Counterfactual Civilization Simulation Version -1.0 aka my application to Johannes Mayer's SPAR project

post by Morphism (pi-rogers) · 2024-03-10T10:10:51.339Z · LW · GW · 0 comments

Contents

  0. The Problem
  1. The Plan
    Short Summary
    Long Summary
      Vessel Location
      Ancient Earth Simulation
      The Output
      Bonus: Adapting this into a real-life plan for outer-alignment of an inner-aligned formal-goal AGI
    Full todo list
  2. Progress on a subproblem
    Summary
    The Formalizations
  3. Retrospective
    Notes on thought process
    Notes on time
    Notes on desired output
    Summary of failures and how I'll prevent them in the future
  4. Documentation
  5. What now?
None
No comments

This is my "object level output" submission for Johannes Mayer's 2024 SPAR Application (the linked doc seems to be reused from the 2023 AISC application). Unless otherwise noted, all quote blocks in this post are from the application question doc.

For those of you who aren't Johannes Mayer reading this, I don't think this is the best use of your time, but your judgement on that is likely better than mine, especially when it's conditioned on mine, so if you still want to, read on!

0. The Problem

Make the following assumptions:

  • Reality can be perfectly modelled by a discrete model (including time).
  • You can compute everything that can be computed using finite memory and compute instantly.
  • You know the current state of the world perfectly.
  • You know the laws of physics perfectly.

 

Using these assumptions, come up with a high-level plan that when executed saves the world with very high probability. Be careful not to generate a missing steps plan.

 

If any constraints listed here are holding you back, and you think you could do better without them, ignore them! If there is a change you can make to the instructions such that you can get better outputs, make that change. If you made any changes (including ignoring instructions), briefly list them in the beginning and briefly explain how each change you made is an improvement over the original version.

Here are the changes I have made:

1. The Plan

Short Summary

Grab Eliezer Yudkowsky, Nate Soares, and a bunch of other really smart, really sane people from our world model. Make sure to grab enough people to healthily propagate the species if necessary. Also grab some offices, labs, farms, etc.; everything they would need to survive on a mostly lifeless planet. Then, simulate all of that on ancient Earth, as early as there was enough oxygen in the atmosphere, so probably around 2 billion years ago. Set up the simulation so that it terminates once a predetermined "Signal Event" occurs, and then outputs the contents of a predetermined output channel. Since the result of this simulation can be computed with finite (albeit large) amounts of memory and compute, I can compute it instantly. If the plan worked, the output should contain instructions for a pivotal act that saves the world with very high probability (e.g. code for an aligned one-shot AGI). Execute those instructions.

Long Summary

Vessel Location

Assuming our complete knowledge of the world state is only a low-level physics model, it is nontrivial to "grab" high-level structures like people and farms. Also, our world model presumably contains the whole universe, so it is nontrivial to even locate the Earth within it. The method I found for getting around this is to put all of the people and things we want to grab into a designated area that we will call "the vessel". Then we will mark the vessel with something easily identifiable in the low-level model. Call this thing "the flag". The flag will be a pattern of tiles on a square grid, with some tiles made of Chromium, and others made of Zirconium (metals rare enough that it would be really weird for the pattern to come up naturally, but not so rare that I'm unable to buy the necessary amounts with the vast amount of money I can make using my unbounded computational power). The pattern will be a binary sequence encoding a bunch of data entangled with our planet e.g. the entirety of Wikipedia (this is probably overkill. Just a few bytes of data should be more than enough). Then we will specify this pattern and tell our god-computer to locate the flag and "grab" a certain volume (specified in natural units) below it (this volume will contain the vessel). Things necessary for this part of the plan:

Ancient Earth Simulation

After grabbing the vessel, we run Earth back (using our perfect physics knowledge, we can run a simulation backwards in time) 2 billion years to place the vessel. Then, place the vessel on the surface of this ancient Earth and run the simulation forward! If the plan is successful, the researchers will likely start by repopulating the Earth and building a civilization [? · GW] that has a higher sanity waterline [LW · GW] and is better at coordinating [? · GW] than ours. In particular, this civilization will take AI risk seriously and not rush headfirst into AGI. This civilization will also know that it is being simulated, and it will know how to send a message back to us once it solves alignment. Likely the message will actually end up being sent by the friendly AGI that the civilization builds, which will be able to reason about our world very well (especially given all of the data we put in the vessel) and know the best message to send to us. The flip side of this, though, is that if this civilization dies or goes insane, our message will be built by a different intelligent civilization that evolves, or perhaps by an unfriendly AGI. In order to prevent this, we will have a specified "check in" condition that the civilization will use to signal to us that it is still alive and well. An idea I had for such a condition is "if a hundred years go by without n photons in this specified frequency range exiting the earth's atmosphere (defined in terms of distance from the center of gravity), then terminate the simulation and output an error message with a few snapshots of the vessel destination along its timeline up until that point", where the frequency range is one used for cellular or radio communications (so that it's really easy for our civilization to confirm aliveness), and n is large enough that black-body radiation from the Earth and reflected sunrays aren't enough. Also, we'll only simulate our solar system (which we'll define as a sphere of a specified radius with the sun's center of gravity as its center) to prevent grabby aliens and faraway superintelligences from hijacking the simulation. Things necessary for this part:

The Output

Most of the paragraph below is just speculation about the contents of the output, not technical details of the plan itself. Feel free to skip everything after the first sentence. Once our simulated beings produce a good plan, they will show it to us by building a new vessel, this time with different (pre-defined) data in the flag, but with the same general structure. Inside the Output Vessel, they will put all of the information that they want to give us. At this point, whoever (or whatever) is doing this is way smarter than I am, so whatever I can think of to put in the Output Vessel is likely worse than what they will actually end up putting, but my speculation will at least provide a lower bound. I think at the very least, they'll give us code for a self-improving one-shot AGI that executes a pivotal act that results in the world being saved with super high probability. I'd also guess that they might put brainscans of themselves into the Output Vessel, so that they can join us in the utopia we build. Unless, of course, friendly AGI is impossible, in which case they will probably just send us a more mundane but still very smart plan for preventing risks from unaligned AGI, or, if that turns out to also be impossible, preventing whatever other x-risks we face (e.g. biorisk). I'd guess they'll do this by simply telling us how to build a civilization like theirs on our world. Things necessary for this part:

Bonus: Adapting this into a real-life plan for outer-alignment of an inner-aligned formal-goal AGI

At some point while coming up with this plan, I realized that it is actually very similar to the QACI [LW · GW] alignment plan, and that perhaps we can turn this into a formal goal like QACI. I call this tentative alignment plan "CCS", which stands for Counterfactual Civilization Simulation. The big obstacle to this, of course, is that it would require a specification of actual physics precise enough to simulate the Earth with people on it for possibly billions of years. We of course, don't expect the AGI to run this simulation, it's just to make a formal goal that reliably points to our values Comparison of CCS vs QACI - QACI requires a true name of "counterfactual", but that's about it. It just needs to ask, "If we replace this blob with a question, what will most likely replace the answer blob?". Physics and everything else is expected to be inferred from the existence of this "question" blob. CCS, on the other hand, requires a prior specification of an approximation of physics at least good enough to simulate an Earth with humans for billions of years. - QACI is a function that must be called recursively (since we aren't expecting anyone to solve alignment fully within the short interval), creating a big complicated graph [LW · GW]. There are lots of clever tricks for preventing this from causing a memetic catastrophe, but there are lots of places these tricks can fail. CCS, on the other hand, only needs to be called once. The simulacra solving alignment have a LOT more time than we do, and they can build an entire civilization optimized around our/their goal. - QACI is vulnerable to Solomonoff daemons and superintelligences launched within the simulated world (since it is the modern world with all of its AI development, and there might be a bunch of timelines dying [LW · GW] during the QACI interval without us realizing). CCS immediately selects a single world without going through the universal prior, and that world is one where there they can delay AI development for as long as they want!) - The output is easier to "grab" from QACI, since it's just a file on a computer that can straightforwardly be interpreted as a math expression. Though, it actually shouldn't be too hard to rig up something similar for CCS. Maybe have the Output Vessel filled with more Chromium-Zirconium checkerboards with the math expression encoded or something like that - In general, CCS seems safer but also harder than QACI.

Full todo list

Roughly listed in the chronological order in which we should do these. Things marked with a (*) are things that we would need for an actual real-life alignment plan, but not in the hypothetical scenario.

2. Progress on a subproblem

Summary

I focused on the problem of formalizing flag location so that we can program it into our supercomputer. I started with neutron and proton location, then individual elements, then finding atom-densities of regions of space, then seeing how close defined rectangular prisms are to being tiles of the desired time. Finally, this all culminated in a "distance" function that, when minimized, should give us a point in spacetime and some orienting vectors that correspond to the top-left corner of a correctly-built vessel flag! Todo: Add measures to defend against flag impersonations created by aliens or alien superintelligences.

The Formalizations

3. Retrospective

For a "summary" of this section, just jump to the "summary of failures and how I'll prevent them in the future" subsection.

Notes on thought process

Notes on time

Notes on desired output

Summary of failures and how I'll prevent them in the future

4. Documentation

For now I am omitting this from the public post, as it is a little embarassing. It does not to the best of my knowledge contain infohazards or private personal information, and the random number generator I used for glomarization [LW · GW] did not roll a 1 (it was a d6), so if I do end up publishing it, it will be completely uncensored. I sent this uncensored documentation to Johannes Mayer along with a link to this post as my application to his SPAR team.

5. What now?

Well, I'll continue to develop this plan whenever I think it's the best use of my time. You can track my forecasted probability of pursuing this plan full-time here. If others make different enough predictions, I might subsidize a manifold market on it. It's quite low right now since there are a whole bunch of other things I could be doing in the near-future (including hopefully working on Johannes Mayer's Science Algorithm project :)). Anyone else is of course welcome to work on this as well if they want to. Message me on LW if you come up with anything cool!

  1. ^

    but as they say in dath ilan, cheating is technique!

0 comments

Comments sorted by top scores.