CCS: Counterfactual Civilization Simulation
post by Morphism (pi-rogers) · 2024-05-02T22:54:29.773Z · LW · GW · 0 commentsContents
High-Level Overview The Details CCS vs QACI None No comments
I don't think this is very likely, but a possible path to alignment is formal goal alignment [LW · GW], which is basically the following two step plan:
- Define a formal goal that robustly leads to good outcomes under heavy optimization pressure
- Build something that robustly pursues the formal goal you give it
I think currently the best proposal for step 1 is QACI [LW · GW]. In this post, I propose an alternative that is probably worse but definitely not Pareto-worse.
High-Level Overview
Step 1.1: Build a large facility ("The Vessel"). Populate The Vessel with very smart, very sane people (e.g. Eliezer Yudkowsky, Tamsin Leake, Gene Smith) and labs and equipment that would be useful for starting a new civilization.
Step 1.2: Mark The Vessel with something that is easy to identify within the Tegmark IV multiverse ("The Vessel Flag").
Step 1.3: Leave the people and stuff in The Vessel for a little while, and then destroy The Flag and dismantle The Vessel.
Step 2: Define CCS
as the result of the following:
Step 2.1: Grab The Vessel out of a Universal Turing Machine, identifying it by the Flag (this is the very very hard part)
Step 2.2: Locate the solar system that contains The Vessel, and run it back 2 billion years. (this is another very hard part)
Step 2.3: Put The Vessel on the Earth in this solar system, and simulate the solar system until either a success condition or a failure condition is met. The idea here is that the Vessel's inhabitants repopulate the Earth with a civilization much smarter and saner than ours [? · GW] that will have a much easier time solving alignment. More importantly, this civilization will have effectively unlimited time to solve alignment.
Step 2.4: The success condition is the creation of The Output Flag. Accompanying the Output Flag is some data. Interpret that data as a mathematical expression.
Step 2.5: Evaluate this expression and interpret it as a utility function.
Step 3: Build a singleton AI that maximizes E[CCS(world)]
.
The Details
TODO: I will soon either update this post or make more posts with more details as I come up with them.
CCS vs QACI
- QACI requires a true name of "counterfactual", but that's about it. It just needs to ask, "If we replace this blob with a question, what will most likely replace the answer blob?". Physics and everything else is expected to be inferred from the existence of this "question" blob. CCS, on the other hand, requires a prior specification of an approximation of physics at least good enough to simulate an Earth with humans for billions of years, or maybe some weird ontology translation thing or something.
- QACI is a function that must be called recursively (since we aren't expecting anyone to solve alignment fully within the short interval), creating a big complicated graph [LW · GW]. There are lots of clever tricks for preventing this from causing a memetic catastrophe, but there are lots of places these tricks can fail. CCS, on the other hand, only needs to be called once. The simulacra solving alignment have a LOT more time than we do, and they can build an entire civilization optimized around our/their goal.
- QACI is vulnerable to superintelligences launched within the simulated world (since it is the modern world with all of its AI development, and there might be a bunch of timelines dying during the QACI interval without us realizing). CCS, on the other hand, simulates a very small world (just the solar system) with a civilization that will quickly become powerful enough to prevent any other intelligence to evolve.
- The output is easier to "grab" from QACI, since it's just a file on a computer that can straightforwardly be interpreted as a math expression. But I think if we figure out how to grab the vessel, we can probably use a very similar method to grab the output.
- In general, CCS seems safer but also much harder than QACI.
0 comments
Comments sorted by top scores.