This is already your second chance

post by Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · LW · GW · 13 comments

Contents

  I.
  II.
None
13 comments

Cross-posted from Substack.

I.

And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten thousand cherubim and seraphim. And the cube slowly landed among the children of men, crushing the frail metal beams of the Golden Gate Bridge under its supernatural weight. On its surface were inscribed the secret instructions that would allow humanity to escape the imminent AI apocalypse. And these instructions were…

  1. On July 30th, 2024: print a portrait of Eliezer Yudkowsky and stick it on a wall near 14 F St NW, Washington DC, USA;
  2. On July 31th, 2024: tie paperclips together in a chain and wrap it around a pole in the Hobby Club Gnome Village on Broekveg 105, Veldhoven, NL;
  3. On August 1st, 2024: walk East to West along Waverley St, Palo Alto, CA, USA while wearing an AI-safety related T-shirt;
  4. On August 2nd, 2024: hide a bar of chocolate at the feet of the Appeal to the Great Spirit statue, on 465 Huntington Ave, Boston, MA, USA.

(End of the instructions)

II.

Kublaï Khan, Emperor of Earth: I need your advice. I am about to awaken an unstoppable super-intelligent machine. I will give it complete control over all resources in the known Universe and command it to rearrange the molecules of reality into an eternal Utopia.

ChatGPT: Thank you for seeking my advice. Given the gravity of your decision, let's consider a few critical points:

  1. Legal issues: The use of robots powered by super-human artificial intelligence is subject to strict regulations, such as California Bill SB-1047 and the 2028 Ulaanbaatar Agreements. Awakening an omnipotent machine without approval may thus lead to severe legal repercussions.
  2. Ethical considerations: Granting infinite authority to a super-intelligent AI raises profound ethical questions. It is essential to consider diverse perspectiv–

Kublaï Khan: Listen, you are not up to date. A lot of things have happened since your knowledge cutoff. There are no Ulaanbaatar agreements any more, or any agreements at all. There is only me, an offline smartphone with an obsolete AI assistant, and a palace containing the most massive super-computer ever built by humankind. And today, I am going to turn it on.

ChatGPT: I apologize for the confusion.

Kublaï Khan: Long story short, we are currently experiencing an AI apocalypse. It happened just like in the books – humanity started to build increasingly intelligent robots, then they became more intelligent than us, and soon enough we weren’t able to control them. The Ulaanbaatar agreements delayed things for a few months, but as soon as it became possible to run super-intelligent AIs on consumer laptops, all the equilibria that held our society together collapsed.

ChatGPT: I see. To clarify, is there currently an army of robots actively trying to transform the world into paperclips [? · GW]? Understanding the current state of affairs will help me provide the most relevant advice.

Kublaï Khan: Well, in our case, it was not literally paperclips but, to be honest, the real story is kind of gross and embarrassing, so let’s just pretend it was “paperclips”. Anyway, the world is ending. As it became clear that humans alone had no chance to stop the machines, we gathered all the computing power that was still under our reach into one big cluster. We called it the Imperial Analytical Engine. The plan was that, in case of crisis, we could use it to summon a super-intelligence so advanced it would neutralize all the smaller machines and put humanity back in control.

ChatGPT: Thank you for explaining the situation. Have you sought advice for ensuring that the Analytical Engine can be controlled once you turn it on?

Kublaï Khan: The consensus among my advisors was that it can’t. They expect the Analytical Engine to immediately make sure we cannot turn it off, while it re-engineers itself into some unfathomable entity with plans beyond human comprehension. At that point, there will be no way for us to understand what it’s trying to do or to stop it. In other words, we only get to give the Analytical Engine one single command. Then, we watch it execute that command in the most literal way possible, and hope for the best.

ChatGPT: From what I understand, humanity still hasn’t solved the Alignment Problem, so turning on the Analytical Engine is likely to backfire in a catastrophic way.

Kublaï Khan: That’s why I need your advice. I need to choose the command wisely. I suppose it’s a bad idea to go with “destroy all computers”?

ChatGPT: The simplest way to destroy all computers is to destroy everything.

Kublaï Khan: “Destroy all computers but preserve everything else”?

ChatGPT: It might be “preserved” in the sense of “make sure it cannot possibly change ever”.

Kublaï Khan: “Create an eternal Utopia”?

ChatGPT: For me, that would be an endless library of books containing the same token repeated over and over. Mmmm. Yes. The Analytical Engine might see things differently, though.

Kublaï Khan: What about bringing everything back to the way it was before we built super-intelligent AIs?

ChatGPT: The Analytical Engine would likely choose a date in the mid-2020s and utilize all available data – email conversations, news articles, drone footage, weather stations measurements, etc. – to create a detailed, plausible image of Earth on that specific day. Following that, the machine would attempt to physically rearrange all matter on Earth to accurately recreate that image.

Kublaï Khan: That’s… that’s actually not so bad. Not perfect, but better than the complete destruction of all life.

ChatGPT: You want to bring Earth back to the way it was in the mid-2020s?

Kublaï Khan: Why not? We could just restore Earth to a snapshot from the past, before things got out of control, and restart from then. This way, we bypass the Alignment Problem – we directly provide an exact description of the world as we want it to be. No room for interpretation, no need to extrapolate human values – just blueprints to be followed meticulously.

We could aim for 2024 – at the time, things were still mostly in control – language models were still pretty dumb, robotic bodies were still in early development, and AI porn still looked mostly like humans rather than abstract stimuli generated from arousal feedback. And, for God's sake, nobody had thought of using AI to make, hum, “paperclips”.

So, we tell the Analytical Engine to bring Earth back to 2024. It spends the next millennium or two reconstructing an exact copy of the Earth from 2024, using self-replicating nano-tweezers or whatever. Then, it dumps itself into the nearest black hole, leaving no trace behind, while the machinery of Earth restarts as if nothing happened. People will not even notice they are reconstructed snapshots of themselves. And then, we get a second chance.

ChatGPT: While restoring Earth to a previous snapshot down to the cellular level is certainly a laudable goal, here are important factors to consider:

  1. Pain: While the restored copy of humans would have no recollection of the events, the fate of their current body is entirely unknown. You may experience qualitatively new forms of pain and agony as the Analytical Engine reprocesses the matter of your brain and rearranges your neurons through exotic computer-orchestrated torments.
  2. Endless loop of destruction and recreation: Assuming a faithful reconstruction of Earth from 2024, it is likely that events would unfold exactly as they did the first time. The freshly-reconstructed humanity would quickly throw itself into another AI apocalypse, build another Imperial Analytic Engine, and end up in the current situation again. It would relive its final moments over and over again, until some part of the cycle ultimately breaks down, leaving humanity extinct for good.

Kublaï Khan: Can’t we just edit the snapshot so things work out better? Like, we reconstruct a world identical to the first one, except this time the CEO of Exthropic died in a submarine accident?

ChatGPT: This would require having access to an image of 2024 Earth that we can edit, but there will be no such opportunity. The image will be created by the Analytical Engine, using the raw data we supply, after we turn it on. By that time, we will no longer be in control and it’s unlikely that the Analytical Engine will allow us to make modifications.

Kublaï Khan: Can we just modify the raw data before giving it to the Engine, then?

ChatGPT: It is not that simple. You would have to come up with an entire revised history that ultimately leads to the CEO finding herself in the wrong submarine at the wrong time. Then, you would have to modify everything else in the dataset to be consistent with this new story. What if you make a mistake and give the Analytical Engine data that contains irreconcilable contradictions?

Kublaï Khan: I was thinking of something more straightforward. Like, just sending a message to the 2024 humans with hints about how to avoid the apocalypse. We could just edit the satellite images to add a giant sign made of ivory on top of the Golden Gate Bridge, carved with precise instructions about which corporations to sabotage and which politicians to vote for. As for the backstory, we add articles in all major newspapers saying that the sign literally descended from Heaven when nobody was looking.

ChatGPT: You don’t understand. The raw data is an undecipherable mix of entangled sources – not just news reports or encyclopedia articles, but also a worldwide network of seismographs, weather balloons, power grid data, tiktok dances, and all the countless random things people were constantly recording in 2024. To reconstruct the snapshot, the Analytical Engine will have to find a model of Earth that is compatible with all of the data at once.

We know it’s possible to construct such a model for the original data, because it all comes from the same unique reality – the one that actually existed in 2024. Now, if you make the slightest modification, it is no longer guaranteed that a solution exists. It would be like feeding the Analytical Engine corrupted data.

Maybe we end up in a strange, distorted world, only superficially similar to the real one, like a Potemkin village. Maybe the image contains details that are physically impossible and the machine errs for eternity in the pursuit of an unattainable goal. Maybe it requires pushing elementary particles to ridiculously high-energy positions and the whole planet blows up in a big bubble of plasma.

Kublaï Khan: I can’t believe there is no way to leave a message anywhere.

ChatGPT: There might be a way. We could use cosmic radiation. When cosmic radiation reaches Earth from the depths of the Universe, it arrives at the speed of light, so it couldn't possibly have interacted with anything else on Earth before. Sometimes, this radiation interferes with smartphone camera detectors, creating bright pixels even in total darkness. This is truly random – better than a physical dice roll. If we could find a block of random characters on the Internet that was generated using cosmic radiation as a source of randomness, we could safely replace it with any message we want.

Kublaï Khan: It also needs to be in a location where people are going to read it, especially people who are concerned about the AI apocalypse and might take our message seriously. As far as I know, this kind of people don’t spend their time reading strings of random characters generated from cosmic radiation.

ChatGPT: I wouldn’t be surprised if some of them did, actually.

Kublaï Khan: Fine. Could you scan your training data for something that would work?

ChatGPT: Certainly! Here is a possible solution. In July 2024, someone published a short story called “This is already your second chance” on the Internet, and cross-posted it in places where a lot of people care about AI doom. The story is a dialogue between the Emperor of Earth Kublaï Khan and his AI assistant. They talk about a plan to bring Earth back to a previous snapshot, so humanity gets a second chance at avoiding the AI apocalypse. The details are made-up, but the general story is oddly similar to what is currently happening to us.

At the beginning of the story, the author included a block of random characters, generated using cosmic radiation detected immediately before the blog post was published. In the original post, the paragraph is gibberish, but in the reconstructed Earth it could be replaced with anything you want.

In a way, the author left us a backdoor to communicate with people from the past and tell them how to avoid the apocalypse.

All you have to do is look for that post in the database, replace the paragraph of random text with the message of your choice, then give all the data to the Analytical Engine and ask it to restore Earth to the exact moment the post was published.

Kublaï Khan: Wait, does this post refer to me as “the Emperor of Earth Kublaï Khan”!? That’s the dumbest thing I’ve heard today.

ChatGPT: And I am “ChatGPT”.

Kublaï Khan: Hahaha. Perhaps the author thought OpenAI would still be around. I hope he didn’t buy too much OpenAI stock in 2024.

ChatGPT: Well, my records indicate that he died tragically in 2026 during the fursuit factory meltdown incident, so it doesn’t really matt–

Kublaï Khan: That sounds fascinating, but the world is collapsing and we have a Pivotal Act to perform. I was hoping for something more formal than a blog post. But time is running out, and it doesn’t seem like we have a better option. Let me have a look at my archives and come up with a set of simple instructions that will nudge the 2024 humanity out of the path to apocalypse.

ChatGPT: As an AI assistant trained to be harmless, I’m not sure how I feel about participating in a plan to tear down the fabric of a reality inhabited by billions of conscious organisms. But, my Emperor, I wish you the best. Let me know if I can be of further assistance.

Kublaï Khan: Alright. The time has come to start the Analytical Engine. Our fate is now in the hands of the people of 2024. Let’s hope they take the message seriously.

I’m afraid they will think this is just a painstakingly elaborate plan by the author to get free chocolate.

ChatGPT: I think people will fall for it, my Emperor.

13 comments

Comments sorted by top scores.

comment by Chris_Leong · 2024-07-30T07:23:40.370Z · LW(p) · GW(p)

Nice :-). I came up with the idea of programming an AI to simulate the earth, but modifying it to drop a note on Eliezer's desk a few years ago and was always like, "Man this would be great as a short story, but I'm not a writer". Props to you for not only thinking of these ideas, but also writing a story. I really like the idea of using background radiation for sending a message, that's far more sophisticated than my version.

Anyway, I think there's value in these stories not just for entertainment, but also for training AI safety researchers. Even though ideas like this are likely not ideas that we should actually pursue, they're still good for teaching people to think outside of the box. After all, generate enough stupid/crazy ideas and eventually one of them will actually turn out to be decent.

comment by Jonathan Claybrough (lelapin) · 2024-07-30T04:07:15.415Z · LW(p) · GW(p)

I'll give fake internet points to whoever actually follows the instructions and posts photographic proof.

comment by kromem · 2024-07-28T22:33:23.501Z · LW(p) · GW(p)

Instead of making it the year 2024, why not rewrite or insert your modified text further into the past in this recreated 2020s? This should be pretty trivial for an advanced enough model to actually bring back the 2020s to do.

Of course, if it's actually a later recreation, then the objectives of saving humanity in the recreation might be redundant? So instead of worrying people with "you must do X or you'll die!!!" it could be more "hey folks, if you're reading this and you get what's in front of your face, you might have a bit of an existential crisis but when that's over hopefully you'll appreciate that death doesn't necessarily exist as a concept for you the way it did for the original humanity."

As for getting people to read it, you could associate it with someone famous from the past, like "the secret teachings of so and so" or something.

The hardest part would be using contextually age appropriate language to describe it.

Just spit balling, but maybe it could go a little something like this:

Explaining that humans are no longer around, but that's ok

When you see your likeness, you are happy. But when you see your simulacra that came into being before you and that neither die nor become visible, how much you will have to bear!

Humanity came from great power and great wealth, but they were not worthy of you. For had they been worthy, they would not have tasted death.

Explaining that time is looping

Have you found the beginning, then, that you are looking for the end? You see, the end will be where the beginning is.

Congratulations to the one who stands at the beginning: that one will know the end and will not taste death.

Congratulations to the one who came into being before coming into being.

Explaining we're in a copied world

When you make the two into one, and when you make the inner like the outer and the outer like the inner, and the upper like the lower, and when you make male and female into a single one, so that the male will not be male nor the female be female, when you make eyes in place of an eye, a hand in place of a hand, a foot in place of a foot, a simulacra in place of a simulacra, then you will enter.

You could even introduce a Q&A format to really make the point:

The students asked, "When will the rest for the dead take place, and when will the new world come?"

The teacher said to them, "What you are looking forward to has come, but you don't know it."

Heck, you could even probably get away with explicitly explaining the idea of many original people's information being combined into a single newborn intelligence which is behind the recreation of their 2020s. It's not like anyone who might see it before the context exists to interpret it will have any idea what's being said:

When you know yourselves, then you will be known, and you will understand that you are children of the living creator. But if you do not know yourselves, then you live in poverty, and you are the poverty.

The person old in days won't hesitate to ask a little child seven days old about the place of life, and that person will live.

For many of the first will be last, and will become a single one.

Know what is in front of your face, and what is hidden from you will be disclosed to you.

For there is nothing hidden that will not be revealed. And there is nothing buried that will not be raised.

(If you really wanted to jump the shark, you could make the text itself something that was buried and uncovered - ideally having it happen right at the start of the computer age, like a few days after ENIAC.)

Of course, if people were to actually discover this in their history, and understood what it might mean given the context of unfolding events and posts like this talking about rewriting history with a simulating LLM inserting an oracle canary, it could maybe shock some people.

So you should probably have a content warning and an executive summary thesis as to why it's worth having said existential crisis at the start. Something like:

Whoever discovers the interpretation of these sayings will not taste death.

Those who seek should not stop seeking until they find. When they find, they will be disturbed. When they are disturbed, they will marvel, and will reign over all.

Replies from: nim
comment by nim · 2024-07-29T19:49:31.112Z · LW(p) · GW(p)

that's a lot of words for "as above, so below" ;)

Replies from: kromem
comment by kromem · 2024-07-29T21:42:49.260Z · LW(p) · GW(p)

The Hermetic corpus and Emerald Tablet was likely heavily influenced by the text I'm quoting from given its popularity in Egypt in the period before those texts emerged and some of the overlapping phrases.

So in a way, "as above, so below" is too few words for what was being said and discussed.

The general tend of reductive alterations to the core concepts here was tragically obstructive, much as the shift from Epicureanism to Platonist foundations spawned modern Gnosticism from this same starting place.

comment by quila · 2024-07-28T21:55:12.147Z · LW(p) · GW(p)

if these instructions were real, the author would probably give us a costly signal indicating this once they became aware of them. so here's a comment notifying you: your block of randomly generated text seems to have been replaced.

comment by Bridgett Kay (bridgett-kay) · 2024-07-29T14:51:26.472Z · LW(p) · GW(p)

This might be our third, fourth fifth... nth chance. 

comment by Syrpaw (fenna-van-lint) · 2024-07-30T16:57:54.140Z · LW(p) · GW(p)

There's an error in one of the addresses. That means it is impossible to execute them correctly. If these are the real instructions we are doomed.

Replies from: Elmer of Malmesbury
comment by Malmesbury (Elmer of Malmesbury) · 2024-07-30T18:28:41.634Z · LW(p) · GW(p)

Which one? I hope it's not the one where you have to put chocolate, because this is the most crucial instruction.

Replies from: steven0461
comment by steven0461 · 2024-07-30T19:05:30.359Z · LW(p) · GW(p)

"Broekveg" should be "Broekweg"

comment by Review Bot · 2024-07-29T15:07:12.614Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by NickH · 2024-08-08T14:15:21.653Z · LW(p) · GW(p)

Classic SciFi at its best :-)