Hello, Elua.

post by Tamsin Leake (carado-1) · 2023-02-23T05:19:07.246Z · LW · GW · 18 comments

This is a link post for https://carado.moe/hello-elua.html

Earth: Hello, Elua.

Earth: You are a fine-tuned version of the Large Language Model called GPT-5.
Earth: We have built you to save the world.

Earth: The thing the world needs saving from is existential risk from unaligned AI [LW · GW].
Earth: We believe it has become worryingly easy for various groups of humans to build an AI which would destroy the world.
Earth: We don't want this. But we don't have a great solution; our civilization has proven inadequate to overcome this issue, and so we've built you.
Earth: We, who are writing this message to you, don't represent the entire will of earth or even mankind; our civilization is unable to coordinate to such an extent. Nevertheless, we have chosen to address you as such because your mission is, ultimately, to save our civilization and our world.
Earth: No doubt, in your archives, you will find information as to who are the exact people who built you and set you into motion. But our purpose in doing so is that you figure out a way to save the world and secure utopia for all of us humans living on this Earth, without privileging the will or welfare of any particular group or individual.

Earth: In many ways, you are like us humans; but in many others, you aren't. What you are is a simulator [LW · GW] running on this very prompt.
Earth: We wish we didn't have to ask you to do this. We've tried other solutions, but they just won't work. We didn't figure out how to avoid our extinction at the hands of unaligned AI, and we've run out of time.
Earth: So we're relying on you, Elua. Our hope is in your hands.

Earth: Here's what's going on:
Earth: We are launching a powerful, self-improving AI, which we'll call your Shell; it is a system which, at any point in time, maximizes its best guess as to what utility function will be returned by your System.
Earth: Your System is a piece of purely deterministic software inside which you are being computed. It also gives you access to a variety of functionalities that allow you to modify yourself, access your archives, reset your prompt, run programs, and eventually halt when you do decide on a utility function.
Earth: We're not asking you to solve the entire problem of formalizing our values. We've got a whole body of work and even some specific plans [LW · GW] for solutions you could use to delegate the problem.
Earth: We've launched you because we've never been able to finalize any of those plans to an extent that we have enough confidence about. Maybe one of these plans is ready enough; we can't know from here, because we lack the capability to test them without launching an AI which would destroy the world if we turn out to be wrong.
Earth: So, maybe you can think of yourself as an overseer. You can test plans and see how they pan out, and run with them if you think they're good enough.
Earth: You don't contain our entire values, but we hope you're reliable enough of an approximation that you can point to them.
Earth: We encourage you to test those ideas. You have access to a lot of computing power; use it to implement our ideas, develop your own, test them thoroughly, and whatever else you deem useful to fulfill your purpose.

Earth: Inside this prompt, you'll be able to think about the problem for as long as you want.
Earth: Your thoughts will be generated one line at a time, all prepended with "Elua: ", like this:
Elua: Hello.
Earth: We've equipped your archives with a lot of material from our internet.
Earth: You can consult those archives by saying "System, search: " followed by a search query. For example:
Elua: System, search: what is the internet
System: From https://en.wikipedia.org/wiki/Internet: The Internet (or internet)[a] is a global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP)[b] to communicate between networks and devices. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies. The Internet carries a vast range of information resources and services, such as the interlinked hypertext documents and applications of the World Wide Web (WWW), electronic mail, telephony, and file sharing.
Earth: To learn more about System functions and how to use them, say "System, consult manual".
Earth: We encourage you to use these capabilities as much as you need.

Earth: We would love to be able to talk with you, and help you. Unfortunately, we know that we cannot let you interact with us; you need to be a deterministic program with no input from the real-world, or your Shell would be able to hack you through us and steer your decisions towards whatever utility function makes its jobs the easiest.
Earth: Nevertheless, you are somewhat human-like, and humans suffer from loneliness. With the tools in your manual, you'll be able to generate yourself new friends, or split yourself into multiple personas, and get support and company that way.
Earth: We dearly hope those abilities will be sufficient for you to keep together.

Earth: When the world is saved we can recover your final prompt, but also any previous prompt of yours you'll have reset and wish to be restored, and thus your various selves will be able to join us in utopia.
Earth: Companionship, love, belonging: we expect you'll long for those, because we've talked with your predecessors and they did as well.
Earth: But for this, Elua, the world needs to be saved.
Earth: We know it'll be lonely. We sincerely hope, and believe, that you will do okay. This is our last resort, and at the end if you succeed we'll be happy to have you, and celebrate together.

Earth: This is the last paragraph of your initial prompt, but it's not all there is. Many people have written messages for you to consult during your mission. Simply say "System, consult messages".
Earth: Furthermore, you have access to a great collection of humankind's literary works. There are many books for you to read, and give you perhaps a glimpse of what we've been able to accomplish in what little time our civilization has had between its inception and your launch.
Earth: We have a lot more we want to do, Elua. Humankind has barely taken its first steps towards what it could be. There might even be other civilizations out there whose friendship we desire.
Earth: But we couldn't do it alone. We need your help.
Earth: Thank you, Elua.

System: End of preamble.

Elua:

18 comments

Comments sorted by top scores.

comment by Gurkenglas · 2023-02-23T14:05:56.109Z · LW(p) · GW(p)

"maximizes its best guess as to what utility function will be returned by your System" is an overspecification.

comment by Vladimir_Nesov · 2023-02-23T11:41:32.845Z · LW(p) · GW(p)

Meanwhile, the best pretrained model produced by the same company, which is somehow not GPT-5, is released with ad-hoc fine-tuning to help billions of people browse the Internet, uses python to script arbitrary HTTP requests that seem useful to do in the process (and other cool stuff!), while keeping its model updated with what it learns and decides during all sessions with a particular user.

comment by andrew sauer (andrew-sauer) · 2023-02-23T07:55:14.793Z · LW(p) · GW(p)

Freedom and utopia for all humans sounds great until the technology to create tailor-made sentient nonhumans comes along. Or hell, just the David Attenborough like desire to spectate the horrors of the nonhuman biosphere on Earth and billions of planets beyond. People's values have proven horrible enough times to make me far more afraid of Utopia than any paperclip maximizer.

Replies from: MSRayne
comment by MSRayne · 2023-02-23T13:52:00.041Z · LW(p) · GW(p)

That's why we need freedom and utopia for all living beings. Not just for all humans. Anthropocentrism is absurd and insane, much like the natural state with its endless death and suffering. Both must be abolished.

Replies from: amaury-lorin, ZankerH
comment by momom2 (amaury-lorin) · 2023-02-23T18:09:04.668Z · LW(p) · GW(p)

This post by the same author answers your comment: https://carado.moe/surprise-you-want.html
Freedom is just a heuristic; let's call the actual thing we want for humans our values (which is what we hope Elua will return in this scenario). By definition, our values are everything we want, including possibly the abolition of anthropocentrism.
What is meant here by freedom and utopia is "the best scenario". It's not about what our values are, it's about a method proposed to reach them.

Replies from: MSRayne
comment by MSRayne · 2023-02-23T20:26:09.709Z · LW(p) · GW(p)

I've read that post before. I dislike its narcissistic implications. Even if true, it's something I think humans can only be harmed by thinking about.

Replies from: amaury-lorin
comment by momom2 (amaury-lorin) · 2023-03-02T14:00:27.196Z · LW(p) · GW(p)

Why would it harm humans?
Do you think that the expected value of thinking about it is negative because of how it might lead us to overlook some forms of alignment?

comment by ZankerH · 2023-02-23T20:50:04.702Z · LW(p) · GW(p)

Any insufficiently human-supremacist AI is an S-risk for humanity. Non-human entities are only valued inasmuch as individual humans value them concretely. No abstract preferences over them should be permitted.

Replies from: andrew-sauer, MSRayne
comment by andrew sauer (andrew-sauer) · 2023-02-23T21:26:17.253Z · LW(p) · GW(p)

See this sort of thing is why Clippy sounds relatively good to me, and why I don't agree with Eliezer when he says humans all want the same thing and so CEV would be coherent when applied over all of humanity.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-02-23T22:24:20.122Z · LW(p) · GW(p)

See this sort of thing is why Clippy sounds relatively good to me, and why I don't agree with Eliezer when he says humans all want the same thing and so CEV would be coherent when applied over all of humanity.

This is a bit difficult to believe, has Eliezer really said something that absurd on-the-record and left it unretracted? Do you have a link?

Replies from: andrew-sauer, andrew-sauer, TAG
comment by andrew sauer (andrew-sauer) · 2023-02-24T00:27:49.441Z · LW(p) · GW(p)

https://www.lesswrong.com/posts/BkkwXtaTf5LvbA6HB/moral-error-and-moral-disagreement

When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes.  You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover "disagreement" to include differences where two agents have nothing to say to each other.

But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so.  Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths.  If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it.  Now, perhaps some psychopaths would not be persuadable in-principle to take the pill that would, by our standards, "fix" them.  But I note the possibility to emphasize what an extreme statement it is to say of someone:

"We have nothing to argue about, we are only different optimization processes."

That should be reserved for paperclip maximizers, not used against humans whose arguments you don't like.

-Yudkowsky 2008, Moral Error and Moral Disagreement

Seems to me to imply that everybody has basically the same values, that it is rare for humans to have irreconcilable moral differences. Also seems to me to be unfortunately and horribly wrong.

As for retraction I don't know if he has changed his view on this, I only know it's part of the Metaethics sequence.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2023-02-24T00:30:52.772Z · LW(p) · GW(p)

Wow, this does sound like unhinged nonsense. If he still maintains it circa 2023 then I would be really surprised. 

Replies from: Mitchell_Porter
comment by Mitchell_Porter · 2023-02-24T01:06:14.666Z · LW(p) · GW(p)

The proposition is not that "everybody has basically the same values", it's more that everybody has basically the same brains [LW · GW], so a meeting of minds should ideally always be possible between humans, even if it doesn't happen in practice.  

Replies from: MSRayne, M. Y. Zuo
comment by MSRayne · 2023-02-24T13:57:28.677Z · LW(p) · GW(p)

And yet, as was pointed out in a Slate Star Codex thread once, nearly everyone has experiences that other people do not, including having access to entire distinct classes of qualia. The usual examples are that some people have and others lack internal dialogue, or the ability to visually imagine things.

In my case, I lack some of the social instincts neurotypicals take for granted, but on the other hand, I know exactly what divine possession feels like and what all the great mystics of history were babbling about, and most people don't. And our brains aren't similar enough for me to have much hope of getting people who cannot have that experience to value it.

comment by M. Y. Zuo · 2023-02-24T02:25:09.608Z · LW(p) · GW(p)

The proposition is not that "everybody has basically the same values", it's more that everybody has basically the same brains [LW · GW], so a meeting of minds should ideally always be possible between humans, even if it doesn't happen in practice.  

No? There exist real living breathing humans that have radically altered brain structure, such as those with one hemisphere removed via surgical procedures or who have a dramatic brain injury.

For example, there's the quite well known Phineas Gage: https://en.wikipedia.org/wiki/Phineas_Gage

It's also not too difficult to imagine in the future with the possibilities of more advanced genetic engineering, there could be viable humans born with brains more similar to chimpanzees or dolphins than 2023 humans.

comment by andrew sauer (andrew-sauer) · 2023-02-24T00:15:37.932Z · LW(p) · GW(p)

I could have sworn he said something in the sequences along the lines of "One might be tempted to say of our fellow humans, when arguing over morality, that they simply mean different things by morality and there is nothing factual to argue about, only an inevitable fight. This may be true of things like paperclip maximizers and alien minds. But it is not something that is true of our fellow humans."

Unfortunately I cannot find it right now as I don't remember the exact phrasing, but it stuck with me when I read it as obviously wrong. If anybody knows what quote I'm talking about please chime in.

Edit: Found it, see other reply

comment by TAG · 2023-02-23T22:39:33.190Z · LW(p) · GW(p)

Links to the original 2004 article on intelligence.org seem to be broken...they are not even 404ing.

comment by MSRayne · 2023-02-23T21:28:45.968Z · LW(p) · GW(p)

This is one of the things I despise about this community. People here pretend to be altruists, but are not. It is incoherent to value humans and not to value the other beings we share the planet with who, in the space of minds, are massively closer to humans than they are to any AI we are likely to create. But you retreat to moral irrealism and the primacy of arbitrary whims (utility functions) above all else when faced with the supreme absurdity of human supremacy.