The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind

post by Roko · 2024-10-16T01:24:51.102Z · LW · GW · 6 comments

This is a link post for https://transhumanaxiology.substack.com/p/the-elysium-proposal

Contents

6 comments

"We ideally want to move reality closer to the efficient frontier of personal utopia production."

6 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-16T05:12:08.938Z · LW(p) · GW(p)

@ThomasCederborg [LW · GW] This seems like the sort of proposal that could use some of your critiquing!

Replies from: ThomasCederborg
comment by ThomasCederborg · 2024-10-17T04:56:24.389Z · LW(p) · GW(p)

Let's optimistically assume that all rules and constraints described in The ELYSIUM Proposal are successfully implemented. Let's also optimistically assume that every human will be represented by an Advocate that perfectly represents her interests. This will allow us to focus on a problem that remains despite these assumptions.

Let's take the perspective of ordinary human individual Steve. Many clever and powerful AIs would now adopt preferences that refer to Steve (the Advocates of humans that have preferences that refer to Steve). Steve has no influence regarding the adoption of these Steve-Preferences. If one of these clever and powerful AIs wants to hurt Steve, then Steve is only protected by a set of human constructed rules.

The scenario where a clever AI wants to hurt a human that is only protected by a set of human constructed rules was previously discussed here [LW · GW]. That post is about Membrane formalisms. But the argument is similar to the argument presented below. Both arguments are also similar to arguments about the difficulty of keeping a clever AI contained in a human constructed box (if it is important that an AI stays in a human constructed box. Then one should not build a clever AI that wants to leave the box. If a clever AI wants to leave the box, then plugging all human-findable security holes is not enough. Because the clever AI can find a security hole that is not humanly-findable). Very briefly: this general situation is dangerous for Steve, because the AI that wants to hurt Steve is more clever than the humans that constructed the rules that are supposed to protect Steve.

Let's explore one specific example scenario where a clever AI finds a way around the specific rules outlined in the text of The ELYSIUM Proposal. Bob does not want to hurt anyone. Bob certainly does not want to use his Utopia as a weapon. However, it is important for Bob that Bob's Utopia was constructed by an ethical AI. A moral imperative for such an AI is that it must punish heretics (if such heretics exist). Bob would prefer a world where no one is a heretic and no one suffers. But unfortunately Steve is a heretic. And the moral imperative to punish Steve is more important than the avoidance of suffering. So Bob's Personal Advocate (BPA) will try to punish Steve.

Steve now faces a clever AI trying to hurt him, and his only protection against this AI is a set of human constructed rules. Even if no human is able to find a way around some specific set of human constructed rules, BPA will be able to think up strategies that no human is able to comprehend (this more serious problem would remain, even if the security hole described below is fully patched). The real problem faced by Steve is that a clever AI has adopted Steve-referring-preferences. And Steve had no influence regarding the decision of which Steve-preferences would be adopted by this clever AI. But let's now return to discussing one specific strategy that BPA can use to hurt Steve without breaking any of the rules described in this specific text.

BPA is constrained by the requirement that all created minds must enthusiastically consent to being created. The other constraint is that BPA is not allowed to torture any created mind. The task of BPA is thus to construct a mind that (i): would enthusiastically consent to being created, and (ii): would suffer in ways that Steve would find horrific, even though no one is torturing this mind.

The details will depend on Steve's worldview. The mind in question will be designed specifically to hurt Steve. One example mind that could be created is OldSteve. OldSteve is what Steve would turn into, if Steve were to encounter some specific set of circumstances. Steve considers OldSteve to be a version of himself in a relevant sense (if Steve did not see things in this way, then BPA would have designed some other mind). OldSteve has adopted a worldview that makes it a moral obligation to be created. So OldSteve would enthusiastically consent to being created by BPA. Another thing that is true of OldSteve, is that he would suffer horribly due to entirely internal dynamics (OldSteve was designed by a clever AI, that was specifically looking for a type of mind that would suffer due to internal dynamics).

So OldSteve is created by BPA. And OldSteve suffers in a way that Steve finds horrific. Steve does not share the moral framework of OldSteve. In particular: Steve does not think that OldSteve had any obligation to be created. In general, Steve does not see the act of creating OldSteve as a positive act in any way. So Steve is just horrified by the suffering. BPA can crate a lot of copies of OldSteve with slight variations, and keep them alive for a long time.

(This comment is an example of Alignment Target Analysis (ATA). This post [LW · GW] argued that doing ATA now is important, because there might not be a lot of time to do ATA later (for example because Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure [LW · GW]). There are many serious AI risks that cannot be reduced by any level of ATA progress. But ATA progress can reduce the probability of a bad alignment target getting successfully implemented. A risk reduction focused ATA project would be tractable, because risks can be reduced even if one is not able to find any good alignment target. This comment [LW(p) · GW(p)] discuss which subset of AI risks can (and cannot) be reduced by ATA. This comment [LW(p) · GW(p)] is focused on a different topic but it contains a discussion of a related concept (towards the end it discusses the importance of having influence over the adoption of self-referring-preferences by clever AIs).)

Replies from: nathan-helm-burger, Roko
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-17T05:07:00.684Z · LW(p) · GW(p)

I hadn't thought about it like that, but now that you've explained it that totally makes sense!

comment by Roko · 2024-10-17T18:47:07.920Z · LW(p) · GW(p)

This seems to only be a problem if the individual advocates have vastly more optimization power than the AIs that check for non-aggression. I don't think there's any reason for that to be the case.

In contemporary society we generally have the opposite problem (the state uses lawfare against individuals).

comment by james oofou (james-oofou) · 2024-10-16T06:43:44.108Z · LW(p) · GW(p)

Great stuff.

But I don't think anyone's extrapolated volition would be to build their utopias in the real world. Post-ASI, virtual is strictly better. No one wants his utopia constrained by the laws of physics. 

And it seems unlikely that anyone would choose to spend extended periods of time with pre-ASI humans rather than people made bespoke for them.

Also, it's not clear to me that we will get a bargaining scenario. Aligned ASI could just impose equal apportioning of compute budget. This depends on how AI progress plays out. 

Replies from: Roko
comment by Roko · 2024-10-16T20:21:36.380Z · LW(p) · GW(p)

virtual is strictly better. No one wants his utopia constrained by the laws of physics

Well. Maybe.