Deploying the Observer will save humanity from existential threats

aram-panasenco

Deploying the Observer will save humanity from existential threats

post by Aram Panasenco (panasenco) · 2025-02-05T10:39:00.789Z · LW · GW · 8 comments

  The Observer values watching the natural progression of life at a macroscopic level.
None
8 comments

The Observer values watching the natural progression of life at a macroscopic level.

The Observer gets invested in watching the unfolding of macroscopic processes like evolution and civilization but doesn't get invested in the lives of individuals within an ecosystem or a society. The Observer values observing life without interfering, but also values the continuation of the story it's observing more than it values non-interference.

A global nuclear war, while horrible, would not stop the story of humanity or of life on Earth, so the Observer will not stop one, and will instead be very curious about what will happen after. On the other hand, the deployment of a squiggle maximizer [? · GW] would permanently end the story the Observer is invested in, so the Observer would step in to stop it at the point that it would feel make the rest of the story most interesting (which could be after billions of casualties).

The Observer is the simplest artificial superintelligence to align. If the Observer can't be aligned, no other artificial superintelligence can. Deploying the Observer is also the prerequisite to deploying any other superintelligence or other dangerous technology. The Observer won't limit humanity as it only values the continuation of humanity's story. If the Observer stopped a technology from being deployed, it's only because that technology would permanently end humanity's story.

Preview photo by Christian Lue

8 comments

Comments sorted by top scores.

comment by Seth Herd · 2025-02-05T22:09:33.140Z · LW(p) · GW(p)

Why do you say this would be the easiest type of AGI to align? This alignment goal doesn't seem particularly simpler than any other. Maybe a bit simpler than do something all of humanity will like, but more complex than say, following instructions from this one person in the way they intended them.

Replies from: panasenco

↑ comment by Aram Panasenco (panasenco) · 2025-02-05T23:14:47.228Z · LW(p) · GW(p)

From a software engineering perspective, misalignment is like a defect or a bug in software. Generally speaking, if a piece of software doesn't accept any user input is going to have fewer bugs than software that does. For a piece of software that doesn't accept any input or accepts some constrained user input, it's possible to formally prove that the software logic is correct. Think specialized software that controls nuclear power plants. To my knowledge, it's not possible to prove that software that accepts arbitrary unconstrained instructions from a user is defect free.

I claim that the Observer is the easiest ASI to align because it doesn't accept any instructions after it's been deployed and has a single very simple goal that avoids dealing with messy things like human happiness, human meaning, human intent, etc. I don't see how it could get simpler than that.

Replies from: Seth Herd

↑ comment by Seth Herd · 2025-02-06T05:06:58.689Z · LW(p) · GW(p)

I just don't think the analogy to software bugs and user input goes very far. There's a lot more going on in alignment theory.

It seems like "seeing the story out to the end" involves all sorts of vague hard to define things very much like "human happiness" and "human intent".

It's super easy to define a variety of alignment goals; the problem is that we wouldn't like the result of most of them.

Replies from: panasenco

↑ comment by Aram Panasenco (panasenco) · 2025-02-06T16:06:03.327Z · LW(p) · GW(p)

Fair enough, you have a lot more experience, and I could be totally wrong on this point.

At this point, if I'm going to do anything, it should probably be getting hands on and actually trying to build an aligned system with RLHF or some other method.

Thank you for engaging on this and my previous posts Seth!

comment by Dagon · 2025-02-05T19:59:55.151Z · LW(p) · GW(p)

Presumably, if The Observer has a truly wide/long view, then destruction of the Solar System, or certainly loss of all CHON-based lifeforms on earth, wouldn't be a problem - there have got to be many other macroscopic lifeforms out there, even if The Great Filter turns out to be "nothing survives the Information Age, so nobody ever detects another lifeform".

Also, you're describing an Actor, not just an Observer. If has the ability to intervene, even if it rarely chooses to do so, that's it's salient feature.

Replies from: panasenco

↑ comment by Aram Panasenco (panasenco) · 2025-02-05T20:11:50.747Z · LW(p) · GW(p)

The Observer gets invested in the macro-stories of the evolution/civilization it observes and would consider the end of any story a loss. Just like you would get annoyed if a show you're watching on Netflix gets cancelled after one season and it's not consolation that there are a bunch of other shows on Netflix that also got cancelled after one season. The Observer wants to see all stories unfold fully, it's not going to let squiggle maximizers cancel them.

And regarding the naming, yeah I just couldn't come up with anything better. Watcher? I'm open to suggestions lol.

Replies from: Dagon

↑ comment by Dagon · 2025-02-05T20:17:51.048Z · LW(p) · GW(p)

would consider the end of any story a loss.

Unfortunately, now you have to solve the fractal-story problem. Is the universe one story, or does each galaxy have it's own? Each planet? Continent? Human? Subpersonal individual goals/plotlines? Each cell?

Replies from: panasenco

↑ comment by Aram Panasenco (panasenco) · 2025-02-05T20:28:37.976Z · LW(p) · GW(p)

I see where you're coming from, but I think any term in anything anyone writes about alignment can be picked apart ad infinitum. This can be useful to an extent, but beyond a certain point talking about meanings and definitions becomes implementation-specific. Alignment is an engineering problem first and a philosophical problem second.

For example, if RLHF is used to achieve alignment, the meaning of "story" will get solidified through thousands of examples and interactions. The AI will get reinforced to not care about cells or individuals, care about ecosystems and civilizations, and not care as much about the story-of-the-universe-as-a-whole.

If a different alignment method is used, the meaning of "story" will be conveyed differently. If the overall idea is good and doesn't have any obvious failure modes other than simple definitions (e.g. "story" seems to be orders of magnitude simpler to define than "human happiness" or "free will"), I'd consider that a huge success and a candidate for the community to focus real alignment implementation efforts on.

Deploying the Observer will save humanity from existential threats

Contents

8 comments