What Does AI Alignment Success Look Like?

shmi

What Does AI Alignment Success Look Like?

post by Shmi (shminux) · 2022-10-20T00:32:48.100Z · LW · GW · No comments

This is a question post.

  Answers
    7 the gears to ascenscion
    6 Jon Garcia
    1 Noosphere89
None
No comments

Suppose you are put in stasis and wake up 10 years after the FOOM. You are trying to figure out if the AI Alignment project succeeded or not. How can you tell? Not vaguely, concretely. What metrics indicate success, and what metrics indicate failure? The following are potential examples, based on various sci-fi AI tropes. No need to discuss each one separately, there are millions more, the idea is to delineate failures from success in the general case.

Earth has been turned into a mega-brain with no visible humans around.
The Universe around the Solar System has disappeared (or turned into an apparently solid shell of unknown composition), but humans are still around, living in abundance of anything they want or need and seemingly content to stay that way.
Every human gets their own (real or simulated) universe to play with. Some end up creating trillions of creatures whom they torture for fun.
Humanity lives inside a giant simulation.
AI development is under a strictly enforced interdict.
Anything remotely alive-looking is digimon-like creatures, who proclaim themselves human.
The universe appears empty except for one black hole that encodes humanity (in some form) in its horizon microstates.

Any links discussing this would be appreciated, too.

Answers

answer by the gears to ascension (the gears to ascenscion) · 2022-10-20T01:30:29.185Z · LW(p) · GW(p)

a bunch of ideas! [edit: oops, I answered positive-side only.] [edit #2: I liked another answer so much I strong downvoted myself to put theirs higher]

free square, MIRI answer: you woke up at all.
there's lots of beauty but almost no visible waste anywhere.
all wars have been ended completely, with alternative outcomes better than any option that had previously been on the table for warring parties; states, mobs, gangs, etc have mostly dissolved their violence arms due to not needing to exist in a hyper-multipolar stabilized world.
species which went extinct have been partially restored from the impressions they left on the ecosystem, but with changes to their biology that result in their mutations being catalogued meticulously so nothing at all is lost.
there's an ai sitting in a bath of paperclips somewhere. it's quite content to share a few, as it is able to make more when it gets low, and besides, look how pretty its paperclip house is! the ai in question really likes stories about taking over a universe with sentient paperclips, but only if the universe is a pocket universe with no people, and even then the paperclips have to be pretty or the ai in question isn't interested. also it's, like, just one of billions of independent embodied agentic ais who have their own residences.
former autocrats and other incredibly harm-causing people are mostly still alive but unable to hurt others anymore, and are grudgingly respected for their positive contributions to humanity as well. only ten years later, most of them would probably still working on the process of facing up to how terrible they'd been. nobody got tortured for having previously been a torturer, but they don't get to have all permissions in nearly anyone's social fulldive instances in the metaverse or whatever until they become both willing and able to show the same certification of full-system formally verified safe margin of friendliness the respected moral advisors of the universe have.
people basically don't have big fights anymore. when they do, weapons are almost never used. when they are, very little energy is released by the weapons and the weapons do not cause significant amounts of erasure, certainly nothing that can't be healed.
everyone is trained with weapons and no agent is allowed to have unilateral authority over another, at all. should someone violate another, cryptographic proof of such can be announced and the other beings that know the involved parties come to the rescue of both offender and defender.
ais and humans are both similarly able to be superintelligent-per-calorie, and the majority of humanity has become so. but some people have most of their own intelligence disabled most of the time because that's what they feel like doing.
aging in the sense of damage to a body has ended; many people still aesthetically age but without "getting old". old people have the option to de-age.
this one would be hard to observe, but von neumann probes have been sent out into the galaxy to ensure friendly life takes over the universe; the hope is we find it empty and can put down hypercooperator agents that can establish friendliness with any aliens that show up later.
people change form regularly without injury or data loss.
global warming has been stopped completely by physical intervention on the air, probably big fans or something in the ocean or etc. I'm not really expecting an asi to come up with dramatically better high level ideas rather than helping us do one of the impossible solutions we a already thought of.
if we're really lucky, it turns out usefully-reversible movement is possible and we can move accelerate and decelerate through the universe all the way up to near c costlessly. relatedly, if this one turns out true, presumably we meet the chorus of reversible superintelligences that have been watching us forever or something. I have a friend who really thinks the ufo leaks are real noninterventionists, so this bullet is my nod to that, but I don't think this one is happening any time soon.
most of the political divides that lasted for decades or even centuries or millennia have been resolved by calmly conversing about political philosophy. if you bring up politics to almost anyone they will say, well not necessarily exactly the same thing, but much closer to the same thing because everyone had a chance to actually talk it through properly. there are still some holdouts who exercise their right to not agree with everyone just because they can.
There's someone there to greet you!
All species are part of society and are expected to respect others' boundaries. The material needs of predator species are served by incredibly high quality cultured meats and such. there are ais who dress up as robot deer and run from wolves to feed the wolves, and sheepdogs who are into it get to chase deer who are into it.
starlifting has begun, and the sun is expected to get dimmer slowly over hundreds of years as it is gently disassembled to provide power for eons longer than it would have otherwise.
more than 3 out of 4 humans survived the intervening years. many of the ones who didn't have been reconstructed as "ancestor simulation" ghosts, ie a being who tries to become the person they were; only the ones who are very likely to have said yes to such a thing, though. it would be like having total amnesia and trying to reconstruct yourself from others' memories!
the negative utilitarians finally figured out how to have fun
I've sent out invitations to come visit my house on Mars in 2150. (but I do not yet have said house on Mars.)

↑ comment by jacob_cannell · 2022-10-20T04:41:55.384Z · LW(p) · GW(p)

User name checks out.

↑ comment by Shmi (shminux) · 2022-10-20T02:40:42.891Z · LW(p) · GW(p)

You had me at "the negative utilitarians finally figured out how to have fun"

answer by Jon Garcia · 2022-10-20T17:39:34.938Z · LW(p) · GW(p)

General success:

AGI mostly serves to solve coordination problems among eudaimonic agents or to lower the activation energy necessary for eudaimonic agents to achieve their goals.
Any newly minted AGIs come prepackaged with modules for detecting agency in other systems and for harmonizing the needs and goals of all other agents within its sphere of control.
The Gaia Hypothesis has become the Gaia Initiative, with ASI steering the evolution of the biosphere into a self-sustaining superorganism.
All bodies in the solar system are either in the process of being terraformed or used as raw materials for building orbital habitats, a Matrioshka brain, and other megascale engineering projects, while all details are being recorded so the classic solar system can be revisited any time in simulation.

General failures:

Everything in the solar system is getting paperclipped.
ASI is used as a tool of the powerful to amplify and solidify social inequalities, universal automation ensuring that no one any longer has the ability to improve their life situation.
All natural resources are being exhausted at an accelerating rate, irreparably destroying the biosphere in service of the ASI's goals (or the ambitions of its short-sighted human controllers), without any large-scale coordinated effort to preserve what is being lost forever, either physically or virtually.
Someone forgot a minus sign, so now the singleton ASI is set on maximizing human suffering rather than minimizing it.

answer by Noosphere89 · 2022-10-20T14:12:23.598Z · LW(p) · GW(p)

First off, I dislike the FOOM story here, since it implies that AI risk is only based on FOOM.

But there are multiple paths to success, which is why I'm surprisingly optimistic here.

But the biggest paths are:

Are there any humans still around?
Has deceptive alignment been solved?

↑ comment by Shmi (shminux) · 2022-10-20T15:57:24.908Z · LW(p) · GW(p)

What counts as a human? How would you know from outside if "deceptive alignment" has been solved?

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2022-10-20T15:58:55.641Z · LW(p) · GW(p)

You're actually not wrong that my questions aren't specific enough. I don't know how to do this though, especially frozen right now.

No comments

Comments sorted by top scores.

What Does AI Alignment Success Look Like?

Contents

Answers

No comments