What does a positive outcome without alignment look like?

post by Donald Hobson (donald-hobson) · 2020-05-09T13:57:23.464Z · LW · GW · 2 comments

This is a question post.

Contents

2 comments
AI-alignment:
I will take a bet at 10:1 odds that human-level AI will be developed before we have a working example of "aligned AI", that is an AI algorithm that provably incorporates human values in a way that is robust against recursive self-improvement.
Positive outcome to the singularity:
This is even more of a sucker bet than Foom vs Moof. However, my belief is closer to 1:1 than it is to 100:1, since I think there is a real danger that a hostile power such as China develops AI before us, or that we haven't developed sufficiently robust institutions to survive the dramatic economic upheaval that human-level AI will produce.

You clearly have some sort of grudge against or dislike of china. In the face of a pandemic, they want basically what we want, to stop it spreading and someone else to blame it on. Chinese people are not inherently evil.

But on to AI. First lets consider the concept of a Nash equilibria. A Nash equilibria is a game theoretic situation, in which everyone is doing the action that most benefits their utility function, conditional on everyone else following the equilibria. In other words, no one person can benefit from doing something different.

Democracy is a Nash equilibria. Given that everyone else is holding elections, arranging themselves into a democratic government and passing laws, its in your own best interests to play along.

Dictatorship is another Nash equilibria. If you are the dictator, you can order whatever you want and get it. If you are anyone else, you better do what the dictator wants, or the dictator will order someone to kill you.

In addition to making sure that AI isn't developed first by an organization hostile to Western liberal values, we also need to make sure that when AI is developed, it is born into a world that encourages its peaceful development. This means promoting norms of liberty, free trade and protection of personal property. In a world with multiple actors trading freely, the optimal strategy is one of trade and cooperation. Violence will only be met with countervailing force.

This is a description of a Nash equilibria in human society. Their stability depends on humans having human values and capabilities.

In the 20 to 50 year after Moof timeframe, you have some very powerful AI's running around. AI's that could easily wipe out humanity if they wanted to. AI's that have probably got molecular nanotech. AI's that could make more goods using less resources than any human. If anything a human can produce is less valuable to the AI than the resources needed to keep a human alive, then it isn't a good time to be a human.

There are probably many Nash equilibria between a group of super-intelligent AI's, and by steering the surroundings as the AI's grow, we might be able to exert some choice over which Nash equilibria is chosen. But I don't see why any of the Nash equilibria between superintelligences will be friendly to humans.

Suppose you had one staple maximising AI, and one paperclip maximiser. One nash equilibrium is working together to fill the universe with a mixture of both, while inspecting each others cognition for plans of defection. Humans are made of atoms that could be a mixture of paperclips and staples.

Another equilibrium could be a war. Two AI's trying to destroy the other, humans all killed in the crossfire. For humans to survive, you need an equilibrium where the AI's aren't shooting at each other, but if one of them converted the humans into an equal mix of staples and paperclips, the other would start shooting. Why would one AI start shooting because the other AI did an action that benefited both equally?

If you have several AI's and one of them cares about humans, it might bargain for human survival with the others. But that implies some human managed to do some amount of alignment.

Answers

2 comments

Comments sorted by top scores.

comment by Logan Zoellner (logan-zoellner) · 2020-05-09T14:46:54.554Z · LW(p) · GW(p)
You clearly have some sort of grudge against or dislike of china. In the face of a pandemic, they want basically what we want, to stop it spreading and someone else to blame it on. Chinese people are not inherently evil.

I certainly don't think the Chinese are inherently evil. Rather I think that from the view of an American in the 1990's a world dominated by a totalitarian China which engages in routine genocide and bans freedom of expression would be a "negative outcome to the rise of China".

This is a description of a Nash equilibria in human society. Their stability depends on humans having human values and capabilities.

Yes. Exactly. We should be trying to find a Nash equilibrium in which humans are still alive (and ideally relatively free to pursue their values) after the singularity. I suspect such a Nash equilibrium involves multiple AIs competing with strong norms against violence and focus on positive-sum trades.

But I don't see why any of the Nash equilibria between superintelligences will be friendly to humans.

This is precisely what we need to engineer! Unless your claim is that there is no Nash equilibrium in which humanity survives, which seems like a fairly hopeless standpoint to assume. If you are correct, we all die. If you are wrong, we abandon our only hope of survival.

Why would one AI start shooting because the other AI did an action that benefited both equally?

Consider deep seabed mining. I would estimate the percent of humans who seriously care (are are aware of the existence of) the sponges living at the bottom of the deep ocean at <1%. Moreover, there are substantial positive economic gains that could potentially be split among multiple nations from mining deep sea nodules. Nonetheless, every attempt to legalize deep sea mining has run unto a hopeless tangle of legal restrictions because most countries view blocking their rivals as more useful than actually mining the deep sea.

If you have several AI's and one of them cares about humans, it might bargain for human survival with the others. But that implies some human managed to do some amount of alignment.

I would hope that some AIs have an interest in preserving humans for the same reason some humans care about protecting life on the deep seabed, but I don't think this is a necessary condition for ensuring humanity's survival in a post-singularity world. We should be trying to establish a Nash equilibrium in which even insignificant actors have their values and existence preserved.

My point is, I'm not sure that aligned AI (in the narrow technical sense of coherently extrapolated values) is even a well-defined term. Nor do I think it is an outcome to the singularity we can easily engineer, since it requires us to both engineer such an AI and to make sure that it is the dominant AI in the post-singularity world.

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2020-05-09T18:24:09.782Z · LW(p) · GW(p)
This is precisely what we need to engineer! Unless your claim is that there is no Nash equilibrium in which humanity survives, which seems like a fairly hopeless standpoint to assume. If you are correct, we all die. If you are wrong, we abandon our only hope of survival.

What I am saying is that if you roll a bunch of random superintelligences, superintelligences that don't care in the slightest about humanity in their utility function, then selection of a Nash equilibria is enough to get a nice future. It certainly isn't enough if humans are doing the selection and we don't know what the AI's want or what technologies they will have. Will one superintelligence be sufficiently transparent to another superintelligence that they will be able to provide logical proofs of their future behaviour to each other? Where does the armsrace of stealth and detection end up? What about

If at least some of the AI's have been deliberately designed to care about us, then we might get a nice future.

From the article you link to

After the initial euphoria of the 1970s, a collapse in world metal prices, combined with relatively easy access to minerals in the developing world, dampened interest in seabed mining.

On the other hand, people do drill for oil in the ocean. It sounds to me like deep seabed mining is unprofitable or not that profitable, given current tech and metal prices.

I suspect such a Nash equilibrium involves multiple AIs competing with strong norms against violence and focus on positive-sum trades.

If you have a tribe of humans, and the tribe has norm then everyone is expected to be able to understand the norms. The norms have to be fairly straightforward to humans. Don't do X except for [100 subtle special cases] gets simplified to don't do X. This happens even when everyone would be better off with the special cases. When you have big corporations with legal teams, the agreements can be more complicated. When you have super-intelligences, the agreements can be Far more complicated. Humans and human organisations are reluctant to agree to a complicated deal that only benefits them slightly, from the overhead cost of reading and thinking about the deal.

Whatsmore, the Nash equilibria that humanity has been in has changed with technology and society. If a Nash equilibria is all that protects humanity, if an AI comes up with a way to kill off all humans and distribute their reasources equally, without any AI being able to figure out who killed the humans, then the AI will kill all humans. Nash equilibria are fragile to details of situation and technology. If one AI can build a spacecraft and escape to a distant galaxy, which will be over the cosmic event horizon before the other AI's can do anything, that changes the equilibrium. In a dyson swarm, one AI deliberately letting debris fly about might be able to Kessler syndrome the whole swarm, mutually assured destruction, but the debris deflection tech might improve and change the Nash equilibria.

My point is, I'm not sure that aligned AI (in the narrow technical sense of coherently extrapolated values) is even a well-defined term. Nor do I think it is an outcome to the singularity we can easily engineer, since it requires us to both engineer such an AI and to make sure that it is the dominant AI in the post-singularity world.

We need an AI that in some sense wants the world to be a nice place to live. If we were able to give a fully formal exact definition of this, we would be much further on at AI alignment. Saying that you want an image that is "beautiful and contains trees" is not a formal specification of the RGB values of each pixel. However, there are images that are beautiful and contain trees. Likewise saying you want an "aligned AI" is not a formal description of every byte of source code, but there are still patterns of source code that are aligned AI's.

Suppose someone figured out alignment and shared the result widely. Making your AI aligned is straightforward. Almost all the serious AI experts agree that AI risks are real and alignment is a good idea. All the serious AI research teams are racing to build an Aligned AI.

Scenario 2. Aligned AI is a bit harder than unaligned AI. However, all the worlds competent AI experts realise that aligned AI would benefit all, and that it is harder to align an AI when you are in a race. They come together into a single worldwide project to build aligned AI. They take their time to do things right. Any competing group is tiny and hopeless, partly because they make an effort to reach out to and work with anyone competent in the field.