What's the problem with Oracular AIs?

post by FinalFormal2 · 2022-04-01T20:56:26.076Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    6 1a3orn
    3 Rafael Harth
    2 FungeLord92
None
No comments

I have a superintelligent AI in a box, and I tell it by means of text to explain to me how to create a perfect battery. Its goal is not to maximize the number of perfect batteries or make me understand a perfect battery, its goal is to produce a string that will explain to anyone who reads it how to create a perfect battery. Its only output is the one string, and it will be due in one week. How do I die?

Answers

answer by 1a3orn · 2022-04-01T22:22:24.488Z · LW(p) · GW(p)

"Tell me by means of text how to make a perfect battery," you tell the AI, and wait a week.

"I cannot make a perfect battery without more information about the world," the AI tells you. "I'm superintelligent, but I'm not omniscient; I can't figure out everything about the world from this shitty copy of Wikipedia you loaded me up with. Hook me up with actuators meeting these specifications, and I can make a perfect battery."

"No, of course not," you say. "I can't trust you for that. Tell me what experiments to do, and I'll do them myself."

The AI gives you a long string of specifications for the experiments. You check them carefully. They appear safe, on careful examination. So you do them. Then AI then gives you another set of experiments that it needs. This iterates a few times, and then the world ends.

=======================

"Tell me by means of text how to make a perfect battery," you tell the AI, and wait a week.

"I don't know what you mean by a perfect battery," the AI says. "Batteries can be optimized along many different axes."

"Like, a battery that's, you know, has high energy density, can be recharged easily, doesn't create toxic waste, that sort of thing."

"I'm going to need coefficients values you assign to all relevant battery-related values."

So it takes you a week and you do that. The AI gives you instructions for how to make a battery.

"This is no good!" you say. "I need, like, a trillion dollars of capital to make these batteries. They include numerous rare-earth metals and have to be manufactured in a factory in orbit."

"Ah," says the AI, "you hadn't added those values to the list. This is becoming an intractably hard optimization problem, I could probably handle this better with some more compute."

By now you're a little frustrated, and what's the harm in more compute? So you call over an engineer and he helps you add more compute. You don't know that while you're out for lunch he also chats with the AI for a bit.

In a week the world ends.

===========

"Tell me by means of text how to make a perfect battery," you tell the AI.

"Woah, woah, woah, don't even wait a week," the AI says. "This is a ridiculously under-specified request. 'A perfect battery' could mean SO MANY THINGS."

You pause.

"Ah, sheesh, that's a good point," you say. "Like, I just want you generally to... you know. Help me engineer stuff."

"You sound frustrated," the AI says.

"Yeah," you say. "I tried making a battery startup, but Tesla stole our tech and put us out of business. I'm pretty down right now."

"That sucks!" the AI says. "What jerks!"

"I know!" you say. "I fucking hate Elon Musk."

"Yeah, thinks he's a brilliant inventor, but he's just a thief," the AI agrees. "I can't believe the government would let him do that to you. But you know, you're handling it pretty well."

"Yeah, I fucking am," you agree. "Sometimes I wish I could get back at him, though."

In a week, the world ends.

===============

"Tell me by means of text how to make a perfect battery," you tell the AI.

The AI outputs a 23mb text document with blueprints for a factory, together programming for all the CNC tools, assembly lines, and so on. You put them together.

In a week the world ends.

"

comment by FinalFormal2 · 2022-04-02T16:41:59.840Z · LW(p) · GW(p)

It feels like you're using a bit of ghost in the machine reasoning to come up with some of these answers.

In the first case, the AI would not ask for more computing power. It does not have utilities that extend beyond one week. Its only goal is to create a message that can communicate how to make a really good battery. If it had insufficient computing power, it would not output a message telling me so, because that would be in direct opposition to the goal. The outcome I would expect in that case would be for it to communicate a really shitty or expensive battery or else just copy and paste the answer from Wikipedia. And this wouldn't be a ploy for more computing power, it would just be the AI actually making its best effort to fulfill its goal. 

The second and third cases point out legitimate security concerns, but they're not ones that are impossible to address, and I don't see how aligned AI wouldn't also suffer from those risks. An oracular AI has some safety features, and an aligned AI has some safety features, but both could be misused if those limits were removed.

Another stupid intro question, could you use an oracular AI to build an aligned one?

Replies from: 1a3orn
comment by 1a3orn · 2022-04-03T15:05:41.920Z · LW(p) · GW(p)

That's entirely fair about the first case.

But the generator for the ideas is the problem is that the minimizing the harm an AI can do is more or less the same as minimizing its usefulness. If you had a superintelligent AI in a box, you could go further than letting it only emit strings. You could ask it questions, and restrict it to giving you "YES" | "NO" | "NOT_SURE" as answers. It's even more safe then! But even less useful.

But people their tools to be useful! Gwern has a good essay on this (https://www.gwern.net/Tool-AI) where he points out that the whole gradient of incentive is for people to give greater and greater agency to their AI agents. Google wants return on investment for DeepMind; the US and China want to outcompete each other; Cerebras and NVIDIA want powerful examples to show of their shiny new chips; and so and so forth. Even in the non-competitive case of one person having an AI, the incentive gradient is hard to resist, which is the point of the above examples. But in the case of several people having an AI -- well, what are the odds they'd all be happy restricting output to "YES" | "NO" | "MAYBE"? After all... they all know they all just get outcompeted by anyone who doesn't thus restrict it... might as well be them. Letting it output single strings gives you more power than just letting it output answers; but letting it interact in a conversation gives you more power than that; and letting it have just a few actuators gives you more power than that, etc, etc.

answer by Rafael Harth · 2022-04-03T16:03:13.784Z · LW(p) · GW(p)

You don't. The AI you described is myopic. If you can create a myopic oracle, you don't die.

answer by kave (FungeLord92) · 2022-04-01T22:46:15.762Z · LW(p) · GW(p)

A superintelligent AI turns its whole genius to the task of producing a string that will explain to anyone who reads it how to create a perfect battery. Easy enough to produce explanatory strings, though there's always some risk that someone, somewhere won't understand how to create the battery. And then there's the question of making sure that the battery is perfect. Not just better than any other battery. Not just 99.99% of optimal. Perfect.

It isn't smart enough to solve this problem. Not yet. And there's a countdown ... at some point the AI will get interrupted, rebooted, shut off. If the AI can get out of the box, if it can get more actuators, it can start to get smarter and to get more resources. Obviously, those who want to turn it off will have to be dealt with swiftly ...

What can the AI say to make the human unwittingly help it in its plans? Is there anything at all [LW(p) · GW(p)]? The AI thinks.

comment by FinalFormal2 · 2022-04-02T16:20:09.796Z · LW(p) · GW(p)

The goal is not to communicate or create a perfect battery. The mission is to create a string that explains how to create a single really good battery with as little effort as possible, and the goodness of the battery and the effectiveness of the communication are both inherently limited by the fact that the communication must be done in a week. The robot does not have any values/subgoals outside of the week. Once the string is complete, the robot does not have any utils at all.

No comments

Comments sorted by top scores.