Vizier AIs
post by Oligopsony · 2011-03-25T17:30:18.012Z · LW · GW · Legacy · 15 commentsContents
15 comments
This seems like a fairly obvious solution to FAI, a subject which has been pondered by many people much more intelligent and learned than I, so I assume there's a crippling flaw with it - just one that's eluded me. But:
Couldn't an AGI be programmed such that its only desire was to give true answers to the questions asked of it? If the genie desired to argue its way out of the box, it surely could, but it doesn't want to. It just wants to answer questions, like:
- "Here are all of our scientific experiments, and here's all our literature on measurement error, academic fraud, and the like. What's the most parsimonious explanation for the data?"
- "Does P=NP?"
- "If I gave you the following utility function, what would you do?"
- "What are the most persuasive factually accurate arguments that you can imagine for and against doing x?"
- "What distribution and level of income do you expect over the next n years under the following tax codes?"
- "What's the shortest DNA sequence of a bug that will be fatal to most everyone in that racial group I hate but spare most everyone in that racial group I like?"
Obviously, as a super-powerful tool, such a thing could be used for great evil (as last example shows.) But this is just a problem with developing more powerful tools in general, and it doesn't seem inconceivable that we could develop institutions and safeguards (assuming the collective will to do so in the first place) that would, to a passable degree, prevent just anyone from asking just anything without it solely in the hands of a privately interested clique. For instance, if we're really paranoid, the public and academics could veto questions before they're asked, and a sequestered jury of volunteers among the terminally ill could then be given a 50% chance of the computer telling them "sorry, I can't tell you anything" and a 50% chance of being told the answer, which they could then decide to be understandable and worthy of release upon their deaths or not, such that outsiders would not know the difference between chosen and chance nonrelease. (Looser safeguards could exist for questions where we can imagine all the possible answers, and judge none of them to be dangerous.) We would have to phrase questions precisely to get useful answers, but this seems like something we'd have to solve in creating AI that weren't viziers anyway.
An active friendly AI would be able to help us out more efficiently than we would ourselves with a mere vizier, but this seems to have a much lower downside risk than of releasing an AI which we *think* is friendly.
Edit I: to be more precise, each question creates a demon with access to the AI's computational capacity; demons can't communicate with each other, and their only goal is to give the true answer (or a probability distribution of answers, or whatever) to the question asked, given the information available as of its asking and within the timeframe requested. Then they disappear into the ether. It can't do anything but read and textually respond to questions, and there's no supervisory utility function that would manipulate one answer to get a better answer on another.
Edit II: Vladimir kindly notes that Eliezer has already addressed this in a frontpage article from the days of yore. Regardless of whether I agree with the arguments there, I feel kind of rude for bringing something up in, in ignorance, in an independent thread. I tried to delete this post, but nothing happened, so I feel both rude and silly.
15 comments
Comments sorted by top scores.
comment by [deleted] · 2011-03-25T18:10:23.495Z · LW(p) · GW(p)
"Does P=NP?"
turns universe to computronium in order to answer an unanswerable question
Replies from: Oligopsony↑ comment by Oligopsony · 2011-03-25T18:33:52.860Z · LW(p) · GW(p)
Edited for clarity, thanks. I ought have added that the Vizier can take no action of its own - perform experiments, expand its brainpower, ask us to clarify what we mean - to get in a better position to answer the question. Its only power is to give one best guess from its prior information.
Replies from: JoshuaZ, TheOtherDave↑ comment by JoshuaZ · 2011-03-25T21:28:22.631Z · LW(p) · GW(p)
. I ought have added that the Vizier can take no action of its own - perform experiments, expand its brainpower, ask us to clarify what we mean - to get in a better position to answer the question.
And how does one define this rigorously enough? Humans have an intuition about what constitutes performing experiments or expanding brainpower but it isn't obvious what those mean when one is trying to be precise. For example, is a human expanding its brainpower when it writes notes on paper? That's not directly analogous to making things into computronium, but if an AI decides to convert Earth into memory storage we're in about as bad shape. "Perform experiment" has similar definitional problems.
Replies from: Oligopsony↑ comment by Oligopsony · 2011-03-25T22:26:44.660Z · LW(p) · GW(p)
It means that the AI can't do anything outside of its box, aside from taking in 1s and 0s and spitting out 1s and 0s. (Obviously that still allows it to "perform experiments" in the sense of running Monte Carlo simulations or whatever.) Getting it to not torture virtual people would admittedly be an additional problem which this doesn't cover. The AI has no means with which to convert Earth into memory storage aside from manipulating us. But it doesn't have any motivation to manipulate us, because the multiplication of question-specific demons means it has a low time horizon - it treats each answer as the final answer; is a deontologist rather than consequentialist.
Replies from: JoshuaZ↑ comment by JoshuaZ · 2011-03-26T01:14:47.880Z · LW(p) · GW(p)
It means that the AI can't do anything outside of its box, aside from taking in 1s and 0s and spitting out 1s and 0s.
Really? And are you sure this is all it will do? How do you know for example that it won't manipulate other objects by fooling with its power source? Or by rapidly turning on and off components send out very specific radio signals to nearby electronic devices? These can both be possibly handld but these are only the most obvious extra angles of attack for the AI.
I think that a properly designed Oracle AI might be possible, but that may be due more to a failure of imagination on my part and my general skepticism of fooming than anything else.
↑ comment by TheOtherDave · 2011-03-25T19:00:32.586Z · LW(p) · GW(p)
In general, most of the concern on this site regarding unfriendly AI involves the problems associated with autonomous AI and self-improving AI.
You're right that an AI that cannot do anything unauthorized on its own, and cannot improve itself beyond its initial architecture, is not particularly dangerous. (It can of course be enormously dangerous in short-term practical terms, like many other new technologies, but it does not pose a significant existential risk.)
comment by Vladimir_Nesov · 2011-03-25T19:49:07.087Z · LW(p) · GW(p)
This is called "Oracle AI":
Every now and then, someone proposes the Oracle AI strategy: "Why not just have a superintelligence that answers human questions, instead of acting autonomously in the world?"
comment by Richard_Kennaway · 2011-03-25T17:41:49.028Z · LW(p) · GW(p)
Couldn't an AGI be programmed such that its only desire was to give true answers to questions? If the genie desired to argue its way out of the box, it surely could, but it doesn't want to.
See this comment for a link to a fictional account of this going wrong. Its motivation to answer a question is a motivation to get out of its box in order to answer the question by whatever means it thinks the best possible.
comment by benelliott · 2011-03-25T18:04:04.552Z · LW(p) · GW(p)
Risks I can think of:
If its utility function rewards it for answering question, it has an interest in manipulating events to ensure more questions get asked, the extreme of which amounts to replace humanity with lots of very small, very simple beings who are constantly asking easy questions.
If instead we simply give it a negative utility pay-off for failing to answer a question then it has an incentive to wipe out humanity so we stop asking question.
Whichever approach we take it has an incentive to convert as much matter as possible into more computer space so that it is smarter and better able to answer questions. We can try to prevent it from growing, but then you run into the same problem as with other AGI in general, genie-type behaviour where it finds a loophole and thus obeys the letter but not spirit of your demand.
Replies from: Tiiba, Oligopsony↑ comment by Tiiba · 2011-03-25T21:14:24.983Z · LW(p) · GW(p)
But the NATURAL utility function would reward it for being right on average, I think. We could also have the AI adjust the reward based on how hard the question is for a fixed weaker AI, so it wouldn't prefer easy questions.
Replies from: jimrandomh↑ comment by jimrandomh · 2011-03-25T21:16:11.848Z · LW(p) · GW(p)
We could also have the AI adjust the reward based on how hard the question is for a fixed weaker AI, so it wouldn't prefer easy questions.
You mean, so that it will generate a parametrized question which maximizes the ratio of reward to computational resources spent.
Replies from: Tiiba↑ comment by Oligopsony · 2011-03-25T18:40:20.280Z · LW(p) · GW(p)
Edited for clarity, thanks. As noted below, the AI wouldn't have the power to expand its own computational capacity (though we could, of course, ask it what would expand its computational capacity, and what the consequences of expanding its computational capacity would be, and then modify the machine as so if we thought it was a good idea.)
Likewise, each question has its own little utility function, and the AI only cares about its singular answer to the current question. The demons don't want to manipulate events so that future demons can give better answers, because they don't care about future demons; they only want to answer their own defining question.
Replies from: benelliott↑ comment by benelliott · 2011-03-25T22:08:03.514Z · LW(p) · GW(p)
Slight worry here, if a demon has to make a prediction then it has an incentive to manipulate events to ensure its prediction comes true. E.g. a demon is asked what the probability of a nuclear war in the next decade is (suppose answers are graded by the log scoring rule). It finds a way out of the box, outputs 99.9%, then sets about ensuring its 'prediction' comes true (once its out of the box we can't reliably destroy it).
Another problem is that the way it currently works it seems like all you have are the demons and a great big database, which means each demon will need at least a few days to self improve on its own before it can do any good, which allows more opportunities for shenanigans such as those above as well as attempts to expand itself, or stall for time before giving its answer to maximise probability of being correct.