Non-orthogonality implies uncontrollable superintelligence

stuart_armstrong

Non-orthogonality implies uncontrollable superintelligence

post by Stuart_Armstrong · 2012-04-30T13:53:53.700Z · LW · GW · Legacy · 47 comments

47 comments

Just a minor thought connected with the orthogonality thesis: if you claim that any superintelligence will inevitably converge to some true code of morality, then you are also claiming that no measures can be taken by its creators to prevent this convergence. In other words, the superintelligence will be uncontrollable.

47 comments

Comments sorted by top scores.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-05-02T01:45:54.020Z · LW(p) · GW(p)

And that for every X except x0, it is mysteriously impossible to build any computational system which generates a range of actions, predicts the consequences of those actions relative to some ontology and world-model, and then selects among probable consequences using criterion X.

Replies from: Wei_Dai, thomblake, Stuart_Armstrong, private_messaging

↑ comment by Wei Dai (Wei_Dai) · 2012-05-03T22:46:03.572Z · LW(p) · GW(p)

It sounds implausible when you put it like that, but suppose the only practical way to build a superintelligence is through some method that severely constrains the possible goals it might have (e.g., evolutionary methods, or uploading the smartest humans around and letting them self-modify), and attempts to build general purpose AIs/oracles/planning tools get nowhere (i.e., fail to be competitive against humans) until one is already a superintelligence.

Maybe when Bostrom/Armstrong/Yudkowsky talk about "possibility" in connection with the orthogonality thesis, they're talking purely about theoretical possibility as opposed to practical feasibility. In fact Bostrom made this disclaimer in a footnote:

The orthogonality thesis implies that most any combination of final goal and intelligence level is logically possible; it does not imply that it would be practically easy to endow a superintelligent agent with some arbitrary or human-respecting final goal—even if we knew how to construct the intelligence part.

But then who are they arguing against? Are there any AI researchers who think that even given unlimited computing power and intelligence on the part of the AI builder, it's still impossible to create AIs with arbitrary (or diverse) goals? This isn't Pei Wang's position, for example.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2013-09-30T17:39:36.970Z · LW(p) · GW(p)

There are multiple variations on the OT, and the kind that just say it is possible can't support the UFAI argument. The UFAI argument is conjunctive, and each stage in the conjunction needs to have a non-neglible probability, else it is a Pascal's Mugging

↑ comment by thomblake · 2012-05-02T15:57:58.257Z · LW(p) · GW(p)

I don't think I've seen that particular reversal of the position before. Neat.

↑ comment by Stuart_Armstrong · 2012-05-02T11:27:17.779Z · LW(p) · GW(p)

Yep. I'm calling that the "no Oracle, no general planning" position in my paper.

↑ comment by private_messaging · 2012-05-04T06:51:18.631Z · LW(p) · GW(p)

build any computational system which generates a range of actions, predicts the consequences of those actions relative to some ontology and world-model, and then selects among probable consequences using criterion X.

Nothing mysterious here: this naive approach has incredibly low payoff per computation, and even if you start with such system, and get it to be smart enough to make improvements, the first thing it'll be improving is changing it's architecture.

If I gave you 10^40 flops, which probably can support 'super intelligent' mind, your naive approach would still be dumber than a housecat on many tasks. For some world evolution & utility, you can do inverse of the 'simulate and choose' much better (think towering exponents times better) than brute-force 'try different actions'. In general you can't. Some functions are easier to find inverse of, than others. A lot easier.

comment by TheOtherDave · 2012-04-30T14:59:17.552Z · LW(p) · GW(p)

Well, yes.

Although I would also consider this conclusion to follow from the broader claim that if A is a superintelligence with respect to B, B cannot control A, regardless of whether there's a true code of morality (a question I'm not weighing in on here).

Well, unless you want to say that if A happens to want what B wants, or want what B would want if B were a superintelligence, or otherwise wants something that B endorses, or that B ought to endorse, or something like that (for example, if A is Friendly with respect to B), then B controls A, or acausally controls A, or something like that.

At which point I suspect we do better to taboo "control", because we're using it in a very strange way.

comment by XiXiDu · 2012-04-30T15:17:37.088Z · LW(p) · GW(p)

...if you claim that any superintelligence will inevitably converge to some true code of morality, then you are also claiming that no measures can be taken by its creators to prevent this convergence.

...if you claim that any superintelligent oracle will inevitably return the same answer given the same question, then you are also claiming that no measures can be taken by its creators to make it return a different answer.

Replies from: khafra, Stuart_Armstrong

↑ comment by khafra · 2012-04-30T16:07:45.719Z · LW(p) · GW(p)

Sounds uncontroversial, to me. I wouldn't expect to be able to create a non-broken AI, even a comparitively trivial one, that thinks 1+1=3. On the other hand, I do think I could create comparitively trivial AIs that leverage their knowledge of arithmetic to accomplish widely varying ends. Simultaneous Location and Mapping, for example, works for a search and rescue bot or a hunt/kill bot.

↑ comment by Stuart_Armstrong · 2012-04-30T18:14:26.184Z · LW(p) · GW(p)

Not exactly true... You need to conclude "can be taken by its creators to make it return a different answer while it remains an Oracle". With that caveat inserted, I'm not sure what your point is... Depending on how you define the terms, either your implication is true by definition, or the premise is agreed to be false by pretty much everyone.

Replies from: XiXiDu

↑ comment by XiXiDu · 2012-04-30T18:32:41.550Z · LW(p) · GW(p)

You need to conclude "can be taken by its creators to make it return a different answer while it remains an Oracle". With that caveat inserted, I'm not sure what your point is...

That was my point. If you accept the premise that superintelligence implies the adoption some sort of objective moral conduct, then it is no different from an oracle returning correct answers. You can't change that behavior and retain superintelligence. You'll end up with a retarded intelligence.

I was just stating an analog example that highlights the tautological nature of your post. But I suppose that was your intention anyway.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-04-30T18:58:53.139Z · LW(p) · GW(p)

Ah, ok :-) It just felt it was pulling intuitions in a different direction!

comment by DanielLC · 2012-04-30T22:02:40.123Z · LW(p) · GW(p)

I doubt either the orthogonality thesis or the parallel thesis you'd need for this argument are true. Some utility functions are more likely than others, but none are certain.

Since if the parallel thesis is true, the AI would be fulfilling CEV, I don't see the problem. It will do what you'd have done if you were smart enough.

comment by JGWeissman · 2012-04-30T16:55:58.210Z · LW(p) · GW(p)

if you claim that any superintelligence will inevitably converge to some true code of morality, then you are also claiming that no measures can be taken by its creators to prevent this convergence.

That seems obviously true, but what are your motivations for stating it? I was under the impression that people who make the claim accept the conclusion, think it's a good thing, and want to build an AI smart enough to find the "true universal morality" without worrying about all that Friendliness stuff.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-04-30T18:10:37.038Z · LW(p) · GW(p)

It's useful for hitting certain philosophers with. Canonical examples: moral realists sceptical of the potential power of AI.

Replies from: JGWeissman

↑ comment by JGWeissman · 2012-04-30T18:18:15.154Z · LW(p) · GW(p)

There are philosophers who believe that any superintelligence will inevitably converge to some true code of morality, and that superintelligence is controllable? Who?

Replies from: thomblake, Stuart_Armstrong

↑ comment by thomblake · 2012-04-30T19:38:48.084Z · LW(p) · GW(p)

As far as I can tell, it's pretty common for moral realists. More or less, the argument goes:

Morality is just what one ought to do, so anyone not suffering from akrasia that is correct about morality will do the moral thing
A superintelligence will be better than us at knowing facts about the world, like morality
(optional) A superintelligence will be better than us at avoiding akrasia
Therefore, a superintelligence will behave more morally than us, and will eventually converge on true morality.

Replies from: JGWeissman

↑ comment by JGWeissman · 2012-04-30T19:47:00.690Z · LW(p) · GW(p)

So, the moral realists believe a superintelligence will converge on true morality. Do they also believe that superintelligence is controllable? I had thought they would believe that superintelligence is uncontrollable, but approve of whatever it uncontrollably does.

Replies from: thomblake

↑ comment by thomblake · 2012-04-30T21:12:01.544Z · LW(p) · GW(p)

Ah, I missed that clause. Yes, that.

↑ comment by Stuart_Armstrong · 2012-04-30T18:33:29.263Z · LW(p) · GW(p)

Quite a few I know (not naming names, sorry!) who haven't thought through the implications. Hell, I've only put the two facts together recently in this form.

comment by TheAncientGeek · 2013-09-30T17:27:10.935Z · LW(p) · GW(p)

There's a certain probability that it would do the right thing anyway, a certain probability that it wouldn't and so on. The probability of an AGI turning unfriendly depends on those other probabilities, although very little attention has been given to moral realism/objectivism/convergence by MIRI.

comment by Vladimir_Nesov · 2012-05-03T11:18:22.740Z · LW(p) · GW(p)

In other words, the superintelligence will be uncontrollable.

Rather, a controllable superintelligence would be impossible to construct, for all goals other than whatever "absolute morality" the laws of the universe are conspiring for.

comment by Armok_GoB · 2012-05-02T21:28:11.668Z · LW(p) · GW(p)

Even if this was the case, by murder-pill logic a papercliper would stop self improving just below the relevant "superintelegence" threshold.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2012-05-02T23:47:47.856Z · LW(p) · GW(p)

Assuming it knew where that threshold was ahead of time.

Replies from: Armok_GoB

↑ comment by Armok_GoB · 2012-05-03T21:57:28.022Z · LW(p) · GW(p)

Well, yea. Probably a bunch of more complicated techniques it could still use in that case thou. Have some specific ideas, but debating over technicalities of irrelevant counterfactuals is not my idea of fun so I wont mention them.

comment by Eugine_Nier · 2012-05-01T02:24:46.678Z · LW(p) · GW(p)

There are still reasons to control it, e.g., making sure it doesn't destroy the earth before converging on the true morality.

These are also theories that the true morality comes from the interaction of multiple agents and that therefore a single super-powerful agent won't necessarily converge on it.

comment by Furcas · 2012-04-30T23:32:09.005Z · LW(p) · GW(p)

No worries, the true code of morality is more or less the one that modern western-educated people have anyway.

Obviously.

comment by lukstafi · 2012-04-30T18:57:47.902Z · LW(p) · GW(p)

So you think that true moral behavior excludes choice? (More generally, once someone chooses their morality, no more choices remain to be made?)

Replies from: XiXiDu

↑ comment by XiXiDu · 2012-04-30T19:26:55.104Z · LW(p) · GW(p)

So you think that true moral behavior excludes choice? (More generally, once someone chooses their morality, no more choices remain to be made?)

I think so. What choice is there in the field mathematics? I don't see that mathematicians ever had any choice but to eventually converge at the same answer given the same conjecture. Why would that be different given an objective morality?

I thought that is what the Aumann's agreement theorem states and the core insight of TDT, that rational agents will eventually arrive at the same conclusions and act accordingly.

The question would be what happens if a superintelligence was equipped with goals that contradict reality. If there exists an objective morality and the goal of a certain AI was to maximize paperclips while maximizing paperclips was morally wrong then that would be similar to the goal of proving that 1+1=3 or to attain faster-than-light propagation.

ETA I suppose that if there does exist a sort of objective morality but it is inconsistent with its goals, then you would end up with an unfriendly AI anyway. Since such an AI would attempt to pursue its goals given the small probability that there is no objective morality for one reason or the other.

Replies from: lukstafi

↑ comment by lukstafi · 2012-05-01T05:53:05.282Z · LW(p) · GW(p)

I think so. What choice is there in the field mathematics? I don't see that mathematicians ever had any choice but to eventually converge at the same answer given the same conjecture.

Mathematics is not an agent, it cannot be controlled anyway. But mathematicians have choice over what branch of math to pursue.

Replies from: XiXiDu

↑ comment by XiXiDu · 2012-05-01T10:17:55.273Z · LW(p) · GW(p)

Mathematics is not an agent, it cannot be controlled anyway. But mathematicians have choice over what branch of math to pursue.

An expected utility maximizer has no choice but to pursue the world state it assigns the highest expected utility. The computation to determine which world state has the highest expected utility is completely deterministic. The evidence it used to calculate what to do was also not a matter of choice.

Replies from: lukstafi

↑ comment by lukstafi · 2012-05-01T12:14:31.647Z · LW(p) · GW(p)

I don't think that every consequentialist view of ethics reduces to equating morality with maximizing an arbitrary but fixed utility function which leaves no action as morally neutral.
Under bounded resources, I think there is (and I think remains as the horizon expands with the capability of the system) plenty of leeway in the "Pareto front" of actions judged at a given time not to be "likely worse in the long term" than any other action considered.
The trajectory of a system depends on its boundary conditions even if the dynamic is in some sense "convergent", so "convergence" does not exclude control over the particular trajectory.

comment by Incorrect · 2012-04-30T15:27:20.388Z · LW(p) · GW(p)

I don't understand how to construct a consistent world view that involves the premise. Could you state the premise as a statement about all computable functions?

Replies from: Stuart_Armstrong, ciphergoth

↑ comment by Stuart_Armstrong · 2012-05-02T11:38:32.576Z · LW(p) · GW(p)

Let's give it a try... In the space of computable functions, there is a class X that we would recognize as "having goal G". There is a process SI we would identify as self-improvement. Then converge implies that for nearly any initial function f, the process SI will result in f being in X.

If you want to phrase this in an updateless way, say that "any function with property SI is in X", defining X as "ultimately having goal G".

↑ comment by Paul Crowley (ciphergoth) · 2012-05-02T07:31:52.258Z · LW(p) · GW(p)

If you want a complete, coherent account of what non-orthogonality would be, you'll have to ask one of its proponents.

comment by shrink · 2012-05-01T05:34:40.091Z · LW(p) · GW(p)

There's so much that can go wrong with such reasoning, given that intelligence (even at the size of a galaxy of Dyson spheres) is not a perfect God, as to render such arguments irrelevant and entirely worthless. Furthermore there's enough ways how the non-orthogonality can hold, such as e.g. almost all intelligences with wrong moral systems crashing or failing to improve, that are not covered by 'converges'.

meta: Tendency to talk seriously about products of very bad reasoning really puts an upper bracket on the sanity of newcomers to LW. As is the idea that very bad argument trumps authority (when it comes to the whole topic).

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-05-01T14:17:45.462Z · LW(p) · GW(p)

What type of reasoning would you prefer to be used when talking about superintelligences?

Replies from: shrink

↑ comment by shrink · 2012-05-02T16:02:56.320Z · LW(p) · GW(p)

Would you take criticism if it is not 'positive' and doesn't give you alternative method to use for talking about same topic? Faulty reasoning has unlimited domain of application - you can 'reason' about purpose of the universe, number of angels that fit on a tip of a pin, of what superintelligences would do, etc. In those areas, non-faulty reasoning can not compete in terms of providing a sort of pleasure from reasoning, or in terms of interesting sounding 'results' that can be obtained with little effort and knowledge.

You can reason what particular cognitive architecture can do on a given task given N operations; you can reason what the best computational process can do in N operations. But that will involve actually using mathematics, and results will not be useful for unintelligent debates in the way in which your original statement is useful (I imagine you could use it to reply to someone who believes in absolute morality, as a soundbite; i really don't see how it could have any predictive power what so ever about the superintelligence though).

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-05-03T11:05:18.400Z · LW(p) · GW(p)

I am interested in anything that allows better reasoning about these topics.

Mathematics has a somewhat limited use when discussing the orthogonality thesis. AIXI, and some calculations about the strength of optimisation processes and stuff like that. But when answering the question "is it likely that humans will build AIs with certain types of goals", we need to look beyond mathematics.

I won't pretend the argument in this post is strong - it's just, to use the technical term, "kinda neat" and I'd never seen it presented this way before.

What would you consider reasonable reasoning on questions like the orthogonality thesis in practice?

Replies from: shrink

↑ comment by shrink · 2012-05-04T05:46:57.673Z · LW(p) · GW(p)

That's how religions were created, you know - they could not actually answer why lightning is thundering, why sun is moving through the sky, etc. So they did look way 'beyond' the non-faulty reasoning, in search for answers now (being inpatient), and got answers that were much much worse than no answers at all. I feel LW is doing precisely same thing with AIs. Ultimately, when you can't compute the right answer in the given time, you will either have no answer or compute a wrong one.

On the orthogonality thesis, it is the case that you can't answer this question given limited knowledge and time (got to know AI's architecture first), and any reasonable reasoning tells you this, while LW's pseudo-rationality keeps giving you wrong answers (that aren't any less wrong than anyone else including the mormon church and any other weird religious group), I don't quite sure what you guys are doing wrong; maybe the focus on biases and conflation of biases with stupidity did lead to a fallacy that lack of (known) biases will lead to non stupidity, i.e. smartness, and if only you won't be biased you'll have a good answer. It doesn't work like this. It leads to another wrongness.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-05-04T09:55:53.253Z · LW(p) · GW(p)

Ultimately, when you can't compute the right answer in the given time, you will either have no answer or compute a wrong one.

But if the question is possibly important and you have to make a decision now, you have to make a best guess. How do you think we should do that?

Replies from: shrink, XiXiDu

↑ comment by shrink · 2012-05-04T15:21:43.234Z · LW(p) · GW(p)

It was definitely important to make animals come, or to make it rain, tens thousands years ago. I'm getting a feeling that as I tell you that your rain making method doesn't work, you aren't going to give up trying if I don't provide you with an airplane, a supply of silver iodide, flight training, runway, fuel, and so on (and even then the method will only be applicable to some days, while the pray for rain is applicable any time).

As for the best guess, if you suddenly need a best guess on a topic because someone told you of something and you couldn't really see a major flaw in vague reasoning of the sort that can arrive at anything via a minor flaw on every step, that's a backdoor other agents will exploit to take your money (those agents will likely also opt to modify their own beliefs somewhat, because, hell, it feels a lot better to be saving mankind than to be scamming people). What is actually important to you, is your utility, and the best reasoning here is strategic: do not leave backdoors open.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-05-08T10:18:31.740Z · LW(p) · GW(p)

Not a relevant answer. You have given me no tools to estimate the risks or lack thereof in AI development. What methods do you use to reach conclusions on these issues? If they are good, I'd like to know them.

Replies from: shrink

↑ comment by shrink · 2012-05-09T08:08:35.223Z · LW(p) · GW(p)

If you want to maximize your win, it is a relevant answer.

For the risk estimate per se, I think one needs not so much methods as a better understanding of the topic, which is attained by studying the field of artificial intelligence - in non cherry picked manner - and takes a long time. If you want easier estimate right now, you could try to estimate how privileged is the hypothesis that there is the risk. (There is no method that would let you calculate the wave from spin down and collision of orbiting black holes without spending a lot of time studying GR, applied mathematics, and computer science. Why do you think there's a method for you to use to tackle even harder problem from first principles?)

Best yet, ban of thinking of it as risk (we have introduced, for instrumental reasons, the burden of proof on those whom say there is no risk, when it comes to new drugs etc, and we did so solely because introduction of random chemicals into a well evolved system is much more often harmful than beneficial. In general there is no reason to put burden of proof on those whom say there is no wolf, especially not when people screaming wolf get candy for doing so), and think of it as prediction of what happens in 100 years. Clearly, you would not listen to philosophers whom use ideals for predictions.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-05-09T09:32:09.796Z · LW(p) · GW(p)

Thank you for your answer. I don't think the methods you describe are much good for predictions. On the other hand, few methods are much good for predictions anyway.

I've already picked up a few online AI courses to get some background; emotionally this has made me feel that AI is likely to be somewhat less powerful than anticipated, but that it's motivations are more certain to be more alien than I'd thought. Not sure how much weight to put on these intuitions.

↑ comment by XiXiDu · 2012-05-04T11:48:04.379Z · LW(p) · GW(p)

Ultimately, when you can't compute the right answer in the given time, you will either have no answer or compute a wrong one.

But if the question is possibly important and you have to make a decision now, you have to make a best guess. How do you think we should do that?

How do you know that you have to make a decision now? You don't know when AGI is going to be invented. You don't know if it will be a quick transition from expert systems towards general reasoning capabilities or if AGI will be constructed piecewise over a longer period of time. You don't know if all that you currently believe to know will be rendered moot in future. You don't know if the resources that you currently spend on researching friendly AI are a wasted opportunity because all that you could possible come up with will be much easier to come by in future.

All that you really know at this time is that smarter than human intelligence is likely possible and that something that is smarter is hard to control.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2012-05-04T14:17:32.808Z · LW(p) · GW(p)

How do you know that you have to make a decision now?

How do you know we don't? Figuring out whether there is urgency or not is one of those questions whose solution we need to estimate... somehow.

Non-orthogonality implies uncontrollable superintelligence

Contents

47 comments