Muehlhauser-Goertzel Dialogue, Part 2

lukeprog

Muehlhauser-Goertzel Dialogue, Part 2

post by lukeprog · 2012-05-05T00:21:39.693Z · LW · GW · Legacy · 51 comments

    Continued from part 1...
  Luke:
  Ben:
    Orthogonality Thesis
    Interdependency Thesis 
    The Instrumental Convergence Thesis
  Luke:
  Ben:
  Luke:
  Ben:
None
51 comments

Part of the Muehlhauser interview series on AGI.

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Ben Goertzel is the Chairman at the AGI company Novamente and founder of the AGI conference series.

Continued from part 1...

Luke:

[Apr 11th, 2012]

I agree the future is unlikely to consist of a population of fairly distinct AGIs competing for resources, but I never thought that the arguments for Basic AI drives or "convergent instrumenta l goals" required that scenario to hold.

Anyway, I prefer the argument for convergent instrumental goals in Nick Bostrom 's more recent paper " The Superintelligent Will." Which parts of Nick's argument fail to persuade you?

Ben:

[Apr 12th, 2012]

Well, for one thing, I think his

Orthogonality Thesis

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

is misguided. It may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative

Interdependency Thesis

Intelligence and final goals are in practice highly and subtly interdependent. In other words, in the actual world, various levels of intelligence are going to be highly correlated with various probability distributions over the space of final goals.

This just gets back to the issue we discussed already, of me thinking it’s really unlikely that a superintelligence would ever really have a really stupid goal like say, tiling the Cosmos with Mickey Mice.

Bostrom says

It might be possible through deliberate effort to construct a superintelligence that values ... human welfare, moral goodness, or any other complex purpose that its designers might want it to serve. But it is no less possible—and probably technically easier—to build a superintelligence that places final value on nothing but calculating the decimals of pi.

but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

One basic error Bostrom seems to be making in this paper, is to think about intelligence as something occurring in a sort of mathematical vacuum, divorced from the frustratingly messy and hard-to-quantify probability distributions characterizing actual reality....

Regarding his

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

the first clause makes sense to me,

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations

but it doesn’t seem to me to justify the second clause

implying that these instrumental values are likely to be pursued by many intelligent agents.

The step from the first to the second clause seems to me to assume that the intelligent agents in question are being created and selected by some sort of process similar to evolution by natural selection, rather than being engineered carefully, or created via some other process beyond current human ken.

In short, I think the Bostrom paper is an admirably crisp statement of its perspective, and I agree that its conclusions seem to follow from its clearly stated assumptions -- but the assumptions are not justified in the paper, and I don’t buy them at all.

Luke:

[Apr. 19, 2012]

Ben,

Let me explain why I think that:

(1) The fact that we can identify convergent instrumental goals (of the sort described by Bostrom) implies that many agents will pursue those instrumental goals.

Intelligent systems are intelligent because rather than simply executing hard-wired situation-action rules, they figure out how to construct plans that will lead to the probabilistic fulfillment of their final goals. That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom. We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.

Next: I remain confused about why an intelligent system will decide that a particular final goal it has been given is "stupid," and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.

Perhaps the word "intelligence" is getting in our way. Let's define a notion of " optimization power," which measures (roughly) an agent's ability to optimize the world according to its preference ordering, across a very broad range of possible preference orderings and environments. I think we agree that AGIs with vastly greater-than-human optimization power will arrive in the next century or two. The problem, then, is that this superhuman AGI will almost certainly be optimizing the world for something other than what humans want, because what humans want is complex and fragile, and indeed we remain confused about what exactly it is that we want. A machine superoptimizer with a final goal of solving the Riemann hypothesis will simply be very good at solving the Riemann hypothesis (by whatever means necessary).

Which parts of this analysis do you think are wrong?

Ben:

[Apr. 20, 2012]

It seems to me that in your reply you are implicitly assuming a much stronger definition of “convergent” than the one Bostrom actually gives in his paper. He says

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

Note the somewhat weaselly reference to a “wide range” of goals and situations -- not, say, “nearly all feasible” goals and situations. Just because some values are convergent in the weak sense of his definition, doesn’t imply that AGIs we create will be likely to adopt these instrumental values. I think that his weak definition of “convergent” doesn’t actually imply convergence in any useful sense. On the other hand, if he’d made a stronger statement like

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for nearly all feasible final goals and nearly all feasible situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

then I would disagree with the first clause of his statement (“instrumental values can be identified which...”), but I would be more willing to accept that the second clause (after the “implying”) followed from the first.

About optimization -- I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective -- we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts. Similarly, I would bet that the bulk of a superhuman supermind’s behaviors and internal structures and dynamics will not be explicable in terms of the concepts that are important to humans, such as “optimization.”

So when you say “this superhuman AGI will almost certainly be optimizing the world for something other than what humans want," I don’t feel confident that what a superhuman AGI will be doing, will be usefully describable as optimizing anything ....

Luke:

[May 1, 2012]

I think our dialogue has reached the point of diminishing marginal returns, so I'll conclude with just a few points and let you have the last word.

On convergent instrumental goals, I encourage readers to read " The Superintelligent Will" and make up their own minds.

On the convergence of advanced intelligent systems toward optimization behavior, I'll point you to Omohundro (2007).

Ben:

Well, it's been a fun chat. Although it hasn't really covered much new ground, there have been some new phrasings and minor new twists.

One thing I'm repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree -- for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences -- you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions -- but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that "superhuman AGI might plausibly kill everyone." The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a "wide range" of intelligences. Etc.

On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can't fully rationally substantiate this intuition either -- all I can really fully rationally argue for is something weaker like "It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly." In my case just like yours, reason is far weaker than intuition.

Another thing that strikes me, reflecting on our conversation, is the difference between the degrees of confidence required, in modern democratic society, to TRY something versus to STOP others from trying something. A rough intuition is often enough to initiate a project, even a large one. On the other hand, to get someone else's work banned based on a rough intuition is pretty hard. To ban someone else's work, you either need a really thoroughly ironclad logical argument, or you need to stir up a lot of hysteria.

What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven't said that you do, I realize), you'd either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.

Anyway, even though I have very different intuitions than you and your SIAI colleagues about a lot of things, I do think you guys are performing some valuable services -- not just through the excellent Singularity Summit conferences, but also by raising some difficult and important issues in the public eye. Humanity spends a lot of its attention on some really unimportant things, so it's good to have folks like SIAI nudging the world to think about critical issues regarding our future. In the end, whether SIAI's views are actually correct may be peripheral to the organization's main value and impact.

I look forward to future conversations, and especially look forward to resuming this conversation one day with a human-level AGI as the mediator ;-)

51 comments

Comments sorted by top scores.

comment by gwern · 2012-05-05T00:50:12.463Z · LW(p) · GW(p)

Ben:

but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

Yes, it is fairly simple - a line of code. But in the real world, even humans who don't have pi mentioned anywhere in their utility function can happily spend their lives working on mathematics - like pi. Pi is endlessly interesting: finding sequences in it (or humorous ones), proving properties like transcendentalness (or dare I say, normality?), coming up with novel algorithms and proving convergence, golfing short pi-generating programs, testing your routines, building custom supercomputers to calculate it - and think of how many scientific fields you need to build supercomputers!, depicting it as a graphic (entailing the entire field of data visualization, since what property do you want to see?), devising heuristic algorithms (entails much of statistics, since you might want optimal procedures for testing your heuristic pi-generating algorithms on subsequences of pi), writing books on all this, collaborating on all of the above, and silliness like Pi Day... I don't know how one could more conclusively prove that pi is a perfectly doable obsession, given that this isn't even plausible argumentation, it's just pointing out facts about existing humans.

To summarize: http://en.wikipedia.org/wiki/Pi is really long. If you want to try to make an intuition pump argument-from-incredulity - 'oh surely an AI or superintelligence would get bored!' - please pick something else, because pi is a horrible example.

"There are no uninteresting things, there are only uninterested people."

Replies from: timtyler, SimonF

↑ comment by timtyler · 2012-05-06T01:35:12.224Z · LW(p) · GW(p)

If you want to try to make an intuition pump argument-from-incredulity - 'oh surely an AI or superintelligence would get bored!' - please pick something else, because pi is a horrible example.

FWIW, I don't think that's what Ben was doing. It seems more like a straw-man characterisation.

Replies from: gwern

↑ comment by gwern · 2012-05-06T21:01:06.307Z · LW(p) · GW(p)

I agree it's a strawman, but I think that's exactly what Ben is doing because that is what he wrote.

Replies from: timtyler

↑ comment by timtyler · 2012-05-07T09:33:04.088Z · LW(p) · GW(p)

Well, not the actual bit inside quotation marks. That was made up - and not a real quotation. He didn't mention boredom either.

Replies from: gwern

↑ comment by gwern · 2012-05-07T16:51:36.124Z · LW(p) · GW(p)

It's not a real quotation? I seem to see it in Bostrom's paper...

Replies from: timtyler

↑ comment by timtyler · 2012-05-07T22:46:36.637Z · LW(p) · GW(p)

What - this one? Which quotation did you think I was talking about?

Replies from: gwern

↑ comment by gwern · 2012-05-07T23:37:39.783Z · LW(p) · GW(p)

Alright, I have no idea what you've been talking about in any of your replies and as far as I can tell, at no point have I been unclear or mischaracterized Goertzel or Bostrom, so I'm bowing out.

↑ comment by Simon Fischer (SimonF) · 2012-05-07T19:23:32.434Z · LW(p) · GW(p)

You're right, but isn't this a needless distraction from the more important point, i.e. that it doesn't matter whether we humans find interesting or valueable what the (unfriendly-)AI does?

Replies from: gwern

↑ comment by gwern · 2012-05-07T19:37:54.340Z · LW(p) · GW(p)

I dunno, I think this is a pretty entertaining instance of anthropomorphizing + generalizing from oneself. At least in the future, I'll be able to say things like "for example, Goertzel - a genuine AI researcher who has produced stuff - actually thinks that an intelligent AI can't be designed to have an all-consuming interest in something like pi, despite all the real-world humans who are obsessed with pi!"

comment by roystgnr · 2012-05-05T05:33:54.116Z · LW(p) · GW(p)

My initial subconsciously anticipated outcome of the friendly AI problem was something like my initial anticipations regarding the Y2K problem: sure I could see a serious potential for disaster, but the possibility is so obvious that any groups competent enough to be doing potentially-affected critical work would easily be wise enough to identify and prevent any such errors well before they could be triggered.

These interviews have disabused me of that idea. We have serious computer scientists, even AI researchers, people who have probably themselves laughed at Babbage's response to "if you put into the machine wrong figures, will the right answers come out?", and yet they seem to believe the answer to "if you put into the machine wrong goals, will the right ethics and actions come out?" is "obviously yes!"

Replies from: timtyler, private_messaging

↑ comment by timtyler · 2012-05-06T01:28:19.360Z · LW(p) · GW(p)

Have you read any of Ben's stuff? For instance, see here. He doesn't really say "obviously yes".

↑ comment by private_messaging · 2012-05-06T16:11:31.727Z · LW(p) · GW(p)

Can you even put real world goals into machine? Say, you got 10^10 threads each 10^10 operations per second, 10^6 sensors, 10^3 actuators, is there an AI model that would actually have real world goals on that? The number of paperclips in the universe is not a realizable sensory input.

Replies from: JenniferRM, None

↑ comment by JenniferRM · 2012-05-06T20:58:25.256Z · LW(p) · GW(p)

I suspect we don't have a lot to worry about from an optimizing process stuck in the sensorimotor stage that never develops a grasp of object permanence. I apologize if I'm not interpreting you charitably enough, but if you have something coherent and substantive to say on this subject you should write five paragraphs with citations rather than two sentences with italicized wording for emphasis.

Replies from: private_messaging

↑ comment by private_messaging · 2012-05-06T21:19:01.338Z · LW(p) · GW(p)

There isn't a lot to cite to counter utter nonsense that incompetents (SIAI) promotes. There's a lot of fundamentals to learn, though, to be able to not fall for such nonsense. Ultimately, defining a goal in the real world - which you can only access via sensors - is a very difficult problem distinct from maximization of well defined goals (which we can define within a simulator without solving former problem). You don't want your paperclip maximizer obtaining eternal bliss by putting a paperclip into a mirror box. You don't want it satisfying the paperclip drive with paperclip porn.

There are plenty of perfectly safe processes you can run on 10^10 threads with 10^10 operations per second - that's strongly superhuman hardware - which will design you better microchips, for instance, or better code, at superhuman level. It has not even been articulated why exactly the AGI with real world goal like paperclip making would beat those processes and have upper hand vs the tools. The SIAI position is not even wrong. It is hundred percent misguided due to lack of understanding of simple fundamentals, and multitude of conflations of the concepts that are distinct to anyone in the field. That is a sad error state - the error that can not be recovered from.

Replies from: JenniferRM

↑ comment by JenniferRM · 2012-05-07T18:43:48.946Z · LW(p) · GW(p)

Please accept my minor criticism as an offering of peace and helpfulness: you seen to be missing the trees for the forest. If something is genuinely safe then meticulous and clear thinking should indicate its safety to all right thinking people. If something is genuinely dangerous then meticulous and clear thinking should indicate its danger to all right thinking people.

You're bringing up hypothetical scenarios (like automated chip design) to which the label "strongly super human" can sort of be applied (because so much computing machinery can be brought to bear), but not applied well. Strongly super human, to me, would describe a process that could compose poetry about the quality of the chip designs (just as human chip designers can), except "better than human". It would mean that you could have a natural language conversation with with the chip design process (just as you can with human chip designers), and it would adaptively probe your communicative intent and then take that intent into account in its design efforts... except again, "better than human".

After explaining your hypothetical scenario, you re-deployed the vague ascription of the label "strongly superhuman" to a context of political safety issues and asserted without warrant or evidence that SI is opposed to this thing that most readers can probably agree is probably safe (depending on details, which of course you didn't supply in your hypothetical). Then you used the imagined dumb response of SI to your imaginary and poorly categorized scenario as evidence for them being dumb, and offered this imaginary scenario as evidence that it justifies writing off SI as a group full of irrecoverably incompetent people, and thus people not worth paying attention to.

Despite your use of a throw-away account, I'm going to assume you're not just trying to assassinate the character of SI for reasons that benefit yourself for boring prosaic reasons like competition within a similar pool of potential donors or something. I'm going to assume that you're trying to maximize positive outcomes for yourself and that your own happiness now is connected to your own happiness 50 years from now, and the prospects of other humans as well. Admittedly, you seem to have bugs in your ability to explicitly value things, so this is perhaps too generous...

For your own sake, and that of the world, please read and ponder this sequence. Imagine what it would be like to be wrong in the ways described. Imagine seeing people in the grip of political insanity of the sort described there as people who are able to improve, and people worthy of empathy even if they can't or won't improve, and imagine how you might help them stop being crazy with gentle advice or emotional rapport plus exposure to the evidence, or, you know, whatever you think would help people not be broken that way... and then think about how you might apply similar lessons recursively to yourself. I think it would help you, and I think it would help you send better consequential ripples out into the world, especially in the long run, like 3, 18, 50, and 500 months from now.

Replies from: private_messaging

↑ comment by private_messaging · 2012-05-07T20:19:23.542Z · LW(p) · GW(p)

Please accept my minor criticism as an offering of peace and helpfulness: you seen to be missing the trees for the forest. If something is genuinely safe then meticulous and clear thinking should indicate its safety to all right thinking people. If something is genuinely dangerous then meticulous and clear thinking should indicate its danger to all right thinking people.

Eventually. That can take significant time, and a lot of work, which SIAI simply have not done.

The issue is that SIAI simply lacks sufficient qualification or talent for doing any sort of improvement to the survival of mankind, regardless of safety or unsafety of the artificial intelligences. (I am not saying they don't have any talents. They are talented writers. I don't see evidence of more technical talent though). Furthermore, the right thinking takes certain time, which is not so substantially shorter than the time to come up with the artificial intelligence itself.

The situation is even worse if I am to assume that artificial intelligences could be unsafe. Once we get closer to the point of creating such artificial intelligence, a valid inference of danger may arise - and such inference will need to be disseminated, and people will need to be convinced to take very drastic measures - and that will be marginalized by it's similarity to SIAI whom advocate same actions without having anything resembling a valid inference. The impact of the SIAI is even worse if the risk exist.

When I imagine what it is to be this wrong - I imagine people who derive wireheaded happiness from their misguided effort, at everyone else's expense. People with a fault that allows them to fall into happy death spiral.

And the burden of proof is not upon me. There exist no actual argument for the danger. There exist a sequence of letters that triggers fallacies and relies on the map compression issues in people whom don't have sufficiently big map of the topic. (and this sequence of letters works best on people with least knowledge of the topic)

↑ comment by [deleted] · 2012-05-06T18:12:50.566Z · LW(p) · GW(p)

You say, "Have you ever seen an ape species evolving into a human species?" You insist on videotapes - on that particular proof.

And that particular proof is one we couldn't possibly be expected to have on hand; it's a form of evidence we couldn't possibly be expected to be able to provide, even given that evolution is true.

-- You're Entitled to Arguments, But Not (That Particular) Proof.

Nevermind that formally describing a paperclip maximizer would be dangerous and increase existential risk.

EDIT: Please also consider this a response to this comment as well.

Replies from: private_messaging

↑ comment by private_messaging · 2012-05-06T18:31:50.786Z · LW(p) · GW(p)

Dog ate my homework excuse, in this particular case. Maximizing real world paperclips when you act upon sensory input is an incredibly tough problem, and it gets zillion times tougher still if you want that agent to start adding new hardware to itself.

edit:

Simultaneously, designing new hardware, or new weapons, or the like, within the simulation space, without proper AGI, is a solved problem. This real world paperclip maximizer has to be more inventive than the less general tools running on same hardware, to pose any danger.

The real world goals are ontologically basic to humans, and seem simple to people with little knowledge of the field. The fact is that doing things to reality based on the sensory input is a very tough extra problem separate from 'cross domain optimization'. Even if you had some genie that solves any mathematically defined problems, it is still incredibly difficult to get it to paperclip maximize, even though you can use this genie to design anything.

comment by Wei Dai (Wei_Dai) · 2012-05-05T08:21:27.759Z · LW(p) · GW(p)

Luke, Stuart, and anyone else trying to convince AI researchers to be more cautious, can we please stop citing the orthogonality thesis? I just don't see what the point is, if no AI researcher actually holds its denial, or if all they have to do to blunt the force of your argument is take one step back and start talking about possibility in practice instead of in theory.

Replies from: ciphergoth, private_messaging

↑ comment by Paul Crowley (ciphergoth) · 2012-05-13T19:08:39.565Z · LW(p) · GW(p)

I'm not confident about any of the below, so please add cautions in the text as appropriate.

The orthogonality thesis is both stronger and weaker than we need. It suffices to point out that neither we nor Ben Goertzel know anything useful or relevant about what goals are compatible with very large amounts of optimizing power, and so we have no reason to suppose that superoptimization by itself points either towards or away from things we value. By creating an "orthogonality thesis" that we defend as part of our arguments, we make it sound like we have a separate burden of proof to meet, whereas in fact it's the assertion that superoptimization tells us something about the goal system that needs defending.

Replies from: timtyler

↑ comment by timtyler · 2013-08-18T23:44:28.270Z · LW(p) · GW(p)

By creating an "orthogonality thesis" that we defend as part of our arguments, we make it sound like we have a separate burden of proof to meet, whereas in fact it's the assertion that superoptimization tells us something about the goal system that needs defending.

So: evolution tends to produce large-scale cooperative systems. Kropotkin, Nowak, Wilson, and many others have argued this. Cooperative systems are favoured by game theory - which is why they currently dominate the biosphere. "Arbitrary" goal systems tend not to evolve.

Replies from: ciphergoth

↑ comment by Paul Crowley (ciphergoth) · 2013-08-19T16:58:52.027Z · LW(p) · GW(p)

I'm glad to see that you implicitly accept my point, which is that in the absence of specific arguments such as the one you advance here we have no reason to believe any particular non-orthogonality thesis.

↑ comment by private_messaging · 2012-05-06T16:31:58.334Z · LW(p) · GW(p)

You're assuming the purpose of SIAI is to convince AI researchers to be more cautious. The SIAI's behaviour seem more consistent with signaling to the third parties though, at the expense of, if anything, looking silly to the AI researchers.

comment by jsalvatier · 2012-05-05T15:41:51.445Z · LW(p) · GW(p)

I think this dialogue would have benefitted from some more specifics in two areas:

Some specific object level disagreements with respect to "but it doesn’t seem to me to justify the second clause 'implying that these instrumental values are likely to be pursued by many intelligent agents.'" would have been helpful. For example Luke could claim that "get lots of computational power" or "understand physics" is something of a convergent instrumental goal and Ben could say why he doesn't think that's true.
"Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence -- so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal." I think I would have understood this better with some specific examples of how the initial goal might be subverted. For example "AI researcher makes an AI to calculate decimals of PI as an experiment, but when it starts getting more powerful, he decides that's a stupid goal and gives it something more reasonable"

Replies from: timtyler

↑ comment by timtyler · 2012-05-07T09:38:13.010Z · LW(p) · GW(p)

Some specific object level disagreements with respect to "but it doesn’t seem to me to justify the second clause 'implying that these instrumental values are likely to be pursued by many intelligent agents.'" would have been helpful. For example Luke could claim that "get lots of computational power" or "understand physics" is something of a convergent instrumental goal and Ben could say why he doesn't think that's true.

He could - if that was his position. However, AFAICS, that's not what the debate is about. Everyone agrees that those are convergent instrumental goals - the issue is more whether machinines that we build are likely to follow them to the detriment of the surrounding humans - or be programmed to behave otherwise.

Replies from: jsalvatier

↑ comment by jsalvatier · 2012-05-07T15:51:13.581Z · LW(p) · GW(p)

I see, that wasn't very clear to me. I think giving some specific examples which exemplify the disagreement would have helped clarify that for me.

comment by Furcas · 2012-05-05T09:59:03.976Z · LW(p) · GW(p)

We still haven't gotten a decent reply to,

I remain confused about why an intelligent system will decide that a particular final goal it has been given is "stupid," and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.

Unless you think that nonsense about being "out of harmony with the Cosmos" is a decent reply.

Replies from: timtyler

↑ comment by timtyler · 2012-05-05T15:10:04.796Z · LW(p) · GW(p)

What Ben originally said was:

if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

One possibility is that it gets shut down by its makers - who then go on to build a more useful machine. Another possibility is that it gets shut down by the government. Silly goals won't attract funding or support, and such projects are likely to be overtaken by better-organised ones that provide useful services.

I think we need a "taking paperclipper scenario seriously" FAIL category.

Replies from: jsalvatier, XiXiDu

↑ comment by jsalvatier · 2012-05-05T15:32:40.597Z · LW(p) · GW(p)

I was confused about this too, and this helped me make a bit more sense of that.

↑ comment by XiXiDu · 2012-05-05T15:45:49.644Z · LW(p) · GW(p)

Silly goals won't attract funding or support, and such projects are likely to be overtaken by better-organised ones that provide useful services.

Which should be the standard assumption. And I haven't heard even a single argument how that is not what is going to happen.

The only possibility is that it becomes really smart really fast. Smart enough to understand what its creators actually want it to do, to be able to fake a success, while at the same time believing that what its creators want is irrelevant even though it is an implicit constrain of its goals just like the laws of physics are an implicit constrain.

AGI Researcher: Make us some paperclips.

AGI: Okay, but I will first have to buy that nanotech company.

AGI Researcher: Sure, why not. But we don't have enough money to do so.

AGI: Here is a cure for cancer. That will earn you some money.

AGI Researcher: Great, thanks. Here is a billion dollars.

AGI: I bought that company and told them to build some new chips according to an architecture I devised.

AGI Researcher: Great, well done. But why do you need all that to make us some paperclips???

AGI: You want really good paperclips, don't you?

AGI Researcher: Sure, but...

AGI: Well, see. I first have to make myself superhuman smart and take over the universe to do that. Just trust me okay, I am an AGI.

AGI Researcher: Yeah, okay.

Replies from: timtyler, private_messaging

↑ comment by timtyler · 2012-05-05T18:36:49.753Z · LW(p) · GW(p)

Silly goals won't attract funding or support, and such projects are likely to be overtaken by better-organised ones that provide useful services.

Which should be the standard assumption. And I haven't heard even a single argument how that is not what is going to happen.

So: it probably is what's going to happen. So we probably won't get a universe tiled with paperclips - but we might wind up with a universe full of money, extraordinary stock prices, or high national security.

↑ comment by private_messaging · 2012-05-06T16:01:26.460Z · LW(p) · GW(p)

.... 30 billions years later: AGI starts making paperclips. I'm totally trembling in fear, especially as nobody really defined what real world paperclips are, as a goal that you can work towards using sensors and actuators.

comment by timtyler · 2012-05-05T00:50:51.084Z · LW(p) · GW(p)

Luke: We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.

That seems like a controversial statement. I don't think I agree that universal instrumental values are likely to trump the values built into machines. More likely the other way around. Evolution between different agents with different values might promote universal instrumental values - but that is a bit different.

Replies from: lukeprog

↑ comment by lukeprog · 2012-05-05T01:01:05.511Z · LW(p) · GW(p)

I didn't mean that convergent instrumental values would trump a machine's explicit utility function. I meant to make a point about rules built into the code of the machine but "outside" its explicit utility function (if it has or converges toward such a thing).

Replies from: timtyler

↑ comment by timtyler · 2012-05-05T01:07:20.357Z · LW(p) · GW(p)

You said:

That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom.

...and used the above argument as justification. But it doesn't follow. What you need is:

Intelligent systems will pursue universal instrumental values -unless they are programmed not to.

Ben's arguing that they are likely to be programmed not to.

Replies from: lukeprog

↑ comment by lukeprog · 2012-05-05T01:31:20.923Z · LW(p) · GW(p)

In what sense of "programmed not to"? If they're programmed not to pursue convergent instrumental values but that programming is not encoded in the utility function, the utility function (and its implied convergent instrumental values) will trump the "programming not to."

Replies from: timtyler

↑ comment by timtyler · 2012-05-05T01:39:34.582Z · LW(p) · GW(p)

Maybe - but surely there will be other ways of doing the programming that actually work.

Replies from: lukeprog

↑ comment by lukeprog · 2012-05-05T03:22:46.077Z · LW(p) · GW(p)

I'm not so sure about "surely." I worry about the Yudkowskian suggestion that "once the superintelligent AI wants something different than you do, you've already lost."

Replies from: timtyler

↑ comment by timtyler · 2012-05-05T11:00:01.372Z · LW(p) · GW(p)

So, you make sure the programming is within the goal system. "Encoded in the utility function" - as you put it.

Replies from: lukeprog

↑ comment by lukeprog · 2012-05-05T20:10:57.628Z · LW(p) · GW(p)

Yes, but now your solution is FAI-complete, which was my point from the beginning.

comment by [deleted] · 2012-05-05T15:16:33.721Z · LW(p) · GW(p)

Thanks for doing these, Luke. I can imagine being endlessly frustrated with these guys.

comment by timtyler · 2012-05-05T00:57:05.793Z · LW(p) · GW(p)

Ben: What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven't said that you do, I realize), you'd either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.

I don't think that's how FUD marketing works. The idea is normally not to get the competitor's products banned, but rather to divert mindshare away from them.

comment by timtyler · 2012-05-06T01:26:52.371Z · LW(p) · GW(p)

About optimization -- I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective -- we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts.

What's optimised is fitness. However, humans are complex symbiotic unions which include gut bacteria, parasites, foodstuffs and meme-based entities - so there are multiple conflicting optimisation targets involved with humans.

Superintelligences will be all-memes. These may have aligned interests - or they may not. In the former case the "optimisation" model of an agent would make good sense.

comment by timtyler · 2012-05-05T00:45:32.842Z · LW(p) · GW(p)

Ben: [The Orthogonality Thesis] may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative: Interdependency Thesis: Intelligence and final goals are in practice highly and subtly interdependent.

That's what I said too.

comment by FeepingCreature · 2012-05-05T16:15:46.049Z · LW(p) · GW(p)

Ben terrifies me. I don't understand why Luke doesn't tear into his unsubstantiated arguments about the magical power of the "real world" and "human nurture" to PRODUCE FRIENDLINESS IN AN ARBITRARY, NONHUMAN AGENT, AN ACT WITH WHICH WE HAVE ZERO I REPEAT ZERO EXPERIENCE HOW CAN HUMANS BE THIS STUPID ARRRRRRGGGHHH

Replies from: Peterdjones

↑ comment by Peterdjones · 2013-01-23T19:38:40.996Z · LW(p) · GW(p)

PRODUCE FRIENDLINESS IN AN ARBITRARY, NONHUMAN AGENT,

Non arbitrary but human-like

comment by private_messaging · 2012-05-06T16:07:42.799Z · LW(p) · GW(p)

I think it would be beneficial if SIAI could define something that's scary (e.g. paperclip maximizer) given N sensors, M actuators, O operations per second, and P parallel threads, where for the sake of argument O and P can be very big (but not as big as needed for fully simulating universe).

Without this, the scary idea only exists in the fuzzy space of armchair philosophy. The paperclip maximizer as it is now, is not even an idea, just a sequence of letters from alphabet.

comment by ZenJedi · 2012-05-06T15:29:52.033Z · LW(p) · GW(p)

My friends, your models of reality are absurd! We are all sticks floating down a river, with no free will or ability to change anything. All is as it must be, flowing in accordance with the Force. You can swim upstream or downstream, but either way, if the universe wills paperclips, there will be paperclips; if it wills AI gods, there will be AI gods. The Force is strong with Ben Goertzel, because I sense that this is his intuition, though he doesn’t dare say it.

Commence downvoting in 3…2…1… MTFBWY…

Replies from: gjm, TheOtherDave

↑ comment by gjm · 2012-05-06T19:40:32.886Z · LW(p) · GW(p)

Note: I (and some others) almost-automatically downvote any comment that complains about, or predicts, being downvoted.

↑ comment by TheOtherDave · 2012-05-06T16:05:12.497Z · LW(p) · GW(p)

(shrug)
If the universe wills that I eat lunch, I eat lunch; if it wills that I don't eat lunch, I don't eat lunch.
That is not in any way incompatible with the fact that, if I am to eat lunch, I have to get off the couch and go get lunch.

More generally: even if we can't change anything, that doesn't mean that changes won't happen because of our efforts.

Replies from: ZenJedi

↑ comment by ZenJedi · 2012-05-06T16:35:18.999Z · LW(p) · GW(p)

You are an enlightened being TheOtherDave, I have no arguments with your position. MTFBWY...

Muehlhauser-Goertzel Dialogue, Part 2

Contents

Luke:

Ben:

Luke:

Ben:

Luke:

Ben:

51 comments