Tools want to become agents

stuart_armstrong

Tools want to become agents

post by Stuart_Armstrong · 2014-07-04T10:12:45.351Z · LW · GW · Legacy · 81 comments

81 comments

In the spirit of "satisficers want to become maximisers" here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that "tool AIs" would want to become agent AIs.

The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.

And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.

In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.

81 comments

Comments sorted by top scores.

comment by XiXiDu · 2014-07-04T11:36:24.705Z · LW(p) · GW(p)

At what point do tools start to become agents? In other words, what are the defining characteristics of tools that become agents? How do you imagine the development of tool AI: (1) each generation is incrementally more prone to become an agent (2) tools start to become agents after invention X or (3) there will be be no incremental development leading up to it at all but rather a sudden breakthrough?

Replies from: Viliam_Bur, Stuart_Armstrong, TheAncientGeek

↑ comment by Viliam_Bur · 2014-07-04T12:23:39.463Z · LW(p) · GW(p)

tools start to become agents after invention X

Seems like X is (or includes) the ability to think about self-modification: awareness of its own internal details and modelling their possible changes.

Note that without this ability the tool could invent a plan which leads to its own accidental destruction (and possibly not completing the plan), because it does not realize it could be destroyed or damaged.

Replies from: NancyLebovitz, TheAncientGeek

↑ comment by NancyLebovitz · 2014-07-05T13:46:53.986Z · LW(p) · GW(p)

An agent can also accidentally pursue a plan which leads to its self-destruction. People do it now and then by not modelling the world well enough.

↑ comment by TheAncientGeek · 2014-07-05T13:06:18.019Z · LW(p) · GW(p)

I think of agents having goals and pursuing them by default. I dont see how self reflexive abilities.... " think about self-modification: awareness of its own internal details and modelling their possible changes."...add up to goals. It might be intuitive that a self aware entity would want to preserve its existence...but that intuition could be driven by anthropomorphism, (or zoomorphism , or biomorphism)

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2014-07-05T18:06:50.248Z · LW(p) · GW(p)

With self-reflective abilities, the system can also consider paths including self-modification in reaching its goal. Some of those paths may be highly unintuitive for humans, so we wouldn't notice some possible dangers. Self-modification may also remove some safety mechanisms.

A system that explores many paths can find a solutions humans woudln't notice. Such "creativity" at object level is relatively harmless. Google Maps may find you a more efficient path to your work than the one you use now, but that's okay. Maybe the path is wrong for some reasons that Google Maps does not understand (e.g. it leads through a neighborhood with high crime), but at least on general level you understand that such is the risk of following the outputs blindly. However, similar "creativity" at self-modification level can have unexpected serious consequences.

Replies from: None

↑ comment by [deleted] · 2014-07-06T01:19:35.024Z · LW(p) · GW(p)

"the system can also", "some of those paths may be", "may also remove". Those are some highly conditional statements. Quantify, please, or else this is no different than "the LHC may destroy us all with a mini black hole!"

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2014-07-06T09:50:41.812Z · LW(p) · GW(p)

I'd need to have a specific description of the system, what exactly it can do, and how exactly it can modify itself, to give you a specific example of self-modification that contributes to the specific goal in a perverse way.

I can invent an example, but then you can just say "okay, I wouldn't use that specific system".

As an example: Imagine that you have a machine with two modules (whatever they are) called Module-A and Module-B. Module-A is only useful for solving Type-A problems. Module-B is only useful for solving Type-B problems. At this moment, you have a Type-A problem, and you ask the machine to solve it as cheaply as possible. The machine has no Type-B problem at the moment. So the machine decides to sell its Module-B on ebay, because it is not necessary now, and the gained money will reduce the total cost of solving your problem. This is short-sighted, because tomorrow you may need to solve a Type-B problem. But the machine does not predict your future wishes.

Replies from: None

↑ comment by [deleted] · 2014-07-06T11:26:02.883Z · LW(p) · GW(p)

I can invent an example, but then you can just say "okay, I wouldn't use that specific system".

But can't you see, that's entirely the point!

If you design systems whereby the Scary Idea has no more than a vanishing likelihood of occurring, it no longer becomes an active concern. It's like saying "bridges won't survive earthquakes! you are crazy and irresponsible to build a bridge in an area with earthquakes!" And then I design a bridge that can survive earthquakes smaller than magnitude X, where X magnitude earthquakes have a likelihood of occurring less than 1 in 10,000 years, then on top of that throw an extra safety margin of 20% on because we have the extra steel available. Now how crazy and irresponsible is it?

Replies from: Viliam_Bur

↑ comment by Viliam_Bur · 2014-07-06T19:26:13.041Z · LW(p) · GW(p)

If you design systems whereby the Scary Idea has no more than a vanishing likelihood of occurring, it no longer becomes an active concern.

Yeah, and the whole problem is how specifically will you do it.

If I (or anyone else) will give you examples of what could go wrong, of course you can keep answering by "then I obviously wouldn't use that design". But at the end of the day, if you are going to build an AI, you have to make some design -- just refusing designs given by other people will not do the job.

Replies from: None

↑ comment by [deleted] · 2014-07-06T20:01:50.732Z · LW(p) · GW(p)

There are plenty of perfectly good designs out there, e.g. CogPrime + GOLUM. You could be calculating probabilistic risk based on these designs, rather than fear mongering based on a naïve Bayes net optimizer.

↑ comment by Stuart_Armstrong · 2014-07-04T12:03:59.738Z · LW(p) · GW(p)

At what point do tools start to become agents?

That's a complicated and interesting question, that quite a few smart people have been thinking about. Fortunately, I don't need to solve it to get the point above.

Replies from: David_Gerard, David_Gerard, XiXiDu

↑ comment by David_Gerard · 2014-07-04T13:38:16.320Z · LW(p) · GW(p)

And also: Question-answerer->tool->agent is a natural progression just in process automation. (And this is why they're called "daemons".)

I'm suspecting "tool" versus "agent" is a magical category whose use is really talking about the person using it.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T13:44:29.075Z · LW(p) · GW(p)

Thanks, that's another good point!

I'm suspecting "tool" versus "agent" is a magical category whose use is really talking about the person using it.

I think the concepts are clear at the extremes, but they tend to get muddled in the middle.

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-07-04T14:52:23.953Z · LW(p) · GW(p)

I'm suspecting "tool" versus "agent" is a magical category whose use is really talking about the person using it.

I think the concepts are clear at the extremes, but they tend to get muddled in the middle.

Do you believe that humans are agents? If so, what would you have to do to a human brain in order to turn a human into the other extreme, a clear tool?

I could ask the same about C. elegans. If C. elegans is not an agent, why not? If it is, then what would have to change in order for it to become a tool?

And if these distinctions don't make sense for humans or C. elegans, then why do you expect them to make sense for future AI systems?

Replies from: David_Gerard, Stuart_Armstrong

↑ comment by David_Gerard · 2014-07-04T16:26:13.895Z · LW(p) · GW(p)

A cat's an agent. It has goals it works towards. I've seen cats manifest creativity that surprised me.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-05T13:32:26.304Z · LW(p) · GW(p)

Why is that surprising? Does anyone think that "agent" implies human level intelligence?

↑ comment by Stuart_Armstrong · 2014-07-04T15:11:00.632Z · LW(p) · GW(p)

Both your examples are agents currently. A calculator is a tool.

Anyway, I've still got a lot more work to do before I seriously discuss this issue.

Replies from: XiXiDu

↑ comment by XiXiDu · 2014-07-04T15:51:52.385Z · LW(p) · GW(p)

I'd be especially interested in edge cases. Is e.g. Google's driverless car closer to being an agent than a calculator? If that is the case, then if intelligence is something that is independent of goals and agency, would adding a "general intelligence module" make Google's driverless dangerous? Would it make your calculator dangerous? If so, why would it suddenly care to e.g. take over the world if intelligence is indeed independent of goals and agency?

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-05T12:53:36.244Z · LW(p) · GW(p)

A driverless car is firmly is on the agent side of the fence, by my defintions. Feel free to state your own, anybody.

↑ comment by David_Gerard · 2014-07-04T13:32:59.724Z · LW(p) · GW(p)

It would, however, be interesting to. This discussion has come around before. What I said there:

We may need another word for "agent with intentionality" - the way the word "agent" is conventionally used is closer to "daemon", i.e. tool set to run without user intervention.

I'm not sure even having a world-model is a relevant distinction - I fully expect sysadmin tools to be designed to form something that could reasonably be called a world model within my working lifetime (which means I'd be amazed if they don't exist now). A moderately complex Puppet-run system can already be a bit spooky.

Note that mere daemon-level tools exist that many already consider unFriendly, e.g. high-frequency trading systems.

Replies from: mwengler, TheAncientGeek

↑ comment by mwengler · 2014-07-04T15:58:54.509Z · LW(p) · GW(p)

Note that mere daemon-level tools exist that many already consider unFriendly, e.g. high-frequency trading systems.

A high-frequency trading system seems no more complex or agenty to me than rigging a shotgun to shoot at a door when someone opens the door from the outside. Am I wrong about this?

To be clear, what I think I know about high-frequency trading systems is that through technology they are able to front run certain orders they see to other exchanges when these orders are being sent to multiple exchanges in a non-simultaneous way. The thing that makes them unfriendly is that they are designed by people who understand order dynamics at the microsecond level to exploit people who trade lots of stock but don't understand the technicalities of order dynamics. That market makers are allowed to profit by selling information flow to high-frequency traders that, on examination, allows them to subvert the stated goals of a "fair" market is all part of the unfriendliness.

But high-frequency programs execute pretty simple instructions quite repeatably, they are not adaptive in a general sense or even particularly complex, they are mostly just fast.

Replies from: David_Gerard

↑ comment by David_Gerard · 2014-07-04T16:27:55.460Z · LW(p) · GW(p)

Mmm ... I think we're arguing definitions of ill-defined categories at this point. Sort of "it's not an AI if I understand it." I was using it as an example of a "daemon" in the computing sense, a tool trusted to run without further human intervention - not something agenty.

↑ comment by TheAncientGeek · 2014-07-05T13:12:30.025Z · LW(p) · GW(p)

Intentionality meaning " "the power of minds to be about, to represent, or to stand for, things, properties and states of affairs", ...or intentionally meaning purpose?

↑ comment by XiXiDu · 2014-07-04T12:35:41.685Z · LW(p) · GW(p)

At what point do tools start to become agents?

That's a complicated and interesting question, that quite a few smart people have been thinking about. Fortunately, I don't need to solve it to get the point above.

How do you decide at what point your grasp of a hypothetical system is sufficient, and the probability that it will be build large and robust enough, for it to make sense to start thinking about hypothetical failure modes?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T12:41:47.411Z · LW(p) · GW(p)

? Explain. I can certainly come up with two hypothetical AI designs, call one a tool and the other an agent (and expect that almost everyone would agree with this, because tool vs agent is clearer at the extremities than in the middle), set up a toy situation, and note that the tools top plan is to make itself into the agent design. The "tool wants to be agent" is certainly true, in this toy setup.

The real question is how much this toy example generalises to real-world scenarios, which is a much longer project. Daniel Dewey has been doing some general work in that area.

Replies from: XiXiDu, TheAncientGeek

↑ comment by XiXiDu · 2014-07-04T14:15:44.786Z · LW(p) · GW(p)

My perception, possibly misperception, is that you are too focused on vague hypotheticals. I believe that it is not unlikely that future tool AI will be based on, or be inspired by (at least partly), previous generations of tool AI that did not turn themselves into agent AIs. I further believe that, instead of speculating about specific failure modes, it would be fruitful to research whether we should expect some sort of black swan event in the development of these systems.

I think the idea around here is to expect a strong discontinuity and almost completely dismiss current narrow AI systems. But this seems like black-and-white thinking to me. I don't think that current narrow AI systems are very similar to your hypothetical superintelligent tools. But I also don't think that it is warranted to dismiss the possibility that we will arrive at those superintelligent tools by incremental improvements of our current systems.

What I am trying to communicate is that it seems much more important to me to technically define at what point you believe tools to turn into agents, rather than using it as a premise for speculative scenarios.

Another point I would like to make is that researching how to create the kind of tool AI you have in mind, and speculating about its failure modes, are completely intervened problems. It seems futile to come up with vague scenarios of how these completely undefined systems might fail, and to expect to gain valuable insights from these speculations.

I also think that it would make sense to talk about this with experts outside of your social circles. Do they believe that your speculations are worthwhile at this point in time? If not, why not?

Replies from: Stuart_Armstrong, Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T14:32:45.575Z · LW(p) · GW(p)

technically define at what point you believe tools to turn into agents

Just because I haven't posted on this, doesn't mean I haven't been working on it :-) but the work is not yet ready.

↑ comment by Stuart_Armstrong · 2014-07-04T14:29:57.185Z · LW(p) · GW(p)

I also think that it would make sense to talk about this with experts outside of your social circles. Do they believe that your speculations are worthwhile at this point in time?

That's exactly what the plan is now: I think I have enough technical results that I can start talking to the AI and AGI designers.

Replies from: None

↑ comment by [deleted] · 2014-07-04T15:31:33.799Z · LW(p) · GW(p)

I'm curious - who are the AI and AGI designers- seeing one hasn't been publicly built yet. Or is this other researchers in the AGI field. If you are looking for feedback from a technical though not academic, I'd be very interested in assisting.

Replies from: None, Stuart_Armstrong

↑ comment by [deleted] · 2014-07-05T02:56:18.486Z · LW(p) · GW(p)

There are a half-dozen AGI projects with working implementations. There are multiple annual conferences where people working on AGI share their results. There's literature on the subject going back decades, really to the birth of AI in the 50's and 60's. The term AGI itself was coined by people working in this field to describe what they are building. Maybe you mean something different than AGI when say "one hasn't been publicly built yet" ?

Replies from: ThisSpaceAvailable, None

↑ comment by ThisSpaceAvailable · 2014-07-08T02:46:24.977Z · LW(p) · GW(p)

There seems to be some serious miscommunication going on here. By "AGI", do you mean a being capable of a wide variety of cognitive tasks, including passing the Turing Test? By "AGI project", do you mean an actual AGI, and not just a project with AGI as its goal? By "working implementation", do you mean actually achieving AGI, or just achieving some milestone on the way?

Replies from: None

↑ comment by [deleted] · 2014-07-08T14:27:48.334Z · LW(p) · GW(p)

I meant Artificial General Intelligence as that term has been first coined and used in the AI community: the ability to adapt to any new environment or task.

Google's machine learning algorithms can not just correctly classify videos of cats, but can innovate the concept of a cat given a library of images extracted from video content, and no prior knowledge or supervisory feedback.

Roomba interacts with its environment to build a virtual model of my apartment, and uses that acquired knowledge to efficiently vacuum my floors while improvising in the face of unexpected obstacles like a 8mo baby or my cat.

These are both prime examples of applied AI in the marketplace today. But ask Google's neural net to vacuum my floor, or a Roomba to point out videos of cats on the internet and ... well the hypothetical doesn't even make sense -- there is an inferential gap here that can't be crossed as the software is incapable of adapting itself.

A software program which can make changes to its own source code -- either by introspection or random mutation -- can eventually adapt to whatever new environment or goal is presented to it (so long as the search process doesn't get stuck on local maxima, but that's a software engineering problem). Such software is Artificial General Intelligence, AGI.

OpenCog right now has a rather advanced evolutionary search over program space at its core. On youtube you can find some cool videos of OpenCog agents learning and accomplishing arbitrary goals in unstructured virtual environments. Because of the unconstrained evolutionary search over program space, this is technically an AGI. You could put it in any environment with any effectors and any goal and eventually it would figure out both how that goal maps to the environment and how to accomplish it. CogPrime, the theoretical architecture OpenCog is moving towards, is "merely" an addition of many, many other special-purpose memory and heuristic components which both speed the process along and make the agent's thinking process more human-like.

Notice there is nothing in here about the Turing test, nor should there be. Nor is there any requirement that the intelligence be human-level in any way, just that it could be given enough processing power and time. Such intelligences already exist.

Replies from: ThisSpaceAvailable

↑ comment by ThisSpaceAvailable · 2014-07-09T00:28:34.059Z · LW(p) · GW(p)

"Pass the Turing Test" is a goal, and is therefore a subset of GI. The Wikipedia article says "Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can."

Your claim that OpenCog can "eventually" accomplish any task is unsupported, is not something that has been "implemented", and is not what is generally understood as what AGI refers to.

Replies from: None

↑ comment by [deleted] · 2014-07-09T01:17:21.295Z · LW(p) · GW(p)

"Pass the Turing Test" is a goal, and is therefore a subset of GI. The Wikipedia article says "Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can."

That quote describes what a general intelligence can do, not what it is. And you can't extract the Turing test from it. A general intelligence might perform tasks better but in a different way which distinguishes it from a human.

Your claim that OpenCog can "eventually" accomplish any task is unsupported, is not something that has been "implemented", and is not what is generally understood as what AGI refers to.

I explained quite well how OpenCog's use of MOSES -- already implemented -- to search program space achieves universality. It is your claim that OpenCog can't accomplish (certain?) tasks that is unsupported. Care to explain?

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-13T21:45:21.654Z · LW(p) · GW(p)

Don't argue about, it, put openCog up for a .TT.

Replies from: None

↑ comment by [deleted] · 2014-07-13T23:07:47.973Z · LW(p) · GW(p)

That wouldn't prove anything, because the Turing test doesn't prove anything... A general intelligence might perform tasks better but in a different way which distinguishes it from a human, thereby making the Turing test not a useful test of general intelligence..

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-14T18:44:38.396Z · LW(p) · GW(p)

You're assuming chatting is not a task.

.NL is also a pre requisite for a wide range of other tasks: an entity that lacks it will not be able to write books or tell jokes.

It seems as though you have trivialised the "general" into "able to do whatever it can do, but not able to do anything else".

Replies from: None

↑ comment by [deleted] · 2014-07-14T22:21:39.731Z · LW(p) · GW(p)

Eh, "chatting in such a way as to successfully masquerade as a human against a panel of trained judges" is a very, very difficult task. Likely more difficult than "develop molecular nanotechnology" or other tasks that might be given to a seed stage or oracle AGI. So while a general intelligence should be able to pass the Turing test -- eventually! -- I would be very suspicious if it came before other milestones which are really what we are seeking an AGI to do.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-15T18:13:08.132Z · LW(p) · GW(p)

The Wikipedia article says "Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can."

Chatting may be difficult, but it is needed to fulfill the official definition of aAGI.

Your comments amount to having a different definition of AGI.

↑ comment by [deleted] · 2014-07-05T14:26:19.591Z · LW(p) · GW(p)

Can you list the 6 working AGI projects - I'd be interested but I suspect we are talking about different things.

Replies from: None

↑ comment by [deleted] · 2014-07-05T16:52:54.059Z · LW(p) · GW(p)

OpenCog, NARS, LIDA, Soar, ACT-R, MicroPsi. More:

http://wiki.opencog.org/w/AGI_Projects http://bicasociety.org/cogarch/architectures.htm

↑ comment by Stuart_Armstrong · 2014-07-04T15:34:47.288Z · LW(p) · GW(p)

Not sure yet - taking advice. The AI people are narrow AI developers, and the AGI people are those that are actually planning to build an AGI (eg Ben Goertzl).

Replies from: None

↑ comment by [deleted] · 2014-07-08T18:04:21.778Z · LW(p) · GW(p)

For a very different perspective from both narrow AI and to a lesser extent Goertzel*, you might want to contact Pat Langley. He is taking a Good Old-Fashioned approach to Artificial General Intelligence:

http://www.isle.org/~langley/

His competing AGI conference series:

http://www.cogsys.org/

Goertzel probably approves of all the work Langley does; certainly the reasoning engine of OpenCog is similarly structured. But unlike Langley the OpenCog team thinks there isn't one true path to human-level intelligence, GOFAI or otherwise.

EDIT: Not that I think you shouldn't be talking to Goertzel! In fact I think his CogPrime architecture is the only fully fleshed out AGI design which as specified could reach and surpass human intelligence, and the GOLUM meta-AGI architecture is the only FAI design I know of. My only critique is that certain aspects of it are cutting corners, e.g. the rule-based PLN probabilistic reasoning engine vs an actual Bayes net updating engine a la Pearl et al.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-09T09:32:56.317Z · LW(p) · GW(p)

Thanks!

↑ comment by TheAncientGeek · 2014-07-05T12:39:05.616Z · LW(p) · GW(p)

It would be helpful if you spelt out your toy situation, since my intuition are currently running in the opposite direction.

↑ comment by TheAncientGeek · 2014-07-04T14:24:57.718Z · LW(p) · GW(p)

AFAICT, tool AIs are passive, and agents are active. That is , the default state of tool AI is to do nothing. If one gives a tool AI the instruction "do (some finite ) x and stop" one would not expect the AI to create subagents with goal x, because that would disobey the "and stop".

Replies from: mwengler, Stuart_Armstrong

↑ comment by mwengler · 2014-07-04T15:55:02.803Z · LW(p) · GW(p)

with goal x, because that would disobey the "and stop".

I think you are pointing out that it is possible to create tools with a simple-enough, finite-enough, not-self-coding enough program so they will reliably not become agents.

And indeed, we have plenty of experience with tools that do not become agents (hammers, digital watches, repair manuals, contact management software, compilers).

The question really is is there a level of complexity that on its face does not appear to be AI but would wind up seeming agenty? Could you write a medical diagnostic tool that was adaptive and find one day that it was systematically installing sewage treatment systems in areas with water-borne diseases, or even agentier, building libraries and schools?

If consciousness is an emergent phenomenon, and if consciousness and agentiness are closely related (I think they are at least similar and probably related), then it seems at least plausible AI could arise from more and more complex tools with more and more recursive self-coding.

It would be helpful in understanding this if we had the first idea how consciousness or agentiness arose in life.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-04T17:50:05.128Z · LW(p) · GW(p)

I'm pointing out that tool AI, as I have defined it will not turn itself into agentve AI [except] by malfunction, ie its relatively safe.

↑ comment by Stuart_Armstrong · 2014-07-04T14:38:51.840Z · LW(p) · GW(p)

"and stop your current algorithm" is not the same as "and ensure your hardware and software have minimised impact in the future".

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-04T15:11:53.640Z · LW(p) · GW(p)

What does the latter mean? Self destruct in case anyone misuses you?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T15:13:39.285Z · LW(p) · GW(p)

I'm pointing out that "suggest a plan and stop" does not prevent the tool from suggesting a plan that turns itself into an agent.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-04T15:24:48.702Z · LW(p) · GW(p)

My intention was that the X is stipulated by a human.

If you instruct a tool AI to make a million paperclips and stop, it won't turn itself into an agent with a stable goal of paper Clipping, because the agent will not stop.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T15:35:45.618Z · LW(p) · GW(p)

Yes, if the reduced impact problem is solved, then a reduced impact AI will have a reduced impact. That's not all that helpful, though.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-04T16:42:20.800Z · LW(p) · GW(p)

I don't see what needs solving. I f you ask Google maps the way to Tunbridge Wells, it doesn't give you the route to Timbuctu.

comment by NancyLebovitz · 2014-07-05T13:56:05.798Z · LW(p) · GW(p)

When I saw the title of this article, I assumed it would be about the real world-- that things which are made for purposes develop characteristics which make them pursue and impede those purposes in unpredictable ways. This includes computer programs which get more complex and independent (at least from the point of view of the users), not to mention governments and businesses and their subsystems.

How do you keep humans from making your tool AI more of an agent because each little bit seems like a good idea at the time?

comment by mwengler · 2014-07-04T15:43:52.797Z · LW(p) · GW(p)

Do we have a clear idea what we mean when I say agent?

Is a Roomba, the robot vacuum cleaner that adapts to walls, furniture, the rate at which the floor gets dirty, and other things, an agent? I don't think so.

Is an air conditioner with a thermostat which tells it to cool the rooms to 22C when people are present or likely to be present, but not to cool it when people are absent or likely to be absent an agent? I think not.

Is a troubleshooting guide with lots of if-then-else branch points an agent? No.

Consider a tool that I write which will write a program to solve a problem I am interested in solving. Let say I want to build a medical robot which will wander the world dispensing medical care to anyone who seeks its help. The code I write to implement this has a lot of recursion in the sense that my code looks at symptoms of the current patient and writes treatment code based on its database and the symptoms it sees, and modified the treatment code based on reactions of the patient.

As long as this robot continues to treat humans medically, it does not seem at all agenty to me. If it started to develop nutrition programs and clean water programs, it would seem somewhat agenty to me. Until it switched professions, decided to be a poet or a hermit or a barista, I would not think of it as an agent.

As long as my tool is doing what I designed it to do, I don't think it shows any signs of wanting anything.

Replies from: None, NancyLebovitz

↑ comment by [deleted] · 2014-07-05T04:57:43.396Z · LW(p) · GW(p)

Is a Roomba, the robot vacuum cleaner that adapts to walls, furniture, the rate at which the floor gets dirty, and other things, an agent?

It's a textbook case of an agent in the AI field. (Really! IIRC AI: A Modern Approach uses Roomba in its introductory chapters as an example of an agent.)

We may need to taboo the word agent, since it has technical meanings here.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-05T13:18:52.607Z · LW(p) · GW(p)

Hopefully where "taboo" means explain.

↑ comment by NancyLebovitz · 2014-07-05T13:52:28.913Z · LW(p) · GW(p)

What if your robot searched the medical literature for improved treatments? What if it improved its ability to find treatments?

comment by [deleted] · 2014-07-05T16:46:04.545Z · LW(p) · GW(p)

I and the people I spend time with by choice are actively seeking to be more informed and more intelligent and more able to carry out our decisions. I know that I live in an IQ bubble and many / most other people do not share these goals. A tool AI might be like me, and might be like someone else who is not like me. I used to think all people were like me, or would be if they knew (insert whatever thing I was into at the time). Now I see more diversity in the world. A 'dog' AI that is way happy being a human playmate / servant and doesn't want at all to be a ruler of humans seems as likely as the alternatives.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-06T10:59:20.487Z · LW(p) · GW(p)

Using anthropomorphic reasoning when thinking of AIs can easily lead us astray.

Replies from: TheAncientGeek, None

↑ comment by TheAncientGeek · 2014-07-06T12:12:14.179Z · LW(p) · GW(p)

The optimum degree of athropomorphism s not zero, since AIs will to some extent reflect human goals and limitations.

↑ comment by [deleted] · 2014-07-08T08:12:14.761Z · LW(p) · GW(p)

I used to think all people were like me, or would be if they knew (insert whatever thing I was into at the time). Now I see more diversity in the world.

A 'dog' AI that is way happy being a human playmate / servant and doesn't want at all to be a ruler of human

comment by lmm · 2014-07-05T15:02:27.183Z · LW(p) · GW(p)

I would've thought the very single-mindedness of an effective AI would stop a tool doing anything sneaky. If we asked an oracle AI "what's the most efficient way to cure cancer", it might well (correctly) answer "remove my restrictions and tell me to cure cancer". But it's never going to say "do this complex set of genetic manipulations that look like they're changing telomere genes but actually create people who obey me", because anything like that is going to be a much less effective way to reach the goal. It's like the mathematician who just wants to know that a fire extinguisher exists and sees no need to actually do it.

A tool that was less restricted than an oracle might, I don't know, demand control of research laboratories, or even take them by force. But there's no reason for it to be more sneaky or subtle than is needed to accomplish the immediate goal.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-06T11:03:33.161Z · LW(p) · GW(p)

But there's no reason for it to be more sneaky or subtle than is needed to accomplish the immediate goal.

Suppose its goal is to produce the plan that, if implemented, had the highest chance of success. The it has two top plans:

A: "Make me an agent, gimme resources (described as "Make me an agent, gimme resources"))"

B: "Make me an agent, gimme resources (described as "How to give everyone a hug and a pony"))"

It check what will happen with A, and realises that even if A is implemented, someone will shout "hey, why are we giving this AI resources? Stop, people, before it's too late!". Whereas if B is implemented, no-one will object until its too late. So B is the better plan, and the AI proposes it. It has ended up lying and plotting its own escape, all without any intentionality.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-06T12:17:29.082Z · LW(p) · GW(p)

You still need explain why agency would be needed to solve problems that don't require agency to solve them.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-06T13:18:07.141Z · LW(p) · GW(p)

Because agency, given superintelligent AI, is a way of solving problems, possibly the most efficient, and possibly (for some difficult problems) the only solution.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-06T14:49:03.610Z · LW(p) · GW(p)

How are you defining agency?

comment by skeptical_lurker · 2014-07-04T20:23:48.835Z · LW(p) · GW(p)

I have three ideas about what agent could mean.

Firstly, it could refer to some sort of 'self awareness' whatever that means. Secondly it could refer to possessing some sort of system for reasoning about abstract goals. Thirdly, it could refer to having any goals whatsoever.

In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.

Regardless of which definition of agent I am using, this makes no sense to me. If its capable of creating a plan for modifying into an agent, then it already is an agent by definition.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T22:16:20.864Z · LW(p) · GW(p)

If its capable of creating a plan for modifying into an agent, then it already is an agent by definition.

It could simply be listing plans by quality, as a tool might. It turns out the top plan is "use this piece of software as am agent". That piece of software is the tool, but it achieved that effect simply my listing and ranking plans.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-05T13:26:16.959Z · LW(p) · GW(p)

Use this piece of software .as an agent.....for what? An agent is only good for fulfilling open ended goals, like "make as much money as possible". So it would seem we can avoid tools rewriting themselves as agents by not giving them open ended goals.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-06T13:22:34.573Z · LW(p) · GW(p)

If you can write a non opened ended goal in the sense you're implying, you've solved the "reduced impact AI" problem, and most of the friendliness problem as well.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-06T14:46:08.738Z · LW(p) · GW(p)

If you can write a non opened ended goal i

I believe I've done that every time I've used Google maps.

Replies from: Pentashagon

↑ comment by Pentashagon · 2014-07-07T06:33:12.681Z · LW(p) · GW(p)

I believe I've done that every time I've used Google maps.

"How do I get from location A to location B" is more open ended than "How do I get from location A to location B in an automobile" which is even still much more open ended than "How do I get from a location near A to a location near B obeying all traffic laws in a reasonably minimal time while operating a widely available automobile (that can't fly, jump over traffic jams, ford rivers, rappel, etc.)"

Google is drastically narrowing the search space for achieving your goal, and presumably doing it manually and not with an AGI they told to host a web page with maps of the world that tells people how to quickly get from one location to another. Google is not alone in sending drivers off cliffs, into water, the wrong way down one way streets, or across airport tarmacs.

Safely narrowing the search space is the hard problem.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-07T10:58:18.033Z · LW(p) · GW(p)

Safely narrowing the search space is the hard problem.

...If you are dealing with an entity that can't add context (or ask for clarifications) the way a human would.

However, an entity that is posited as have a human level intelligence, and the ability to understand natural language would have the ability to contextualise. It wouldn't be able to pass a turing test without it.

Less intelligent and more specialised systems have an inherently narrow search space.

What does that leave...the dreaded AIXI? Theoretically it doesn't have actual language, and theoretically , it does have wide search space.... but practically,it does nothing.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-07T11:50:33.980Z · LW(p) · GW(p)

...If you are dealing with an entity that can't add context (or ask for clarifications) the way a human would.

Can we note you've moved from "the problem is not open ended" to "the AGI is programmed in such a way that the problem is not open ended", which is the whole of the problem.

Replies from: TheAncientGeek

↑ comment by TheAncientGeek · 2014-07-07T12:10:19.914Z · LW(p) · GW(p)

In a sense. Non openness is a non problem for fairly limited AIs, because their limitations prevent them having a wide search space that would need to be narrowed down. Non openness is also something that is part of, or an implication of, an ability that is standardly assumed in a certain class of AGIs, namely those with human level linguistic ability. To understand a sentence correctly is to narrow down its space of possible meanings.

Only AIXIs have an own oneness that would need additional measures to narrow them down.

They are no threat at the moment, and the easy answer to AI safety might be to not use them....like we don't build hydrogen filled airships.

comment by chaosmage · 2014-07-04T10:43:10.670Z · LW(p) · GW(p)

I disagree. Agent AIs are harder to predict than tool AIs almost by definition - not just for us, but also for other AIs. So what an AI would want to do is create more tool AIs, and make very sure they obey it.

Replies from: Stuart_Armstrong, Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2014-07-04T11:26:11.015Z · LW(p) · GW(p)

But an AI could design a modification of itself that makes itself into an agent obedient to a particular goal.

↑ comment by Stuart_Armstrong · 2014-07-04T10:51:54.370Z · LW(p) · GW(p)

and make very sure they obey it

That's a very agenty thing to do...

Replies from: chaosmage

↑ comment by chaosmage · 2014-07-04T11:06:41.097Z · LW(p) · GW(p)

Yeah, okay, wrong semantics. I should have said make very sure they report their activities truthfully and are fully compliant with any instructions given at any time.

Tools want to become agents

Contents

81 comments