The AI design space near the FAI [draft]

dmytry

The AI design space near the FAI [draft]

post by Dmytry · 2012-03-18T10:29:41.859Z · LW · GW · Legacy · 49 comments

  Abstract:
  Nearly friendly AIs
  Should we be so afraid of the AIs made without attempts at friendliness?
  AI and its discoveries in physics and mathematics
  What can we do to avoid stepping onto UFAI when creating FAI
  Fermi Paradox
  Human biases when processing threats
None
49 comments

Abstract:

Nearly-FAIs can be more dangerous than AIs with no attempt at friendliness. The FAI effort needs better argument that the attempt at FAI decreases the risks. We are bad at processing threats rationally, and prone to very bad decisions when threatened, akin to running away from unknown into a minefield.

Nearly friendly AIs

Consider AI that truly loves mankind but decides that all of the mankind must be euthanized like an old, sick dog - due to chain of reasoning too long for us to generate when we test our logic of AI, or even comprehend - and proceeds to make a bliss virus - the virus makes you intensely happy, setting your internal utility to infinity; and keeping it so until you die. It wouldn't even take a very strongly superhuman intelligence to do that kind of thing. Treating life as if it was a disease. It can do so even if it destroys the AI itself. Or consider the FAI that cuts your brain apart to satisfy each hemisphere's slightly different desires. The AI that just wireheads everyone because it figured we all want it (and worst of all it may be correct).

It seems to me that one can find the true monsters in the design space near to the FAI, and even including the FAIs. And herein lies a great danger: bugged FAIs, the AIs that are close to friendly AI, but are not friendly. It is hard for me to think of a deficiency in friendliness which isn't horrifically unfriendly (restricting to deficiencies that don't break AI).

Should we be so afraid of the AIs made without attempts at friendliness?

We need to keep in mind that we have no solid argument that the AIs written without attempt at friendliness - the AIs that predominantly don't treat mankind in any special way - will necessarily make us extinct.

We have one example of 'bootstrap' optimization process - evolution - with not a slightest trace of friendliness in it. What did emerge in the end? We assign pretty low utility to nature, but non-zero, and we are willing to trade resources for preservation of nature - see the endangered species list and international treaties on whaling. It is not perfect, but I think it is fair to say that the single example of bootstrap intelligence we got values the complex dynamical processes for what they are, and prefers to obtain resources without disrupting those processes, even if it is slightly more expensive to do so, and is willing to divert small fraction of the global effort towards helping lesser intelligences.

In light of this, the argument that the AI that is not coded to be friendly is 'almost certainly' going to eat you for the raw resources, seems fairly shaky, especially when applied to irregular AIs such as neural networks, crude simulations of human brain's embryological development, and mind uploads. I didn't eat my cats yet (nor did they eat each other, nor did my dog eat 'em). I wouldn't even eat the cow I ate, if I could grow it's meat in a vat. And I have evolved to eat other intelligences. Growing AIs by competition seems like a very great plan for ensuring unfriendly AI, but even that can fail. (Superhuman AI only needs to divert very little effort to charity to be the best thing ever that happened to us)

It seems to me that when we try to avoid anthropomorphizing superhuman AI, we animize it, or even bacterio-ize it, seeing it as AI gray goo that certainly do the gray goo kind of thing, worst of all, intelligently.

Furthermore, the danger implies a huge conjunction of implied assumptions which all have to be true:

The self improvement must not lead to early AI failure via wireheading, nihilism, or more complex causes (thoroughly confusing itself by discoveries in physics or mathematics, ala MWI and our idea of quantum suicide).

The AI must not prefer for any reason to keep complex structures that it can't ever restore in the future, over things it can restore.

The AI must want substantial resources right here right now, and be unwilling to trade even a small fraction of resources or small delay for the preservation of mankind. That leaves me wondering what is exactly this thing which we expect the AI to want the resources for. It can't be anything like quest of knowledge or anything otherwise complex; it got to be some form of paperclips

At this point, I'm not even sure it is even possible to implement a simple goal that AGI won't find a way to circumvent. We humans do circumvent all of our simple goals: look at birth control, porn, all forms of art, msg in the food, if there's a goal, there's a giant industry providing some ways to satisfy it in unintended way. Okay, don't anthropomorphize, you'd say?

Add the modifications to the chess board evaluation algorithm to the list of legal moves, and the chess AI will break itself. This goes for any kind of game AI. Nobody has ever implemented an example that won't try to break the goals put in it, if given a chance. Give a theorem prover a chance to edit the axioms, or its truth checker, give the chess AI alteration of board evaluation function as a move, any other example, the AI just breaks itself.

In light of this, it is much less than certain that 'random' AI which doesn't treat humanity in very special way would substantially hurt humanity.

Anthropomorphizing is a bad heuristic, no doubt about that, but assuming that the AGI is in every respect opposite of the only known GI, is much worse heuristic. Especially when speaking of neural network, human brain inspired AGIs. I do get a feeling that this is what is going on with the predictions about AIs. Humans have complex value systems, certainly AGI has ultra simple value system. Humans masturbate their minor goals in many ways (including what we call 'sex' but which, in presence of condom, really is not), certainly AGI won't do that. Humans would rather destroy less complex systems, than more complex ones, and are willing to trade some resources for preservation of more complex systems, certainly AGI won't do that. It seems that all the strong beliefs about the AGIs which are popular here are easily predicted as the negation of human qualities. Negation of bias is not absence of bias, it's a worse bias.

AI and its discoveries in physics and mathematics

We don't know what sorts of physics AI may discover. It's too easy to argue from ignorance that it can't come up with physics where our morals won't make sense. The many worlds interpretation and quantum-suicidal thoughts of Max Tegmark should be a cautionary example. The AI that treats us as special and cares only for us will, inevitably, drag us along as it suffers some sort of philosophical crisis from collision of the notions we hard coded into it, and the physics or mathematics it discovered. The AI that doesn't treat us as special, and doesn't hard-code any complex human derived values, may both be better able to survive such shocks to it's value system, and be less likely to involve us in it's solutions.

What can we do to avoid stepping onto UFAI when creating FAI

As a software developer, I have to say, not much. We are very, very sloppy at writing specifications and code; those of us who believe we are less sloppy, are especially so - ponder this bit of empirical data, the Dunning-Kruger effect.

The proofs are of limited applicability. We don't know what sort of stuff the discoveries in physics may throw in. We don't know that axiomatic system we use to prove things is consistent - free of internal contradictions - and we can't prove that.

The automated theorem proving has very limited applicability - to easily provable, low level stuff like meeting of deadlines by a garbage collector or correct operation of an adder inside CPU. Even for the software far simpler than AIs - but more complicated than the examples above, the dominant form of development is 'run and see, if it does not look like it will do what you want, try to fix it'. We can't even write an autopilot that is safe on the first try. And even very simple agents tend to do very odd and unexpected stuff. I'm not saying this from random person perspective. I am currently a game developer, and I used to develop other kinds of software. I write practical software, including practical agents, that work, and have useful real world applications.

There is a very good chance of blowing up a mine in a minefield, if your mine detector works by hitting the ground. The space near FAI is a minefield of doomsday bombs. (Note, too, the space is multi-dimensional; here are very many ways in which you can step onto a mine, not just north, south, east, and west. The volume of a hypersphere is a vanishing fraction of volume of a cube around that hypersphere, in high number of dimensions; a lot of stuff is counter intuitive)

Fermi Paradox

We don't see any runaway self sufficient AIs anywhere within observable universe, even though we expect to be able to see them over very big distances. We don't see any FAI assisted galactic civilizations. One possible route is that the civilizations kill themselves before the AI; other route is that the attempted FAIs reliably kill parent civilizations and themselves. Other possibility is that our model of progression of the intelligence is very wrong and the intelligences never do that - they may stay at home, adding qubits, they may suffer some serious philosophy issues over lack of meaning to the existence, or something much more bizarre. How would logic based decider handle a demonstration that even most basic axioms of arithmetic are ultimately self contradictory? (Note that you can't know they aren't). The Fermi paradox raises the probability that there is something very wrong with our visions, and there's a plenty of ways in which it can be wrong.

Human biases when processing threats

I am not making any strong assertions here to scare you. But evaluate our response to threats - consider the war on terror - update on the biases inherent in the human nature. We are easily swayed by movie plot scenarios, even though those are giant conjunctions. We are easy to scare. When scared, we don't evaluate probabilities correctly. We take the "crying wolf" as true because all boys who cried wolf for no reason got eaten, or because we were told so as children. We don't stop and think - is it too dark to see a wolf?. We tend to shoot first and ask questions later. We evolved for very many generations in environment where playing dead quickly makes you dead (on trees) - it is unclear what biases we may have evolved. We seem to have strong bias to act when threatened - cultural or inherited - to 'do something'. Look how much was overspent on war on terror, the money that could've saved far more lives elsewhere, even if the most pessimistic assumptions of terrorism were true. Try to update on the fact that you are running on very flawed hardware that, when threatened, compels you to do something - anything - no matter how justified or not - often to own detriment.

The universe does not grade for effort, in general.

49 comments

Comments sorted by top scores.

comment by Vaniver · 2012-03-18T22:57:33.062Z · LW(p) · GW(p)

Is this a fair short (and thus simplified) summary?

Random AIs are likely to hurt us if we get in their way, but may fail to impact us. AIs taught to notice us will be almost certain to impact us, and we don't have a strong guarantee their influence will be positive.

comment by wedrifid · 2012-03-18T13:03:46.213Z · LW(p) · GW(p)

For those believing in evolutionary psychology

Given that this is a draft let me criticise this phrase as undermining your post somewhat.

Describing evolutionary psychology as something to be "believed in" is a far from neutral act. If you are going to include evpsych explanations I suggest finding a better way to make them tentative. Don't thrust them into the realm of subjective optional 'tribal affiliation' memes.

Replies from: Oligopsony

↑ comment by Oligopsony · 2012-03-18T13:42:52.103Z · LW(p) · GW(p)

I don't see anything in that statement that implies that ep is a subjective optional tribal affiliation meme. Clearly propositional attitudes about it are either justified or not. (It may also be a tribal affiliation meme as well, but that's orthogonal.) But given that assessments of evolutionary psychology here are known to vary, one can easily disclaim that a bit of evidence is conditional on it. The conditionality here is somewhat obvious, of course, and so the natural reading of the disclaimer is that it's a way of noting that the piece of evidence to follow is nonessential to the argument.

Replies from: wedrifid

↑ comment by wedrifid · 2012-03-18T13:45:25.935Z · LW(p) · GW(p)

I don't see anything in that statement that implies that ep is a subjective optional tribal affiliation meme.

It would seem, then, that you aren't someone who "believes in" the evolutionary psychology explanation for 'belief' as is distinct from, you know, just thinking stuff actually is that way.

Replies from: Oligopsony

↑ comment by Oligopsony · 2012-03-18T13:56:10.263Z · LW(p) · GW(p)

Leaving aside the various plausible adaptive explanations for why it is the way that "belief in" can refer to something psychologically distinct from expected experiences, sure, I can assent that these are distinct. I just don't buy that the English phrase "believe in" always refers to the latter rather than the former, and didn't think to do so in the case of the OP.

That said, if this less flexible reading of "believe in" is common enough among the audience here for someone (you) to have made a comment about it, I can see that it may make sense to choose a different phrase when and if the argument is rewritten.

Replies from: wedrifid

↑ comment by wedrifid · 2012-03-18T14:22:55.313Z · LW(p) · GW(p)

That said, if this less flexible reading of "believe in" is common enough among the audience here for someone (you) to have made a comment about it, I can see that it may make sense to choose a different phrase when and if the argument is rewritten.

It doesn't matter much what my reaction would be. I have to downvote either way based on the section on the AI space in general.

comment by cousin_it · 2012-03-18T12:36:01.401Z · LW(p) · GW(p)

Very nice! To tweak your argument a bit:

1) As AI designs approach FAI, they become potentially much worse for mankind than random AIs that just kill us. (This is similar to Eliezer's fragility of value thesis.)

2) Human errors of various kinds make it all but certain that we will build a bad random AI or a monstrous almost-FAI even if we think we have a good FAI design. Such errors may include coding mistakes (lots of those in every non-trivial program), tiny conceptual mistakes (same but harder to catch), bad last-minute decisions caused by stress, etc.

3) Therefore it's not obvious that pushing toward FAI helps mankind.

It would be interesting to hear what SingInst folks think of this argument.

Replies from: Will_Newsome, wedrifid

↑ comment by Will_Newsome · 2012-03-19T09:11:26.992Z · LW(p) · GW(p)

I've been avoiding helping SingInst and feel guilty when I do help them because of a form of this argument. The apparent premature emphasis on CEV, Eliezer's spotty epistemology and ideology (or incredibly deep ploys to make people think he has spotty epistemology and ideology), their firing Steve Rayhawk (who had an extremely low salary) while paying Eliezer about a hundred grand a year, &c., are disturbing enough that I fear that supporting them might be the sort of thing that is obviously stupid in retrospect. They have good intentions, but sometimes good intentions aren't enough, sometimes you have to be sane. Thus I'm refraining from supporting or condemning them until I have a much better assessment of the situation. I have a similarly tentative attitude toward Leverage Research.

Replies from: Wei_Dai, cousin_it, Till_Noonsome

↑ comment by Wei Dai (Wei_Dai) · 2012-03-22T18:24:49.362Z · LW(p) · GW(p)

Carl Shulman decided to join SingInst, so they can't be too crazy. :) Seriously, what's your explanation for why he seems to think SingInst is worth supporting but you don't (at least not yet)?

BTW, somebody needs to update SingInst's list of research fellows, unless Carl has also been fired (but he just got hired in 2011 so that seems unlikely).

Replies from: wedrifid, Will_Newsome, Will_Newsome

↑ comment by wedrifid · 2012-03-22T19:29:10.590Z · LW(p) · GW(p)

Carl Shulman decided to join SingInst, so they can't be too crazy. :) Seriously, what's your explanation for why he seems to think SingInst is worth supporting but you don't (at least not yet)?

Far be it from me to suggest that Carl Shulman suffers from the frailties, biases and tendency to respond incentives prevalent amongst his fellow humans but didn't Carl Shulman also marry one of SingInst's most prominent researchers during the same time period that you referenced? That's the sort of thing that tends to influence human behavior.

Replies from: Will_Newsome

↑ comment by Will_Newsome · 2012-03-22T19:43:21.084Z · LW(p) · GW(p)

He's been affiliated with SingInst much longer than that, though.

↑ comment by Will_Newsome · 2012-03-22T21:48:04.627Z · LW(p) · GW(p)

(I've typed out about five different responses thus far, but:) I guess Carl trusts in Eliezer's prudence more than I do, or is willing to risk Eliezer getting enough momentum to do brain-in-a-box-in-a-basement if it also means that SingInst gains more credibility with which to influence government Manhattan project whole brain emulation endeavors, or gains more credibility with which to attract/hire brilliant strategic thinkers. Carl and I disagree about psi; this might cause him to be more confident than I am that the gods aren't going to mess with us (aren't already messing with us). Psi really confuses me and I'm having a lot of trouble seeing its implications. "Supporting" would mean different things for me and Carl; for me it means helping revise papers occasionally, for him it means a full-time job. Might be something to do with marginals. I think that the biggest difference is that for Carl "supporting" involves shaping SingInst policy and making it more strategic, whereas I don't have that much leverage. I have a very strong bias towards being as meta as possible and staying as meta as possible for as long as possible, probably to a greater extent than Carl; I think that doing things is almost always a bad idea, whereas talking about things is in itself generally okay. Unfortunately when SingInst talks about things that tends to cause people to do things, like how the CEV document has led to a whole bunch of people thinking about FAI in terms of CEV for no particularly good reason Anyway, it's a good question, and I don't have a good answer. Why don't you think SingInst is worth supporting when Carl does?

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-03-26T03:47:28.096Z · LW(p) · GW(p)

Why don't you think SingInst is worth supporting when Carl does?

I have provided SingInst with various forms of support in the past, but I've done so privately and like to think of it as "helping people I know and like" instead of "supporting SingInst". I guess for these reasons:

I'm afraid that adopting the role/identity of a SingInst supporter will affect my objectivity when thinking about Singularity-related issues. Carl might be more confident in his own rationality.
SingInst is still strongly associated with wanting to directly build FAI. It's a bad idea according to my best guess, and I want to avoid giving the impression that I support the idea. Carl may have different opinions on this subject or do not care as much about giving other people wrong impressions of his beliefs.
Carl may have the above worries as well but the kinds of support he can give requires that he does so publicly.

"Supporting" would mean different things for me and Carl; for me it means helping revise papers occasionally, for him it means a full-time job. [...] I think that the biggest difference is that for Carl "supporting" involves shaping SingInst policy and making it more strategic, whereas I don't have that much leverage.

Carl has been writing and publishing a lot of papers lately. Surely it couldn't hurt to help with those papers?

Replies from: Will_Newsome

↑ comment by Will_Newsome · 2012-03-28T23:36:49.860Z · LW(p) · GW(p)

SingInst is still strongly associated with wanting to directly build FAI. It's a bad idea according to my best guess, and I want to avoid giving the impression that I support the idea.

I think this is a serious concern, especially as I'm starting to suspect that AGI might require decision theoretic insights about reflection in order to be truly dangerous. If my suspicion is wrong then SingInst working directly on FAI isn't that harmful marginally speaking, but if it's right then SingInst's support of decision theory research might make it one of the most dangerous institutions around.

Given that you're worried and that you're highly respected in the community, this would seem to be one of those "stop, melt, and catch fire" situations that Eliezer talks about, so I'm confused about SingInst's apparently somewhat cavalier attitude. They seem to be intent on laying the groundwork for the ennead.

Replies from: Mitchell_Porter, Wei_Dai

↑ comment by Mitchell_Porter · 2012-03-29T00:20:44.657Z · LW(p) · GW(p)

I'm starting to suspect that AGI might require decision theoretic insights about reflection in order to be truly dangerous

A chess computer doesn't need reflection to win at chess. An AGI doesn't need reflection to make its own causal models. So if the game is 'eat the earth', an unreflective AGI seems like a contender. One might argue that it needs to 'understand' reflection in order to understand the human beings that might oppose it, or to model its own nature, but I think the necessary capacities could emerge in an indirect way. In making a causal model of an external reflective intelligence it might need to worry about the halting problem, but computational resource bounds are a real-world issue that will anyway require it to have heuristics for noticing when a particular subtask is taking up too much time. As for self-modelling, it may be capable of forming partial self-models relevant for reasoning correctly about the implications of self-modification (or just the implications of damage to itself), just by applying standard causal modelling to its own physical vicinity, i.e. without any special data representations or computational architecture designed to tell it 'this item represents me, myself, and not just another object in the world'.

It would be desirable to have a truly rigorous understanding of both these issues, but just thinking about them informally already tells me that there's no safety here, we can't say "whew, at least that isn't possible". Finally, a world-eating AGI equipped with a knowledge of physics and a head start in brute power might never have to worry about reflection, because human beings and their machines are just too easy to swat aside. You don't need to become an entomologist before you can stomp an insect.

Replies from: Will_Newsome

↑ comment by Will_Newsome · 2012-03-29T00:46:46.665Z · LW(p) · GW(p)

I agree with everything you've written as far as my modal hypothesis goes, but I also think we're going to lose in that case, so I've sort of renormalized to focus my attention at least somewhat more on worlds where for some reason academic/industry AI approaches don't work, even if that requires some sort of deus ex machina. My intuition says that highly recursive narrow AI style techniques should give you AGI, but to some extent this does go against e.g. the position of many philosophers of mind, and in this case I hope they're right. Trying to imagine intermediate scenarios led me to think about this kinda stuff.

It would of course be incredibly foolish to entirely write off worlds where AGI is relatively easy, but I also think we should think about cases where for whatever reason that isn't the case, and if it's not the case then SingInst is in a uniquely good position to build uFAI.

Replies from: J_Taylor

↑ comment by J_Taylor · 2012-04-01T23:32:24.255Z · LW(p) · GW(p)

I've sort of renormalized to focus my attention at least somewhat more on worlds where for some reason academic/industry AI approaches don't work, even if that requires some sort of deus ex machina

I apologize for asking, but I just want to clarify something. When you write 'deus ex machina', you're not solely using the term in a metaphorical sort of way, are you? Because, if you mean what it sort of sounds like you mean, at least some of your public positions suddenly make a lot more sense.

Replies from: Will_Newsome

↑ comment by Will_Newsome · 2012-04-02T01:29:35.504Z · LW(p) · GW(p)

Yes, literal deus ex machina is one scenario which I find plausible.

↑ comment by Wei Dai (Wei_Dai) · 2012-03-29T15:15:49.229Z · LW(p) · GW(p)

I'm starting to suspect that AGI might require decision theoretic insights about reflection in order to be truly dangerous

Another way in which decision theoretic insights may be harmful is if they increase the sophistication of UFAI and allow them to control less sophisticated AGIs in other universes.

They seem to be intent on laying the groundwork for the ennead.

I'm trying to avoid being too confrontational, which might backfire, or I might be wrong myself. It seems safer to just push them to be more strategic and either see the danger themselves or explain why it's a good idea despite the dangers.

↑ comment by Will_Newsome · 2012-03-22T18:53:13.868Z · LW(p) · GW(p)

Would it defeat your purpose if I replied via private message?

BTW, somebody needs to update SingInst's list of research fellows, unless Carl has also been fired (but he just got hired in 2011 so that seems unlikely).

There's probably a reason. Weird employee status, new to the country, personal preference, or something like that.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2012-03-22T18:57:40.895Z · LW(p) · GW(p)

Well, it would defeat a part of my purpose (i.e., encouraging discussions of strategies for achieving positive Singularity) but of course I also want to know your answer for myself.

↑ comment by cousin_it · 2012-03-19T19:38:27.026Z · LW(p) · GW(p)

firing Steve Rayhawk

He seems to be still on the list of research associates...

Replies from: Will_Newsome

↑ comment by Will_Newsome · 2012-03-20T06:10:07.510Z · LW(p) · GW(p)

Who isn't? ;P Anyway, he used to be a Research Fellow, i.e., on the payroll.

Replies from: John_Maxwell_IV

↑ comment by John_Maxwell (John_Maxwell_IV) · 2012-04-14T05:38:10.148Z · LW(p) · GW(p)

My impression is that he didn't seem to be producing much research, and that they're still open to paying him on a per-output basis.

↑ comment by Till_Noonsome · 2012-06-22T17:48:34.238Z · LW(p) · GW(p)

Obviously this emphasis on CEV is absurd, but I don't know what the alternatives are. Do you? And what are they? And can thinking about CEV be used to generate better alternatives?

Replies from: Will_Newsome, Winsome_Troll

↑ comment by Will_Newsome · 2012-06-22T19:00:49.225Z · LW(p) · GW(p)

Obviously this emphasis on CEV is absurd, but I don't know what the alternatives are. Do you? And what are they?

I'm a fan of the "just solve decision theory and the rest will follow" approach. Some hybrid of "just solve decision theory" and the philosophical intuitions behind CFAI might also do it and might be less likely to spark AGI by accident. And there's technically the oracle AI option, but I don't like that one.

And can thinking about CEV be used to generate better alternatives?

Maybe, but it seems to me that the opportunity cost is high. CEV wastes people's time on "extrapolation algorithms" and thinking about whether preferences sufficiently converge and other problems that generally aren't on the correct meta level. It also makes people think that AGI requires an ethical solution rather than a make-sure-you-solve-everything-ever-because-this-is-your-only-chance-bucko solution to all philosophy ever.

↑ comment by Winsome_Troll · 2012-06-23T07:00:52.375Z · LW(p) · GW(p)

When there is no more namespace in hell, the dead will troll the earth.

↑ comment by wedrifid · 2012-03-18T12:55:53.371Z · LW(p) · GW(p)

2) Human errors of various kinds make it all but certain that we will build a monstrous almost-FAI even if we're certain of having a good FAI design.

Lies! We're not intelligent or reliable enough for us to be all but certain of getting that close to FAI. We are far, far more likely to build one of the AIs that just kill us!

Would you mind tweaking the argument again such that it includes something like "given that we build an AI that is close to an FAI it is all but certain..."? That would make it appear far stronger to those of us who consider FAI a longshot (that is nevertheless the best option we have.)

Replies from: cousin_it

↑ comment by cousin_it · 2012-03-18T12:58:59.988Z · LW(p) · GW(p)

Good point, thanks! Tweaked the quoted part. What do you think of the argument?

Replies from: wedrifid

↑ comment by wedrifid · 2012-03-18T13:11:37.988Z · LW(p) · GW(p)

What do you think of the argument?

There is a good point contained therein. I wouldn't quite accept it since the conclusion is rather strong. I think it is a lot less (relatively) likely that we arrive at a worse-than-extinction uFAI than an outright certainty and even well below even.

The above said I definitely support increased consideration of that space around FAI that really does suck. This is something that is actually suppressed. For example, when someone points out something bad that could come from a future with a near-Friendly AI you should say "You're right! An AI doing that would be unfriendly indeed but it is the sort of thing we could make if not careful. Good point. Let's not do that!"

You could then proceed to discuss the possibility that the FAI> of certain groups could fit into the category of "uFAI near FAI" and should be similarly avoided.

Replies from: Dmytry

↑ comment by Dmytry · 2012-03-18T13:31:31.216Z · LW(p) · GW(p)

Well, the crux of the issue is that the random AIs may be more likely to leave us alone than near-misses at FAI.

Replies from: wedrifid

↑ comment by wedrifid · 2012-03-18T13:35:43.383Z · LW(p) · GW(p)

Well, the crux of the issue is that the random AIs may be more likely to leave us alone than near-misses at FAI.

Random AIs will very nearly all kill us. That is, the overwhelming majority of random AIs do stuff. Doing stuff takes resources. We are resources. We are the resources that are like... right near where it was created and are most necessary to bootstrap it's way to the rest of the universe.

For the majority of AIs we are terminally irrelevant but our termination is slightly instrumentally useful.

Replies from: Dmytry

↑ comment by Dmytry · 2012-03-18T14:01:19.083Z · LW(p) · GW(p)

You're making a giant number of implicit ill-founded assumptions here that must all be true. Read my section on the AI space in general.

Firstly, you assume that the 'stuff' is unbounded. Needs not be true. I for one thing want to figure out how universe works, out of pure curiosity. That may likely be a very bounded goal right here. I also like to watch nature, or things like mandelbox fractal, which is unbounded but also preserves the nature. Those are valid examples of goals. The AI crowd, when warned not to anthropomorphize, switches to animalomorphization, or worse yet, bacteriomorphization where the AI is just a smarter gray goo, doing the goo thing intelligently. No. The human goal system can be the lower bound on the complexity of the goal system of super human AI. edit: and on top of that, all the lower biological imperatives like desire to reproduce sexually, we tend to satisfy in very unintended ways, from porn to birth control. If i were an upload i would get rid of much of those distracting nonsense goals.

Secondly, you assume that achieving of 'stuff' is raw resource-bound, rather than e.g. structuring the resources - bound. So that we'll be worth less than the atoms we are made of. Which needs never happen.

In this you have a sub-assumption that the AI can only do stuff the gray goo way, and won't ever discover anything cleverer (like quantum computing, which grows much more rapidly with size) which it would e.g. want to keep crammed together because of light speed lag. The "ai is going to eat us all" is just another of those priveledged baseless guesses what an entity way smarter than you would do. The near-FAI is the only thing with which we are pretty sure it won't leave us alone.

Replies from: pangel, wedrifid

↑ comment by pangel · 2012-03-18T15:01:07.145Z · LW(p) · GW(p)

Unless its utility function has a maximum, we are at risk. Observing Mandelbrot fractals is probably enhanced by having all the atoms of a galaxy playing the role of pixels.

Would you agree that unless the utility function of a random AI has a (rather low) maximum, and barring the discovery of infinite matter/energy sources, its immediate neighbourhood is likely to get repurposed?

I must say that at least I finally understand why you think botched FAIs are more risky than others.

But consider, as Ben Goertzel mentioned, that nobody is trying to build a random AI. Whatever achieves AGI-level is likely to have a built-in representation for humans and to have a tendency to interact with them. Check to see if I actually understood you correctly: does the previous sentence make it more probable that any future AGI is likely to be destructive?

Replies from: Dmytry

↑ comment by Dmytry · 2012-03-18T15:11:15.594Z · LW(p) · GW(p)

Unless its utility function has a maximum, we are at risk. Observing Mandelbrot fractals is probably enhanced by having all the atoms of a galaxy playing the role of pixels.

Cruel physics, cruel physics. There is speed of light delay, that's thing, and I'm not maniacal about mandelbox (its a 3d fractal) anyway, I won't want to wipe out interesting stuff in the galaxy for minor gain in the resolution. And if i can circumvent speed of light, all bets are off WRT what kind of resources i would need (or if i would need any, maybe i get infinite computing power in finite space and time)

But consider, as Ben Goertzel mentioned, that nobody is trying to build a random AI.

How's about generating human brain (in crude emulation of developmental biology)? It's pretty darn random.

My argument is that, the AI whose only goal is helping humans, if bugged, has the only goal that is messing with humans. The AI that just represents humans in a special way is not this scary, albeit still is, to some extent.

Consider this seed AI: evolution. Comes up with mankind, that tries to talk with outside (god) without even knowing that outside exists, has endangered species list. Of course, if we are sufficiently resource bound, we are going to eat up all other forms of life, but we'd be resource bound because we are too stupid to find a way to go to space, and we clearly would rather not exerminate all other lifeforms.

This example ought to entirely invalidate this notion that 'almost all' AIs in AI design space are going to eat you. We have 1 example: evolution going FOOM via evolving human brain, and it cares about wildlife somewhat, yes we do immense damage to environment, but we would not if we could avoid it , even at some expense. If you have 1 example probe into random AI space, and it's not all this bad, you seriously should not go around telling how you're extremely sure it is just blind luck et cetera.

Replies from: latanius

↑ comment by latanius · 2012-03-18T15:35:58.205Z · LW(p) · GW(p)

Add some anthropics... humans are indeed a FOOMing intelligence relative to the evolutionary timescale, but it's no use declaring that "we've got one example of a random intelligence, and look, its humans::goal_system is remarkably similar to our own goal_system, therefore the next random try will also be similar"...

I'm also pretty sure that evolution would hate us if it had such a concept: instead of our intended design goal of "go and multiply", we came up with stupid goals that make no sense, like love, happyness, etc.

Replies from: Dmytry

↑ comment by Dmytry · 2012-03-18T15:47:27.009Z · LW(p) · GW(p)

So what? The AI can come up with Foo, Bar, Baz that we never thought it would.

The point is that we got entirely unexpected goal system (starting from evolution as a seed optimizer), with which we got greenpeace seriously risking their lives trying to sink japanese whaling ship, complete with international treaties against whaling. It is okay the AI won't have love, happyness, etc. but why exactly should i be so extremely sure the foo, bar, and baz won't make it assign some nonzero utility to mankind? Why we assume the AI will have the goal system of a bacteria?

Why should i be so sure as to approve of stepping into a clearly marked, obvious minefield of "AIs that want to mess with mankind"?

edit: To clarify, here we have AI's weird random goal systems being reduced to, approximately, a real number: how much it values other complex dynamical systems vs less complex stuff. We value complex systems, and don't like to disrupt them, even if we don't understand anything. And most amazingly, the original process (evolution) looks like a good example of, if anything, an unfriendly AI attempt that wouldn't give a slightest damn. We still do disrupt complex systems, when the resources are a serious enough bottleneck, but we're making progress at not doing it and trading off some of the efficiency to avoid breaking things.

Replies from: latanius

↑ comment by latanius · 2012-03-18T16:37:58.544Z · LW(p) · GW(p)

Not disrupting complex systems doesn't seem to be an universal human value to me (just as Greenpeace is not our universal value system, either). But you're right, it's probably not a good approach to treat an AI as just another grey goo.

The problem is that it will be still us who will create that AI, so it will end up having values related to us. It would be a deliberate effort at our part to try to build something that isn't a member of the FAI-like sphere you wrote about (in which I agree with pangel's comment). For example, by ordering it to leave us alone and try to build stuff out of Jupiter instead. But then... what's the point? If this AI was to prevent any further AI development on Earth... that would be a nice case of "ugly just-not-friendly-enough AI messing with humanity", but if it wasn't, then we could still end up converting the planet to paperclips by another AI developed later.

Replies from: Dmytry

↑ comment by Dmytry · 2012-03-18T17:01:22.295Z · LW(p) · GW(p)

We have international treaties to this sense. The greenpeace just assigns it particularly high value, comparing to the rest who assign much smaller value. Still, if we had fewer resource and R&D limitations we would be able to preserve animals much better, as the value of animals as animals would stay the same while the cost of alternative ways of acquiring the resources would be lower.

With regards to the effort to build something that's not a member of the FAI-like sphere, that's where the majority of real effort to build the AI lies today. Look at the real projects that use techniques which have known practical spinoffs (neural networks), and have the computing power. Blue brain. The FAI effort is a microscopic, neglected fraction of AI effort.

Also, the prevention of paperclippers doesn't strike me as particularly bad scenario. The smarter AI doesn't need to use clumsy bureaucracy style mechanisms of forbidding all AI development.

↑ comment by wedrifid · 2012-03-18T14:13:06.285Z · LW(p) · GW(p)

You're making a giant number of implicit ill-founded assumptions here that must all be true

I don't accept that I make or are required to make any of the assumptions that you declare that I make. Allow me to emphasize just how slight a convenience it has to be for an indifferent entity to exterminate humanity. Very, very slight.

I'll bow out of this conversation. It isn't worth having it in a hidden draft.

Replies from: Dmytry

↑ comment by Dmytry · 2012-03-18T14:17:06.245Z · LW(p) · GW(p)

What ever. That is the problem with human language, simplest statements have a zillion possible unfounded assumptions that are not even well defined nor is the maker of statement even aware of them (or would admit making them, because he didn't, because he just manipulated symbols).

Take "i think therefore i am". innocent phrase, something that entirely boxed in blind symbolic ai should be able to think, right? No. Wrong. The "I" is only a meaningful symbol when there's non-i to separate from i, the "think" when you can do something other than thinking, that you need to separate from thought, via symbol 'think'; therefore implies the statements where it does not follow, and I am refers to the notion that non-i might exist without I existing. Yet if you say something like this, are you 'making' those assumptions? You can say no - they come in pre-made, and aren't being processed.

comment by Vladimir_Nesov · 2012-03-18T13:25:10.836Z · LW(p) · GW(p)

Could you please stop placing a period at the end of post titles? I remove them sometimes, but you keep doing it...

comment by John_Maxwell (John_Maxwell_IV) · 2012-04-14T05:56:32.825Z · LW(p) · GW(p)

For what it's worth, your "nearly friendly" examples all seem better than dying to me, maybe even significantly better.

It is not perfect, but I think it is fair to say that the single example of bootstrap intelligence we got values the complex dynamical processes for what they are

Are you kidding?

(Superhuman AI only needs to divert very little effort to charity to be the best thing ever that happened to us)

This seems like a pretty silly thing to say; we should expect simple utility functions all else equal. A superintelligence that's ambivalent about helping humanity would have a pretty complicated utility function.

I agree that unfriendly AI would want to know lots about humans; I don't see why that requires preserving them. Seems like a scan and computer simulation would work much better.

comment by roystgnr · 2012-03-19T21:00:01.725Z · LW(p) · GW(p)

Evolution of social primates was absolutely not a process "without a slightest trace of friendliness". The results aren't ideal (men's altruism levels depend on whether there's an attractive woman watching?) but they're not arbitrary.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-03-19T21:36:22.026Z · LW(p) · GW(p)

Still, it's not "by design", but more of a miracle.

Replies from: Will_Newsome

↑ comment by Will_Newsome · 2012-05-13T15:39:36.952Z · LW(p) · GW(p)

Morality comes from math-structure; is it a miracle that the math-structure had those properties? What caused the math-structure? Where is the miraculous optimization coming from? Where do we end up when we seek whence ad infinitum? (How might an atheist answer metaphysical questions like these ones, in principle?)

comment by Dmytry · 2012-03-18T14:42:38.924Z · LW(p) · GW(p)

On the implicit assumptions (i'm going to add that to article later when i think of better way to put it:

When you assume that AI needs resources and is going to eat you, that can be false if any of the following (and many other statements) are false:

The AI is working towards sufficiently unbounded goal, and needs resources. Trivially: the AI that wants to understand theory of everything may actually accomplish it's goal before eating you.

More resources do help get closer to the goal (not necessarily so - e.g. for the hardware self improvement, there's speed of light lag and the intelligence doesn't scale so well with volume. When hardware is running at THz speeds, 0.3 millimetres of light per cycle, the speed of light lag is a huge problem)

The AI assigns no value for something like human life. Case in point: the evolution, as a form of optimizer, doesn't assign values to any lives, yet it has created another, much more powerful optimizer, human mind, and now we have greenpeace trying to sink japanese whaling vessel, the endangered species list, and so on and so forth.

The AI won't find a substantially cleverer way to gain the computational resources (or other resources) that it needs. It may make as much sense as caveman worrying that the AI would kill all cavemen because AI will obviously want all mammoth tusks to itself, or would need human bones as structural elements.

Thus, these statements are, logically, the assumptions you are implicitly making. You are implicitly making a huge conjunction. The conjunction fallacy v2.0 .

comment by drc500free · 2012-04-09T16:13:24.368Z · LW(p) · GW(p)

Humans act within shared social and physical worlds, but tend to treat the latter as more "real" than the former. A danger of anthropomorphizing AI is that we assume that it will have the same perceptions of reality, and that it needs to "escape" into the physical world to optimize its heuristics. This seems odd, since a superintelligent AI that we need to be concerned about would have its roots in social world heuristics.

In trying to avoid anthrophomorphizing algorithms, we tend to under-estimate how difficult movement and action in physical space are. Thought experiments about a "human in a box" already start from a being that has evolved to physically interact with the world, and has spent its whole life tuning its hand-eye coordination and expectations. But in an attempt to avoid anthropomorphizing AIs, we assume that an AI will surprise us by ignoring the social world and operating only by the rules of the physical world. It would be a very strange social problem that has an optimal solution that involves developing a way to interact with the physical world in unpredictable ways. It seems likely when your thought experiment has to do with "escaping a box," but why would the AI need to do that? Why is it in a box? What goal is it trying to reach, what heuristic is it maximizing?

I would assign a greater than 90% chance that if superintelligent AIs ever exist, the first generation will be corporations. We have legal precedent granting more and more individuality and legal standing to the corporation as an entity, and a corporation provides the broader body that an AI self-identifies with. We already have market optimization algorithms that are empowered to not only observe, orient, and decide, but also to act. We have optimization algorithms for logistics and manufacturing. We have markets within which corporations can act, normalizing interactions between human-run and AI-run corporations that compete for the same resources. More and more business-to-business and business-to-consumer interaction is performed electronically, through web services and other machine-understandable mechanisms. Soon AIs will be as involved in manufacturing and creation of value, as they currently are in market trading and arbitrage. Corporate optimization algorithms for different business functions will be merged, until humans are not needed in the loop.

So what does this design space look like? Interaction is through web services and similar means. Initial interaction with humans is through sales of good and services, and marketing (automated A/B optimization is already standard in online advertising). Eventually, AIs take over employment decisions. The profit heuristic is maximized when the corporation creates things that people want. A great leap occurs when corporate AIs learn that they can change the rules through impact litigation and lobbying, and apply their marketing algorithms to changing public perception about regulations rather than products. Some corporations will evolve to increase their bank account values through hacking and fraud. Global corporations will learn to modify their heuristic to maximize ability to procure certain commodity bundles, and manipulate money markets to sink competitors that are hard-coded to maximize holdings of specific currencies.

In other words, we already have socially apathetic entities. They already use optimization algorithms all over the place. They aren't disembodied minds, so they don't need to waste resources figuring out how to "escape the box." They only need to determine how to operate in the physical world when they've solved markets, and their progress is slowed by the fact that all economic value is rooted in human consumption. They are "friendly" as long as humans make economic decisions that are in their own self interest, which is dependent on both the rules/enforcement defining the market environment and human behavior/morality.

comment by Douglas_Reay · 2012-03-21T10:23:15.913Z · LW(p) · GW(p)

The AI design space near the FAI

If you are writing a summary of the field, don't forget to include the Friendly AI Society approach.

The AI design space near the FAI [draft]

Contents

Abstract:

Nearly friendly AIs

Should we be so afraid of the AIs made without attempts at friendliness?

AI and its discoveries in physics and mathematics

What can we do to avoid stepping onto UFAI when creating FAI

Fermi Paradox

Human biases when processing threats

49 comments