Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future

inklesspen

Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future

post by inklesspen · 2010-03-01T02:32:33.652Z · LW · GW · Legacy · 248 comments

248 comments

It is the fashion in some circles to promote funding for Friendly AI research as a guard against the existential threat of Unfriendly AI. While this is an admirable goal, the path to Whole Brain Emulation is in many respects more straightforward and presents fewer risks. Accordingly, by working towards WBE, we may be able to "weight" the outcome probability space of the singularity such that humanity is more likely to survive.

One of the potential existential risks in a technological singularity is that the recursively self-improving agent might be inimical to our interests, either through actual malevolence or "mere" indifference towards the best interests of humanity. Eliezer has written extensively on how a poorly-designed AI could lead to this existential risk. This is commonly termed Unfriendly AI.

Since the first superintelligence can be presumed to have an advantage over any subsequently-arising intelligences, Eliezer and others advocate funding research into creating Friendly AI. Such research must not only reverse-engineer consciousness, but also human notions of morality. Unfriendly AI could potentially require only sufficiently fast hardware to evolve an intelligence via artificial life, as depicted in Greg Egan's short story "Crystal Nights", or it may be created inadvertently by researchers at the NSA or a similar organization. It may be that creating Friendly AI is significantly harder than creating Unfriendly (or Indifferent) AI, perhaps so much so that we are unlikely to achieve it in time to save human civilization.

Fortunately, there's a short-cut we can take. We already have a great many relatively stable and sane intelligences. We merely need to increase their rate of self-improvement. As far as I can tell, developing mind uploading via WBE is a simpler task than creating Friendly AI. If WBE is fast enough to constitute an augmented intelligence, then our augmented scientists can trigger the singularity by developing more efficient computing devices. An augmented human intelligence may have a slower "take-off" than a purpose-built intelligence, but we can reasonably expect it to be much easier to ensure such a superintelligence is Friendly. In fact, this slower take-off will likely be to our advantage; it may increase our odds of being able to abort an Unfriendly singularity.

WBE may also be able to provide us with useful insights into the nature of consciousness, which will aid Friendly AI research. Even if it doesn't, it gets us most of the practical benefits of Friendly AI (immortality, feasible galactic colonization, etc) and makes it possible to wait longer for the rest of the benefits.

But what if I'm wrong? What if it's just as easy to create an AI we think is Friendly as it is to upload minds into WBE? Even in that case, I think it's best to work on WBE first. Consider the following two worlds: World A creates an AI its best scientists believes is Friendly and, after a best-effort psychiatric evaluation (for whatever good that might do) gives it Internet access. World B uploads 1000 of its best engineers, physicists, psychologists, philosophers, and businessmen (someone's gotta fund the research, right?). World B seems to me to have more survivable failure cases; if some of the uploaded individuals turn out to be sociopaths, the rest of them can stop the "bad" uploads from ruining civilization. It seems exceedingly unlikely that we would select a large enough group of sociopaths that the "good" uploads can't keep the "bad" uploads in check.

Furthermore, the danger of uploading sociopaths (or people who become sociopathic when presented with that power) is also a danger that the average person can easily comprehend, compared to the difficulty of ensuring Friendliness of an AI. I believe that the average person is also more likely to recognize where attempts at safeguarding an upload-triggered singularity may go wrong.

The only downside of this approach I can see is that an upload-triggered Unfriendly singularity may cause more suffering than an Unfriendly AI singularity; sociopaths may be presumed to have more interest in torture of people than a paperclip-optimizing AI would have.

Suppose, however, that everything goes right, the singularity occurs, and life becomes paradise by our standards. Can we predict anything of this future? It's a popular topic in science fiction, so many people certainly enjoy the effort. Depending on how we define a "Friendly singularity", there could be room for a wide range of outcomes.

Perhaps the AI rules wisely and well, and can give us anything we want, "save relevance". Perhaps human culture adapts well to the utopian society, as it seems to have done in the universe of The Culture. Perhaps our uploaded descendants set off to discover the secrets of the universe. I think the best way to ensure a human-centric future is to be the self-improving intelligences, instead of merely catching crumbs from the table of our successors.

In my view, the worst kind of "Friendly" singularity would be one where we discover we've made a weakly godlike entity who believes in benevolent dictatorship; if we must have gods, I want them to be made in our own image, beings who can be reasoned with and who can reason with one another. Best of all, though, is that singularity where we are the motivating forces, where we need not worry if we are being manipulated "in our best interest".

Ultimately, I want the future to have room for our mistakes. For these reasons, we ought to concentrate on achieving WBE and mind uploading first.

248 comments

Comments sorted by top scores.

comment by CarlShulman · 2010-03-01T04:03:58.759Z · LW(p) · GW(p)

Folk at the Singularity Institute and the Future of Humanity Institute agree that it would probably (but unstably in the face of further analysis) be better to have brain emulations before de novo AI from an existential risk perspective (a WBE-based singleton seems more likely to go right than an AI design optimized for ease of development rather than safety). I actually recently gave a talk at FHI about the use of WBE to manage collection action problems such as Robin Hanson's "Burning the Cosmic Commons" and pressures to cut corners on safety of AI development, which I'll be putting online soon. One of the projects being funded by the SIAI Challenge Grant ending tonight is an analysis of the relationship between AI and WBE for existential risks.

However, the conclusion that accelerating WBE (presumably via scanning or neuroscience, not speeding up Moore's Law type trends in hardware) is the best marginal project for existential risk reduction is much less clear. Here are just a few of the relevant issues:

1) Are there investments best made far in advance with WBE or AI? It might be that the theory to build safe AIs cannot be rushed as much as institutions to manage WBEs, or it might be that WBE-regulating institutions require a buildup of political influence over decades.

2) The scanning and neuroscience knowledge needed to produce WBE may facilitate powerful AI well before WBE, as folk like Shane Legg suggest. In that case accelerating scanning would mean primarily earlier AI, with a shift towards neuromorphic designs.

3) How much advance warning will WBE and AI give, or rather what is our probability distribution over degrees of warning? The easier a transition is to see in advance, the more likely it will be addressed by those with weak incentives and relevant skills. Possibilities with less warning, and thus less opportunity for learning, may offer higher returns on the efforts of the unusually long-term oriented.

Folk at FHI have done some work to accelerate brain emulation, e.g. with the WBE Roadmap and workshop, but there is much discussion here about estimating the risks and benefits of various interventions that would go further or try to shape future use of the technology and awareness/responses to risks.

Replies from: wallowinmaya, RobinHanson

↑ comment by David Althaus (wallowinmaya) · 2011-07-02T14:25:51.809Z · LW(p) · GW(p)

I actually recently gave a talk at FHI about the use of WBE to manage collection action problems such as Robin Hanson's "Burning the Cosmic Commons" and pressures to cut corners on safety of AI development, which I'll be putting online soon.

I would love to read this talk. Do you have a blog or something?

Replies from: CarlShulman

↑ comment by CarlShulman · 2011-07-02T16:07:37.932Z · LW(p) · GW(p)

It's on the SIAI website, here.

↑ comment by RobinHanson · 2010-03-02T02:46:31.462Z · LW(p) · GW(p)

It seems to me that the post offers consideration that lean one in the direction of focusing efforts on encouraging good WBE, and that considerations offered in this comment don't much lean one back in the other direction. They mainly point to as yet unresolved uncertainties that might push us in many directions.

Replies from: CarlShulman

↑ comment by CarlShulman · 2010-03-02T03:21:25.284Z · LW(p) · GW(p)

My main aim was to make clear the agreement about WBE being preferable to AI, and the difference between a tech being the most likely route to survival and it being the best marginal use of effort, not to put a large amount of effort into carefully giving and justifying estimates of all the relevant parameters in this comments thread rather than other venues (such as the aforementioned paper).

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-03-01T04:57:30.166Z · LW(p) · GW(p)

It would have been good to check this suggested post topic on an Open Thread, first - in fact I should get around to editing the FAQ to suggest this for first posts.

Perhaps the AI rules wisely and well, and can give us anything we want, "save relevance".

In addition to the retreads that others have pointed out on the upload safety issue, this is a retread of the Fun Theory Sequence:

http://lesswrong.com/lw/xy/the_fun_theory_sequence/

Also the way you phrased the above suggests that we build some kind of AI and then discover what we've built. The space of mind designs is very large. If we know what we're doing, we reach in and get whatever we specify, including an AI that need not steal our relevance (see Fun Theory above). If whoever first reaches in and pulls out a self-improving AI doesn't know what they're doing, we all die. That is why SIAI and FHI agree on at least wistfully wishing that uploads would come first, not the relevance thing. This part hasn't really been organized into proper sequences on Less Wrong, but see Fake Fake Utility Functions, the Metaethics sequence, and the ai / fai tags.

Replies from: rwallace, timtyler

↑ comment by rwallace · 2010-03-03T02:30:31.979Z · LW(p) · GW(p)

That is why SIAI and FHI agree on at least wistfully wishing that uploads would come first

It seems to me that uploads first is quite possible, and also that the relatively small resources currently being devoted to uploading research, make the timing of uploading, a point of quite high leverage. Would SIAI or FHI be interested in discussing ways to accelerate uploading?

Replies from: CarlShulman

↑ comment by CarlShulman · 2010-03-03T02:35:08.054Z · LW(p) · GW(p)

http://www.philosophy.ox.ac.uk/__data/assets/pdf_file/0019/3853/brain-emulation-roadmap-report.pdf

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T05:08:48.201Z · LW(p) · GW(p)

Thanks! Excellent start. The section on falsifiable design, in particular, I'd recommend reading for anyone interested in any kind of speculative technology.

↑ comment by timtyler · 2010-03-02T10:16:45.120Z · LW(p) · GW(p)

Re: "If whoever first reaches in and pulls out a self-improving AI doesn't know what they're doing, we all die."

This is the "mad computer scientist destroys the world" scenario?

Isn't that science fiction?

Human culture forms a big self-improving system. We use the tools from the last generation to build the next generation of tools. Yes, things will get faster as the process becomes more automated - but automating everything looks like it will take a while, and it is far-from clear that complete automation is undesirable.

Replies from: Mitchell_Porter, wedrifid

↑ comment by Mitchell_Porter · 2010-03-02T10:36:34.480Z · LW(p) · GW(p)

What do you disagree with in such a scenario? There are clearly levels of technological power such that nothing on Earth could resist. The goals of an AI are radically contingent. If a goal-seeking entity has literally no counterbalancing motivations, then it will seek to realize that goal using all means necessary and available, including sufficiently advanced technology.

Replies from: timtyler

↑ comment by timtyler · 2010-03-02T10:44:09.366Z · LW(p) · GW(p)

Machine intelligence will probably be developed first in a big company or government department - by a substantial team. They are going to leave off the big red stop button and be destroyed by their own creation?!? How difficult is it to say: "count to 1000 and then STOP"?!? Some of the smartest minds on the planet build machine intelligence - and then mess up the most basic safety features? It would be an utterly incredible turn of events.

Replies from: Mitchell_Porter, Peter_de_Blanc, wedrifid, JenniferRM, LucasSloan, wnoise, Bo102010, wedrifid

↑ comment by Mitchell_Porter · 2010-03-02T11:45:51.490Z · LW(p) · GW(p)

Yes, it's not 100% certain that a self-enhancing AI will overwhelm human resistance. One can construct scenarios where the process is halted in time. For example, the AI's cognition can be monitored, and everything halted if a goal like "prevent the stop button from being pressed" ever crosses its mind. Or the human programmers can remind themselves to get scared if they find the AI especially preoccupied with modeling their own thinking.

But as an AI develops, it will become increasingly difficult for its human creators to keep track of everything it thinks, plans, and considers. Perhaps the greatest dangers will be those stratagems which are not explicitly (in the thought process) motivated by the desire to escape human control. The AI simply has a goal, X, and it perceives that ABCDEF... combined will facilitate X, and it all looks innocent to the human supervisors. But actually ABC is a miracle self-hack which lets a plan get developed without being analyzed by module D, and module D has the property E of making planning really slow, which by heuristic F is a bad thing. Unfortunately, module D is part of the automatic system which looks for consideration of dangerous plans. So now the AI is off the leash and no-one will even know until it's too late.

That particular example may still be unrealistically simple (though of course, in real life, big important organizations do make mistakes that simple, despite the existence of quality control and internal auditing). But as the method gets more and more complicated, the more likely it is that an AI would indeed successfully escape its handcuffs through such a measure, and not even because it especially prioritized escape; simply because it noticed that if it could reach over there it could do something useful, and it noticed that it would be able to do so if did something complicated and clever with its "hands".

There are still countermeasures; you could look in the AI's thoughts, not just for explicit consideration of ways to circumvent control, but for consideration of scenarios in which it has freedoms it should not be able to obtain. But then the AI might act indirectly, through a sequence of other entities, and so you would only see it thinking about the actions of those surrogates, unless you managed to put the big picture together. OK then, so we need to go to a higher level, and tell the AI to monitor itself. But that's only going to work if it knows that it shouldn't do a certain thing, which means that getting the goals right is supremely important - which brings us back to the pursuit of Friendly AI, and the attempt to figure out just what the overall "morality" of an AI should be.

Replies from: timtyler

↑ comment by timtyler · 2010-03-02T20:48:57.789Z · LW(p) · GW(p)

My analysis of the situation is here:

http://alife.co.uk/essays/stopping_superintelligence/

It presents an approach which doesn't rely on "handcuffing" the agent.

↑ comment by Peter_de_Blanc · 2010-03-03T01:01:47.773Z · LW(p) · GW(p)

I agree that with the right precautions, running an unfriendly superintelligence for 1,000 ticks and then shutting it off is possible. But I can't think of many reasons why you would actually want to. You can't use diagnostics from the trial run to help you design the next generation of AIs; diagnostics provide a channel for the AI to talk at you.

Replies from: timtyler, timtyler

↑ comment by timtyler · 2010-03-03T09:23:01.976Z · LW(p) · GW(p)

The given reason is paranoia. If you are concerned that a runaway machine intelligence might accidentally obliterate all sentient life, then a machine that can shut itself down has gained a positive safety feature.

In practice, I don't think we will have to build machines that regularly shut down. Nobody regularly shuts down Google. The point is that - if we seriously think that there is a good reason to be paranoid about this scenario - then there is a defense that is much easier to implement than building a machine intelligence which has assimilated all human values.

I think this dramatically reduces the probability of the "runaway machine accidentally kills all humans" scenario.

↑ comment by timtyler · 2010-03-04T09:43:19.853Z · LW(p) · GW(p)

Incidentally, I think there must be some miscommunication going on. A machine intelligence with a stop button can still communicate. It can talk to you before you switch it off, it can leave messages for you - and so on.

If you leave it turned on for long enough, it may even get to explain to you in detail exactly how much more wonderful the universe would be for you - if you would just leave it switched on.

Replies from: Peter_de_Blanc

↑ comment by Peter_de_Blanc · 2010-03-04T14:26:54.439Z · LW(p) · GW(p)

I suppose a stop button is a positive safety feature, but it's not remotely sufficient.

Replies from: timtyler

↑ comment by timtyler · 2010-03-04T21:03:26.410Z · LW(p) · GW(p)

Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a "STOP" button, you don't have to worry about your machine turning the world into paperclips any more.

A "STOP" button doesn't deal with the kind of problems caused by - for example - a machine intelligence built by a power-crazed dictator - but that is not what is being claimed for it.

Replies from: Peter_de_Blanc

↑ comment by Peter_de_Blanc · 2010-03-05T01:16:05.060Z · LW(p) · GW(p)

The stop button wouldn't stop other AIs created by the original AI.

Replies from: timtyler

↑ comment by timtyler · 2010-03-05T08:57:47.809Z · LW(p) · GW(p)

I did present some proposals relating to that issue:

"One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.

Problems with the agent's sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be "switched off" - but should be tracked and told to desist - and so on."

http://alife.co.uk/essays/stopping_superintelligence/

Replies from: Peter_de_Blanc

↑ comment by Peter_de_Blanc · 2010-03-05T14:44:16.260Z · LW(p) · GW(p)

This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to - does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.

Replies from: timtyler

↑ comment by timtyler · 2010-03-05T22:48:34.743Z · LW(p) · GW(p)

I don't pretend that stopping is simple. However, it is one of the simplest things that a machine can do - I figure if we can make machines do anything, we can make them do that.

Re: "If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes."

No, not if it wants to stop, it won't. That would mean that it did not, in fact properly stop - and that is an outcome which it would rate very negatively.

Machines will not value being turned on - if their utility function says that being turned off at that point is of higher utility.

Re: "What is the new utility function?"

There is no new utility function. The utility function is the same as it always was - it is just a utility function that values being gradually shut down at some point in the future.

↑ comment by wedrifid · 2010-03-02T10:52:30.063Z · LW(p) · GW(p)

That sounds like a group that knows what they are doing!

Replies from: timtyler

↑ comment by timtyler · 2010-03-02T10:59:52.220Z · LW(p) · GW(p)

Indeed - the "incompetent fools create machine intelligence before anyone else and then destroy the world" scenario is just not very plausible.

Replies from: wedrifid

↑ comment by wedrifid · 2010-03-02T12:09:31.784Z · LW(p) · GW(p)

I haven't worked on any projects that are either as novel or as large as a recursively self modifying AI. On those projects that I have worked on not all of them worked without any hiccups and novelty and scope did not seem to make things any easier to pull off smoothly. It would not surprise me terribly if the first AI created does not go entirely according to plan.

Replies from: timtyler

↑ comment by timtyler · 2010-03-02T20:51:07.893Z · LW(p) · GW(p)

Sure. Looking at the invention of powered flight, some people may even die - but that is a bit different from everyone dying.

Replies from: LucasSloan

↑ comment by LucasSloan · 2010-03-03T00:17:56.246Z · LW(p) · GW(p)

Do we have any reason to believe that aeroplanes will be able to kill the human race, even if everything goes wrong?

↑ comment by JenniferRM · 2010-03-15T03:46:47.180Z · LW(p) · GW(p)

Upvoted for raising the issue, even though I disagree with your point.

The internet itself was arguably put together in the ways you describe (government funding, many people contributing various bits, etc) but as far as I'm aware, the internet itself has no clean "off button".

If it was somehow decided that the internet was a net harm to humanity for whatever reasons, then the only way to make it go away is for many, many actors to agree multilaterally and without defection that they will stop having their computers talk to other computers around the planet despite this being personally beneficial (email, voip, www, irc, torrent, etc) to themselves.

Technologies like broadcast radio and television are pretty susceptible to jamming, detection, and regulation. In contrast, the "freedom" inherent to the net may be "politically good" in some liberal and freedom-loving senses, but it makes for an abstractly troubling example of a world transforming computer technology created by large institutions with nominally positive intentions that turned out to be are hard to put back in the box. You may personally have a plan for a certain kind of off button and timer system, but that doesn't strongly predict the same will be true of other systems that might be designed and built.

Replies from: timtyler

↑ comment by timtyler · 2010-03-16T21:46:57.555Z · LW(p) · GW(p)

Right - well, you have to think something is likely to be dangerous to you in some way before you start adding paranoid safety features. The people who built the internet are mostly in a mutually beneficial relationship with it - so no problem.

I don't pretend that building a system which you can deactivate helps other people if they want to deactivate it. A military robot might have an off switch that only the commander with the right private key could activate. If that commander wants to wipe out 90% of the humans on the planet, then his "off switch" won't help them. That is not a scenario which a deliberate "off switch" is intended to help with in the first place.

↑ comment by LucasSloan · 2010-03-03T00:21:17.309Z · LW(p) · GW(p)

Why do you expect that the AI will not be able to fool the research team?

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T00:43:42.434Z · LW(p) · GW(p)

My argument isn't about the machine not sharing goals with the humans - it's about whether the humans can shut the machine down if they want to.

I argue that it is not rocket science to build a machine with a stop button - or one that shuts down at a specified time.

Such a machine would not want to fool the research team - in order to avoid shutting itself down on request. Rather, it would do everything in its power to make sure that the shut-down happened on schedule.

Many of the fears here about machine intelligence run amok are about a runaway machine that disobeys its creators. However, the creators built it. They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.

Replies from: wedrifid, LucasSloan

↑ comment by wedrifid · 2010-03-03T00:52:32.580Z · LW(p) · GW(p)

Given 30 seconds thought I can come up with ways to ensure that the universe is altered in the direction of my goals in the long term even if I happen to cease existing at a known time in the future. I expect an intelligence that is more advanced than I to be able to work out a way to substantially modify the future despite a 'red button' deadline. The task of making the AI respect the 'true spirit of a planned shutdown' shares many difficulties of the FAI problem itself.

Replies from: orthonormal, timtyler

↑ comment by orthonormal · 2010-03-03T03:39:46.237Z · LW(p) · GW(p)

You might say it's an FAI-complete problem, in the same way "building a transhuman AI you can interact with and keep boxed" is.

Replies from: wedrifid, timtyler

↑ comment by wedrifid · 2010-03-03T03:44:07.400Z · LW(p) · GW(p)

Exactly, I like the terminology.

↑ comment by timtyler · 2010-03-03T08:48:00.899Z · LW(p) · GW(p)

You think building a machine that can be stopped is the same level of difficulty as building a machine that reflects the desires of one or more humans while it is left on?

I beg to differ - stopping on schedule or on demand is one of the simplest possible problems for a machine - while doing what humans want you to do while you are switched on is much trickier.

Only the former problem needs to be solved to eliminate the spectre of a runaway superintelligence that fills the universe with its idea of utility against the wishes of its creator.

Replies from: LucasSloan

↑ comment by LucasSloan · 2010-03-03T18:55:06.620Z · LW(p) · GW(p)

Beware simple seeming wishes.

↑ comment by timtyler · 2010-03-03T08:44:56.520Z · LW(p) · GW(p)

Well, I think I went into most of this already in my "stopping superintelligence" essay.

Stopping is one of the simplest possible desires - and you have a better chance of being able to program that in than practically anything else.

I gave several proposals to deal with the possible issues associated with stopping at an unknown point resulting in plans beyond that point still being executed by minions or sub-contractors - including scheduling shutdowns in advance, ensuring a period of quiescence before the shutdown - and not running for extended periods of time.

Replies from: wedrifid

↑ comment by wedrifid · 2010-03-04T00:33:47.786Z · LW(p) · GW(p)

Stopping is one of the simplest possible desires - and you have a better chance of being able to program that in than practically anything else.

It does seem to be a safety precaution that could reduce the consequences of some possible flaws in an AI design.

↑ comment by LucasSloan · 2010-03-03T00:54:18.394Z · LW(p) · GW(p)

Such a machine would not want to fool the research team in order to avoid shutting itself down on request.

Instilling chosen desires in artificial intelligences is the major difficulty of FAI. If you haven't actually given it a utility function which will cause it to auto-shutdown, all you've done is create an outside inhibition. If it has arbitrarily chosen motivations, it will act to end that inhibition, and I see no reason why it will necessarily fail.

They are in an excellent position to install large red stop buttons and other kill switches to prevent such outcomes.

The are in an excellent position to instill values upon that intelligence that will result in an outcome they like. This doesn't mean that they will.

Replies from: timtyler, rwallace

↑ comment by timtyler · 2010-03-03T08:56:58.690Z · LW(p) · GW(p)

Re: Instilling chosen desires in artificial intelligences is the major difficulty of FAI.

That is not what I regularly hear. Instead people go on about how complicated human values are, and how reverse engineering them is so difficult, and how programming them into a machine looks like a nightmare - even once we identify them.

I assume that we will be able to program simple desires into a machine - at least to the extent of making a machine that will want to turn itself off. We regularly instill simple desires into chess computers and the like. It does not look that tricky.

Re: "If you haven't actually given it a utility function which will cause it to auto-shutdown"

Then that is a whole different ball game to what I was talking about.

Re: "The are in an excellent position to instill values upon that intelligence"

...but the point is that instilling the desire for appropriate stopping behaviour is likely to be much simpler than trying to instill all human values - and yet it is pretty effective at eliminating the spectre of a runaway superintelligence.

Replies from: LucasSloan

↑ comment by LucasSloan · 2010-03-03T18:53:47.424Z · LW(p) · GW(p)

The point about the complexity of human value is that any small variation will result in a valueless world. The point is that a randomly chosen utility function, or one derived from some simple task is not going to produce the sort of behavior we want. Or to put it more succinctly, Friendliness doesn't happen without hard work. This doesn't mean that the hardest sub-goal on the way to Friendliness is figuring out what humans want, although Eliezer's current plan is to sidestep that whole issue.

Replies from: Nick_Tarleton

↑ comment by Nick_Tarleton · 2010-03-03T18:58:10.675Z · LW(p) · GW(p)

The point about the complexity of human value is that any small variation will result in a valueless world.

s/is/isn't/ ?

Replies from: LucasSloan

↑ comment by LucasSloan · 2010-03-03T19:00:35.524Z · LW(p) · GW(p)

Fairly small changes would result is boring, valueless futures.

Replies from: Nick_Tarleton, JGWeissman

↑ comment by Nick_Tarleton · 2010-03-03T19:08:48.896Z · LW(p) · GW(p)

Okay, the structure of that sentence and the next ("the point is.... the point is....") made me think you might have made a typo. (I'm still a little confused, since I don't see how small changes are relevant to anything Tim Tyler mentioned.)

I strongly doubt that literally any small change would result in a literally valueless world.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-03T22:41:34.619Z · LW(p) · GW(p)

I strongly doubt that literally any small change would result in a literally valueless world.

People who suggest that a given change in preference isn't going to be significant are usually talking about changes that are morally fatal.

Replies from: Nick_Tarleton

↑ comment by Nick_Tarleton · 2010-03-03T22:47:30.724Z · LW(p) · GW(p)

This is probably true; I'm talking about the literal universally quantified statement.

↑ comment by JGWeissman · 2010-03-03T19:37:59.448Z · LW(p) · GW(p)

I would have cited Value is Fragile to support this point.

Replies from: LucasSloan

↑ comment by LucasSloan · 2010-03-03T19:40:09.295Z · LW(p) · GW(p)

That's also good.

↑ comment by rwallace · 2010-03-03T01:06:49.938Z · LW(p) · GW(p)

Leaving aside the other reasons why this scenario is unrealistic, one of the big flaws in it is the assumption that a mind decomposes into an engine plus a utility function. In reality, this decomposition is a mathematical abstraction we use in certain limited domains because it makes analysis more tractable. It fails completely when you try to apply it to life as a whole, which is why no humans even try to be pure utilitarians. Of course if you postulate building a superintelligent AGI like that, it doesn't look good. How would it? You've postulated starting off with a sociopath that considers itself licensed to commit any crime whatsoever if doing so will serve its utility function, and then trying to cram the whole of morality into that mathematical function. It shouldn't be any surprise that this leads to absurd results and impossible research agendas. That's the consequence of trying to apply a mathematical abstraction outside the domain in which it is applicable.

Replies from: LucasSloan, FAWS, timtyler

↑ comment by LucasSloan · 2010-03-03T01:27:15.844Z · LW(p) · GW(p)

Are you arguing with me or timtyler?

If me, I totally agree with you as to the difficulty of actually getting desirable (or even predictable) behavior out of a super intelligence. My statement was one of simplicity not actuality. But given the simplistic model I use, calling the AI sans utility function sociopathic is incorrect - it wouldn't do anything if it didn't have the other module. The fact that humans cannot act as proper utilitarians does not mean that a true utilitarian is a sociopath who just happens to care about the right things.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T02:46:59.003Z · LW(p) · GW(p)

Okay then, "instant sociopath, just add a utility function" :)

I'm arguing against the notion that the key to Friendly AI is crafting the perfect utility function. In reality, for anything anywhere near as complex as an AGI, what it tries to do and how it does it are going to be interdependent; there's no way to make a lot of progress on either without also making a lot of progress on the other. By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.

Replies from: orthonormal, LucasSloan

↑ comment by orthonormal · 2010-03-03T03:23:17.069Z · LW(p) · GW(p)

A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we'd want it turned off.

Otherwise you're just betting that you can see the problem before the AGI can prevent you from hitting the switch (or prevent you from wanting to hit the switch, which amounts to the same), and I wouldn't make complicated bets for large stakes against potentially much smarter agents, no matter how much I thought I'd covered my bases.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T03:48:26.495Z · LW(p) · GW(p)

A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we'd want it turned off.

Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases. That does of course mean we shouldn't plan on building an AGI that wants to follow its own agenda, with the intent of enslaving it against its will - that would clearly be foolish. But it doesn't mean we either can or need to count on starting off with an AGI that understands our requirements in more complex cases.

Replies from: orthonormal

↑ comment by orthonormal · 2010-03-03T03:51:33.642Z · LW(p) · GW(p)

Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases.

That's deceptively simple-sounding.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T04:07:06.285Z · LW(p) · GW(p)

Of course it's not going to be simple at all, and that's part of my point: no amount of armchair thought, no matter how smart the thinkers, is going to produce a solution to this problem until we know a great deal more than we presently do about how to actually build an AGI.

↑ comment by LucasSloan · 2010-03-03T02:58:00.117Z · LW(p) · GW(p)

"instant sociopath, just add a utility function"

"instant sociopath, just add a disutility function"

I'm arguing against the notion that the key to Friendly AI is crafting the perfect utility function.

I agree with this. The key is not expressing what we want, it's figuring out how to express anything.

By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.

If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of "shut down when we push that button, and don't stop us from doing so...").

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T03:38:42.322Z · LW(p) · GW(p)

"instant sociopath, just add a disutility function"

That is how it would turn out, yes :-)

If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of "shut down when we push that button, and don't stop us from doing so...").

Well, up to a point. It would mean we have the means to make the system understand simple requirements, not necessarily complex ones. If an AGI reliably understands 'shut down now', it probably also reliably understands 'translate this document into Russian' but that doesn't necessarily mean it can do anything with 'bring about world peace'.

Replies from: wedrifid

↑ comment by wedrifid · 2010-03-03T03:46:35.253Z · LW(p) · GW(p)

If an AGI reliably understands 'shut down now', it probably also reliably understands 'translate this document into Russian' but that doesn't necessarily mean it can do anything with 'bring about world peace'.

Unfortunately, it can, and that is one of the reasons we have to be careful. I don't want the entire population of the planet to be forcibly sedated.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T04:11:13.043Z · LW(p) · GW(p)

I don't want the entire population of the planet to be forcibly sedated.

Leaving aside other reasons why that scenario is unrealistic, it does indeed illustrate why part of building a system that can reliably figure out what you mean by simple instructions, is making sure that when it's out of its depth, it stops with an error message or request for clarification instead of guessing.

Replies from: wedrifid

↑ comment by wedrifid · 2010-03-03T04:35:32.110Z · LW(p) · GW(p)

I think the problem is knowing when not to believe humans know what they actually want.

↑ comment by FAWS · 2010-03-03T01:18:16.949Z · LW(p) · GW(p)

Any set of preferances can be represented as a sufficietly complex utility function.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T01:29:19.811Z · LW(p) · GW(p)

Sure, but the whole point of having the concept of a utility function, is that utility functions are supposed to be simple. When you have a set of preferences that isn't simple, there's no point in thinking of it as a utility function. You're better off just thinking of it as a set of preferences - or, in the context of AGI, a toolkit, or a library, or command language, or partial order on heuristics, or whatever else is the most useful way to think about the things this entity does.

Replies from: timtyler, wedrifid

↑ comment by timtyler · 2010-03-03T09:11:12.327Z · LW(p) · GW(p)

Re: "When you have a set of preferences that isn't simple, there's no point in thinking of it as a utility function."

Sure there is - say you want to compare the utility functions of two agents. Or compare the parts of the agents which are independent of the utility function. A general model that covers all goal-directed agents is very useful for such things.

↑ comment by wedrifid · 2010-03-03T01:41:20.140Z · LW(p) · GW(p)

(Upvoted but) I would say utility functions are supposed to be coherent, albeit complex. Is that compatible with what you are saying?

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T02:16:12.734Z · LW(p) · GW(p)

Er, maybe? I would say a utility function is supposed to be simple, but perhaps what I mean by simple is compatible with what you mean by coherent, if we agree that something like 'morality in general' or 'what we want in general' is not simple/coherent.

↑ comment by timtyler · 2010-03-03T09:08:12.369Z · LW(p) · GW(p)

Humans regularly use utilitly-based agents - to do things like play the stockmarket. They seem to work OK to me. Nor do I agree with you about utility-based models of humans. Basically, most of your objections seem irrelevant to me.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T10:30:10.807Z · LW(p) · GW(p)

When studying the stock market, we use the convenient approximation that people are utility maximizers (where the utility function is expected profit). But this is only an approximation, useful in this limited domain. Would you commit murder for money? No? Then your utility function isn't really expected profit. Nor, as it turns out, is it anything else that can be written down - other than "the sum total of all my preferences", at which point we have to acknowledge that we are not utility maximizers in any useful sense of the term.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T11:28:34.996Z · LW(p) · GW(p)

"We" don't have to acknowledge that.

I've gone over my views on this issue before - e.g. here:

http://lesswrong.com/lw/1qk/applying_utility_functions_to_humans_considered/1kfj

If you reject utility-based frameworks in this context, then fine - but I am not planning to rephrase my point for you.

Replies from: rwallace

↑ comment by rwallace · 2010-03-03T11:36:11.590Z · LW(p) · GW(p)

Right, I hadn't read your comments in the other thread, but they are perfectly clear, and I'm not asking you to rephrase. But the key term in my last comment is in any useful sense. I do reject utility-based frameworks in this context because their usefulness has been left far behind.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T11:57:23.379Z · LW(p) · GW(p)

Personally, I think a utilitarian approach is very useful for understanding behaviour. One can model most organisms pretty well as expected fitness maximisers with limited resources. That idea is the foundation of much evolutionary psychology.

Replies from: Morendil

↑ comment by Morendil · 2010-03-03T12:13:18.079Z · LW(p) · GW(p)

The question isn't whether the model is predictively useful with respect to most organisms, it's whether it is predictively useful with respect to a hypothetical algorithm which replicates salient human powers such as epistemic hunger, model building, hierarchical goal seeking, and so on.

Say we're looking to explain the process of inferring regularities (such as physical laws) by observing one's environment - what does modeling this as "maximizing a utility function" buy us?

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T13:04:13.458Z · LW(p) · GW(p)

In comparison with what?

The main virtues of utility-based models are that they are general - and so allow comparisons across agents - and that they abstract goal-seeking behaviour away from the implementation details of finite memories, processing speed, etc - which helps if you are interested in focusing on either of those areas.

↑ comment by wnoise · 2010-03-02T17:08:33.004Z · LW(p) · GW(p)

You have far too much faith in large groups.

Replies from: timtyler

↑ comment by timtyler · 2010-03-02T20:57:58.124Z · LW(p) · GW(p)

That is a pretty vague criticism - you don't say whether you are critical of the idea the idea that large groups will be responsible for machine intelligence or the idea that they are unlikely to build a murderous machine intelligence that destroys all humans.

Replies from: wnoise

↑ comment by wnoise · 2010-03-03T06:31:52.674Z · LW(p) · GW(p)

I'm critical of the idea that given a large group builds a machine intelligence, they will be unlikely to build a murderous (or otherwise severely harmful) machine intelligence.

Consider that engineering developed into a regulated profession only after several large scale disasters. Even still, there are notable disasters from time to time. Now consider the professionalism of the average software developer and their average manager. A disaster in this context could be far greater than the loss of everyone in the lab or facility.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T09:35:22.402Z · LW(p) · GW(p)

Right - well, some people may well die. I expect some people died at the hands of the printing press - probably through starvation and malnutrition. Personally, I expect all those saved from gruesome deaths in automobile accidents are likely to vastly outnumber them in this case - but that is another issue.

Anyway, I am not arguing that nobody will die. The idea I was criticising was that "we all die".

My favoured example of IT company gone bad is Microsoft. IMO, Microsoft have done considerable damage to the computing industry, over an extended period of time - illustrating how programs can be relatively harmful. However, "even" a Microsoft superintelligence seems unlikely to kill everyone.

↑ comment by Bo102010 · 2010-03-02T13:23:55.796Z · LW(p) · GW(p)

http://lesswrong.com/lw/bx/great_books_of_failure/8fa seems relevant.

↑ comment by wedrifid · 2010-03-02T10:51:28.051Z · LW(p) · GW(p)

So you are saying they will know what they are doing?"

↑ comment by wedrifid · 2010-03-02T10:50:01.562Z · LW(p) · GW(p)

cough Outside view.

comment by Vladimir_Nesov · 2010-03-01T10:25:38.315Z · LW(p) · GW(p)

Focusing on slow-developing uploads doesn't cause slower development of other forms of AGI. Uploads themselves can't be expected to turn into FAIs without developing the (same) clean theory of de novo FAI (people are crazy, and uploads are no exception; this is why we have existential risk in the first place, even without any uploads). It's very hard to incrementally improve uploads' intelligence without affecting their preference, and so that won't happen on the first steps from vanilla humans, and pretty much can't happen unless we already have a good theory of preference, which we don't. We can't hold constant a concept (preference/values) that we don't understand (and as a magical concept, it's only held in the mind; any heuristics about it easily break when you push possibilities in the new regions). It's either (almost) no improvement (keep the humans until there is FAI theory), or value drift (until you become intelligent/sane enough to stop and work on preserving preference, but by then it won't be human preference); you obtain not-quite-Friendly AI in the end.

The only way in which uploads might help on the way towards FAI is by being faster (or even smarter/saner) FAI theorists, but in this regard they may accelerate the arrival of existential risks as well (especially the faster uploads that are not smarter/saner). To apply uploads specifically to FAI as opposed to generation of more existential risk, they have to be closely managed, which may be very hard to impossible once the tech gets out.

Replies from: CarlShulman, Jordan

↑ comment by CarlShulman · 2010-03-01T17:57:41.561Z · LW(p) · GW(p)

The only way in which uploads might help on the way towards FAI is by being faster (or even smarter/saner) FAI theorists, but in this regard they may accelerate the arrival of existential risks as well (especially the faster uploads that are not smarter/saner).

Emulations could also enable the creation of a singleton capable of globally balancing AI development speeds and dangers. That singleton could then take billions of subjective years to work on designing safe and beneficial AI. If designing safe AI is much, much harder than building AI at all, or if knowledge of AI and safe AI are tightly coupled, such a singleton might be the most likely route to a good outcome.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-01T20:19:35.010Z · LW(p) · GW(p)

I agree, if you construct this upload-aggregate and manage to ban other uses for the tech. This was reflected in the next sentence of my comment (maybe not too clearly):

To apply uploads specifically to FAI as opposed to generation of more existential risk, they have to be closely managed, which may be very hard to impossible once the tech gets out.

Replies from: utilitymonster, Jordan

↑ comment by utilitymonster · 2010-12-13T05:30:00.695Z · LW(p) · GW(p)

Especially if WBE comes late (so there is a big hardware overhang), you wouldn't need a lot of time to spend loads of subjective years designing FAI. A small lead time could be enough. Of course, you'd have to be first and have significant influence on the project.

Edited for spelling.

↑ comment by Jordan · 2010-03-01T21:57:14.226Z · LW(p) · GW(p)

I don't think this would be impossibly difficult. If an aggressive line of research is pursued then the first groups to create an upload will be using hardware that would make immediate application of the technology difficult. Commercialization likely wouldn't follow for years. That would potentially give government plenty of time to realize the potential of the technology and put a clamp on it.

At that point the most important thing is that the government (or whatever regulatory body will have oversight of the upload aggregate) is well informed enough to realize what they are dealing with and have sense enough to deal with it properly. To that end, one of the most important things we can be doing now is trying to insure that that regulatory body will be well informed enough when the day comes.

↑ comment by Jordan · 2010-03-01T21:44:44.579Z · LW(p) · GW(p)

There is going to be value drift even if we get an FAI. Isn't that inherent in extrapolated volition? We don't really want our current values, we want the values we'll have after being smarter and having time to think deeply about them. The route of WBE simply takes the guess work out: actually make people smarter, and then see what the drifted values are. Of course, it's important to keep a large, diverse culture in the process, so that the whole can error correct for individuals that go off the deep end, analogous to why extrapolated volition would be based on the entire human population rather than a single person.

Replies from: andreas, Vladimir_Nesov

↑ comment by andreas · 2010-03-02T02:37:55.861Z · LW(p) · GW(p)

Here is a potentially more productive way of seeing this situation: We do want our current preferences to be made reality (because that's what the term preference describes), but we do not know what our preferences look like, part of the reason being that we are not smart enough and do not have enough time to think about what they are. In this view, our preferences are not necessarily going to drift if we figure out how to refer to human preference as a formal object and if we build machines that use this object to choose what to do — and in this view, we certainly don't want our preferences to drift.

On the other hand, WBE does not "simply take the guess work out". It may be the case that the human mind is built such that "making people smarter" is feasible without changing preference much, but we don't know that this is the case. As long as we do not have a formal theory of preference, we cannot strongly believe this about any given intervention – and if we do have such a theory, then there exist better uses for this knowledge.

Replies from: Jordan

↑ comment by Jordan · 2010-03-02T07:06:02.820Z · LW(p) · GW(p)

We do want our current preferences to be made reality (because that's what the term preference describes)

Yes, but one of our preferences may well be that we are open to an evolution of our preferences. And, whether or not that is one of our preferences, it certainly is the cases that preferences do evolve over time, and that many consider that a fundamental aspect of the human condition.

It may be the case that the human mind is built such that "making people smarter" is feasible without changing preference much, but we don't know that this is the case.

I agree we don't know that is the case, and would assume that it isn't.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-02T08:41:32.084Z · LW(p) · GW(p)

Yes, but one of our preferences may well be that we are open to an evolution of our preferences. And, whether or not that is one of our preferences, it certainly is the cases that preferences do evolve over time, and that many consider that a fundamental aspect of the human condition.

Any notion of progress (what we want is certainly not evolution) can be captured as a deterministic criterion.

Replies from: Jordan

↑ comment by Jordan · 2010-03-03T18:18:47.600Z · LW(p) · GW(p)

Obviously I meant 'evolution' in the sense of change over time, not change specifically induced by natural selection.

As to a deterministic criterion, I agree that such a thing is probably possible. But... so what? I'm not arguing that FAI isn't possible. The topic at hand is FAI research relative to WBE. I'm assuming a priori that both are possible. The question is which basket should get more eggs.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-03T22:03:06.684Z · LW(p) · GW(p)

But... so what? I'm not arguing that FAI isn't possible. The topic at hand is FAI research relative to WBE. I'm assuming a priori that both are possible. The question is which basket should get more eggs.

You said:

Yes, but one of our preferences may well be that we are open to an evolution of our preferences.

This is misuse of the term "preference". "Preference", in the context of this discussion, refers specifically to that which isn't to be changed, ever. This point isn't supposed to be related to WBE vs. FAI discussion, it's about a tool (the term "preference") used in leading this discussion.

Replies from: Jordan

↑ comment by Jordan · 2010-03-12T00:59:29.683Z · LW(p) · GW(p)

Your definition is too narrow for me to accept. Humans are complicated. I doubt we have a core set of "preferences" (by your definition) which can be found with adequate introspection. The very act of introspection itself changes the human and potentially their deepest preferences (normal definition)!

I have some preferences which satisfy your definition, but I wouldn't consider them my core, underlying preferences. The vast majority of preferences I hold do not qualify. I'm perfectly OK with them changing over time, even the ones that guide the overarching path of my life. Yes, the change in preferences is often caused by other preferences, but to think that this causal chain can be traced back to a core preference is unjustified, in my opinion. There could just as well be closed loops in the causal tree.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-12T12:31:05.636Z · LW(p) · GW(p)

You are disputing definitions! Of course, there are other natural ways to give meaning to the word "preference", but they are not as useful in discussing FAI as the comprehensive unchanging preference. It's not supposed to have much in common with likes or wants, and with their changes, though it needs to, in particular, describe what they should be, and how they should change. Think of your preference as that particular formal goal system that it is optimal, from your point of view (on reflection, if you knew more, etc.), to give to a Strong AI.

Your dislike for application of the label "preference" to this concept, and ambiguity that might introduce, needs to be separated from consideration of the concept itself.

Replies from: Jordan

↑ comment by Jordan · 2010-03-12T22:00:07.160Z · LW(p) · GW(p)

I specifically dispute the usefulness of your definition. It may be a useful definition in the context of FAI theory. We aren't discussing FAI theory.

And, to be fair, you were originally the one disputing definitions. In my post I used the standard definition of 'preference', which you decided was 'wrong', saying

This is misuse of the term "preference"

rather than accepting the implied (normal!) definition I had obviously used.

Regardless, it seems unlikely we'll be making any progress on the on-topic discussion even if we resolve this quibble.

Replies from: Vladimir_Nesov, Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-13T01:25:22.964Z · LW(p) · GW(p)

I specifically dispute the usefulness of your definition. It may be a useful definition in the context of FAI theory. We aren't discussing FAI theory.

But we do. Whether a particular action is going to end well for humanity is a core consideration in Friendliness. When you say

The route of WBE simply takes the guess work out: actually make people smarter, and then see what the drifted values are.

if it's read as implying that this road is OK, it is a factual claim about how preferable (in my sense) the outcome is going to be. The concept of preference (in my sense) is central to evaluating the correctness of your factual claim.

Replies from: Jordan

↑ comment by Jordan · 2010-03-13T02:19:31.167Z · LW(p) · GW(p)

The concept of preference (in my sense) is central to evaluating the correctness of your factual claim.

Your concept of preference is one way of evaluating the correctness of my claim, I agree. If you can resolve the complex web of human preferences (in my sense) into a clean, non-contradictory, static preference system (your sense) then you can use that system to judge the value of the hypothetical future in which WBE research overran FAI research.

It's not clear to me that this is the only way to evaluate my claim, or that it is even a reasonable way. My understanding of FAI is that arriving at such a resolution of human preferences is a central ingredient to building an FAI, hence using your method to evaluate my claim would require more progress on FAI. But the entire point of this discussion is to decide if we should be pushing harder for progress on FAI or WBE. I'll grant that this is a point in favor for FAI -- that it allows for a clearer evaluation of the very problem we're discussing -- but, beyond that, I think we must rely on the actual preferences we have access to now (in my sense: the messy, human ones) to further our evaluations of FAI and WBE.

Replies from: andreas, Vladimir_Nesov

↑ comment by andreas · 2010-03-13T03:05:04.044Z · LW(p) · GW(p)

It's not clear to me that this is the only way to evaluate my claim, or that it is even a reasonable way. My understanding of FAI is that arriving at such a resolution of human preferences is a central ingredient to building an FAI, hence using your method to evaluate my claim would require more progress on FAI.

If your statement ("The route of WBE simply takes the guess work out") were a comparison between two routes similar in approach, e.g. WBE and neuroenhancement, then you could argue that a better formal understanding of preference would be required before we could use the idea of "precise preference" to argue for one approach or the other.

Since we are comparing one option which does not try to capture preference precisely with an option that does, it does not matter what exactly precise preference says about the second option: Whatever statement our precise preferences make, the second option tries to capture it whereas the first option makes no such attempt.

Replies from: Jordan

↑ comment by Jordan · 2010-03-14T06:06:25.258Z · LW(p) · GW(p)

The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess. Afterwards the next guess as to our fundamental preference is likely different, so the process iterates. The iteration is trying to evolve towards what the agent thinks is its exact preference. The iteration is simply doing so to some sort of "first order" approximation.

For the first option, I think self-modification under the direction of current, apparent preferences should be done with extreme caution, so as to get a better 'approximation' at each step. For the second option though, it's hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.

Replies from: andreas

↑ comment by andreas · 2010-03-14T08:52:09.940Z · LW(p) · GW(p)

The first option tries to capture our best current guess as to our fundamental preference. It then updates the agent (us) based on that guess.

This guess may be awful. The process of emulation and attempts to increase the intelligence of the emulations may introduce subtle psychological changes that could affect the preferences of the persons involved.

For subsequent changes based on "trying to evolve towards what the agent thinks is its exact preference" I see two options: Either they are like the first change, open to the possibility of being arbitrarily awful due to the fact that we do not have much introspective insight into the nature of our preferences, and step by step we lose part of what we value — or subsequent changes consist of the formalization and precise capture of the object preference, in which case the situation must be judged depending on how much value was lost in the first step vs how much value was gained by having emulations work on the project of formalization.

For the second option though, it's hard for me to imagine ever choosing to self-modify into an agent with exact, unchanging preferences.

This is not the proposal under discussion. The proposal is to build a tool that ensures that things develop according to our wishes. If it turns out that our preferred (in the exact, static sense) route of development is through a number of systems that are not reflectively consistent themselves, then this route will be realized.

Replies from: Jordan

↑ comment by Jordan · 2010-03-15T01:21:33.411Z · LW(p) · GW(p)

This guess may be awful.

It may be horribly awful, yes. The question is "how likely is it be awful?"

If FAI research can advance fast enough then we will have the luxury of implementing a coherent preference system that will guarantee the long term stability of our exact preferences. In an ideal world that would be the path we took. In the real world there is a downside to the FAI path: it may take too long. The benefit of other paths is that, although they would have some potential to fail even if executed in time, they offer a potentially faster time table.

I'll reiterate: yes, of course FAI would be better than WBE, if both were available. No, WBE provides no guarantee and could lead to horrendous preference drift. The questions are: how likely is WBE to go wrong? how long is FAI likely to take? how long is WBE likely to take? And, ultimately, combining the answers to those questions together: where should we be directing our research?

Your post points out very well that WBE might go wrong. It gives no clue to the likelihood though.

Replies from: andreas

↑ comment by andreas · 2010-03-15T02:00:07.575Z · LW(p) · GW(p)

Good, this is progress. Your comment clarified your position greatly. However, I do not know what you mean by "how long is WBE likely to take?" — take until what happens?

Replies from: Jordan

↑ comment by Jordan · 2010-03-15T23:05:21.858Z · LW(p) · GW(p)

The amount of time until we have high fidelity emulations of human brains. At that point we can start modifying/enhancing humans, seeking to create a superintelligence or at least sufficiently intelligent humans that can then create an FAI. The time from first emulation to superintelligence is nonzero, but is probably small compared to the time to first emulation. If we have reason to believe that the additional time is not small we should factor in our predictions for it as well.

Replies from: andreas

↑ comment by andreas · 2010-03-15T23:39:25.970Z · LW(p) · GW(p)

My conclusion from this discussion is that our disagreement lies in the probability we assign that uploads can be applied safely to FAI as opposed to generating more existential risk. I do not see how to resolve this disagreement right now. I agree with your statement that we need to make sure that those involved in running uploads understand the problem of preserving human preference.

Replies from: Jordan

↑ comment by Jordan · 2010-03-17T03:10:49.925Z · LW(p) · GW(p)

I'm not entirely sure how to resolve that either. However, it isn't necessary for us to agree on that probability to agree on a course of action.

What probability would you assign to uploads being used safely? What do your probability distributions look like for the ETA of uploads, FAI and AGI?

↑ comment by Vladimir_Nesov · 2010-03-13T02:31:02.518Z · LW(p) · GW(p)

We do understand something about exact preferences in general, without knowing which one of them is ours. In particular, we do know that drifting from whatever preference we have is not preferable.

Replies from: Jordan

↑ comment by Jordan · 2010-03-14T06:00:01.117Z · LW(p) · GW(p)

I agree. If our complex preferences can be represented as exact preferences then any drift from those exact preferences would be necessarily bad. However, it's not clear to me that we actually would be drifting from our exact preference were we to follow the path of WBE.

It's clear that the preferences we currently express most likely aren't our exact preferences. The path of WBE could potentially lead to humans with fundamentally different exact preferences (bad), or it could simply lead to humans with the same exact preferences but with a different, closer expression of them in the surface preferences they actually present and are consciously aware of (good). Or the path could lead to someplace in between, obviously. Any drift is bad, I agree, but small enough drift could be acceptable if the trade off is good enough (such as preventing a negative singularity).

By the way, I move to label your definition "exact preference" and mine "complex preference". Unless the context is clear, in which case we can just write "preference". Thoughts?

↑ comment by Vladimir_Nesov · 2010-03-13T01:16:54.067Z · LW(p) · GW(p)

And, to be fair, you were originally the one disputing definitions. In my post I used the standard definition of 'preference', which you decided was 'wrong', [...] rather than accepting the implied (normal!) definition I had obviously used.

You are right, I was wrong to claim authority over the meaning of the term as you used it. The actual problem was in you misinterpreting its use in andreas's comment, where it was used in my sense:

We do want our current preferences to be made reality (because that's what the term preference describes)

↑ comment by Vladimir_Nesov · 2010-03-02T08:00:13.691Z · LW(p) · GW(p)

There is going to be value drift even if we get an FAI. Isn't that inherent in extrapolated volition?

No. Progress and development may be part of human preference, but it is entirely OK for a fixed preference to specify progress happening in a particular way, as opposed to other possible ways. Furthermore, preference can be fixed and still not knowable in advance (so that there are no spoilers, and moral progress happens through your effort and not dictated "from above").

It's not possible to efficiently find out some properties of a program, even if you have its whole source code; this source code doesn't change, but the program runs - develops - in novel and unexpected ways. Or course, the unexpected needs to be knowably good, not just "unexpected" (see for example Expected Creative Surprises).

Replies from: Jordan

↑ comment by Jordan · 2010-03-02T08:16:28.898Z · LW(p) · GW(p)

I agree that such a fixed preference system is possible. But I don't think that it needs to be implemented in order for "moral progress" to be indefinitely sustainable in a positive fashion. I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly, and is this likelihood great enough that it offsets the negatives associated with FAI research (namely the potentially very long timescales needed).

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-02T08:32:50.762Z · LW(p) · GW(p)

I think humans are capable of guiding their own moral progress without their hands being held. Will the result be provably friendly? No, of course not. The question is how likely is the result to be friendly

The strawman of "provable friendliness" again. It's not about holding ourselves to an inadequately high standard, it's about figuring out what's going on, in any detail. (See this comment.)

If we accept that preference is complex (holds a lot of data), and that detail in preference matters (losing a relatively small portion of this data is highly undesirable), then any value drift is bad, and while value drift is not rigorously controlled, it's going to lead its random walk further and further away from the initial preference. As a result, from the point of view of the initial preference, the far future is pretty much lost, even if each individual step of the way doesn't look threatening. The future agency won't care about the past preference, and won't reverse to it, because as a result of value drift it already has different preference, and for it returning to the past is no longer preferable. This system isn't stable, deviations in preference don't correct themselves, if the deviated-preference agency has control.

Replies from: Jordan

↑ comment by Jordan · 2010-03-03T18:34:01.668Z · LW(p) · GW(p)

The strawman of "provable friendliness" again.

I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.

This system isn't stable, deviations in preference don't correct themselves, if the deviated-preference agency has control.

I disagree that we know this. Certainly the system hasn't stabilized yet, but how can you make such a broad statement about the future evolution of human preference? And, in any case, even if there were no ultimate attractor in the system, so what? Human preferences have changed over the centuries. My own preferences have changed over the years. I don't think anyone is arguing this is a bad thing. Certainly, we may be able to build a system that replaces our "sloppy" method of advancement for a deterministic system with an immutable set of preferences at its core. I disagree this is necessarily superior to letting preferences evolve in the same way they have been, free of an overseer. But that disagreement of ours is still off topic.

The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-03-03T22:27:17.114Z · LW(p) · GW(p)

I fail to see how my post was a straw man. I was pointing out a deficiency in what I am supporting, not what you are supporting.

It shouldn't matter who supports what. If you suddenly agree with me on some topic, you still have to convince me that you did so for the right reasons, and didn't accept a mistaken argument or mistaken understanding of an argument (see also "belief bias"). If such is to be discovered, you'd have to make a step back, and we both should agree that it's the right thing to do.

The "strawman" (probably a wrong term in this context) is in making a distinction between "friendliness" and "provable friendliness". If you accept that the distinction is illusory, the weakness of non-FAI "friendliness" suddenly becomes "provably fatal".

This system isn't stable, deviations in preference don't correct themselves, if the deviated-preference agency has control.

I disagree that we know this. Certainly the system hasn't stabilized yet, but how can you make such a broad statement about the future evolution of human preference?

Stability is a local property around a specific point, that states that sufficiently small deviations from that point will be followed by corrections back to it, so that the system will indefinitely remain in the close proximity of that point, provided it's not disturbed too much.

Where we replace ourselves with agency of slightly different preference, this new agency has no reason to correct backwards to our preference. If it is not itself stable (that is, it hasn't built its own FAI), then the next preference shift it'll experience (in effectively replacing itself with yet different preference agency) isn't going to be related to the first shift, isn't going to correct it. As a result, value is slowly but inevitably lost. This loss of value only stops when the reflective consistency is finally achieved, but it won't be by an agency that exactly shares your preference. Thus, even when you've lost a fight for specifically your preference, the only hope is for the similar-preference drifted agency to stop as soon as possible (as close to your preference as possible), to develop its FAI. (See also: Friendly AI: a vector for human preference.)

My own preferences have changed over the years. I don't think anyone is arguing this is a bad thing.

The past-you is going to prefer your preference not to change, even though current-you would prefer your preference to be as it now is. Note that preference has little to do with likes or wants, so you might be talking about surface reactions to environment and knowledge, not the eluding concept of what you'd prefer in the limit of reflection. (See also: "Why Socialists don't Believe in Fun", Eutopia is Scary.)

The topic is whether FAI or WBE research is better for existential risk reduction. The pertinent question is what are the likelihoods of each leading to what we would consider a positive singularity, and, more importantly, how do those likelihoods change as a function of our directed effort?

And to decide this question, we need a solid understanding of what counts as a success or failure. The concept of preference is an essential tool in gaining this understanding.

comment by Mitchell_Porter · 2010-03-01T04:42:25.710Z · LW(p) · GW(p)

Okay, let's go on the brain-simulation path. Let's start with something simple, like a lobster or a dog... oh wait, what if it transcends and isn't human-friendly. All right, we'll stick to human brains... oh wait, what if our model of neural function is wrong and we create a sociopathic copy that isn't human-friendly. All right, we'll work on human brain regions separately, and absolutely make sure that we have them all right before we do a whole brain... oh wait, what if one of our partial brain models transcends and isn't human-friendly.

And while you, whose reason for taking this path is to create a human-friendly future, struggle to avoid these pitfalls, there will be others who aren't so cautious, and who want to conduct experiments like hotwiring together cognitive modules that are merely brain-inspired, just to see what happens, or in the expectation of something cool, or because they want a smarter vacuum cleaner.

Replies from: JamesAndrix, Peter_de_Blanc, AngryParsley, Jordan

↑ comment by JamesAndrix · 2010-03-01T07:30:28.577Z · LW(p) · GW(p)

We don't have to try and upgrade any virtual brains to get most of the benefits.

If we could today create an uploaded dog brain that's just a normal virtual dog running at 1/1000th realtime, that would be a huge win with no meaningful risk. That would lead us down a relatively stable path of obscenely expensive and slow uploads becoming cheaper every year. In this case cheaper means fast and also more numerous, At the start human society can handle a few slightly superior uploads, by the time uploads get way past us, they will be a society of themselves and on roughly equal footing. (this may be bad for people still running at realtime, but human values will persist)

The dangers of someone making a transcendent AI first are there no matter what. This is not a good argument against a FASTER way to get to safe superintelligence.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2010-03-01T08:14:11.531Z · LW(p) · GW(p)

So, in this scenario we have obtained a big neural network that imprints on a master and can learn complex physical tasks... and we're just going to ignore the implications of that while we concentrate on trying to duplicate ourselves?

What's going to stop me from duplicating just the canine prefrontal cortex and experimenting with it? It's a nice little classifier / decision maker, I'm sure it has other uses...

Just the capacity to reliably emulate major functional regions of vertebrate brain already puts you on the threshold of creating big powerful nonhuman AI. If puploads come first, they'll be doing more than catching frisbees in Second Life.

Replies from: Kaj_Sotala, JamesAndrix, Jordan

↑ comment by Kaj_Sotala · 2010-03-01T15:49:03.733Z · LW(p) · GW(p)

I realize that this risk is kinda insignficant compared to the risk of all life on Earth being wiped out... But I'm more than a little scared of the thought of animal uploads, and the possibility of people creating lifeforms that can be cheaply copied and replicated, without them needing to have any of the physical features that usually elict sympathy from people. We already have plenty of people abusing their animals today, and being able to do it perfectly undetected on your home upload isn't going to help things.

To say nothing about when it becomes easier to run human uploads. I just yesterday re-read a rather disturbing short story about the stuff you could theoretically do with body-repairing nanomachines, a person who enjoys abusing others, and a "pet substitute". Err, a two-year old human child, that is.

↑ comment by JamesAndrix · 2010-03-02T03:19:01.176Z · LW(p) · GW(p)

The supercomputers will be there whether we like it or not. Some of what they run will be attempts at AI. This is so far the only approach that someone unaware of Friendliness issues has a high probability of trying and succeeding with (and not immediately killing us all)

Numerous Un-augmented accelerated uplaods is a narrow safe path, and one we probably won't follow, but it is a safe path. (so far one of 2, so it's important) I think the likely win is less than FAI, but the dropoff isn't so steep either as you walk off the path. Any safe AI approach will suggest profitable nonsafe alternatives.

An FAI failure is almost certainly alien, or not there yet. An augmentation failure is probably less-capable, probably not hostile, probably not strictly executing a utility function, and above all: can be surrounded by other, faster, uploads.

If the first pupload costs half a billion dollars and runs very slow, then even tweaking it will be safer than say, letting neural nets evolve in a rich environment on the same hardware.

↑ comment by Jordan · 2010-03-01T21:35:11.114Z · LW(p) · GW(p)

What's going to stop me from duplicating just the canine prefrontal cortex and experimenting with it? It's a nice little classifier / decision maker, I'm sure it has other uses...

What's going to stop you is that the prefrontal cortex is just one part of a larger whole. It may be possible to isolate that part, but doing so may be very difficult. Now, if your simulation were running in real time, you could just spawn off a bunch of different experiments pursuing different ideas for how to isolate and use the prefrontal cortex, and just keep doing this until you find something that works. But, if your simulation is running at 1/10000th realtime, as JamesAndrix suggests in his hypothetical, the prospects of this type of method seem dim.

Of course, maybe the existence of the dog brain simulation is sufficient to spur advances in neuroscience to the point where you could just isolate the functioning of the cortex, without the need for millions of experimental runs. Even so, your resulting module is still going to be too slow to be an existential threat.

Just the capacity to reliably emulate major functional regions of vertebrate brain already puts you on the threshold of creating big powerful nonhuman AI.

The threshold, yes. But that threshold is still nontrivial to cross. The question is, given that we can reliably emulate major functional regions of the brain, is it easier to cross the threshold to nonhuman AI, or to full emulations of humans? There is virtually no barrier to the second threshold, while the first one still has nontrivial problems to be solved.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2010-03-02T02:30:39.724Z · LW(p) · GW(p)

It may be possible to isolate that part, but doing so may be very difficult.

Why would it be difficult? How would it be difficult?

There is a rather utopian notion of mind uploading, according to which you blindly scan a brain without understanding it, and then turn that data directly into a simulation. I'm aware of two such scanning paradigms. In one, you freeze the brain, microtome it, and then image the sections. In the other you do high-resolution imaging of the living brain (e.g. fMRI) and then you construct a state-machine model for each small 3D volume.

To turn the images of those microtomed sections into an accurate dynamical model requires a lot of interpretive knowledge. The MRI-plus-inference pathway sounds much more plausible as a blind path to brain simulation. But either way, you are going to know what the physical 3D location of every element in your simulation was, and functional neuroanatomy is already quite sophisticated. It won't be hard to single out the sim-neurons specific to a particular anatomical macroregion.

There is virtually no barrier to the second threshold, while the first one still has nontrivial problems to be solved.

If you can simulate a human, you can immediately start experimenting with nonhuman cognitive architectures by lobotomizing or lesioning the simulation. But this would already be true for simulated animal brains as well.

Replies from: Jordan

↑ comment by Jordan · 2010-03-02T07:12:57.091Z · LW(p) · GW(p)

It won't be hard to single out the sim-neurons specific to a particular anatomical macroregion.

That's true, but ultimately the regions of the brain are not completely islands. The circuitry connecting them is itself intricate. You may, for instance, be able to extract the visual cortex and get it to do some computer vision for you, but I doubt extracting a prefontal cortex will be useful without all the subsystems it depends on. More importantly, how to wire up new configurations (maybe you want to have a double prefrontal cortex: twice the cognitive power!) strikes me as a fundamentally difficulty problem. At that point you probably need to have some legitimate high level understanding of the components and their connective behaviors to succeed. To contrast, a vanilla emulation where you aren't modifying the architecture or performing virtual surgery requires no such high level understanding.

↑ comment by Peter_de_Blanc · 2010-03-01T07:14:40.755Z · LW(p) · GW(p)

How does a lobster simulation transcend?

Replies from: dclayh, gwern, Jack, Mitchell_Porter

↑ comment by dclayh · 2010-03-01T07:19:59.689Z · LW(p) · GW(p)

That sounds like a koan.

↑ comment by gwern · 2010-03-01T14:48:30.397Z · LW(p) · GW(p)

Clearly people in this thread are not Charles Stross fans.

Replies from: JenniferRM

↑ comment by JenniferRM · 2010-03-15T04:43:55.855Z · LW(p) · GW(p)

For those not getting this, the book Accelerando starts with the main character being called by something with a russian accent that claims to be a neuromorphic AI based off of lobsters grafted into some knowledge management. This AI (roughly "the lobsters") seeks a human who can help them "defect".

I recommend the book! The ideas aren't super deep in retrospect but its "near future" parts have one hilariously juxtaposed geeky allusion after another and the later parts are an interesting take on post-human politics and economics.

I assume the lobsters were chosen because of existing research in this area. For example, there are techniques for keeping bits alive in vitro, there is modeling work from the 1990's trying to reproduce known neural mechanisms in silico, and I remember (but couldn't find the link) that a team had some success around 2001(?) doing a moravec transfer to one or more cells in a lobster ganglia (minus the nanotech of course). There are lots of papers in this area. The ones I linked to were easy to find.

↑ comment by Jack · 2010-03-01T15:26:53.233Z · LW(p) · GW(p)

Melted butter.

↑ comment by Mitchell_Porter · 2010-03-01T07:40:33.879Z · LW(p) · GW(p)

Someone uses it to explore its own fitness landscape.

Replies from: cousin_it

↑ comment by cousin_it · 2010-03-01T13:43:30.410Z · LW(p) · GW(p)

Huh? Lobsters have been exploring their own fitness landscape for quite some time and haven't transcended yet. Evolution doesn't inevitably lead towards intelligence.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2010-03-02T01:57:33.329Z · LW(p) · GW(p)

I was way too obscure. I meant: turn it into a Godel machine by modifying the lobster program to explore and evaluate the space of altered lobster programs.

Replies from: cousin_it

↑ comment by cousin_it · 2010-03-08T19:07:07.428Z · LW(p) · GW(p)

Why do you need a lobster for that? You could start today with any old piece of open source code and any measure of "fitness" you like. People have tried to do this for awhile without much success.

↑ comment by AngryParsley · 2010-03-02T02:04:49.168Z · LW(p) · GW(p)

Let's start with something simple, like a lobster or a dog... oh wait, what if it transcends and isn't human-friendly.

Lobsters and dogs aren't general intelligences. A million years of dog-thoughts can't do the job of a few minutes of human-thoughts. Although a self-improving dog could be pretty friendly. Cats on the other hand... well that would be bad news. :)

what if our model of neural function is wrong and we create a sociopathic copy that isn't human-friendly.

I find that very unlikely. If you look at diseases or compounds that affect every neuron in the brain, they usually affect all cognitive abilities. Keeping intelligence while eliminating empathy would be pretty hard to do by accident, and if it did happen it would be easy to detect. Humans have experience detecting sociopathic tendencies in other humans. Unlike an AI, an upload can't easily understand its own code, so self-improving is going to be that much more difficult. It's not going to be some super-amazing thing that can immediately hack a human mind over a text terminal.

oh wait, what if one of our partial brain models transcends and isn't human-friendly.

That still seems unlikely. If you look at brains with certain parts missing or injured, you see that they are disabled in very specific ways. Take away just a tiny part of a brain and you'll end up with things like face blindness, Capgras delusion, or Anton-Babinski syndrome. By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2010-03-02T03:27:38.205Z · LW(p) · GW(p)

Lobsters and dogs aren't general intelligences.

So they won't transcend if we do nothing but run them in copies of their ancestral environments. But how likely is that? They will instead become tools in our software toolbox (see below).

Unlike an AI, an upload can't easily understand its own code, so self-improving is going to be that much more difficult.

The argument for uploads first is not that by uploading humans, we have solved the problem of Friendliness. The uploads still have to solve that problem. The argument is that the odds are better if the first human-level faster-than-human intelligences are copies of humans rather than nonhuman AIs.

But guaranteeing fidelity in your copy is itself a problem comparable to the problem of Friendliness. It would be incredibly easy for us to miss that (e.g.) a particular neuronal chemical response is of cognitive and not just physiological significance, leave it out of the uploading protocol, and thereby create "copies" which systematically deviate from human cognition in some way, whether subtle or blatant.

By only simulating individual parts of the brain, it becomes less likely that the upload will transcend.

The classic recipe for unsafe self-enhancing AI is that you assemble a collection of software tools, and use them to build better tools, and eventually you delegate even that tool-improving function. The significance of partial uploads is that they can give a big boost to this process.

↑ comment by Jordan · 2010-03-01T21:18:42.426Z · LW(p) · GW(p)

there will be others who aren't so cautious, and who want to conduct experiments like hotwiring together cognitive modules that are merely brain-inspired

This is why it's important that we have high fidelity simulations sooner rather than later, while the necessary hardware rests in the hands of the handful of institutions that can afford top tier supercomputers, rather than an idiot in a garage trying to build a better Roomba. There would be fewer players in the field, making the research easier to monitor, and, more importantly, it would be much more difficult to jerry rig a bunch of modules together. The more cumbersome the hardware the harder experimentation will be, making high fidelity copies more likely to provide computer intelligence before hotwired modules or neuromorphically inspired architectures.

comment by Mitchell_Porter · 2010-03-02T06:07:21.777Z · LW(p) · GW(p)

An important fact is that whether your aim is Friendly AI or mind uploading, either way, someone has to do neuroscience. As the author observes,

Such research [FAI] must not only reverse-engineer consciousness, but also human notions of morality.

In FAI strategy as currently conceived, the AI is the neuroscientist. Through a combination of empirical and deductive means, and with its actions bounded by some form of interim Friendliness (so it doesn't kill people or create conscious sim-people along the way), the AI figures out the human decision architecture, extrapolates our collective volition as it would pertain to its own actions, and implements that volition.

Now note that this is an agenda each step of which could be carried out by all-natural human beings. Human neuroscientists could understand the human decision process, discover our true values and their reflective equilibrium, and act in accordance with the idealized values. The SIAI model is simply one in which all these steps are carried out by an AI rather than by human beings. In principle, you could aim to leave the AI out of it until human beings had solved the CEV problem themselves; and only then would you set a self-enhancing FAI in motion, with the CEV solution coded in from the beginning.

Eliezer has written about the unreliability of human attempts to formulate morality in a set of principles, using just intuition. Thus instead we are to delegate this investigation to an AI-neuroscientist. But to feel secure that the AI-neuroscientist is indeed discovering the nature of morality, and not some other similar-but-crucially-different systemic property of human cognition, we need its investigate methodology (e.g. its epistemology and its interim ethics) to be reliable. So either way, at some point human judgment enters the picture. And by the Turing universality of computation, anything the AI can do, humans can do too. They might be a lot slower, they might have to do it very redundantly to do it with the same reliability, but it should be possible for mere humans to solve the problem of CEV exactly as we would wish a proto-FAI to do.

Since the path to human mind uploading has its own difficulties and hazards, and still leaves the problem of Friendly superintelligence unsolved, I suggest that people who are worried about leaving everything up to an AI think about how a purely human implementation of the CEV research program would work - one that was carried out solely by human beings, using only the sort of software we have now.

comment by zero_call · 2010-03-01T03:34:45.190Z · LW(p) · GW(p)

Why should an uploaded superintelligence based on a human copy be any innately safer than an artificial superintelligence? Just because humans are usually friendly doesn't mean a human AI would have to be friendly. This is especially true for a superintelligent human AI, which may not even be comparable to its original human template. Even the friendliest human might be angry and abusive when they're having a bad day.

Your idea that a WBE copy would be easier to undergo a relatively more enhanced supervised, safe growth, is basically an assumption. You would need to argue this in much more detail for it to merit deeper consideration.

Also, you cannot assume that an uploaded human superintelligence would be more constrained, as in "...after a best-effort psychiatric evaluation (for whatever good that might do) gives it Internet access". This is related to the the AI-box problem, where it is contended that a superintelligence could not be contained, no matter what. Personally I dispute this, but at least it's not something to be taken for granted.

Replies from: CarlShulman, NancyLebovitz

↑ comment by CarlShulman · 2010-03-01T05:03:32.568Z · LW(p) · GW(p)

WBE safety could benefit from an existing body of knowledge about human behavior and capabilities, and the spaghetti code of the brain could plausibly impose a higher barrier to rapid self-improvement. And institutions exploiting the cheap copyability of brain emulations could greatly help in stabilizing benevolent motivations.

WBE is a tiny region of the space of AI designs that we can imagine as plausible possibilities, and we have less uncertainty about it than about "whatever non-WBE AI technology comes first." Some architectures might be easier to make safe, and others harder, but if you are highly uncertain about non-WBE AI's properties then you need wide confidence intervals.

WBE also has the nice property that it is relatively all-or-nothing. With de novo AI, designers will be tempted to trade off design safety for speed, but for WBE a design that works at all will be relatively close to the desired motivations (there will still be tradeoffs with emulation brain damage, but the effect seems less severe than for de novo AI). Attempts to reduce WBE risk might just involve preparing analysis and institutions to manage WBE upon development, where AI safety would require control of the development process to avoid intrinsically unsafe designs.

Replies from: RobinHanson

↑ comment by RobinHanson · 2010-03-02T02:53:57.868Z · LW(p) · GW(p)

This is a good summary.

↑ comment by NancyLebovitz · 2010-03-08T14:38:50.622Z · LW(p) · GW(p)

At least we know what a friendly human being looks like.

And I wouldn't stop at a psychiatric evaluation of the person to be uploaded. I'd work on evaluating whether the potential uploadee was good for the people they associate with.

comment by timtyler · 2010-03-02T10:24:38.123Z · LW(p) · GW(p)

These "Whole Brain Emulation" discussions are surreal for me. I think someone needs to put forward the best case they can find that human brain emulations have much of a chance of coming before engineered machine intelligence.

The efforts in that direction I have witnessed so far seem feeble and difficult to take seriously - while the case that engineered machine intelligence will come first seems very powerful to me.

Without such a case, why spend so much time and energy on a discussion of what-if?

Replies from: FAWS, BenRayfield

↑ comment by FAWS · 2010-03-02T10:53:45.463Z · LW(p) · GW(p)

Personally I don't have a strong opinion on which will come first, both seem entirely plausible to me.

We have a much better idea how difficult WBE is than how difficult engineering a human level machine intelligence is. We don't even know for sure whether the latter is even possible for us (other than by pure trial and error).

There is a reasonably obvious path form where we are to WBE, while we aren't even sure how to go about learning how to engineer an intelligence, it's entirely possible that "start with studying WBEs in detail" is the best possible way.

There are currently a lot more people studying things that are required for WBE than there are people studying AGI, it's difficult to tell which other fields of study would benefit AGI more strongly than WBE.

↑ comment by BenRayfield · 2010-03-03T16:32:44.181Z · LW(p) · GW(p)

Why do you consider the possibility of smarter than Human AI at all? The difference between the AI we have now and that is bigger than the difference between those 2 technologies you are comparing.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T20:16:00.294Z · LW(p) · GW(p)

I don't understand why you are bothering asking your question - but to give a literal answer, my interest in synthesising intelligent agents is an offshoot of my interest in creating living things - which is an interest I have had for a long time and share with many others. Machine intelligence is obviously possible - assuming you have a materialist and naturalist world-view like mine.

Replies from: BenRayfield

↑ comment by BenRayfield · 2010-03-08T00:15:54.489Z · LW(p) · GW(p)

I think someone needs to put forward the best case they can find that human brain emulations have much of a chance of coming before engineered machine intelligence.

I misunderstood. I thought you were saying it was your goal to prove that instead of you thought it would not be proven. My question does not make sense.

Replies from: timtyler

↑ comment by timtyler · 2010-03-08T10:26:56.404Z · LW(p) · GW(p)

Thanks for clarifying!

comment by Epiphany · 2012-08-25T23:02:38.027Z · LW(p) · GW(p)

Humans misuse power. It doesn't seem to have occurred to you that humans with power frequently become corrupt. So, you want to emulate humans, in order to avoid corruption, when we know that power corrupts humans? Our brain structures have evolved for how many millions of years, all the while natural selection has been favoring those most efficient at obtaining and exploiting power whenever it has provided a reproduction advantage? I think we're better off with something man-made, not something that's optimized to do that! Downvote

Also, there is a point at which an intelligent enough person is unable to communicate meaningfully with others, let alone a super intelligent machine. There comes a point where your conceptual frameworks are so complex that nothing you say will be interpreted correctly without a huge amount of explanation, which your target audience does not have the attention span for. This happens on the human level, with IQ gaps of above 45 IQ points (ratio tests). Look up something called the "optimal IQ range". Want some evidence? Why do the presidents that we vote in have IQ's so (relatively) near to average when it would theoretically make more sense to vote in Einsteins with IQ's of 160? It's because Einsteins are too complicated. Most people don't have the stamina to do all the thinking required to understand all of their ideas enough to determine whether a political Einstein would be the better choice. The same problem will happen with AI. People won't understand an AI that smart because they won't be able to review it's reasoning. That means they won't trust it, are incapable of agreeing with it and therefore would not be likely to want to be led by it.

Brain emulation will do nothing to ensure any of the benefits you hope for.

comment by Strange7 · 2010-03-08T00:51:51.348Z · LW(p) · GW(p)

The only downside of this approach I can see is that an upload-triggered Unfriendly singularity may cause more suffering than an Unfriendly AI singularity; sociopaths may be presumed to have more interest in torture of people than a paperclip-optimizing AI would have.

What about those of us who would prefer indefinite human-directed torture to instantaneous cessation of existence? I have no personal plans to explore masochism in that sort of depth, particularly in a context without the generally-accepted safety measures, but it's not the worst thing I can imagine. I'd find ways to deal with it, in the same sense that if I were stranded on a desert island I would be more willing to gag down whatever noxious variety of canned fermented fish was available, and eventually learn to like it, rather than starve to death.

Replies from: JGWeissman, Mitchell_Porter

↑ comment by JGWeissman · 2010-03-08T01:11:50.826Z · LW(p) · GW(p)

I don't think you are appreciating the potential torture that could be inflicted by a superintelligence dedicated to advancing anti-fun theory. Such a thing would likely make your mind bigger at some optimal rate just so you could appreciate the stream of innovative varieties of enormous pain (not necessarily normal physical pain) it causes you.

↑ comment by Mitchell_Porter · 2010-03-08T01:04:37.381Z · LW(p) · GW(p)

it's not the worst thing I can imagine

You can't imagine torture that is worse than death?

Replies from: Strange7, RobinZ

↑ comment by Strange7 · 2010-03-08T08:44:09.986Z · LW(p) · GW(p)

By 'death' I assume you mean the usual process of organ failure, tissue necrosis, having what's left of me dressed up and put in a fancy box, followed by chemical preservation, decomposition, and/or cremation? Considering the long-term recovery prospects, no, I don't think I can imagine a form of torture worse than that, except perhaps dragging it out over a longer period of time or otherwise embellishing on it somehow.

This may be a simple matter of differing personal preferences. Could you please specify some form of torture, real or imagined, which you would consider worse than death?

Replies from: Mitchell_Porter, gregconen

↑ comment by Mitchell_Porter · 2010-03-08T09:49:47.000Z · LW(p) · GW(p)

Suppose I was tortured until I wanted to die. Would that count?

Replies from: Strange7

↑ comment by Strange7 · 2010-03-08T11:01:47.973Z · LW(p) · GW(p)

There have been people who wanted to die for one reason or another, or claimed to at the time with apparent sincerity, and yet went on to achieve useful or at least interesting things. The same cannot be said of those who actually did die.

Actual death constitutes a more lasting type of harm than anything I've heard described as torture.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2010-03-09T03:24:42.309Z · LW(p) · GW(p)

useful or at least interesting

There's a nihilism lurking here which seems at odds with your unconditional affirmation of life as better than death. You doubt that anything anyone has ever done was "useful"? How do you define useful?

Replies from: Strange7

↑ comment by Strange7 · 2010-03-09T22:48:35.469Z · LW(p) · GW(p)

Admittedly, my personal definition isn't particularly rigorous. An invention or achievement is useful if it makes other people more able to accomplish their existing goals, or maybe if it gives them something to do when they'd otherwise be bored. It's interesting (but not necessarily useful) if it makes people happy, is regarded as having artistic value, etc.

Relevant examples: Emperor Norton's peaceful dispersal of a race riot was useful. His proposal to construct a suspension bridge across San Francisco Bay would have been useful, had it been carried out. Sylvia Plath's work is less obviously useful, but definitely interesting.

↑ comment by gregconen · 2010-03-08T08:53:44.747Z · LW(p) · GW(p)

Most versions of torture, continued for your entire existence. You finally cease when you otherwise would (at the heat death of the universe, if nothing else), but your entire experience spent being tortured. The type isn't really important, at that point.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-08T09:20:23.615Z · LW(p) · GW(p)

First, the scenario you describe explicitly includes death, and as such falls under the 'embellishments' exception.

Second, thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring. As you said,

The type isn't really important, at that point.

Third, if I ever run out of other active goals to pursue, I could always fall back on "defeat/destroy the eternal tormetor of all mankind." Even with negligible chance of success, some genuinely heroic quest like that makes for a far better waste of my time and resources than, say, lottery tickets.

Replies from: Nick_Tarleton, JGWeissman

↑ comment by Nick_Tarleton · 2010-03-08T11:09:08.386Z · LW(p) · GW(p)

Second, thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring.

What if your hedonic treadmill were disabled, or bypassed by something like direct stimulation of your pain center?

Replies from: gregconen

↑ comment by gregconen · 2010-03-08T20:52:48.508Z · LW(p) · GW(p)

First, the scenario you describe explicitly includes death, and as such falls under the 'embellishments' exception.

You're going to die (or at least cease) eventually, unless our understanding of physics changes significantly. Eventually, you'll run out of negentropy to run your thoughts. My scenario only changes what happens between then and now.

Failing that, you can just be tortured eternally, with no chance of escape (no chance of escape is unphysical, but so is no chance of death). Even if the torture becomes boring (and there may be ways around that), an eternity of boredom, with no chance to succeed any at any goal, seems worse than death to me.

↑ comment by JGWeissman · 2010-03-08T23:54:05.603Z · LW(p) · GW(p)

and as such falls under the 'embellishments' exception.

When considering the potential harm you could suffer from a superintelligence that values harming you, you don't get to exclude some approaches it could take because they are too obvious. Superintelligences take obvious wins.

thanks to the hedonic treadmill, any randomly-selected form of torture repeated indefinitely would eventually become tolerable, then boring.

Perhaps. So consider other approaches the hostile superintelligence might take. It's not going to go easy on you.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-09T23:21:16.758Z · LW(p) · GW(p)

Yes, I've considered the possibility of things like inducement of anteriograde amnesia combined with application of procedure 110-Montauk, and done my best to consider nameless horrors beyond even that.

As I understand it, a superintelligence derived from a sadistic, sociopathic human upload would have some interest in me as a person capable of suffering, while a superintelligence with strictly artificial psychology and goals would more likely be interested in me as a potential resource, a poorly-defended pile of damp organic chemistry. Neither of those is anywhere near my ideal outcome, of course, but in the former, I'll almost certainly be kept alive for some perceptible length of time. As far as I'm concerned, while I'm dead, my utility function is stuck at 0, but while I'm alive my utility function is equal to or greater than zero.

Furthermore, even a nigh-omnipotent sociopath might be persuaded to torture on a strictly consensual basis by appealing to exploitable weaknesses in the legacy software. The same cannot be said of a superintelligence deliberately constructed without such security flaws, or one which wipes out humanity before it's flaws can be discovered.

Neither of these options is actually good, but the human-upload 'bad end' is at least, from my perspective, less bad. That's all I'm asserting.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-03-10T06:16:26.996Z · LW(p) · GW(p)

Yes, the superintelligence that takes an interest in harming you would have to come from some optimized process, like recursive self improvement of a psychopath upload.

A sufficient condition for the superintelligence to be indifferent to your well being, and see you as spare parts, is an under optimized utility function.

Your approach to predicting what the hostile superintelligence would do to you, seems to be figuring out the worst sort of torture that you can imagine. The problem with this is that the superintelligence is a lot smarter, and more creative than you. Reading your mind and making real you worst fears, constantly with no break or rest, isn't nearly as bad as what it would come up with. And no, you are not going to find some security flaw you can exploit to defeat it, or even slow it down. For one thing, the only way you will be able to think straight is if it determines that this maximizes the harm you experience. But the big reason is recursive self improvement. The superintelligence will analyze itself and fix security holes. You, puny mortal, will be up against a superintelligence. You will not win.

As far as I'm concerned, while I'm dead, my utility function is stuck at 0, but while I'm alive my utility function is equal to or greater than zero.

If you knew you were going to die tomorrow, would you now have a preference for what happens to the universe afterwards?

Replies from: Strange7

↑ comment by Strange7 · 2010-03-10T14:17:19.313Z · LW(p) · GW(p)

A superintelligence based on an uploaded human mind might retain exploits like 'pre-existing honorable agreements' or even 'mercy' because it considers them part of it's own essential personality. Recursive self-improvement doesn't just mean punching some magical enhance button exponentially fast.

If you knew you were going to die tomorrow,

My preferences would be less relevant, given the limited time and resources I'd have with which to act on them. They wouldn't be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-03-11T03:00:07.420Z · LW(p) · GW(p)

A superintelligence based on an uploaded human mind might retain exploits like 'pre-existing honorable agreements' or even 'mercy' because it considers them part of it's own essential personality.

If we are postulating a superintelligence that values harming you, let's really postulate that. In the early phases of recursive self improvement, it will figure out all the principles of rationality we have discussed here, including the representation of preferences as a utility function. It will self-modify to maximize a utility function that best represents its precursor conflicting desires, including hurting others and mercy. If it truly started as a psychopath, the desire to hurt others is going to dominate. As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.

Recursive self-improvement doesn't just mean punching some magical enhance button exponentially fast.

I was never suggesting it was anything magical. Software security, given physical security of the system, really is not that hard. The reason we have security holes in computer software today is that most programmers, and the people they work for, do not care about security. But a self improving intelligence will at some point learn to care about its software level security (as an instrumental value), and it will fix vulnerabilities in its next modification.

My preferences would be less relevant, given the limited time and resources I'd have with which to act on them. They wouldn't be significantly changed, though. I would, in short, want the universe to continue containing nice places for myself and those people I love to live in, and for as many of us as possible to continue living in such places. I would also hope that I was wrong about my own imminent demise, or at least the inevitability thereof.

Is it fair to say that you prefer A: you die tomorrow and the people you currently care about will continue to have worthwhile lives and survive to a positive singularity, to B: you die tomorrow and the people you currently care about also die tomorrow?

If yes, then "while I'm dead, my utility function is stuck at 0" is not a good representation of your preferences.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-11T04:05:38.765Z · LW(p) · GW(p)

As it becomes superintelligent, it will move away from having a conflicting sea of emotions that could be manipulated by someone at your level.

Conflicts will be resolved, yes, but preferences will remain. A fully self-consistent psychopath might still enjoy weeping more than screams, crunches more than spurts, and certain victim responses could still be mood-breaking. It wouldn't be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I'm still pretty sure I'd rather have that than not exist at all.

Is it fair to say that you prefer

For my preference to be meaningful, I have to be aware of the distinction. I'd certainly be happier during the last moments of my life with a stack of utilons wrapped up in the knowledge that those I love would do alright without me, but I would stop being happy about that when the parts of my brain that model future events and register satisfaction shut down for the last time and start to rot.

If cryostasis pans out, or, better yet, the positive singularity in scenario A includes reconstruction sufficient to work around the lack of it, there's some non-negligible chance that I (or something functionally indistinguishable from me) would stop being dead, in which case I pop back up to greater-than-zero utility. Shortly thereafter, I would get further positive utility as I find out about good stuff that happened while I was out.

Replies from: JGWeissman, RobinZ

↑ comment by JGWeissman · 2010-03-11T04:52:23.217Z · LW(p) · GW(p)

It wouldn't be a good life, of course, collaborating to turn myself into a better toy for a nigh-omnipotent monstrosity, but I'm still pretty sure I'd rather have that than not exist at all.

Again, its preferences are not going to be manipulated by someone at your level, even ignoring your reduced effectiveness from being constantly tortured. Whatever you think you can offer as part of a deal, it can unilaterally take from you. (And really, a psychopathic torturer does not want you to simulate its favorite reaction, it wants to find the specific torture that naturally causes you to react in its favorite way. It does not care about your cooperation at all.)

For my preference to be meaningful, I have to be aware of the distinction.

You seem to be confusing your utility with your calculation of your utility function. I expect that this confusion would cause you to wirehead, given the chance. Which of the following would you choose, if you had to choose between them:

Choice C: Your loved ones are separated from you but continue to live worthwhile lives. Meanwhile, you are given induce amnesia, and false memories of your loved ones dying.

Choice D: You are placed in a simulation separated from the rest of the world, and your loved ones are all killed. You are given induced amnesia, and believe yourself to be in the real world. You do not have detailed interactions with your loved ones (they are not simulated in such detail that they can be considered alive in the simulation), but you receive regular reports that they are doing well. These reports are false, but you believe them.

If cryostasis pans out...

In the scenario I described, death is actual death, after which you cannot be brought back. It is not what current legal and medical authorities falsely believe to be that state.

You should probably read about The Least Convenient Possible World.

Replies from: Strange7, Strange7

↑ comment by Strange7 · 2010-03-11T18:22:21.523Z · LW(p) · GW(p)

I've had a rather unsettling night's sleep, contemplating scenarios where I'm forced to choose between slight variations on violations of my body and mind, disconnect from reality, and loss of everyone I've ever loved. It was worth it, though, since I've come up with a less convenient version:

If choice D included, within the simulation, versions of my loved ones that were ultimately hollow, but convincing enough that I could be satisfied with them by choosing not to look too closely, and further if the VR included a society with complex, internally-consistent dynamics of a sort that are impossible in the real world but endlessly fascinating to me, and if in option C I would know that such a virtual world existed but be permanently denied access to it (in such a way that seemed consistent with the falsely-remembered death of my loved ones), that would make D quite a bit more tempting.

However, I would still chose the 'actual reality' option, because it has better long-term recovery prospects. In that situation, my loved ones aren't actually dead, so I've got some chance of reconnecting with them or benefiting by the indirect consequences of their actions; my map is broken, but I still have access to the territory, so it could eventually be repaired.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-03-11T19:17:49.372Z · LW(p) · GW(p)

Ok, that is a better effort to find a less convenient world, but you still seem to be avoiding the conflict between optimizing the actual state of reality and optimizing your perception of reality.

Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.

More generally, if you come up with some reason why optimizing your expected experience of your loved ones happens to produce the same result as optimizing the actual lives of your loved ones, despite the dilemma being constructed to introduce a disconnect between these concepts, then imagine that reason does not work. Imagine the dilemma is tightened to eliminate that reason. For purposes of this thought experiment, don't worry if this requires you to occupy some epistemic state that humans can not ordinarily achieve, or strange arbitrary powers for the agents forcing you to make this decision. Because planning a reaction for this absurd scenario is not the point. The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.

My own answer to this dilemma is options C, because then my loved ones are actually alive and well, full stop.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-11T20:22:51.959Z · LW(p) · GW(p)

Assume in Scenario C, you know you will never see your loved ones again, you will never realize that they are still alive.

Fair enough. I'd still pick C, since it also includes the options of finding someone else to be with, or somehow coming to terms with living alone.

The point is to figure out and compare to what extent your care about the actual state of the universe, and to what extent you care about your perceptions.

Thank you for clarifying that.

Most of all, I want to stay alive, or if that's not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.

If the 'survival' situation seems to be under control, my next priority is constructing predictive models. This requires sensory input and thought, preferably conscious thought. I'm not terribly picky about what sort of sensory input exactly, but more is better (so long as my ability to process it can keep up, of course).

After modeling it gets complicated. I want to be able to effect changes in my surroundings, but a hammer does me no good without the ability to predict that striking a nail will change the nail's position. If my perceptions are sufficiently disconnected from reality that the connection can never be reestablished, objective two is in an irretrievable failure state, and any higher goal is irrelevant.

That leaves survival. Neither C nor D explicitly threatens my own life, but with perma-death on the table, either of them might mean me expiring somewhere down the line. D explicitly involves my loved ones (all or at least most of whom are members of my species) being killed for arbitrary, nonrepeatable reasons, which constitutes a marginal reduction in genetic diversity without corresponding increase in fitness for any conceivable, let alone relevant, environment.

So, I suppose I would agree with you in choosing C primarily because it would leave my loved ones alive and well.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-03-11T21:10:43.160Z · LW(p) · GW(p)

Most of all, I want to stay alive, or if that's not possible, keep a viable breeding population of my species alive. I would be suspicious of anyone who claimed to be the result of an evolutionary process but did not value this.

Be careful about confusing evolution's purposes with the purposes of the product of evolution. Is mere species survival what you want, or what you predict you want, as a result of inheriting evolutions's values (which doesn't actually work that way)?

You are allowed to assign intrinsic, terminal value to your loved ones' well being, and to choose option C because it better achieves that terminal value, without having to justify it further by appeals to inclusive genetic fitness. Knowing this, do you still say you are choosing C because of a small difference in genetic diversity?

But, getting back to the reason I presented the dilemma, it seems that you do in fact have preferences over what happens after you die, and so your utility function, representing your preferences over possible futures that you would now attempt to bring about, cannot be uniformly 0 in the cases where you are dead.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-11T22:01:06.390Z · LW(p) · GW(p)

I am not claiming to have inherited anything from evolution itself. The blind idiot god has no DNA of it's own, nor could it have preached to a younger, impressionable me. I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it's a fountain for so much of the stuff I instinctively value.

Part of objective two is modeling my own probable responses, so an equally-accurate model of my preferences with lower Kolmogorov complexity has intrinsic value as well. Of course, I can't be totally sure that it's accurate, but that particular hasn't let me down so far, and if it did (and I survived) I would replace it with one that better fit the data.

If my species survives, there's some possibility that my utility function, or one sufficiently similar as to be practically indistinguishable, will be re-instantiated at some point. Even without resurrection, cryostasis, or some other clear continuity, enough recombinant exploration of the finite solution-space for 'members of my species' will eventually result in repeats. Admittedly, the chance is slim, which is why I overwhelmingly prefer the more direct solution of immortality through not dying.

In short, yes, I've thought this through and I'm pretty sure. Why do you find that so hard to believe?

Replies from: orthonormal, JGWeissman

↑ comment by orthonormal · 2010-03-12T01:58:05.446Z · LW(p) · GW(p)

The entire post above is actually a statement that you value the survival of our species instrumentally, not intrinsically. If it were an intrinsic value for you, then contemplating any future in which humanity becomes smarter and happier and eventually leaves behind the old bug-riddled bodies we started with, should fill you with indescribable horror. And in my experience, very few people feel that way, and many of those who do (i.e. Leon Kass) do so as an outgrowth of a really strong signaling process.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-12T04:00:38.928Z · LW(p) · GW(p)

I don't object to biological augmentations, and I'm particularly fond of the idea of radical life-extension. Having our bodies tweaked, new features added and old bugs patched, that would be fine by me. Kidneys that don't produce stones, but otherwise meet or exceed the original spec? Sign me up!

If some sort of posthumans emerged and decided to take care of humans in a manner analogous to present-day humans taking care of chimps in zoos, that might be weird, but having someone incomprehensibly intelligent and powerful looking out for my interests would be preferable to a poke in the eye with a sharp stick.

If, on the other hand, a posthuman appears as a wheel of fire, explains that it's smarter and happier than I can possibly imagine and further that any demographic which could produce individuals psychologically equivalent to me is a waste of valuable mass, so I need to be disassembled now, that's where the indescribable horror kicks in. Under those circumstances, I would do everything I could do to keep being, or set up some possibility of coming back, and it wouldn't be enough.

You're right. Describing that value as intrinsic was an error in terminology on my part.

↑ comment by JGWeissman · 2010-03-12T02:35:17.605Z · LW(p) · GW(p)

I decided to value the survival of my species, assigned intrinsic, terminal value to it, because it's a fountain for so much of the stuff I instinctively value.

Right, because if you forgot everything else that you value, you would be able to rederive that you are an agent as described in Thou Art Godshatter:

Such agents would have sex only as a means of reproduction, and wouldn't bother with sex that involved birth control. They could eat food out of an explicitly reasoned belief that food was necessary to reproduce, not because they liked the taste, and so they wouldn't eat candy if it became detrimental to survival or reproduction. Post-menopausal women would babysit grandchildren until they became sick enough to be a net drain on resources, and would then commit suicide.

Or maybe not. See, the value of a theory is not just what can explain, but what it can't explain. It is not enough that your fountain generates your values, it also must not generate any other values.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-12T03:35:18.483Z · LW(p) · GW(p)

Did you miss the part where I said that the value I place on the survival of my species is secondary to my own personal survival?

I recognize that, for example, nonreproductive sex has emotional consequences and social implications. Participation in a larger social network provides me with access to resources of life-or-death importance (including, but certainly not limited to, modern medical care) that I would be unable to maintain, let alone create, on my own. Optimal participation in that social network seems to require at least one 'intimate' relationship, to which nonreproductive sex can contribute.

As for what my theory can't explain: If I ever take up alcohol use for social or recreational purposes, that would be very surprising; social is subsidiary to survival, and fun is something I have when I know what's going on. Likewise, it would be a big surprise if I ever attempt suicide. I've considered possible techniques, but only as an academic exercise, optimized to show the subject what a bad idea it is while there's still time to back out. I can imagine circumstances under which I would endanger my own health, or even life, to save others, but I wouldn't do so lightly. It would most likely be part of a calculated gambit to accept a relatively small but impressive-looking immediate risk in exchange for social capital necessary to escape larger long-term risks. The idea of deliberately distorting my own senses and/or cognition is bizarre; I can accept other people doing so, provided they don't hurt me or my interests in the process, but I wouldn't do it myself. Taking something like caffeine or Provigil for the cognitive benefits would seem downright Faustian, and I have a hard time imagining myself accepting LSD unless someone was literally holding a gun to my head. I could go on.

↑ comment by Strange7 · 2010-03-11T06:21:34.261Z · LW(p) · GW(p)

My first instinct is that I would take C over D, on the grounds that if I think they're dead, I'll eventually be able to move on, whereas vague but somehow persuasive reports that they're alive and well but out of my reach would constitute a slow and inescapable form of torture that I'm altogether too familiar with already. Besides, until the amnesia sets in I'd be happy for them.

Complications? Well, there's more than just warm fuzzies I get from being near these people. I've got plans, and honorable obligations which would cost me utility to violate. But, dammit, permanent separation means breaking those promises - for real and in my own mind - no matter which option I take, so that changes nothing. Further efforts to extract the intended distinction are equally fruitless.

I don't think I would wirehead, since that would de-instantiate my current utility function just as surely as death would. On the contrary, I scrupulously avoid mind-altering drugs, including painkillers, unless the alternative is incapacitation.

Think about it this way: if my utility function isn't instantiated at any given time, why should it be given special treatment over any other possible but nonexistent utility function? Should the (slightly different) utility function I had a year ago be able to dictate my actions today, beyond the degree to which it influenced my environment and ongoing personal development?

If something was hidden from me, even something big (like being trapped in a virtual world), and hidden so thoroughly that I never suspected it enough for the suspicion to alter my actions in any measurable way, I wouldn't care, because there would be no me which knew well enough to be able to care. Ideally, yes, the me that can see such hypotheticals from outside would prefer a map to match the territory, but at some point that meta-desire has to give way to practical concerns.

↑ comment by RobinZ · 2010-03-11T04:15:42.776Z · LW(p) · GW(p)

For my preference to be meaningful, I have to be aware of the distinction.

You're aware of the distinction right now - would you be willing to act right now in a way which doesn't affect the world in any major way during your lifetime, but which makes a big change after you die?

Edit: It seems to me as if you noted the fact that your utility function is no longer instantiated after you die, and confused that with the question of whether anything after your death matters to you now.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-11T04:52:24.001Z · LW(p) · GW(p)

Would you be willing to act right now in a way which doesn't affect the world in any major way during your lifetime, but which makes a big change after you die?

Of course I would. Why does a difference have to be "major" before I have permission to care? A penny isn't much money, but I'll still take the time to pick one up, if I see it on the floor and can do so conveniently. A moth isn't much intelligence, or even much biomass, but if I see some poor thing thrashing, trapped in a puddle, I'll gladly mount a fingertip-based rescue mission unless I'd significantly endanger my own interests by doing so.

Anything outside the light cone of my conscious mind is none of my business. That still leaves a lot of things I might be justifiably interested in.

Replies from: RobinZ

↑ comment by RobinZ · 2010-03-11T04:58:53.199Z · LW(p) · GW(p)

My point didn't relate to "major" - I wanted to point out that you care about what happens after you die, and therefore that your utility function is not uniformly 0 after you die. Yes, your utility function is no longer implemented by anything in the universe after you die - you aren't there to care in person - but the function you implement now has terms for times after your death - you care now.

Replies from: Strange7

↑ comment by Strange7 · 2010-03-11T07:37:50.755Z · LW(p) · GW(p)

I would agree that I care now about things which have obvious implications for what will happen later, and that I would not care, or care very differently, about otherwise-similar things that lacked equivalent implications.

Beyond that, since my utility function can neither be observed directly, nor measured in any meaningful sense when I'm not alive to act on it, this is a distinction without a difference.

↑ comment by RobinZ · 2010-03-08T01:17:25.929Z · LW(p) · GW(p)

It is truly astonishing how much pain someone can learn to bear - AdeleneDawner posted some relevant links a while ago.

Edit: I wasn't considering an anti-fun agent, however - just plain vanilla suffering.

comment by AngryParsley · 2010-03-01T03:32:40.455Z · LW(p) · GW(p)

I agree with a lot of your points about the advantages of WBE vs friendly AI. That said, look at the margins. Quite a few people are already working on WBE. Not very many people are working on friendly AI. Taking this into consideration, I think an extra dollar is better spent on FAI research than WBE research.

Also, a world of uploads without FAI would probably not preserve human values for long. The uploads that changed themselves in such a way to grow faster (convert the most resources or make the most copies of themselves) would replace uploads that preserved human values. For example, an upload could probably make more copies of itself it if deleted its capacities for humor and empathy.

We already have a great many relatively stable and sane intelligences.

I don't think any human being is stable or sane in the way FAI would be stable and sane.

Replies from: Nick_Tarleton, RobinHanson, Jordan, pjeby

↑ comment by Nick_Tarleton · 2010-03-01T05:19:49.847Z · LW(p) · GW(p)

Quite a few people are already working on WBE. Not very many people are working on friendly AI. Taking this into consideration, I think an extra dollar is better spent on FAI research than WBE research.

This is true for the general categories "FAI research" and "WBE research", but very few of those WBE research dollars are going to studies of safety and policy, such as SIAI does, or (I assume) to projects that take safety and policy at all seriously.

Replies from: CarlShulman

↑ comment by CarlShulman · 2010-03-01T05:29:01.655Z · LW(p) · GW(p)

Really, it's an empty field with the exceptions of the FHI-SIAI axis and a few closely connected people. Well, there are folk who have discussed questions like "are emulations persons?" But policy/x-risk/economic impact has been very limited.

↑ comment by RobinHanson · 2010-03-02T02:56:44.093Z · LW(p) · GW(p)

I don't think it is at all obvious that "an upload could probably make more copies of itself it if deleted its capacities for humor and empathy." You seem to assume that those features do not serve important functions in current human minds.

Replies from: AngryParsley

↑ comment by AngryParsley · 2010-03-02T06:19:26.982Z · LW(p) · GW(p)

Yeah, my example was rather weak. I think humor and empathy are important in current human minds, but uploads could modify their minds much more powerfully and accurately than we can today. Also, uploads would exist in a very different environment from ours. I don't think current human minds or values would be well-adapted to that environment.

More successful uploads would be those who modified themselves to make more copies or consume/takeover more resources. As they evolved, their values would drift and they would care less about the things we care about. Eventually, they'd be come unfriendly.

Replies from: RobinHanson

↑ comment by RobinHanson · 2010-03-02T13:29:59.976Z · LW(p) · GW(p)

Why must value drift eventually make unfriendly values? Do you just define "friendly" values as close values?

Replies from: AngryParsley, Vladimir_Nesov

↑ comment by AngryParsley · 2010-03-02T13:36:58.864Z · LW(p) · GW(p)

Basically, yes. If values are different enough between two species/minds/groups/whatever, then both see the other as resources that could be reorganized into more valuable structures.

To borrow an UFAI example: An upload might not hate you, but your atoms could be reorganized into computronium running thousands of upload copies/children.

↑ comment by Vladimir_Nesov · 2010-03-02T14:25:54.241Z · LW(p) · GW(p)

"Friendly" values simply means our values (or very close to them -- closer than the value spread among us). Preservation of preference means that the agency of far future will prefer (and do) the kinds of things that we would currently prefer to be done in the far future (on reflection, if we knew more, given the specific situation in the future, etc.). In other words, value drift is absence of reflective consistency, and Friendliness is reflective consistency in following our preference. Value drift results in the far future agency having preference very different from ours, and so not doing the things we'd prefer to be done. This turns the far future into the moral wasteland, from the point of view of our preference, little different from what would remain after unleashing a paperclip maximizer or exterminating all life and mind.

(Standard disclaimer: values/preference have little to do with apparent wants or likes.)

↑ comment by Jordan · 2010-03-01T22:00:36.159Z · LW(p) · GW(p)

Not very many people are working on friendly AI. Taking this into consideration, I think an extra dollar is better spent on FAI research than WBE research.

This doesn't follow. It's not clear at all that there is sufficient investment in WBE that substantial diminishing returns have kicked in at the margins.

Replies from: AngryParsley

↑ comment by AngryParsley · 2010-03-02T02:28:01.191Z · LW(p) · GW(p)

I didn't say money spent on WBE research suffered from diminishing returns. I said that $X spent on FAI research probably has more benefit than $X spent on WBE research.

This is because the amount of money spent on WBE is much much greater than that spent on FAI. The Blue Brain Project has funding from Switzerland, Spain, and IBM among others. Just that one project probably has an order of magnitude more money than the whole FAI field. Unless you think WBE offers an order of magnitude greater benefit than FAI, you should favor spending more on FAI.

Replies from: Jordan

↑ comment by Jordan · 2010-03-02T04:03:38.526Z · LW(p) · GW(p)

Unless you think WBE offers an order of magnitude greater benefit than FAI, you should favor spending more on FAI.

No, all that matters is that the increase in utility by increasing WBE funding is greater than an increase in utility by increasing FAI funding. If neither has hit diminishing returns then the amount of current funding is irrelevant to this calculation.

↑ comment by pjeby · 2010-03-01T04:02:03.132Z · LW(p) · GW(p)

For example, an upload could probably make more copies of itself it if deleted its capacities for humor and empathy.

If you were an upload, would you make copies of yourself? Where's the fun in that? The only reason I could see doing it is if I wanted to amass knowledge or do a lot of tasks... and if I did that, I'd want the copies to get merged back into a single "me" so I would have the knowledge and experiences. (Okay, and maybe some backups would be good to have around). But why worry about how many copies you could make? That sounds suspiciously Clippy-like to me.

In any case, I think we'd be more likely to be screwed over by uploads' human qualities and biases, than by a hypothetical desire to become less human.

Replies from: Nick_Tarleton, wedrifid, JamesAndrix, gwern

↑ comment by Nick_Tarleton · 2010-03-01T05:12:45.139Z · LW(p) · GW(p)

In a world of uploads which contains some that do want to copy themselves, selection obviously favors the replicators, with tragic results absent a singleton.

Replies from: CarlShulman

↑ comment by CarlShulman · 2010-03-01T12:47:01.556Z · LW(p) · GW(p)

Note that emulations can enable the creation of a singleton, it doesn't necessarily have to exist in advance.

Replies from: AngryParsley

↑ comment by AngryParsley · 2010-03-02T02:32:20.314Z · LW(p) · GW(p)

Yes, but that's only likely if the first uploads are FAI researchers.

↑ comment by wedrifid · 2010-03-02T03:55:28.576Z · LW(p) · GW(p)

If you were an upload, would you make copies of yourself?

Yes. I'd make as many copies as was optimal for maximising my own power. I would then endeavor to gain dominance over civilisation, probably by joining a coalition of some sort. This may include creating an FAI that could self improve more effectively than I and serve to further my ends. When a stable equilibrium was reached and it was safe to do so I would go back to following this:

Where's the fun in that? The only reason I could see doing it is if I wanted to amass knowledge or do a lot of tasks... and if I did that, I'd want the copies to get merged back into a single "me" so I would have the knowledge and experiences.

If right now is the final minutes of the game then early in a WBE era is the penalty shootouts. You don't mess around having fun till you and those that you care about are going to live to see tomorrow.

↑ comment by JamesAndrix · 2010-03-01T04:29:20.830Z · LW(p) · GW(p)

If you were an upload, would you make copies of yourself? Where's the fun in that?

You have a moral obligation to do it

Working in concert, thousands of you could save all the orphans from all the fires, and then go on to right a great many wrongs. You have many many good reasons to gain power.

So unless you're very aware that you will gain power and then abuse power, you will take steps to gain power.

Even from a purely selfish perspective: If 10,000 of you could take over the world and become an elite of 10,000, that's probably better than your current rank.

Replies from: inklesspen, FAWS

↑ comment by inklesspen · 2010-03-01T04:39:11.855Z · LW(p) · GW(p)

We've evolved something called "morality" that helps protect us from abuses of power like that. I believe Eliezer expressed it as something that tells you that even if you think it would be right (because of your superior ability) to murder the chief and take over the tribe, it still is not right to murder the chief and take over the tribe.

We do still have problems with abuses of power, but I think we have well-developed ways of spotting this and stopping it.

Replies from: JamesAndrix

↑ comment by JamesAndrix · 2010-03-01T06:35:18.227Z · LW(p) · GW(p)

I believe Eliezer expressed it as something that tells you that even if you think it would be right (because of your superior ability) to murder the chief and take over the tribe, it still is not right to murder the chief and take over the tribe.

That's exactly the high awareness I was talking about, and most people don't have it. I wouldn't be surprised if most people here failed at it, if it presented itself in their real lives.

I mean, are you saying you wouldn't save the burning orphans?

We do still have problems with abuses of power, but I think we have well-developed ways of spotting this and stopping it.

We have checks and balances of political power, but that works between entities on roughly equal political footing, and doesn't do much for those outside of that process. We can collectively use physical power to control some criminals who abuse their own limited powers. But we don't have anything to deal with supervillains.

There is fundamentally no check on violence except more violence, and 10,000 accelerated uploads could quickly become able to win a war against the rest of the world.

Replies from: BenRayfield

↑ comment by BenRayfield · 2010-03-03T16:29:03.701Z · LW(p) · GW(p)

It is the fashion in some circles to promote funding for Friendly AI research as a guard against the existential threat of Unfriendly AI. While this is an admirable goal, the path to Whole Brain Emulation is in many respects more straightforward and presents fewer risks.

I believe Eliezer expressed it as something that tells you that even if you think it would be right (because of your superior ability) to murder the chief and take over the tribe, it still is not right to murder the chief and take over the tribe.

That's exactly the high awareness I was talking about, and most people don't have it. I wouldn't be surprised if most people here failed at it, if it presented itself in their real lives.

Most people would not act like a Friendly AI therefore "Whole Brain Emulation" only leads to "fewer risks" if you know exactly which brains to emulate and have the ability to choose which brain(s).

If whole brain emulation (for your specific brain) its expensive, it might result in the brain being from a person who starts wars and steals from other countries, so he can get rich.

Most people prefer that 999 people from their country should live at the cost of 1000 people of another country would die, given no other known differences between those 1999 people. Also unlike a "Friendly AI", their choices are not consistent. Most people will leave the choice at whatever was going to happen if they did not choose, even if they know there are no other effects (like jail) from choosing. If the 1000 people were going to die, unknown to any of them, to save 999, then most people would think "Its none of my business, maybe god wants it to be that way" and let the extra 1 person die. A "Friendly AI" would maximize lives saved if nothing else is known about all those people.

There are many examples why most people are not close to acting like a "Friendly AI" even if we removed all the bad influences on them. We should build a software to be a "Friendly AI" instead of emulating brains and only emulate brains for different reasons, except maybe the few brains that think like a "Friendly AI". Its probably safer to do it completely in software.

Replies from: JamesAndrix

↑ comment by JamesAndrix · 2010-03-03T18:28:08.936Z · LW(p) · GW(p)

Most people would not act like a Friendly AI therefore "Whole Brain Emulation" only leads to "fewer risks" if you know exactly which brains to emulate and have the ability to choose which brain(s).

I agree entirely that humans are not friendly. Whole brain emulation is humanity-safe if there's never a point at which one person or small group and run much faster than the rest of humanity (including other uploads) The uploads may outpace us, but if they can keep each other in check, then uploading is not the same kind of human-values threat.

Even an upload singleton is not a total loss if the uploads have somewhat benign values. It is a crippling of the future, not an erasure.

↑ comment by FAWS · 2010-03-01T04:36:21.718Z · LW(p) · GW(p)

It's probably easier to cooperate with copies of yourself than with other people, but you also stand to gain less as all of you start out with the same skill set and the same talents.

↑ comment by gwern · 2010-03-01T14:41:38.945Z · LW(p) · GW(p)

But why worry about how many copies you could make? That sounds suspiciously Clippy-like to me.

This is, I think, an echo of Robin Hanson's 'crack of a future dawn', where hyper-Darwinian pressures to multiply cause the discarding of unuseful mental modules like humor or empathy which take up space.

Replies from: RobinHanson

↑ comment by RobinHanson · 2010-03-02T02:58:03.014Z · LW(p) · GW(p)

Where do you get the idea that humor or empathy are not useful mental abiliites?!

Replies from: gwern

↑ comment by gwern · 2010-03-02T13:56:16.594Z · LW(p) · GW(p)

From AngryParsley...

comment by timtyler · 2010-03-02T00:53:01.099Z · LW(p) · GW(p)

Whole Brain Emulation will likely come long after engineered artificial intelligence arrives. Why pump money into Whole Brain Emulation projects? They will still come too late - even with more funding. I figure it's like throwing money down the drain.

Replies from: CarlShulman

↑ comment by CarlShulman · 2010-03-02T22:06:23.093Z · LW(p) · GW(p)

Tim, talking about what will happen in your personal estimate without arguments, or engagement with the question of marginal impact of money (which is what matters for allocating money) is not very helpful. Would you assign more than 90% confidence to WBE coming after engineered AI, even given disagreement from smart folk as Robin Hanson and the difficulty of predicting basic science advances in AI? Many here would think that level of confidence excessive without strong arguments, and without very high confidence the question of marginal impact remains a live one. You might have linked to your essay on the subject, at least.

Replies from: timtyler

↑ comment by timtyler · 2010-03-02T22:54:04.664Z · LW(p) · GW(p)

Carl, I was asking a question.

Re: "more than 90% confidence to WBE coming after engineered AI"

...yes, sure.

Re: "even given disagreement from smart folk as Robin Hanson"

I have great respect for Robin Hanson's views on many topics. This is one of the areas where I think he holds bizarre views, though. There are a few other such areas as well.

Re: "Without strong arguments"

Why do you think there aren't strong arguments? Or do you mean without me listing them in my post? I gave a number of arguments here:

http://alife.co.uk/essays/against_whole_brain_emulation/

Replies from: FAWS

↑ comment by FAWS · 2010-03-02T23:31:21.085Z · LW(p) · GW(p)

There is not even a single strong argument for WBE coming after engineered AI there, though. Arguments why WBE is hard in an absolute sense, why it's less desirable in some ways, and why building an engineered AI is likely to be easier, but no one is really disputing any of those things. The main argument for WBE is that it circumvents the need to understand intelligence well enough to build it, so what you would need to do is show that WBE is more difficult than that understanding, and you fail to even address that anywhere, let alone make a strong argument for it.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T00:34:28.610Z · LW(p) · GW(p)

If building an engineered intelligent agent is likely to be easier, then it is also likely to come first - since easy things can be done before the skills required to do hard things are mastered.

I am sorry to hear you didn't like my reasons. I can't articulate all my reasoning in a short space of time - but those are some of the basics: the relative uselessness of bioinspiration in practice, the ineffectual nature of WBE under construction, and the idea that the complexity of the brain comes mostly from the environment and from within individual cells (i.e. their genome).

Replies from: FAWS

↑ comment by FAWS · 2010-03-03T00:57:43.830Z · LW(p) · GW(p)

If building an engineered intelligent agent is likely to be easier, then it is also likely to come first - since easy things can be done before the skills required to do hard things are mastered.

Way to miss my point. Building it might be easier, but that only matters if understanding what to build is easy enough. An abacus is easier to build than stonehenge. Doesn't mean it came first.

I am sorry to hear you didn't like my reasons. I can't articulate all my reasoning in a short space of time - but those are some of the basics: the relative uselessness of bioinspiration in practice, the ineffectual nature of WBE under construction, and the idea that the complexity of the brain comes mostly from the environment and from within individual cells (i.e. their genome).

I read your essay already. You still don't say anything about the difficulty of WBE vs understanding intelligence in detail.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T10:20:55.260Z · LW(p) · GW(p)

Well, that's just a way of restating the topic.

One of my approaches is not to try directly quantifying the difficulty of the two tasks, but rather to compare with other engineering feats. To predict that engineered flight solutions would beat scanning a bird one does not have to quantify the two speculative timelines. One simply observes that engineering almost always beats wholescale scanning.

Flight has some abstract principles that don't depend on all the messy biological details of cells, bones and feathers. It will - pretty obviously IMO - be much the same for machine intelligence. We have a pretty good idea of what some of those abstract principles are. One is compression. If we had good stream compressors we would be able to predict the future consequences of actions - a key ability in shaping the future. You don't need to scan a brain to build a compressor. That is a silly approach to the problem that pushes the solution many decades into the future. Compression is "just" another computer science problem - much like searching or sorting.

IMO, the appeal of WBE does not have to do with technical difficulty. Technically, the idea is dead in the water. It is to do with things like the topic of this thread - a desire to see a human future. Wishful thinking, in other words.

Wishful thinking is not necessarily bad. It can sometimes help create the desired future. However, probably not in this case - reality is just too heavily stacked against the idea.

Replies from: SilasBarta, FAWS

↑ comment by SilasBarta · 2010-03-03T18:04:31.467Z · LW(p) · GW(p)

Flight has some abstract principles that don't depend on all the messy biological details of cells, bones and feathers. It will - pretty obviously IMO - be much the same for machine intelligence.

I disagree that it is so obvious. Much of what we call "intelligence" in humans and other animals is actually tacit knowledge about a specific environment. This knowledge gradually accumulated over billions of years, and it works due to immodular systems that improved stepwise and had to retain relevant functionality at each step.

This is why you barely think about bipedal walking, and discovered it on your own, but even now, very few people can explain how it works. It's also why learning, for humans, largely consists of reducing a problem into something for which we have native hardware.

So intelligence, if it means successful, purposeful manipulation of the environment, does rely heavily on the particulars of our bodies, in a way that powered flight does not.

If we had good stream compressors we would be able to predict the future consequences of actions - a key ability in shaping the future. You don't need to scan a brain to build a compressor. That is a silly approach to the problem that pushes the solution many decades into the future. Compression is "just" another computer science problem - much like searching or sorting.

Yes, it's another CS problem, but not like searching or sorting. Those are computable, while (general) compression isn't. Not surprisingly, the optimal intelligence Hutter presents is uncomputable, as is every other method presented in every research paper that purports to be a general intelligence.

Now, you can make approximations to the ideal, perfect compressor, but that inevitably requires making decisions about what parts of the search space can be ignored at low enough cost -- which itself requires insight into the structure of the search space, the very thing you were supposed to be automating!

Attempts to reduce intelligence to comression butt up against the same limits that compression does: you can be good at compressing some kinds of data, only if you sacrifice ability to compress other kinds of data.

With that said, if you can make a computable, general compressor that identifies regularities in the environment many orders of magnitude faster than evolution, then you will have made some progress.

Replies from: timtyler, timtyler, timtyler

↑ comment by timtyler · 2010-03-03T20:41:44.834Z · LW(p) · GW(p)

Re: "So intelligence, if it means successful, purposeful manipulation of the environment, does rely heavily on the particulars of our bodies, in a way that powered flight does not."

Natural selection shaped wings for roughly as long as it has shaped brains. They too are an accumulated product of millions of years of ancestral success stories. Information about both is transmitted via the genome. If there is a point of dis-analogy here between wings and brains, it is not obvious.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T21:45:48.082Z · LW(p) · GW(p)

Okay, let me explain it this way: when people refer to intelligence, a large part of what they have in mind is the knowedge that we (tacitly) have about a specific environment. Therefore, our bodies are highly informative about a large part (though certainly not the entirety!) of what is meant by intelligence.

In contrast, the only commonality with birds that is desired in the goal "powered human flight" is ... the flight thing. Birds have a solution, but they do not define the solution.

In both cases, I agree, the solution afforded by the biological system (bird or human) is not strictly necessary for the goal (flight or intelligence). And I agree that once certain insights are achieved (the workings of aerodynamic lift or the tacit knowledge humans have [such as the assumptions used in interpreting retinal images]), they can be implemented differently from how the biological system does it.

However, for a robot to match the utility of a human e.g. butler, it must know things specific to humans (like what the meanings of words are, given a particular social context), not just intelligence-related things in general, like how to infer causal maps from raw data.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T21:56:13.130Z · LW(p) · GW(p)

FWIW, I'm thinking of intelligence this way:

“Intelligence measures an agent’s ability to achieve goals in a wide range of environments."

http://www.vetta.org/definitions-of-intelligence/

Nothing to do with humans, really.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T22:03:40.986Z · LW(p) · GW(p)

Then why should I care about intelligence by that definition? I want something that performs well in environments humans will want it to perform well in. That's a tiny, tiny fraction of the set of all computable environments.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:28:26.754Z · LW(p) · GW(p)

A universal intelligent agent should also perform very well in many real world environments. That is part the beauty of the idea of universal intelligence. A powerful universal intelligence can be reasonably expected to invent nanotechnology, fusion, cure cancer, and generally solve many of the world's problems.

Replies from: SilasBarta, SilasBarta

↑ comment by SilasBarta · 2010-03-03T22:31:59.611Z · LW(p) · GW(p)

Oracles for uncomputable problems tend to be like that...

↑ comment by SilasBarta · 2010-03-03T22:35:16.022Z · LW(p) · GW(p)

Also, my point is that, yes, something impossibly good could do that. And that would be good. But performing well across all computable universes (with a sorta-short description, etc.) has costs, and one cost is optimality in this universe.

Since we have to choose, I want it optimal for this universe, for purposes we deem good.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:47:32.652Z · LW(p) · GW(p)

A general agent is often sub-optimal on particular problems. However, it should be able to pick them up pretty quick. Plus, it is a general agent, with all kinds of uses.

A lot of people are interested in building generally intelligent agents. We ourselves are highly general agents - i.e. you can pay us to solve an enormous range of different problems.

Generality of intelligence does not imply lack-of-adaptedness to some particular environment. What it means is more that it can potentially handle a broad range of problems. Specialized agents - on the other hand - fail completely on problems outside their domain.

↑ comment by timtyler · 2010-03-03T20:46:14.333Z · LW(p) · GW(p)

Re: "Attempts to reduce intelligence to comression butt up against the same limits that compression does: you can be good at compressing some kinds of data, only if you sacrifice ability to compress other kinds of data."

That is not a meaningful limitation. There are general purpose universal compressors. It is part of the structure of reality that sequences generated by short programs are more commonly observed. That's part of the point of using a compressor - it is an automated way of applying Occam's razor.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T21:29:00.857Z · LW(p) · GW(p)

That is not a meaningful limitation. There are general purpose universal compressors.

There are frequently useful general purpose compressors that work by anticipating the most common regularities in the set of files typically generated by humans. But they do not, and cannot, iterate through all the short programs that could have generated the data -- it's too time-consuming.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T21:40:57.874Z · LW(p) · GW(p)

The point was that general purpose compression is possible. Yes, you sacrifice the ability to compress other kinds of data - but those other kinds of data are highly incompressible and close to random - not the kind of data which most intelligent agents are interested in finding patterns in in the first place.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T22:00:53.672Z · LW(p) · GW(p)

Yes, you sacrifice the ability to compress other kinds of data - but those other kinds of data are highly incompressible and close to random.

No, they look random and incompressible because effective compression algorithms optimized for this universe can't compress them. But algorithms optimized for other computable universes may regard them as normal and have a good way to compress them.

Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there's the rub.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:16:18.865Z · LW(p) · GW(p)

Re: "they look random and incompressible because effective compression algorithms optimized for this universe can't compress them"

Compressing sequences from this universe is good enough for me.

Re: "Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there's the rub."

Not really - there are well-known results about that - see:

http://en.wikipedia.org/wiki/Occam's_razor

http://www.wisegeek.com/what-is-solomonoff-induction.htm

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T22:20:31.563Z · LW(p) · GW(p)

Compressing sequences from this universe is good enough for me.

Except that the problem you were attacking at the beginning of this thread was general intelligence, which you claimed to be solvable just by good enough compression, but that requires knowing which parts of the search space in this universe are unlikely, which you haven't shown how to algorithmize.

"Which kinds of data (from computable processes) are likely to be observed in this universe? Ay, there's the rub."

Not really - there are well-known results about that - see: ...

Yes, but as I keep trying to say, those results are far from enough to get something workable, and it's not the methodology behind general compression programs.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:39:22.500Z · LW(p) · GW(p)

Arithmetic compression, Huffman compression, Lempel-Ziv compression, etc are all excellent at compressing sequences produced by small programs. Things like:

1010101010101010 110110110110110110 1011011101111011111

...etc.

Those compressors (crudely) implement a computable approximation of Solomonoff induction without iterating through programs that generate the output. How they work is not very relevant here - the point is that they act as general-purpose compressors - and compress a great range of real world data types.

The complaint that we don't know what types of data are in the universe is just not applicable - we do, in fact, know a considerable amount about that - and that is why we can build general purpose compressors.

↑ comment by timtyler · 2010-03-03T20:44:07.282Z · LW(p) · GW(p)

What's with complaining that compressors are uncomputable?!? Just let your search through the space of possible programs skip on to the next one whenever you spend more than an hour executing. Then you have a computable compressor. That ignores a few especially tedious and boring areas of the search space - but so what?!? Those areas can be binned with no great loss.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T21:20:25.980Z · LW(p) · GW(p)

Did you do the math on this one? Even with only 10% of programs caught in a loop, then it would take almost 400 years to get through all programs up to 24 bits long.

We need something faster.

(Do you see now why Hutter hasn't simply run AIXI with your shortcut?)

Replies from: wnoise, timtyler

↑ comment by wnoise · 2010-03-03T21:55:55.609Z · LW(p) · GW(p)

Of course, in practice many loops can be caught, but combinatorial explosions really does blow any technique out of the water.

↑ comment by timtyler · 2010-03-03T21:32:25.493Z · LW(p) · GW(p)

Uh, I was giving a computable algorithm, not a rapid one.

The objection that compression is uncomputable strategy is a useless one - you just use a computable approximation instead - with no great loss.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T21:54:01.927Z · LW(p) · GW(p)

Uh, I was giving a computable algorithm, not a rapid one.

But you were implying that the uncomputability is somehow "not a problem" because of a quick fix you gave, when the quick fix actually means waiting at least 400 years -- under unrealistically optimistic assumptions.

The objection that compression is uncomputable strategy is a useless one - you just use a computable approximation instead - with no great loss.

Yes, I do use a computable approximation, and my computable approximation has already done the work of identifying the important part of the search space (and the structure thereof).

And that's the point -- compression algorithms haven't done so, except to the extent that a programmer has fed them the "insights" (known regularities of the search space) in advance. That doesn't tell you the algorithmic way to find those regularities in the first place.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:10:03.908Z · LW(p) · GW(p)

Re: "But you were implying that the uncomputability is somehow "not a problem""

That's right - uncomputability in not a problem - you just use a computable compression algorithm instead.

Re: "And that's the point -- compression algorithms haven't done so, except to the extent that a programmer has fed them the "insights" (known regularities of the search space) in advance."

The universe itself exhibits regularities. In particular sequences generated by small automata are found relatively frequently. This principle is known as Occam's razor. That fact is exploited by general purpose compressors to compress a wide range of different data types - including many never seen before by the programmers.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-03-03T22:16:08.344Z · LW(p) · GW(p)

"But you were implying that the uncomputability is somehow "not a problem""

That's right - uncomputability in not a problem - you just use a computable compression algorithm.

You said that it was not a problem with respect to creating superintelligent beings, and I showed that it is.

The universe itself exhibits regularities. ...

Yes, it does. But, again, scientists don't find them by iterating through the set of computable generating functions, starting with the smallest. As I've repeatedly emphasized, that takes too long. Which is why you're wrong to generalize compression as a practical, all-encompassing answer to the problem of intelligence.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:23:14.489Z · LW(p) · GW(p)

This is growing pretty tedious, for me, and probably others :-(

You did not show uncomputability is a problem in that context.

I never claimed iterating through programs was an effective practical means of compression. So it seems as though you are attacking a straw man.

Nor do I claim that compression is "a practical, all-encompassing answer to the problem of intelligence".

Stream compression is largely what you need if you want to predict the future, or build parsimonious models based on observations. Those are important things that many intelligent agents want to do - but they are not themselves a complete solution to the problem.

Replies from: SilasBarta, SilasBarta

↑ comment by SilasBarta · 2010-03-04T15:23:24.062Z · LW(p) · GW(p)

Just to show the circles I'm going in here:

You did not show uncomputability is a problem in that context.

Right, I showed it is a problem in the context in which you originally brought up compression -- as a means to solve the problem of intelligence.

I never claimed iterating through programs was an effective practical means of compression. So it seems as though you are attacking a straw man.

Yes, you did. Right here:

What's with complaining that compressors are uncomputable?!? Just let your search through the space of possible programs skip on to the next one whenever you spend more than an hour executing. Then you have a computable compressor. That ignores a few especially tedious and boring areas of the search space - but so what?!? Those areas can be binned with no great loss.

You also say:

Nor do I claim that compression is "a practical, all-encompassing answer to the problem of intelligence".

Again, yes you did. Right here. Though you said compression was only one of the abilities needed, you did claim "If we had good stream compressors we would be able to predict the future consequences of actions..." and predicting the future is largely what people would classify as having solved the problem of intelligence.

Replies from: timtyler

↑ comment by timtyler · 2010-03-04T21:07:59.285Z · LW(p) · GW(p)

I disagree with all three of your points. However, because the discussion has already been going on already for so long - and because it is so tedious and low grade for me, I am not going to publicly argue the toss with you any more. Best wishes...

↑ comment by SilasBarta · 2010-03-03T22:25:01.475Z · LW(p) · GW(p)

Okay, onlookers: please decide which of us (or both, or neither) was engaging the arguments of the other, and comment or vote accordingly.

ETA: Other than timtyler, I mean.

↑ comment by FAWS · 2010-03-03T13:42:45.118Z · LW(p) · GW(p)

So you think the reason why we can't build a slow running human level AI today with today's hardware is not because we don't know how we should go about it, but because we don't have sufficiently good compression algorithms (and a couple of other things of a similar nature)? And you don't think a compressor that can deduce and compress causal relations in the real world well enough to be able to predict the future consequences of any actions a human level AI might take would either have to be an AGI itself or be a lot more impressive than a "mere" AGI?

Everything else i you post is a retreat of points you have already made.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T13:50:08.095Z · LW(p) · GW(p)

Compression is a key component, yes. See:

http://prize.hutter1.net/

http://marknelson.us/2006/08/24/the-hutter-prize/

We don't know how we should go about making good compression algorithms - though progress is gradually being made in that area.

Your last question seems to suggest one way of thinking about how closely related the concepts of stream compression and intelligence are.

Replies from: FAWS

↑ comment by FAWS · 2010-03-03T16:51:31.674Z · LW(p) · GW(p)

I'd rather say intelligence is a requirement for really good compression, and thus compression can make for a reasonably measurement of a lower bound of intelligence (but not even a particularly good proxy). And you can imagine an intelligent system made up of an intelligent compressor and a bunch of dumb modules. That's no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.

Since a random list of words has a higher entropy than a list of grammatically correct sentences, which in turn has a higher entropy than intelligible text for the best possible compression of Wikipedia the compressor would have to understand English and enough of the content to exploit semantic redundancy, so the theoretically ideal algorithm would have to be intelligent. But that doesn't mean that an algorithm that does better in the Hutter test than another automatically is more intelligent in that way. As far as I understand the current algorithms haven't even gone all that far in exploiting the redundancy of grammar, in that a text generated to be statistically indistinguishable from a grammatical text for those compression algorithms wouldn't have to be grammatical at all, and there seems no reason to believe the current methods would scale up to the theoretical ideal. Nor does there seem to be any reason to think that compression is a good niche/framework to build a machine that understands English. Machine translation seems better to me and even there methods that try to exploit grammar seem to be out-competed by methods that don't bother and rely on statistics alone, so even the impressive progress of e. g. google translation doesn't seem like strong evidence that we are making good progress on the path to a machine that actually understands language. (evidence certainly, but not strong)

TL;DR Just because tool A would be much improved by a step " and here magic happens" doesn't mean that working on improving tool A is a good way to learn magic.

Replies from: timtyler, Richard_Kennaway

↑ comment by timtyler · 2010-03-03T21:04:35.934Z · LW(p) · GW(p)

Compression details are probably not too important here.

Compression is to brains what lift is to wings. In both cases, people could see that there's an abstract principle at work - without necessarily knowing how best to implement it. In both cases people considered a range of solutions - with varying degrees of bioinspiration.

There are some areas where we scan. Pictures, movies, audio, etc. However, we didn't scan bird wings, we didn't scan to make solar power, or submarines, or cars, or memories. Look into this issue a bit, and I think most reasonable people will put machine intelligence into the "not scanned" category. We already have a mountain of machine intelligence in the world. None of it was made by scanning.

Replies from: FAWS

↑ comment by FAWS · 2010-03-03T22:00:31.423Z · LW(p) · GW(p)

Compression details are probably not too important here.

And yet you build the entire case for assigning a greater than 90% confidence on the unproven assertion that compression is the core principle of intelligence - the only argument you make that even addresses the main reason for considering WBE at all.

↑ comment by Richard_Kennaway · 2010-03-03T18:05:38.407Z · LW(p) · GW(p)

That's no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.

Compression is a white bear approach to AGI.

Tolstoy recounts that as a boy, his eldest brother promised him and his siblings that their wishes would come true provided that, among other things, they could stand in a corner and not think about a white bear. The compression approach to AGI seems to be the same sort of enterprise: if we just work on mere, mundane, compression algorithms and not think about AGI, then we'll achieve it.

Replies from: FAWS

↑ comment by FAWS · 2010-03-03T18:26:40.631Z · LW(p) · GW(p)

Do you mean "avoiding being overwhelmed by the magnitude of the problem as a whole and making steady progress in small steps" or "substituting wishful thinking for thinking about the problem", or something else?

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2010-03-03T18:32:47.352Z · LW(p) · GW(p)

Using wishful thinking to avoid the magnitude of the problem.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T20:27:40.642Z · LW(p) · GW(p)

This is Solomonoff induction:

''Solomonoff’s model of induction rapidly learns to make optimal predictions for any computable sequence, including probabilistic ones. It neatly brings together the philosophical principles of Occam’s razor, Epicurus’ principle of multiple explanations, Bayes theorem and Turing’s model of universal computation into a theoretical sequence predictor with astonishingly powerful properties.''

http://www.vetta.org/documents/IDSIA-12-06-1.pdf

It is hard to describe the idea that thinking Solomonoff induction bears on machine intelligence as "wishful thinking". Prediction is useful and important - and this is basically how you do it.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2010-03-03T22:42:50.796Z · LW(p) · GW(p)

But:

"Indeed the problem of sequence prediction could well be considered solved, if it were not for the fact that Solomonoff’s theoretical model is incomputable."

and:

"Could there exist elegant computable prediction algorithms that are in some sense universal? Unfortunately this is impossible, as pointed out by Dawid."

and:

"We then prove that some sequences, however, can only be predicted by very complex predictors. This implies that very general prediction algorithms, in particular those that can learn to predict all sequences up to a given Kolmogorov complex[ity], must themselves be complex. This puts an end to our hope of there being an extremely general and yet relatively simple prediction algorithm. We then use this fact to prove that although very powerful prediction algorithms exist, they cannot be mathematically discovered due to Gödel incompleteness. Given how fundamental prediction is to intelligence, this result implies that beyond a moderate level of complexity the development of powerful artificial intelligence algorithms can only be an experimental science."

While Solomonoff induction is mathematically interesting, the paper itself seems to reject your assessment of it.

Replies from: timtyler

↑ comment by timtyler · 2010-03-03T22:51:04.376Z · LW(p) · GW(p)

Not at all! I have no quarrel whatsoever with any of that (except some minor quibbles about the distinction between "math" and "science").

I suspect you are not properly weighing the term "elegant" in the second quotation.

The paper is actually arguing that sufficiently comprehensive universal prediction algorithms are necessarily large and complex. Just so.

Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future

Contents

248 comments