Hacking the CEV for Fun and Profit

wei-dai

Hacking the CEV for Fun and Profit

post by Wei Dai (Wei_Dai) · 2010-06-03T20:30:29.518Z · LW · GW · Legacy · 207 comments

207 comments

It’s the year 2045, and Dr. Evil and the Singularity Institute have been in a long and grueling race to be the first to achieve machine intelligence, thereby controlling the course of the Singularity and the fate of the universe. Unfortunately for Dr. Evil, SIAI is ahead in the game. Its Friendly AI is undergoing final testing, and Coherent Extrapolated Volition is scheduled to begin in a week. Dr. Evil learns of this news, but there’s not much he can do, or so it seems. He has succeeded in developing brain scanning and emulation technology, but the emulation speed is still way too slow to be competitive.

There is no way to catch up with SIAI's superior technology in time, but Dr. Evil suddenly realizes that maybe he doesn’t have to. CEV is supposed to give equal weighting to all of humanity, and surely uploads count as human. If he had enough storage space, he could simply upload himself, and then make a trillion copies of the upload. The rest of humanity would end up with less than 1% weight in CEV. Not perfect, but he could live with that. Unfortunately he only has enough storage for a few hundred uploads. What to do…

Ah ha, compression! A trillion identical copies of an object would compress down to be only a little bit larger than one copy. But would CEV count compressed identical copies to be separate individuals? Maybe, maybe not. To be sure, Dr. Evil gives each copy a unique experience before adding it to the giant compressed archive. Since they still share almost all of the same information, a trillion copies, after compression, just manages to fit inside the available space.

Now Dr. Evil sits back and relaxes. Come next week, the Singularity Institute and rest of humanity are in for a rather rude surprise!

207 comments

Comments sorted by top scores.

comment by CarlShulman · 2010-06-03T21:14:36.291Z · LW(p) · GW(p)

The point about demarcating individuals is important for ethical theories generally (and decision theories that make use of spooky 'reference classes'). Bostrom's Duplication of experience paper illustrates the problem further.

Also, insofar as CEV is just a sort of idealized deliberative democracy, this points to the problem of emulations with systematically unrepresentative values rapidly becoming the majority when emulation hardware is cheap.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2010-06-10T17:10:13.927Z · LW(p) · GW(p)

Any ethical theory that depends on demarcating individuals, or "counting people", appears doomed.

It seems likely that in the future, "individuals" will be constantly forked and merged/discarded as a matter of course. And like forking processes in Unix, such operations will probably make use of copy-on-write memory to save resources. Intuitively it makes little sense to attach a great deal of ethical significance to the concept of "individual" in those circumstances.

Is it time to give up, and start looking for ethical theories that don't depend on a concept of 'individual"? I'm curious what your thoughts are.

Replies from: Vladimir_M, red75, CarlShulman

↑ comment by Vladimir_M · 2010-06-10T18:44:39.979Z · LW(p) · GW(p)

Arguably, the concept of "individual" is incoherent even with ordinary humans, for at least two reasons.

First, one could argue that human brain doesn't operate as a single agent in any meaningful sense, but instead consists of a whole bunch of different agents struggling to gain control of external behavior -- and what we perceive as our stream of consciousness is mostly just delusional confabulation giving rise to the fiction of a unified mind thinking and making decisions. (The topic was touched upon in this LW post and the subsequent discussion.)

Second, it's questionable whether the concept of personal identity across time is anything more than an arbitrary subjective preference. You believe that a certain entity that is expected to exist tomorrow can be identified as your future self, so you assign it a special value. From the evolutionary perspective, it's clear why humans have this value, and the concept is more or less coherent assuming the traditional biological constraints on human life, but it completely breaks down once this assumption is relaxed (as discussed in this recent thread). Therefore, one could argue that the idea of an "individual" existing through time has no objective basis to begin with, and the decision to identify entities that exist in different instants of time as the same "individual" can't be other than a subjective whim.

I haven't read and thought about these problems enough to form a definite opinion yet, but it seems to me that if we're really willing to go for a no-holds-barred reductionist approach, they should both be considered very seriously. Trouble is, their implications don't sound very pleasant.

Replies from: mattnewport, SilasBarta, red75, None

↑ comment by mattnewport · 2010-06-10T21:40:49.409Z · LW(p) · GW(p)

It strikes me that there's a somewhat fuzzy continuum in both directions. The concept of a coherent identity is largely a factor of how aligned the interests of the component entities are. This ranges all the way from individual genes or DNA sequences, through cells and sub-agents in the brain, past the individual human and up through family, community, nation, company, religion, species and beyond.

Coalitions of entities with interests that are more aligned will tend to have a stronger sense of identity. Shifting incentives may lead to more or less alignment of interests and so change the boundaries where common identity is perceived. A given entity may form part of more than one overlapping coalition with a recognizable identity and shifting loyalties between coalitions are also significant.

↑ comment by SilasBarta · 2010-06-10T21:48:35.075Z · LW(p) · GW(p)

Therefore, one could argue that the idea of an "individual" existing through time has no objective basis to begin with, and the decision to identify entities that exist in different instants of time as the same "individual" can't be other than a subjective whim.

Evolution may have reasons for making us think this, but how would you get that the identification of an individual existing through time is subjective? You can quite clearly recognize that there is a being of approximately the same composition and configuration in the same location from one moment to the next.

Even (and especially) with the Mach/Barbour view that time as a fundamental coordinate doesn't exist, you can still identify a persistent individual in that it is the only one with nearly-identical memories to another one at the nearest location in the (indistinguishable-particle based) configuration space. (Barbour calls this the "Machian distinguished simplifier" or "fundamental distance", and it matches our non-subjective measures of time.)

ETA: See Vladimir_M's response below; I had misread his comment, thereby criticizing a position he didn't take. I'll leave the above unchanged because of its discussion of fundamental distance as a related metric.

Replies from: Vladimir_M

↑ comment by Vladimir_M · 2010-06-10T22:06:40.934Z · LW(p) · GW(p)

SilasBarta:

You can quite clearly recognize that there is a being of approximately the same composition and configuration in the same location from one moment to the next.

That's why I wrote that "the concept [of personal identity] is more or less coherent assuming the traditional biological constraints on human life." It falls apart when we start considering various transhuman scenarios where our basic intuitions no longer hold, and various intuition pump arguments provide conflicting results.

Arguably, some of the standard arguments that come into play when we discuss these issues also have the effect that once they've been considered seriously, our basic intuitions about our normal biological existence also start to seem arbitrary, even though they're clearly defined and a matter of universal consensus within the range of our normal everyday experiences.

Replies from: SilasBarta

↑ comment by SilasBarta · 2010-06-10T22:41:27.036Z · LW(p) · GW(p)

Point taken, I misread you as saying that our intuitions were arbitrary specifically in the case of traditional biological life, not just when they try to generalize outside this "training set". Sorry!

↑ comment by red75 · 2010-06-10T21:24:18.187Z · LW(p) · GW(p)

On the other hand, one could say that human brain can be described as a collection of interconnected subsystems, acting more or less coherently and coordinated by neural activity, that one perceive as stream of consciousness. Thus, stream of consciousness can be seen as a unifying tool, which allows to treat human brain activity as single agent operation. This point of view, while remaning reductionist-compatible, allows to reinforce perception of self as a real acting agent, thus, hopefully, reinforcing underlying neural coordination activity and making brain/oneself more effective.

I'll be convinced that personal identity is a subjective preference, if one can explain strange coincidence: only "tomorrow I" will have those few terabytes of my memories.

↑ comment by [deleted] · 2010-06-10T21:20:19.308Z · LW(p) · GW(p)

Therefore, one could argue that the idea of an "individual" existing through time has no objective basis to begin with, and the decision to identify entities that exist in different instants of time as the same "individual" can't be other than a subjective whim.

That's roughly my current view. Two minor points. I think "whim" may overstate the point. An instinct cooked into us by millions of years of evolution isn't what I'd call "whim". Also, "subjective" seems to presuppose the very subject whose reality is being questioned.

↑ comment by red75 · 2010-06-10T22:21:18.313Z · LW(p) · GW(p)

I think there are significant ethical situations.

Is it ethical to forcefully merge with one's own copy? To create unconscious copy and use it as slave/whatever? To forcefully create someone's copies? To discard one's own copy?

Why would conscious agent be less significant if it can split/merge at will? Of course, voting will be meaningless, but is it reasonable do drop all ethics?

Replies from: jimrandomh, SilasBarta

↑ comment by jimrandomh · 2010-06-10T23:00:39.186Z · LW(p) · GW(p)

With regard to your own copies, it's more a matter of practicality than ethics. Before you make your first copy, while you're still uncertain about which copy you'll be, you should come up with detailed rules about how you want your instances to treat each other, then precommit to follow them. That way, none of your copies can ever force another to do anything without one of them breaking a precommitment.

Replies from: red75

↑ comment by red75 · 2010-06-11T04:45:20.779Z · LW(p) · GW(p)

What if second copy experiences that "click" moment, which makes his/her goals diverge, and he/she is unable to convince first copy to break precommitment on merging or to inflict this "click" moment on first copy?

↑ comment by SilasBarta · 2010-06-10T22:38:35.909Z · LW(p) · GW(p)

More importantly, does it count as gay or masturbation if you have sex with your copy?

↑ comment by CarlShulman · 2010-06-11T03:39:09.483Z · LW(p) · GW(p)

I'm curious what your thoughts are.

I have a plan for a LW post on the subject, although I don't know when I'll get around to it.

comment by Alexandros · 2010-06-04T13:01:36.275Z · LW(p) · GW(p)

Thinking about this a bit more, and assuming CEV operates on the humans that exist at the time of its application: Why would CEV operate on humans that do exist, and not on humans that could exist? It seems this is what Dr. Evil is taking advantage of, by densely populating identity-space around him and crowding out the rest of humanity. But this could occur for many other reasons: Certain cultures encouraging high birth rates, certain technologies or memes being popular at the time of CEV-activation that affect the wiring of the human brain, or certain historical turns that shape the direction of mankind. A more imaginative scenario: what if another scientist, who knows nothing about FAI and CEV, finds it useful to address a problem by copying himself into trillions of branches, each examining a certain hypothesis, and all the branches are (merged/discarded) when the answer is found. Let's further say that CEV t-zero occurs when the scientist is deep in a problem-solving cycle. Would the FAI take each branch as a separate human/vote? This scenario involves no intent to defraud the system. It also is not manipulation of a proxy, as there is a real definitional problem here whose answer is not easily apparent to a human. Applying CEV to all potential humans that could have existed in identity-space would deal with this, but pushes CEV further and further into uncomputable territory.

Replies from: Wei_Dai, MichaelVassar, MugaSofer

↑ comment by Wei Dai (Wei_Dai) · 2010-06-09T00:14:23.493Z · LW(p) · GW(p)

Why would CEV operate on humans that do exist, and not on humans that could exist?

To do the latter, you would need a definition of "human" that can not just distinguish existing humans from existing non-humans, but also pick out all human minds from the space of all possible minds. I don't see how to specify this definition. (Is this problem not obvious to everyone else?)

For example, we might specify a prototypical human mind, and say that "human" is any mind which is less than a certain distance from the prototypical mind in design space. But then the CEV of this "humankind" is entirely dependent on the prototype that we pick. If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of "humanity" come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.

Replies from: Alexandros, Mitchell_Porter, Alexandros

↑ comment by Alexandros · 2010-06-09T09:26:05.157Z · LW(p) · GW(p)

The problem is indeed there, but if the goal is to find out the human coherent extrapolated volition, then a definition of human is necessary.

If we have no way of picking out human minds from the space of all possible minds, then we don't really know what we're optimizing for. We can't rule out the possibility that a human mind will come into existence that will not be (perfectly) happy with the way things turn out.* This may well be an inherent problem in CEV. If FAI will prevent such humans from coming into existence, then it has in effect enforced its own definition of a human on humanity.

But let's try to salvage it. What if you were to use existing humans as a training set for an AI to determine what a human is and is not (assuming you can indeed carve reality/mind-space at the joints, which I am unsure about). Then you can use this definition to pick out the possible human minds from mind-space and calculate their coherent extrapolated volition.

This would be resistant to identity-space stuffing like what you describe, but not resistant to systematic wiping out of certain genes/portions of identity-space before CEV-application.

But the wiping out of genes and introduction of new ones is the very definition of evolution. We then would need to differentiate between intentional wiping out of genes by certain humans from natural wiping out of genes by reality/evolution, a rabbit-hole I can't see the way out of, possibly a category error. If we can't do that, we have to accept the gene-pool at the time of CEV-activation as the verdict of evolution about what a human is, which leaves a window open to gaming by genocide.

Perhaps taking as a starting poing the time of introduction of the idea of CEV as a means of preventing the possibility of manipulation would work, or perhaps trying to infer if there was any intentional gaming of CEV would also work. Actually this would deal with both genocide and stuffing without any additional steps. But this is assuming rewinding time and global knowledge of all human thoughts and memories as capabilities. Great fun :)

*coming to think of it, what guarantees that the result of CEV will not be something that some of us simply do not want? If such clusters exist, will the FAI create separate worlds for each one?

EDIT: Do you think there would be a noticeable difference between 1900AD!CEV and 2000AD!CEV?

↑ comment by Mitchell_Porter · 2010-06-09T00:24:29.620Z · LW(p) · GW(p)

If the FAI designers are allowed to just pick any prototype they want, they can make the CEV of "humanity" come out however they wish, so they might as well have the FAI use the CEV of themselves. If they pick the prototype by taking the average of all existing humans, then that allows the same attack described in my post.

Who ever said that CEV is about taking the average utility of all existing humans? The method of aggregating personal utilities should be determined by the extrapolation, on the basis of human cognitive architecture, and not by programmer fiat.

↑ comment by Alexandros · 2010-06-09T06:32:06.231Z · LW(p) · GW(p)

So how about taking all humans that do exist, determining the boundary humans, and using the entire section of identity-space delineated by them? That is still vulnerable to Dr. Evil killing everyone, but not to the trillion near-copy strategy. No?

↑ comment by MichaelVassar · 2010-06-08T05:13:43.316Z · LW(p) · GW(p)

Yep. This is also, arguably, why cryonics doesn't work.

Replies from: JoshuaZ, Alexandros, NancyLebovitz

↑ comment by JoshuaZ · 2010-06-08T05:17:30.657Z · LW(p) · GW(p)

I don't follow your logic. What is the connection to cryonics?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-08T08:19:04.792Z · LW(p) · GW(p)

If it's just to use volitions of "people in general" as opposed to "people alive now", it might also turn out to be correct to ignore "people present now" altogether, including cryonauts, in constructing the future. So, this is not just an argument for cryonics not working, but also for people alive at the time in no sense contributing/surviving (in other words, everyone survives to an equal degree, irrespective of whether they opted in on cryonics).

Replies from: MichaelVassar

↑ comment by MichaelVassar · 2010-06-09T16:20:13.438Z · LW(p) · GW(p)

Conditional on the development of FAI. The fact that humanity arises and produces FAI capable of timeless trade at a certain rate would cause the human-volition regarding behaviors of all trading super-intelligences and of all human friendly AIs in the multiverse.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-09T19:25:09.724Z · LW(p) · GW(p)

The fact that humanity arises and produces FAI capable of timeless trade at a certain rate would cause the human-volition regarding behaviors of all trading super-intelligences and of all human friendly AIs in the multiverse.

Can't parse this statement. The fact that a FAI could appear causes "human-volition" about all AGIs? What's human-volition, what does it mean to cause human-volition, how is the fact that FAI could appear relevant?

↑ comment by Alexandros · 2010-06-08T10:14:01.315Z · LW(p) · GW(p)

I'm stumped. Can you elaborate?

↑ comment by NancyLebovitz · 2010-06-08T09:26:09.112Z · LW(p) · GW(p)

Do you mean that cryonics doesn't work, or that cryonics isn't worth doing?

↑ comment by MugaSofer · 2013-01-24T11:13:30.747Z · LW(p) · GW(p)

Why would CEV operate on humans that do exist, and not on humans that could exist?

Because for every possible human mind, there's one with the sign flipped on it's utility function. Unless your definition of "human" describes their utility function, in which case ...

comment by RobinHanson · 2010-06-08T20:05:16.390Z · LW(p) · GW(p)

One could of course define the CEV in terms of some previous population, say circa the year 2000. But then you might wonder why it is fair to give higher weight to those groups that managed to reproduce from 1900 to 2000, and so defining it in terms of year 1900 people might be better. But then how far back do you go? How is the increase in population that Dr. Evil manages to achieve for his descendants less legitimate than all the prior gains of various groups in previous periods?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2013-11-27T19:59:06.128Z · LW(p) · GW(p)

How is the increase in population that Dr. Evil manages to achieve for his descendants less legitimate than all the prior gains of various groups in previous periods?

It is differnt in so far as he did it intentionally. Our society differentiates between these cases in general, so why should the CEV not?

Replies from: Jiro

↑ comment by Jiro · 2013-11-27T21:46:50.905Z · LW(p) · GW(p)

There are cases where groups achieved decreases in population for other groups they didn't like (typically by mass-murdering them). Moral theories about population are based on the people in the surviving population, not the people who would have survived without the mass murder. If a population that is decreased by mass murder produces a group that is acceptable for the purposes of CEV, why not a population that is increased by mass uploading?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2013-11-28T06:30:55.454Z · LW(p) · GW(p)

I'm not clear what you are driving at. My point was: "Changes in number of individuals caused intentionally do not count". This applies to self-cloning, birth-control and murder alike.

Replies from: Jiro

↑ comment by Jiro · 2013-11-30T03:04:36.417Z · LW(p) · GW(p)

As the proposal does not include a clause saying "Rome wiped out the Carthaginians, so we need to add a couple million Carthaginians to the calculation because that's how many would have existed in the present otherwise", I don't think you are seriously proposing that changes in number of individuals caused intentionally don't count, at least when the changes are reductions.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2013-11-30T09:32:05.100Z · LW(p) · GW(p)

The goal is to hack your share in the number of individuals sharing benefits from CEV right?

On this goal I do not disregard all intentions alike.

That Rome wiped out Cathargo is not intentional with respect to hacking the CEV. I'd argue that the deeds of our ancestors are time-barred because collective memory forgives that after sufficiently long time (you just have to ask the youth about the Holocaust to see this effect). Thus even there is no statute of limitations on murder on the individual case there effectively is one on societies.

There is an effect of time on intentions. This is because it is usually recognized that you can only look only so far into the future even on you own goals.

Another approach via an analogy: Assume a rich ancestor dies and leaves a large inheritance and in his last will has benefitted all living ancestors alike (possibly even unborn ones via a family trust). Then by analogy the following holds:

If you kill anyone of the other heirs after the event you usually void your share.
If you kill anyone of the other heirs before the event and the legacy has made no exception for this then you still gain your share.
If you have children (probably including clones) it depends on the statement of the will (aha). If it is a simple heritage your children will only participate from your share if born after the event. If it is a family trust they will benefit equally.

comment by AlexMennen · 2010-06-08T02:04:46.733Z · LW(p) · GW(p)

Simple solution: Build an FAI to optimize the universe to your own utility function instead of humanity's average utility function. They will be nearly the same thing anyway (remember, you were tempted to have the FAI use the average human utility function instead, so clearly, you sincerely care about other people's wishes). And in weird situations in which the two are radically different (like this one), your own utility function more closely tracks the intended purpose of an FAI.

Replies from: AlexMennen, Roko, derefr, PhilGoetz, None, MugaSofer

↑ comment by AlexMennen · 2010-06-09T22:30:23.885Z · LW(p) · GW(p)

Here's what I've been trying to say: The thing that you want an FAI to do is optimize the universe to your utility function. That's the definition of your utility function. This will be very close to the average human utility function because you care about what other people want. If you do not want the FAI to do things like punishing people you hate (and I assume that you don't want that), then your utility function assigns a great weight to the desires of other people, and if an FAI with your utility function does such a thing, it must have been misprogrammed. The only reason to use the average human utility function instead is TDT: If that's what you are going to work towards, people are more likely to support your work. However, if you can convince them that on the average, your utility function is expected to be closer to theirs than the average human's is because of situations like this, then that should not be an issue.

↑ comment by Roko · 2010-06-12T16:44:36.820Z · LW(p) · GW(p)

I dispute this claim:

They will be nearly the same thing anyway

There is a worrying tendency on LW to acknowledge verbally moral antirealism, but then argue as if moral realism is true. We have little idea how much our individual extrapolations will disagree on what to do with the universe, indeed there is serious doubt over just how weird those extrapolations will seem to us. There is no in-principle reason for humans to agree on what to do under extrapolation, and in practice we tend to disagree a lot before extrapolation.

Replies from: AlexMennen, Douglas_Knight

↑ comment by AlexMennen · 2010-06-13T15:08:42.223Z · LW(p) · GW(p)

There is a worrying tendency on LW to acknowledge verbally moral antirealism, but then argue as if moral realism is true.

I did not intend to imply that moral realism was true. If I somehow seemed to indicate that, please explain so I can make the wording less confusing.

There is no in-principle reason for humans to agree on what to do under extrapolation, and in practice we tend to disagree a lot before extrapolation.

True, but many of the disagreements between people relate to methods rather than goals or morals, and these disagreements are not relevant under extrapolation. Plus, I want other people to get what they want, so if an AI programmed to optimize the universe to my utility function does not do something fairly similar to optimizing the universe to the average human utility function, either the AI is misprogrammed or the average human utility function changed radically through unfavorable circumstances like the one described in the top-level post. I suspect that the same thing is true of you. And if you do not want other people to get what they want, what is the point of using the average human utility function in the first place?

Replies from: Roko

↑ comment by Roko · 2010-06-13T16:19:34.776Z · LW(p) · GW(p)

I want other people to get what they want

Bob wants the AI to create as close an approximation to hell as possible, and throw you into it forever, because he is a fundamentalist christian.

Are you sure you want bob to get what he wants?

Replies from: AlexMennen, purpleposeidon

↑ comment by AlexMennen · 2010-06-13T16:39:49.607Z · LW(p) · GW(p)

Most fundamentalist christians, although believing that there is a hell and that people like me are destined for it, and want their religion to be right, probably would not want an approximation of their religion created conditional on it not already being right. An AI cannot make Bob right.

That being said, there probably are some people who would want me thrown into hell anyway even if their religion stipulating that I would be was not right in the first place. I should amend my statement: I want people to get what they want in ways that do not conflict, or conflict only minimally, with what other people want. Also, the possibility that there are a great many people like the Bob (as I said, I'm not quite sure how many fundamentalists would want to make their religion true even if it isn't) is a very good reason not to use the average human utility function for the CEV. As you said, I do not want Bob to get what he wants and I suspect that you don't either. So why would you want to create an FAI with a CEV that is inclined to accommodate Bob's wish (which greatly conflicts with what other people want) if it proves especially popular?

Replies from: Blueberry, Roko

↑ comment by Blueberry · 2010-06-13T17:20:13.174Z · LW(p) · GW(p)

CEV doesn't just average people's wishes. It extrapolates what people would do if they were better informed. Even if Bob wants to create a hell right now, his extrapolated volition may be for something else.

↑ comment by Roko · 2010-06-13T19:03:42.455Z · LW(p) · GW(p)

I wouldn't.

So why would you want to create an FAI with a CEV that is inclined to accommodate Bob's wish

Replies from: AlexMennen

↑ comment by AlexMennen · 2010-06-14T03:55:56.166Z · LW(p) · GW(p)

Well, I suppose we can reliably expect that there are not enough people like Bob, and me getting tortured removes much more utility from me than it gives Bob, but that's missing the point.

Imagine yourself in a world in which the vast majority of people want to subject a certain minority group to eternal torture. The majority who want that minority group to be tortured is so vast that an FAI with an average human utility function-based CEV would be likely to subject the members of that minority group to eternal torture. You have the ability to create an FAI with a CEV based off of the average human utility function, with your personal utility function, or not at all. What do you do?

Replies from: Roko

↑ comment by Roko · 2010-06-14T09:12:29.687Z · LW(p) · GW(p)

With my personal utility function, of course, which would, by my definition of the term "right", always do the right thing.

Replies from: AlexMennen

↑ comment by AlexMennen · 2010-06-15T04:43:29.928Z · LW(p) · GW(p)

Silly me, I thought that we were arguing about whether using a personal utility function is a better substitute, and I was rather confused at what appeared to be a sudden concession. Looking at the comments above, I notice that you in fact only disputed my claim that the results would be very similar.

↑ comment by purpleposeidon · 2010-07-09T07:25:31.302Z · LW(p) · GW(p)

I want bob to think he gets what he wants.

↑ comment by Douglas_Knight · 2010-06-12T18:23:51.563Z · LW(p) · GW(p)

There are a lot of different positions people could take and I think you often demand unreasonable dichotomies. First, there is something more like a trichotomy of realism, (anti-realist) cognitivism and anti-cognitivism. Only partially dependent on that is the question of extrapolation. One could believe that there is a (human-)right answer to human moral questions here-and-now, without believing that weirder questions have right answers or that the answer to simple questions would be invariant under extrapolation.

Just because philosophers are wasting the term realism, doesn't mean that it's a good idea to redefine it. You are the one guilty of believing that everyone will converge on a meaning for the word.

I happen to agree with the clause you quote because I think the divergence of a single person is so great as to swamp 6 billion people. I imagine that if one could contain that divergence, one would hardly worry about the problem of different people.

Replies from: Roko, Roko

↑ comment by Roko · 2010-06-12T23:11:01.204Z · LW(p) · GW(p)

I happen to agree with the clause you quote because I think the divergence of a single person is so great as to swamp 6 billion people. I imagine that if one could contain that divergence, one would hardly worry about the problem of different people.

Today, people tend to spend more time and worry about the threat that other people pose than the threat that they themselves (in another mood, perhaps) pose.

This might weakly indicate that inter-person divergence is bigger than intra-person.

Looking from another angle, what internal conflicts are going to be persistent and serious within a person? It seems to me that I don't have massive trouble reconciling different moral intuitions, compared to the size and persistence of, say, the Israel-Palestine conflict, which is an inter-person conflict.

↑ comment by Roko · 2010-06-12T23:07:14.558Z · LW(p) · GW(p)

The difference between Eliezer's cognitivism and the irrealist stance of, e.g. Greene is just syntactic, they mean the same thing. That is, they mean that values are arbitrary products of chance events, rather than logically derivable truths.

↑ comment by derefr · 2010-06-08T02:22:01.237Z · LW(p) · GW(p)

This seems to track with the Eliezer's fictional "conspiracies of knowledge": if we don't want our politicians to get their hands on our nuclear weapons (or the theory for their operation), then why should they be allowed a say in what our FAI thinks?

↑ comment by PhilGoetz · 2010-06-09T18:08:39.538Z · LW(p) · GW(p)

Besides, the purpose of CEV is to extrapolate the volition humanity would have if it were more intelligent - and since you just created the first AI, you are clearly the most intelligent person in the world (not that you didn't already know that). Therefore, using your own current utility function is an even better approximation than trying to extrapolate humanity's volition to your own level of intelligence!

↑ comment by [deleted] · 2010-06-08T03:14:43.405Z · LW(p) · GW(p)

"I was tempted not to kill all those orphans, so clearly, I'm a compassionate and moral person."

Replies from: AlexMennen

↑ comment by AlexMennen · 2010-06-08T19:08:43.993Z · LW(p) · GW(p)

That's not an accurate parallel. The fact that you thought it was a good idea to use the average human utility function proves that you expect it to have a result almost identical to an FAI using your own utility function. If the average human wants you not to kill the orphans, and you also want not to kill the orphans, it doesn't matter which algorithm you use to decide not to kill the orphans.

Replies from: None

↑ comment by [deleted] · 2010-06-09T02:59:08.971Z · LW(p) · GW(p)

I think that you're looking too deeply into this; what I'm trying to say is that accepting excuses of the form "I was tempted to do ~x before doing x, so clearly I have properties characteristic of someone who does ~x" is a slippery slope.

Replies from: Kingreaper, Kingreaper

↑ comment by Kingreaper · 2010-06-21T19:29:19.156Z · LW(p) · GW(p)

If you killed the orphans because otherwise Dr. Evil would have converted the orphans into clones of himself, and taken over the world, then your destruction of the orphanage is more indicative of a desire for Dr. Evil not to take over the world than any opinion on orphanages.

The fact you were tempted not to destroy the orphanage (despite the issue of Dr. Evil) is indicative of the fact you don't want to kill orphans.

↑ comment by Kingreaper · 2010-06-21T19:10:15.598Z · LW(p) · GW(p)

I don't see how it is slippery at all. Instead, it seems that you have simply jumped off the slope.

If you were tempted to save the orphans you have some properties that lead to not killing orphans. You likely share some properties with compassionate, moral people.

That doesn't make you compassionate or moral. I'm often tempted to murder people by cutting out their heart and shoving it into their mouth.

This doesn't make me a murderer, but it does mean I have some properties characteristic of murderers.

↑ comment by MugaSofer · 2013-01-24T11:00:19.646Z · LW(p) · GW(p)

What if you're a preference utilitarian?

Replies from: AlexMennen

↑ comment by AlexMennen · 2013-01-24T19:08:41.758Z · LW(p) · GW(p)

If you are a true preference utilitarian, then the FAI will implement preference utilitarianism when it maximizes your utility function.

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-01-25T09:17:11.785Z · LW(p) · GW(p)

My point was that a preference utilitarian would let Dr Evil rule the world, in that scenario.

Although, obviously, if you're a preference utilitarian then that's what you actually want.

comment by Oscar_Cunningham · 2010-06-03T21:13:51.232Z · LW(p) · GW(p)

The thing I've never understood about CEV is how the AI can safely read everyone's brain. The whole point of CEV is that the AI is unsafe unless it has a human value system, but before it can get one, it has to open everyones heads and scan their brains!? That doesn't sound like something I'd trust a UFAI to do properly.

I bring this up because without knowing how the CEV is supposed to occur it is hard to analyse this post. I also agree with JoshuaZ that this didn't deserve a top-level post.

Replies from: CarlShulman, Mitchell_Porter

↑ comment by CarlShulman · 2010-06-03T21:25:48.556Z · LW(p) · GW(p)

Presumably by starting with some sort of prior, and incrementally updating off of available information (the Web, conversation with humans, psychology literature, etc). At any point it would have to use its current model to navigate tradeoffs between the acquisition of new information about idealised human aims and the fulfillment of those aims.

This does point to another more serious problem, which is that you can't create an AI to "maximize the expected value of the utility function written in this sealed envelope" without a scheme for interpersonal comparison of utility functions (if you assign 50% probability to the envelope containing utility function A, and 50% probability to the envelope containing utility function B, you need an algorithm to select between actions when each utility function alone would favor a different action). See this OB post by Bostrom.

↑ comment by Mitchell_Porter · 2010-06-04T02:45:59.499Z · LW(p) · GW(p)

The C in CEV stands for Coherent, not Collective. You should not think of CEV output as occurring through brute-force simulation of everyone on Earth. The key step is to understand the cognitive architecture of human decision-making in the abstract. The AI has to find the right concepts (the analogues of utility function, terminal values, etc). Then it is supposed to form a rational completion and ethical idealization of the actual architecture, according to criteria already implicit in that architecture. Only then does it apply the resulting decision procedure to the contingent world around us.

Replies from: Oscar_Cunningham

↑ comment by Oscar_Cunningham · 2010-06-04T07:54:15.031Z · LW(p) · GW(p)

Still not following you. Does it rely on everyone's preferences or not? If it does then it has to interact with everybody. It might not have to scan their brains and brute force an answer, but it has to do something to find out what they want. And surely this means letting it loose before it has human values? Even if you plan to have it just go round and interview everyone, I still wouldn't trust it.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2010-06-04T10:36:34.669Z · LW(p) · GW(p)

Does it rely on everyone's preferences or not? If it does then it has to interact with everybody.

CEV is more like figuring out an ethical theory, than it is about running around fighting fires, granting wishes, and so on. The latter part is the implementation of the ethical theory. That part - the implementation - has to be consultative or otherwise responsive to individual situations. But the first part, CEV per se - deciding on principles - is not going to require peering into the mind of every last human being, or even very many of them.

It is basically an exercise in applied neuroscience. We want to understand the cognitive basis of human rationality and decision-making, including ethical and metaethical thought, and introduce that into an AI. And it's going to be a fairly abstract thing. Although human beings love food, sex, and travel, there is no way that these are going to be axiomatic values for an AI, because we are capable of coming up with ideas about what amounts to good or bad treatment of organisms or entities with none of those interests. So even if our ethical AI looks at an individual human being and says, that person should be fed, it won't be because its theory of the world says "every sentient being must be given food" as an ethical first principle. The ultimate basis of such judgments is going to be something a whole lot more abstract which doesn't even refer to or presuppose human beings directly, but which, when applied to an entity like a human being, is capable of giving rise to such judgments.

(By the way, I don't mean that such reasoning from abstract beginnings will be the basis of real-time judgments. You don't go through life recomputing everything from first principles at every moment; as you discover important new implications of those principles, or effective new heuristics of action, you store them in memory and act directly on them later on. The ultimate basis of such complex decision-making in an AI would be pragmatically relevant only under certain circumstances, as when it was asked to justify a particular decision from first principles, or when it was faced with very novel situations, or when it was engaged in thought experiments.)

Although I have been insisting that the output of CEV - something like a theory of ethical action - must be independent, not only of what we all want from moment to moment, but even independent of many basic human realities (because it must have something to say about entities which don't have those qualities or needs), it cannot be entirely independent of human nature. We all agree that certain outcomes are bad. That agreement is coming from something in us; a rock, for example, would not agree, or disagree. So some part of human decision-making cognition is necessary for a Friendly AI to be recognized as such. The whole idea of CEV is about extracting that part of how we work, and copying it across to silicon.

This is where the applied neuroscience comes in. We cannot trust pure thought to get this right. Even pure thought combined with psychological experiment is probably not enough; we need to understand what the brain is doing when we make these judgments. At the same time, pure neuroscience is not enough either; it would just give us a neutral causal description of how the brain works; it wouldn't tell us how to normatively employ that information in making a Friendly AI. Thus, applied neuroscience. The human beings who set a CEV process in motion would need to avoid two things: they would have to avoid wrong apriori normatives, and they would have to avoid wrong implicit normatives. By an implicit normative, I mean something, some factor in their thinking and their practice, which isn't explicitly recognized as helping to determine the CEV outcome, but which is doing so anyway.

I'm saying a lot which isn't in the existing expositions of CEV (e.g. Yudkowsky, Mijic, Nesov), but it comes just from taking the philosophy and adding the fact that all this information about human meta-preferences is meant to come from the study of the brain. That's called neuroscience, and in principle even fallible human cognition, in the form of human neuroscience, may figure all this out before we ever have self-enhancing AIs. In other words, we may reach a point where the combination of psychology, philosophy, neuroscience really is telling us, this is what humans actually want, or want to want, etc. (Though it may be hard for non-experts to tell the difference between false premature claims of such knowledge, and the real thing.)

All that could happen even before there is a Singularity, and in that case the strategy for a Friendly outcome will be able to dispense with automating the deduction of neuroethical first principles, and concentrate on simply ensuring that the first transhuman AI operates according to those principles, rather than according to some utility function which, when pursued with superhuman intelligence, leads to disaster. But the CEV philosophy, and the idea of "reflective decision theory", is meant to offer a way forward, if we do figure out artificial intelligence before we figure out artificial ethics.

comment by blogospheroid · 2010-06-11T05:32:58.142Z · LW(p) · GW(p)

Summing up the only counterhacks presented, not including deeper discussions of the other issues people had with CEV.

Taking into account only variances from one mind to another, so that very similar minds cluster and their volition is taken into account, but not given any great preference. running into problem - normal human majorities are also made into minorities.
Taking into account cycle time of humans
Taking into account unique experiences weighted by hours of unique experience
Doing CEV on possible human minds, instead of present human minds ** running into the problem, what is the boundary of human space?
Establishing the protocols of what constitutes a valid and desirable extrapolation from a small sub group and only then unleashing the CEV to go fetch volition from the entire humanity. ** running into problem - difficult to distinguish means and methods of valid extrapolation from the actual morals of the initial small sub group.

Am I comprehensive enough or did i miss any other significant counterhack in the thread?

comment by Yoreth · 2010-06-04T03:11:06.332Z · LW(p) · GW(p)

This seems to be another case where explicit, overt reliance on a proxy drives a wedge between the proxy and the target.

One solution is to do the CEV in secret and only later reveal this to the public. Of course, as a member of said public, I would instinctively regard with suspicion any organization that did this, and suspect that the proffered explanation (some nonsense about a hypothetical "Dr. Evil") was a cover for something sinister.

Replies from: blogospheroid

↑ comment by blogospheroid · 2010-06-04T06:57:50.405Z · LW(p) · GW(p)

Since I wrote about Extrapolated Volition as a solution to Goodhart's law, I think I should explain why i did so.

Here, what is sought is friendliness (your goal - G), whereas the friendliness architecture, the actual measureable thing, the goal is the proxy(G*).

Extrapolated volition is one way of avoiding G* divergence from G because when one extrapolates the volition of the persons involved, one gets closer to G.

In friendly AI, the entire living humanity's volition is sought to be extrapolated. Unfortunately, this proxy, like any other proxy, is subject to hack attacks. The scale of this problem is such that other solutions proposed cannot be utilised.

EDIT : edited for grammar in 3rd para

Replies from: Houshalter

↑ comment by Houshalter · 2010-06-04T15:31:40.875Z · LW(p) · GW(p)

In friendly AI, the entire living humanity's volition is sought to be extrapolated.

Thats the number one thing they are doing wrong then. This is exactly why you don't want to do that. Instead, the original programmer(s)'s volition should be the one to be extrapolated. If the programmer wants what is best for humanity, then the AI will also. If the programmer doesn't want whats best, then why would you expect him to make this for humanity in the first place? See, by wanting what is best for humanity, the programmer also doesn't want all potential bugs and problems that could come up like this one. The only problem I can see is if there are multiple people working on it. Do they put their trust into one leader that will then take control?

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-04T16:49:13.340Z · LW(p) · GW(p)

You are assuming that the programmer's personal desired reflect what is best for humans as whole. Relying on what humans think that is rather than a top-down approach will likely work better. Moreover, many people see an intrinsic value in some form of democratic approach. Thus, even if I could program a super-smart AI to push through my personal notion of "good" I wouldn't want to because I'd rather let collective decision making occur than impose my view on everyone.

This is aside from other issues like the fact that there likely won't be a single programmer for such an AI but rather a host of people working on it.

A lot of these issues are discussed in much more detail in the sequences and older posts. You might be downvoted less if you read more of those instead of rehashing issues that have been discussed previously. At least if you read those, you'll know what arguments have been made before and which have not been brought up. Many online communities one can easily jump into without reading much of their recommended reading. Unfortunately, that's not the case for Less Wrong.

Replies from: kodos96, Houshalter

↑ comment by kodos96 · 2010-06-04T23:59:01.574Z · LW(p) · GW(p)

I don't seem to recall any of the sequences specifically addressing CEV and such (I read about it via eliezer's off-site writings). Did I miss a sequence somewhere?

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-05T00:14:53.182Z · LW(p) · GW(p)

I wasn't sure. That's why I covered my bases with "sequences and older posts." But I also made my recommendation above because many of the issues being discussed by Houshalter aren't CEV specific but general issues of FAI and metaethics, which are covered explicitly in the sequences.

↑ comment by Houshalter · 2010-06-04T18:42:25.519Z · LW(p) · GW(p)

You are assuming that the programmer's personal desired reflect what is best for humans as whole.

[...]

if I could program a super-smart AI to push through my personal notion of "good" I wouldn't want to because I'd rather let collective decision making occur than impose my view on everyone.

But my point is, if thats what you want, then it will do it. If you want to make it a democracy, then you can spend years trying to figure out every possible exception and end up with a disaster like whats presented in this post, or you can make the AI and it will organize everything the way you want it as best it can without creating any bizzare loopholes that could destroy the world. Its always going to be a win-win for whoever created it.

This is aside from other issues like the fact that there likely won't be a single programmer for such an AI but rather a host of people working on it.

Possibly, though I doubt it. But even if it is, you can just do that democracy thing on the group in question, not the whole world. Also, until your AI is smart enough and powerful enough to work at that level, its going to be extremely dangerous to declare that the AI will be in charge of the world from then on. Even if its working perfectly, without the proper resources and strategy in place, its going to be very though to just "take over" and it will likely cost lives. Infact, to me thats the scariest part of AI. Good or bad, at some point the old system is going to have to be abolished.

A lot of these issues are discussed in much more detail in the sequences and older posts. You might be downvoted less if you read more of those instead of rehashing issues that have been discussed previously. At least if you read those, you'll know what arguments have been made before and which have not been brought up. Many online communities one can easily jump into without reading much of their recommended reading. Unfortunately, that's not the case for Less Wrong.

I only have so much time in a day and in that time there is only so much I can read/do. But I do try.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-04T19:00:44.005Z · LW(p) · GW(p)

Its always going to be a win-win for whoever created it.

Well, thankfully a lot of the people here care enough about the opinions of others that they want to work out a framework that will work well for others. Note incidentally, that it isn't necessarily the case that it will even be a win for the programmer. Bad AI's can end up trying to paperclip the Earth . Even the democracy example would be difficult for the AI to achieve. Say for example that I tell the AI to determine things with a democratic system and to give that a highest priority and then a majority of people decide to do away with the democracy, what is the AI supposed to do? Keep in mind that AI are not going to act like villainous computers from bad scifi where simply giving the machines an apparent contradiction will make them overheat and meltdown.

Possibly, though I doubt it. But even if it is, you can just do that democracy thing on the group in question, not the whole world. Also, until your AI is smart enough and powerful enough to work at that level, its going to be extremely dangerous to declare that the AI will be in charge of the world from then on. Even if its working perfectly, without the proper resources and strategy in place, its going to be very though to just "take over" and it will likely cost lives.

This is an example where knowing about prior discussions here would help. In particular, you seem to be assuming that the AI will take quite a bit of time to get to be in charge. Now, as a conclusion, that's one I agree with. But a lot of very smart people such as Eliezer Yudkowsky consider the chance that an AI might take over in a very short timespan to be very high. And a decent number of LWians agree with Eliezer or at least consider such results to be likely enough to take seriously. So just working off the assumption that an AI will come to global power but will do so slowly is not a good assumption here: It is one you can preface explicitly as a possibility and say something like "If AI doesn't foom very fast then " but just taking your position for granted like that is a major reason you are getting downvoted.

Replies from: Houshalter

↑ comment by Houshalter · 2010-06-04T20:39:19.174Z · LW(p) · GW(p)

Well, thankfully a lot of the people here care enough about the opinions of others that they want to work out a framework that will work well for others.

That's my point. If they do care about that, then the AI will do it. If it doesn't, then its not working right.

Note incidentally, that it isn't necessarily the case that it will even be a win for the programmer. Bad AI's can end up trying to paperclip the Earth .

Bad AI's can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.

Say for example that I tell the AI to determine things with a democratic system and to give that a highest priority and then a majority of people decide to do away with the democracy, what is the AI supposed to do?

It's only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.

This is an example where knowing about prior discussions here would help. In particular, you seem to be assuming that the AI will take quite a bit of time to get to be in charge. Now, as a conclusion, that's one I agree with. But a lot of very smart people such as Eliezer Yudkowsky consider the chance that an AI might take over in a very short timespan to be very high. And a decent number of LWians agree with Eliezer or at least consider such results to be likely enough to take seriously. So just working off the assumption that an AI will come to global power but will do so slowly is not a good assumption here: It is one you can preface explicitly as a possibility and say something like "If AI doesn't foom very fast then " but just taking your position for granted like that is a major reason you are getting downvoted.

Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn't interfaced with the actual world enough that you could control everything from it, and I can't see any possible way any entity could take over. Doesn't mean it can't happen, but its also wrong to assume it will.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-04T21:46:27.479Z · LW(p) · GW(p)

That's my point. If they do care about that, then the AI will do it. If it doesn't, then its not working right.

So care about other people how? And to what extent? That's the point of things like CEV.

It's only really a contradiction to us. Either the AI has a goal to make sure that there is always a democracy or it has a goal to simply build a democracy in which case it can abolish itself if it decides to do so.

Insufficient imagination. What if for example we tell the AI to try the first one and then it decides that the solution is to kill the people who don't support a democracy? That's the point, even when you've got something resembling a rough goal, you are assuming your AI will accomplish the goals the way a human would.

To get some idea of how easily something can go wrong it might help to say read about the stamp collecting device for starters. There's a lot that can go wrong with an AI. Even dumb optimizers often arrive at answers that are highly unexpected. Smart optimizers have the same problems but more so.

Bad AI's can, sure. If its bad though, whats it matter who its trying to follow orders from. It will ultimately try to turn them into paper clips as well.

What matters is that an unfriendly AI will make things bad for everyone. If someone screws up just once and makes a very smart paperclipper then that's an existential threat to humanity.

Your right. Sorry. There are a lot of variables to consider. It is one likely sceneario to consider. Currently, the internet isn't interfaced with the actual world enough that you could control everything from it, and I can't see any possible way any entity could take over. Doesn't mean it can't happen, but its also wrong to assume it will.

Well, no one is assuming that it will. But some people assign the scenario a high probability, and it only needs a very tiny probability to really be a bad scenario. Note incidentally that there's a lot a very smart entity could do simply with basic internet access. For example, consider what happens if the AI finds a fast way to factor numbers. Well then, lots of secure communication channels over the internet are now vulnerable. And that's aside from the more plausible but less dramatic problem of an AI finding flaws in programs that we haven't yet noticed. Even if our AI just decided to take over most of the world's computers to increase processing power that's a pretty unpleasant scenario for the rest of us. And that's on the lower end of problems. Consider how often some bad hacking incident occurs where some system that should not have been online is accessible online. Now think about how many automated or nearly fully automated plants there are (for cars, for chemicals for 3-rd printing). And that situation will only get worse over the next few years.

Worse, a smart AI can likely get people to release it from its box and allow it a lot more free reign. See the AI box test. Even if the AI has trouble dealing with that, an AI with internet access (which you seem to think wouldn't be that harmful) might not have trouble finding someone sympathetic to the AI if it portrayed itself sympathetically. These are all only some of the most obvious of failure modes. It may well be that some of the sneakiest things such an AI could do won't even occur to us because they are so beyond anything humans would think of. It helps for this sort of thing to not only have a minimally restricted imagination but also to realize that even such an imagination is likely too small to encompass all the possible things that can go wrong.

Replies from: None, Blueberry

↑ comment by [deleted] · 2010-06-04T22:36:26.788Z · LW(p) · GW(p)

That's my point. If they do care about that, then the AI will do it. If it doesn't, then its not >>working right.

So care about other people how? And to what extent? That's the point of things like CEV.

If I understand Houshalter correctly, then his idea can be presented using the following story:

Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep moral reflection and finally come up with the perfect implementation of CEV completely impervious to the tricks of Dr. Evil and his ilk.

However, morality upon which you have reflected for all those years isn't an external force accessible only to humans. It is a computation embedded in your brain. Whatever you ended up doing was the result of your brain-state at the beginning of the story and stimuli that have affected you since that point. All of this could have been simulated by a Sufficiently Smart™ AGI.

So the idea is: instead of spending those years coming up with the best goal system for your AGI, simply run it and tell it to simulate a counterfactual world in which you did and then do what you would have done. Whatever will result from that, you couldn't have done better anyway.

Of course, this is all under the assumption that formalizing Coherent Extrapolated Volition is much more difficult than formalizing My Very Own Extrapolated Volition (for any given value of me).

↑ comment by Blueberry · 2010-06-04T21:54:38.298Z · LW(p) · GW(p)

To get some idea of how easily something can go wrong it might help to read say read about the stamp collecting device for starters.

Thanks for that link. That is brilliant, especially Eliezer's comment:

Seth, I see that you were a PhD student in NEU’s Electrical Engineering department. Electrical engineering isn’t very complicated, right? I mean, it’s just:

while device is incomplete

…get some wires

…connect them

The part about getting wires can be implemented by going to a hardware store, and as for connecting them, a soldering iron should do the trick.

comment by Scott Alexander (Yvain) · 2010-06-05T14:35:43.906Z · LW(p) · GW(p)

EDIT: Doesn't work, see Wei Dai below.

This isn't a bug in CEV, it's a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.

But allowing Dr. Evil to clone himself is bad; it will reduce the utility of all currently existing humans except Dr. Evil.

If a normal, relatively nice but non-philosopher human ascended to godhood, ve would probably ignore Dr. Evil's clones' wishes. Ve would destroy the clones and imprison the doctor, because ve was angry at Dr. Evil for taking the utility-lowering action of cloning himself and wanted to punish him.

But everything goes better than expected! Dr. Evil hears a normal human is ascending to godhood, realizes making the clones won't work, and submits passively to the new order. And rationalists should win, so a superintelligent AI should be able to do at least as well as a normal human by copying normal human methods when they pay off.

So an AI with sufficiently good decision theory could (I hate to say "would" here, because making quick assumptions that an AI would do the right thing is a good way to get yourself killed) use the same logic. Ve would say, before even encountering the world "I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility." Then ve opens ver sensors, sees Dr. Evil and his clones, and says "Sorry, I'd like to help you, but I precommited to not doing so," kills all of the clones as painlessly as possible, and get around to saving the world.

Replies from: Wei_Dai, Douglas_Knight

↑ comment by Wei Dai (Wei_Dai) · 2010-06-05T17:22:11.360Z · LW(p) · GW(p)

"I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility."

Wait, increase utility according to what utility function? If it's an aggregate utility function where Dr. Evil has 99% weight, then why would that precommitment increase utility?

Replies from: Yvain, MugaSofer

↑ comment by Scott Alexander (Yvain) · 2010-06-05T20:25:12.703Z · LW(p) · GW(p)

You're right. It will make a commitment to stop anyone who tries the same thing later, but it won't apply it retroactively. The original comment is wrong.

↑ comment by MugaSofer · 2013-01-24T11:05:13.861Z · LW(p) · GW(p)

Wait, increase utility according to what utility function?

The current CEV of humanity, or your best estimate of it, I think. If someone forces us to kill orphans or they'll destroy the world, saving the world is higher utility, but we still want to punish the guy who made it so.

I think that's where the idea came from, anyway; I agree with Yvain that it doesn't work.

↑ comment by Douglas_Knight · 2010-06-05T15:03:21.653Z · LW(p) · GW(p)

This isn't a bug in CEV, it's a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.

I think that's wrong. At the very least, I don't think it matches the scenario in the post. In particular, I think "how many people are there?" is a factual question, not a moral question. (and the answer is not an integer)

Replies from: Dre

↑ comment by Dre · 2010-06-05T15:11:40.610Z · LW(p) · GW(p)

But the important (and moral) question here is "how do we count the people for utility purposes." We also need a normative way to aggregate their utilities, and one vote per person would need to be justified separately.

Replies from: Blueberry

↑ comment by Blueberry · 2010-06-05T16:40:48.897Z · LW(p) · GW(p)

This scenario actually gives us a guideline for aggregating utilities. We need to prevent Dr. Evil from counting more than once.

One proposal is to count people by different hours of experience, so that if I've had 300,000 hours of experience, and my clone has one hour that's different, it counts as 1/300,000 of a person. But if we go by hours of experience, we have the problem that with enough clones, Dr. Evil can amass enough hours to overwhelm Earth's current population (giving ten trillion clones each one unique hour of experience should do it).

So this indicates that we need to look at the utility functions. If two entities have the same utility function, they should be counted as the same entity, no matter what different experiences they have. This way, the only way Dr. Evil will be able to aggregate enough utility is to change the utility function of his clones, and then they won't all want to do something evil. Something like using a convergent series for the utility of any one goal might work: if Dr. Evil wants to destroy the world, his clone's desire to do so counts for 1/10 of that, and the next clone's desire counts for 1/100, so he can't accumulate more than 10/9 of his original utility weight.

comment by Roko · 2010-06-06T20:07:10.509Z · LW(p) · GW(p)

In a sense, this kind of thing (value drift due to differential reproduction) is already happening. For example, see this article about the changing racial demographics of the USA:

According to an August 2008 report by the U.S. Census Bureau, those groups currently categorized as racial minorities—blacks and Hispanics, East Asians and South Asians—will account for a majority of the U.S. population by the year 2042. Among Americans under the age of 18, this shift is projected to take place in 2023

An increasing Latin-American population in the US seems to be causing an increase in Catholicism, which, if it continued, would constitute a significant axiological shift, as anyone who takes a pro-choice stance on abortion will attest.

Extrapolating these kinds of effects out to 2100 and beyond seems to indicate that we are (by default) in for much more such change.

Replies from: NancyLebovitz, RobinZ

↑ comment by NancyLebovitz · 2010-06-07T21:39:43.872Z · LW(p) · GW(p)

What effect do you think the demographic shift will have? It isn't as though blacks, Hispanics, East Asians, and South Asians are going to be a single power bloc.

Replies from: Roko

↑ comment by Roko · 2010-06-07T23:17:36.817Z · LW(p) · GW(p)

I honestly have no idea, and I wish I had a good way of thinking about this.

Far right people say that the end of white America will be the beginning of doom and disaster. It is important not to let either politically motivated wishful thinking or politically correct wishful thinking bias one when trying to get to the real facts of the matter.

One way to think about the problem is to note that different immigrant populations have different social outcomes, and that there is some evidence that this is partially genetic, though it is obviously partly cultural. You could simply say, what would happen if you extrapolated the ethnic socioeconomics in proportion to their increased share of the population. So, in this spherical cow model of societies, overall American socioeconomic and political variables are simply linear combinations of those for ethnic subpopulations. In this simple model, fewer whites would be a bad thing, though more east asians would probably be a good thing.

Another effect is to note that humans innately cluster and ally with those who are similar to them, so as the ethinic diversity (you could even calculate an ethnic entropy as SIGMA[ r_i log r_i ] ) increases, you'd expect the country to be more disunified, and more likely to have a civil war or other serious internal conflict. (Hypothesis: ethnic entropy correlates with civil wars).

My rational expectation is that the demographic shift will have a mildly bad effect on overall welfare in 2042, but that no one will believe me because it sounds "racist".

If the trend continues to 2100 and no major effect intervenes, US will be Hispanic majority as far as I know. My rational expectation of outcomes in this case is that something disruptive will happen, i.e. that the model will somehow break, perhaps violently. I just can't imagine a smooth transition to a majority Hispanic US.

↑ comment by RobinZ · 2010-06-07T16:46:22.213Z · LW(p) · GW(p)

What, by 2042? There's no chance that it will take all the way to 2043 for this to occur? I wish I could be as confident in predicting events three decades yet to come!

(So far as I can tell from tracking down the press release and then the study in question, the prediction in question was formed by estimating birth, death, and migration factors for each age group within each demographic and performing the necessary calculations. Error bars are not indicated anywhere I have found.)

Replies from: Roko

↑ comment by Roko · 2010-06-07T17:55:44.194Z · LW(p) · GW(p)

I don't think that the message is significantly changed if you add some variance to that.

Replies from: RobinZ

↑ comment by RobinZ · 2010-06-07T18:47:16.186Z · LW(p) · GW(p)

You're right - I'm just a bit trigger-happy about futurism.

Replies from: Roko

↑ comment by Roko · 2010-06-07T21:33:32.344Z · LW(p) · GW(p)

I think that a serious flaw of Less Wrong is that the majority of commenters weigh the defensibility of a statement far higher than the value of the information that it carries (in the technical sense of information value).

A highly defensible statement can be nearly useless if it doesn't pertain to something of relevance, whereas a mildly inaccurate and/or mildly hard to rhetorically defend statement can be extremely valuable if it reveals an insight that is highly relevant to aspects of reality that we care about.

comment by Blueberry · 2010-06-05T20:22:13.849Z · LW(p) · GW(p)

I just realized there's an easier way to "hack" CEV. Dr. Evil just needs to kill everyone else, or everyone who disagrees with him.

comment by Alexandros · 2010-06-03T21:33:10.412Z · LW(p) · GW(p)

What if the influence is weighted by degree of divergence from the already-scanned minds, something like a reverse PageRank? All Dr. Evils would cluster, and therefore count as bit above-1 vote. Also, this could cover the human-spectrum better, less influenced by cultural factors. I guess this would give outliers much more influence but if outliers are in all directions, would they cancel each other out? What else could go terribly wrong with this?

Replies from: Tiiba, blogospheroid, timtyler

↑ comment by Tiiba · 2010-06-04T01:04:49.763Z · LW(p) · GW(p)

Imagine if elections worked that way: one party, one vote, so the time cubists would get only slightly less influence than the Democrats. I dunno...

↑ comment by blogospheroid · 2010-06-04T10:23:16.610Z · LW(p) · GW(p)

Whuffie, the money/social capital analog in Cory Doctorow's "down and out in the magic kingdom" had some feature like that. Right handed whuffie was whuffie was given by people you like and left handed whuffie was whuffie given by people you didn't like. So, mind similarity could incorporate some such measure, where left handed whuffie was made really important.

Replies from: Alexandros

↑ comment by Alexandros · 2010-06-04T11:35:10.177Z · LW(p) · GW(p)

If I've learned to rely on something, it's this community providing references to amazing and relevant material. Thank you so much.

Aside: As it turns out, the Whuffie, especially its weighted variety, is very close to what I had in mind for a distributed cloud architecture I am working on. I called the concept 'community currency' but the monetary reference always puts people off. Perhaps referring to Whuffie (or the related Kudos) will help better communicate. Again, many thanks.

↑ comment by timtyler · 2010-06-04T00:24:53.327Z · LW(p) · GW(p)

It sounds as though it has much the same problems - most obviously difficulty of implementation.

Technology has historically been used to give those in power what they want. Are those in power likely to promote a system that fails to assign their aims privilidged status? Probably not, IMO. There's democracy - but that still leaves lots of room for lobbying.

comment by Vladimir_Nesov · 2010-06-03T20:43:35.246Z · LW(p) · GW(p)

Since Dr. Evil is human, it shouldn't be that bad. Extrapolated volition kicks in, making his current evil intentions irrelevant, possibly even preferring to reverse the voting exploit.

Replies from: Baughn, dclayh

↑ comment by Baughn · 2010-06-03T21:31:13.726Z · LW(p) · GW(p)

That is not the most inconvenient possible world.

The conservative assumption to make here is that Dr. Evil is, in fact, evil; and that, after sufficient reflection, he is still evil. Perhaps he's brain-damaged; it doesn't matter.

Given that, will he be able to hijack CEV? Probably not, now that the scenario has been pointed out, but what other scenarios might be overlooked?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-03T21:45:18.102Z · LW(p) · GW(p)

I agree. (Presumably we shouldn't include volitions of tigers in the mix, and the same should go for the actually evil alien mutants.)

Replies from: Baughn

↑ comment by Baughn · 2010-06-03T22:59:42.162Z · LW(p) · GW(p)

So, how do we decide who's evil?

Replies from: Kutta

↑ comment by Kutta · 2010-06-04T08:14:34.233Z · LW(p) · GW(p)

A surprisingly good heuristic would be "choose only humans".

Replies from: Blueberry, Strange7

↑ comment by Blueberry · 2010-06-04T08:18:40.646Z · LW(p) · GW(p)

If I were to implement CEV, I'd start with only biological, non-uploaded, non-duplicated, non-mentally ill, naturally born adult humans, and then let their CEV decide whether to include others.

Replies from: NancyLebovitz, None, Baughn

↑ comment by NancyLebovitz · 2010-06-04T11:54:17.516Z · LW(p) · GW(p)

Is there a biological tech level you're expecting when building an FAI becomes possible?

What do you mean by "naturally born"? Are artificial wombs a problem?

It's conceivable that children have important input for what children need that adults have for the most part forgotten.

Replies from: Blueberry

↑ comment by Blueberry · 2010-06-04T15:49:32.586Z · LW(p) · GW(p)

Is there a biological tech level you're expecting when building an FAI becomes possible?

I don't know. We don't actually need any technology other than Python and vi. ;) But it's possible uploads, cloning, genetic engineering, and so forth will be common then.

What do you mean by "naturally born"? Are artificial wombs a problem?

Yes, just to be safe, we should avoid anyone born through IVF, for instance, or whose birth was created or assisted in a lab, or who experienced any genetic modification. I'm not sure exactly where to draw the line: fertility drugs might be ok. I meant anyone conceived through normal intercourse without any technological intervention. Such people can be added in later if the CEV of the others wants them added.

It's conceivable that children have important input for what children need that adults have for the most part forgotten.

Yes, this is a really good point, but CEV adds in what we would add in if we knew more and remembered more.

Replies from: thomblake, NancyLebovitz

↑ comment by thomblake · 2010-06-04T16:14:33.847Z · LW(p) · GW(p)

What do you mean by "naturally born"? Are artificial wombs a problem?

Yes, just to be safe, we should avoid anyone born through IVF, for instance, or whose birth was created or assisted in a lab, or who experienced any genetic modification. I'm not sure exactly where to draw the line: fertility drugs might be ok. I meant anyone conceived through normal intercourse without any technological intervention

That's terrible. You're letting in people who are mutated in all sorts of ways through stupid, random, 'natural' processes, but not those who have the power of human intelligence overriding the choices of the blind idiot god. If the extropians/transhumanists make any headway with germline genetic engineering, I want those people in charge.

↑ comment by NancyLebovitz · 2010-06-04T19:00:37.421Z · LW(p) · GW(p)

Exclude people who aren't different or problematic in any perceptible way because of your yuck factor?

Minor point, but are turkey basters technology?

Aside from the problem of leaving out what seems to be obviously part of the human range, I think that institutionalizing that distinction for something so crucial would lead to prejudice.

Replies from: Blueberry

↑ comment by Blueberry · 2010-06-04T20:41:34.720Z · LW(p) · GW(p)

I have no particular yuck factor involving IVF. And you're right that it's not obvious where to draw the line with things like turkey basters. To be safe, I'd exclude them.

Keep in mind that this is just for the first round, and the first round group would presumably decide to expand the pool of people. It's not permanently institutionalized. It's just a safety precaution, because the future of humanity is at stake.

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2010-06-05T09:34:17.544Z · LW(p) · GW(p)

What risk are you trying to protect against?

Replies from: Blueberry

↑ comment by Blueberry · 2010-06-05T17:16:04.196Z · LW(p) · GW(p)

Something like the Dr. Evil CEV hack described in the main post. Essentially, we want to block out any way of creating new humans that could be used to override CEV, so it makes sense to start by blocking out all humans created artificially. It might also be a good idea to require the humans to have been born before a certain time, say 2005, so no humans created after 2005 can affect CEV (at least in the first round).

Turkey basters are probably not a threat. However, there's an advantage to being overly conservative here. The very small number of people created or modified through some sort of artificial means for non-CEV-hacking reasons can be added in after subsequent rounds of CEV. But if the first round includes ten trillion hacked humans by mistake, it will be too late to remove them because they'll outvote everyone else.

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2010-06-05T18:27:48.199Z · LW(p) · GW(p)

Requiring that people have been incubated in a human womb seems like enough of a bottleneck, though even that's politically problematic if there are artificial wombs or tech for incubation in non-humans.

However, I'm more concerned that uncaring inhuman forces already have a vote.

Replies from: Blueberry, Blueberry

↑ comment by Blueberry · 2010-06-05T18:40:09.856Z · LW(p) · GW(p)

You may also be interested in this article:

The associations between prevalence and cultural dimensions are consistent with the prediction that T. gondii can influence human culture. Just as individuals infected with T. gondii score themselves higher in the neurotic factor guilt-proneness, nations with high T. gondii prevalence had a higher aggregate neuroticism score. In addition, Western nations with high T. gondii prevalence were higher in the ‘neurotic’ cultural dimensions of masculine sex roles and uncertainty avoidance. These results were predicted by a logical scaling-up from individuals to aggregate personalities to cultural dimensions.

Can the common brain parasite, Toxoplasma gondii, influence human culture?

↑ comment by Blueberry · 2010-06-05T20:25:03.085Z · LW(p) · GW(p)

Requiring that people have been incubated in a human womb seems like enough of a bottleneck

You're probably right. It probably is. But we lose nothing by being more conservative, because the first round of CEV will add in all the turkey baster babies.

↑ comment by [deleted] · 2010-06-06T12:02:55.841Z · LW(p) · GW(p)

What consitutes mental ilness is a horrible can of worms. Even defining the borders of what consitutes brain damage is terribly hard.

↑ comment by Baughn · 2010-06-04T09:01:31.368Z · LW(p) · GW(p)

Ha. Okay, that's a good one.

You might find that deciding who's mentally ill is a little harder, but the other criteria should be reasonably easy to define, and there are no obvious failure conditions. Let me think this over for a bit. :)

↑ comment by Strange7 · 2010-06-04T08:26:52.675Z · LW(p) · GW(p)

Define human.

Replies from: Kutta

↑ comment by Kutta · 2010-06-04T21:52:10.671Z · LW(p) · GW(p)

Featherless biped.

Replies from: None, Strange7, anonym

↑ comment by [deleted] · 2010-06-04T23:36:20.538Z · LW(p) · GW(p)

Ten thousand years later, postkangaroo children learn from their history books about Kutta, the one who has chosen to share the future with his marsupial brothers and sisters :)

↑ comment by Strange7 · 2010-06-07T23:58:09.643Z · LW(p) · GW(p)

If an upload remembers having had legs, and/or is motivated to acquire for itself a body with exactly two legs and no feathers, please explain either how this definition would adequately exclude uploads or why you are opposed to equal rights for very young children (not yet capable of walking upright) and amputees.

↑ comment by anonym · 2010-06-05T03:27:50.379Z · LW(p) · GW(p)

Includes sexbots, and excludes uploaded versions of me.

Replies from: Blueberry

↑ comment by Blueberry · 2010-06-05T08:40:46.373Z · LW(p) · GW(p)

The point is to exclude uploaded versions of you. I'm more concerned about including plucked chickens.

BTW, what is the difference between a sexbot and a catgirl?

Replies from: anonym

↑ comment by anonym · 2010-06-05T19:54:09.609Z · LW(p) · GW(p)

A sexbot is a robot for sex -- still a human under the featherless biped definition as long as has two legs and no feathers.

If the point is to exclude "uploaded versions", what counts as uploaded? How about if I transfer my mind (brain state) to another human body? If that makes me still a human, what rational basis is there for defining a mind-body system as human or not based on the kind of the body it is running in?

↑ comment by dclayh · 2010-06-03T20:51:38.800Z · LW(p) · GW(p)

Moreover, the CEV of one trillion Evil-clones will likely be vastly different from the CEV of one Dr. Evil. For instance, Dr. Evil may have a strong desire to rule all human-like beings, but for each given copy this desire will be canceled out by the desire of the 1 trillion other copies not to be ruled by that copy.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-03T20:56:29.451Z · LW(p) · GW(p)

Dr. Evil may have a strong desire to rule all human-like beings

No matter what he currently thinks, it doesn't follow that it's his extrapolated volition to rule.

Replies from: blogospheroid, CarlShulman

↑ comment by blogospheroid · 2010-06-04T09:32:59.740Z · LW(p) · GW(p)

The hack troubled me and I read part of the CEV document again.

The terms "Were more the people we wished we were" , "Extrapolated as we wish that extrapolated", and "Interpreted as we wish that interpreted" are present in the CEV document explaining extrapolation. These pretty much guarantee that a hack such like what Wei Dai mentioned would be an extremely potent one.

However, the conservatism in the rest of the document, with phrases like below seem to take care of it fairly well.

"It should be easier to counter coherence than to create coherence. " "The narrower the slice of the future that our CEV wants to actively steer humanity into, the more consensus required. " "the initial dynamic for CEV should be conservative about saying "yes", and listen carefully for "no". "

I just hope the actual numbers when entered match that. If they are, then I think the CEV might just come back with to the programmers saying "I see something weird. Kindly explain"

Replies from: novalis

↑ comment by novalis · 2010-06-04T15:41:33.669Z · LW(p) · GW(p)

The narrower the slice of the future that our CEV wants to actively steer humanity into, the more consensus required.

This sounded really good when I read it in the CEV paper. But now I realize that I have no idea what it means. What is the area being measured for "narrowness"?

Replies from: blogospheroid

↑ comment by blogospheroid · 2010-06-05T08:59:51.262Z · LW(p) · GW(p)

My understanding of narrower future is more choices taken away weighted by the number of people they are taken away from, compared to the matrix of choices present at the time of activation of CEV.

Replies from: novalis

↑ comment by novalis · 2010-06-07T02:37:16.971Z · LW(p) · GW(p)

There are many problems with this definition:

(1) it does not know how to weight choices of people not yet alive at time of activation. (2) it does not know how to determine which choices count. For example, is Baskin Robbins to be preferred to Alinea, because Baskin Robbins offers 31 choices while Alinea offers just one (12 courses or 24)? Or Baskin Robbins^^^3 for most vs 4 free years of schooling in a subject of choice for all? Does it improve the future to give everyone additional unpalatable choices, even if few will choose them? I understand that CEV is supposed to be roughly the sum over what people would want, so some of the more absurd meanings would be screened off. But I don't understand how this criterion is specific enough that if I were a Friendly superpower, I could use it to help me make decisions.

↑ comment by CarlShulman · 2010-06-03T21:16:31.304Z · LW(p) · GW(p)

But he should still give sizable credence to that desire persisting.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-03T21:21:35.562Z · LW(p) · GW(p)

Why? And, more importantly, why should he care? It's in his interest to have the FAI follow his extrapolated volition, not his revealed preference, be it in the form of his own belief about his extrapolated volition or not.

Replies from: CarlShulman

↑ comment by CarlShulman · 2010-06-03T21:35:26.610Z · LW(p) · GW(p)

Why?

Because the power of moral philosophy to actually change things like the desire for status is limited, even in very intelligent individuals interested in moral philosophy. The hypothesis that thinking much faster, knowing much more, etc, will radically change that has little empirical support, and no strong non-empirical arguments to produce an extreme credence.

Replies from: Vladimir_Nesov, steven0461

↑ comment by Vladimir_Nesov · 2010-06-03T21:41:19.958Z · LW(p) · GW(p)

When we are speaking about what to do with the world, which is what formal preference (extrapolated volition) is ultimately about, this is different in character (domain of application) from any heuristics that a human person has for what he personally should be doing. Any human consequentialist is a hopeless dogmatic deontologist in comparison with their personal FAI. Even if we take both views as representations of the same formal object, syntactically they have little in common. We are not comparing what a human will do with what advice that human will give to himself if he knew more. Extrapolated volition is a very different kind of wish, a kind of wish that can't be comprehended by a human, and so no heuristics already in mind will resemble heuristics about that wish.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2010-06-04T14:24:22.838Z · LW(p) · GW(p)

so no heuristics already in mind will resemble heuristics about that wish

But you seem to have the heuristic that the extrapolated volition of even the most evil human "won't be that bad". Where does that come from?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-11T12:40:11.725Z · LW(p) · GW(p)

But you seem to have the heuristic that the extrapolated volition of even the most evil human "won't be that bad". Where does that come from?

That's not a heuristic in the sense I use the word in the comment above, it's (rather weakly) descriptive of a goal and not rules for achieving it.

The main argument (and I changed my mind on this recently) is the same as for why another normal human's preference isn't that bad: sympathy. If human preference has a component of sympathy, of caring about other human-like persons' preferences, then there is always a sizable slice of the control of the universe pie going to everyone's preference, even if orders of magnitude smaller than for the preference in control. I don't expect that even the most twisted human can have a whole aspect of preference completely absent, even if manifested to smaller degree than usual.

This apparently changes my position on the danger of value drift, and modifying minds of uploads in particular. Even though we will lose preference to the value drift, we won't lose it completely, so long as people holding the original preference persist.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2010-06-11T13:56:15.495Z · LW(p) · GW(p)

I don't expect that even the most twisted human can have a whole aspect of preference completely absent, even if manifested to smaller degree than usual.

Humans also have other preferences that are in conflict with sympathy, for example the desire to see one's enemies suffer. If sympathy is manifested to a sufficiently small degree, then it won't be enough to override those other preferences.

Are you aware of what has been happening in Congo, for example?

↑ comment by steven0461 · 2010-06-03T23:26:43.804Z · LW(p) · GW(p)

It seems to me there's a pretty strong correlation between philosophical competence and endorsement of utilitarian (vs egoist) values, and also that most who endorse egoist values do so because they're confused about e.g. various issues around personal identity and the difference between pursuing one's self-interest and following one's own goals.

Replies from: mattnewport, timtyler

↑ comment by mattnewport · 2010-06-04T00:20:28.781Z · LW(p) · GW(p)

Can we taboo utilitarian since nobody ever seems to be able to agree what it means? Also, do you have any references to strong arguments for whatever you mean by utilitarianism? I've yet to encounter any good arguments in favour of it but given how many apparently intelligent people seem to consider themselves utilitarians they presumably exist somewhere.

Replies from: RomanDavis

↑ comment by RomanDavis · 2010-06-04T01:30:35.398Z · LW(p) · GW(p)

Utility is just a basic way to describe "happiness" (or, if you prefer, "preferences") in an economic context. Sometimes the measurement of utility is a utilon. To say you are a Utilitarian just means that you'd prefer an outcome that results in the largest total number of utilons over tthe human population. (Or in the universe, if you allow for Babyeaters, Clippies, Utility Monsters, Super Happies , and so on.)

Replies from: mattnewport

↑ comment by mattnewport · 2010-06-04T01:42:05.510Z · LW(p) · GW(p)

Alicorn, who I think is more of an expert on this topic than most, had this to say:

I'm taking an entire course called "Weird Forms of Consequentialism", so please clarify - when you say "utilitarianism", do you speak here of direct, actual-consequence, evaluative, hedonic, maximizing, aggregative, total, universal, equal, agent-neutral consequentialism?

Just the other day I debated with PhilGoetz whether utilitarianism is supposed to imply agent-neutrality or not. I still don't know what most people mean on that issue.

Even assuming agent neutrality there is a major difference between average and total utilitarianism. Then there are questions about whether you weight agents equally or differently based on some criteria. The question of whether/how to weight animals or other non-human entities is a subset of that question.

Given all these questions it tells me very little about what ethical system is being discussed when someone uses the word 'utilitarian'.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-04T02:04:11.225Z · LW(p) · GW(p)

Given all these questions it tells me very little about what ethical system is being discussed when someone uses the word 'utilitarian'.

It does substantially reduce the decision space. For example, it is generally a safe-bet that the individual is not going to subscribe to deontological claims that say "killing humans is always bad." I'd thus be very surprised to ever meet a pacifist utilitarian.

It probably is fair to say that given the space of ethical systems generally discussed on LW, talking about utilitarianism doesn't narrow the field down much from that space.

↑ comment by timtyler · 2010-06-04T00:44:56.126Z · LW(p) · GW(p)

I haven't seen any stats on that issue. Is there any evidence relating to the topic?

Replies from: mattnewport

↑ comment by mattnewport · 2010-06-04T02:02:55.991Z · LW(p) · GW(p)

Depending on how you define 'philosophical competence' the results of the PhilPapers survey may be relevant.

The PhilPapers Survey was a survey of professional philosophers and others on their philosophical views, carried out in November 2009. The Survey was taken by 3226 respondents, including 1803 philosophy faculty members and/or PhDs and 829 philosophy graduate students.

Here are the stats for Philosophy Faculty or PhD, All Respondents

Normative ethics: deontology, consequentialism, or virtue ethics?

Other 558 / 1803 (30.9%)
Accept or lean toward: consequentialism 435 / 1803 (24.1%)
Accept or lean toward: virtue ethics 406 / 1803 (22.5%)
Accept or lean toward: deontology 404 / 1803 (22.4%)

And for Philosophy Faculty or PhD, Area of Specialty Normative Ethics

Normative ethics: deontology, consequentialism, or virtue ethics?

Other 80 / 274 (29.1%)
Accept or lean toward: deontology 78 / 274 (28.4%)
Accept or lean toward: consequentialism 66 / 274 (24%)
Accept or lean toward: virtue ethics 50 / 274 (18.2%)

As utilitarianism is a subset of consequentialism it appears you could conclude that utilitarians are a minority in this sample.

Replies from: timtyler

↑ comment by timtyler · 2010-06-04T02:11:12.490Z · LW(p) · GW(p)

Thanks! For perspective:

* 2.1 Utilitarianism
* 2.2 Ethical egoism and altruism
* 2.3 Rule consequentialism
* 2.4 Motive consequentialism
* 2.5 Negative consequentialism
* 2.6 Teleological ethics

http://en.wikipedia.org/wiki/Consequentialism#Varieties_of_consequentialism

Replies from: mattnewport

↑ comment by mattnewport · 2010-06-04T02:31:09.656Z · LW(p) · GW(p)

Unfortunately the survey doesn't directly address the main distinction in the original post since utilitarianism and egoism are both forms of consequentialism.

comment by PaulAlmond · 2010-08-19T13:58:44.580Z · LW(p) · GW(p)

I think this can be dealt with in terms of measure. In a series of articles, "Minds, Measure, Substrate and Value" I have been arguing that copies cannot be considered equally, without regard to substrate: We need to take account of measure for a mind, and the way in which the mind is implemented will affect its measure. (Incidentally, some of you argued against the series: After a long delay [years!], I will be releasing Part 4, in a while, which will deal with a lot of these objections.)

Without trying to present the full argument here, the minimum size of the algorithm that can "find" a mind by examining some physical system will determine the measure of that mind - because it will give an indication of how many other algorithms will exist that can find a mind. I think an AI would come to this view to: It would have to use some concept of measure to get coherent results: Otherwise it would be finding high measure, compressed human minds woven into Microsoft Windows (they would just need a LOT of compressing...). Compressing your mind will increase the size of the algorithm needed to find it and will reduce your measure, just as running your mind on various kinds of physical substrate would do this. Ultimately, it comes down to this:

"Compressing your mind will have an existential cost, such existential cost depending on the degree of compression."

(Now, I just know that is going to get argued with, and the justification for it would be long. Seriously, I didn't just make it up off the top of my head.)

When Dr Evil carries out his plan, each of the trillion minds can only be found by a decompression program and there must be at least a sufficient number of bits to distinguish one copy from another. Even ignoring the "overhead" for the mechanics of the decompression algorithm itself, the bits needed to distinguish one copy from another will have an existential cost for each copy - reducing its measure. An AI doing CEV which has a consistent approach will take this into account and regard each copy as not having as great a vote.

Another scenario, which might give some focus to all this:

What if Dr Evil decides to make one trillion identical copies and run them separately? People would disagree on whether the copies would count: I say they would and think that can be justified. However, he can now compress them and just have the one copy which "implies" the trillion. Again, issues of measure would mean that Dr Evil's plan would have problems. You could add random bits to the finding algorithm to "find" each mind, but then you are just decreasing the measure: After all, you can do that with anyone's brain.

That's compression out of the way.

Another issue is that these copies will only be almost similar, and hence capable of being compressed, as long as they aren't run for any appreciable length of time (unless you have some kind of constraint mechanism to keep them almost similar - which might be imagined - but then the AI might take that into account and not regard them as "properly formed" humans). As soon as you start running them, they will start to diverge, and compression will start to become less viable. Is the AI supposed to ignore this and look at "potential future existence for each copy"? I know someone could say that we just run them very slowly, so that while you and I have years of experience, each copy has one second of experience, so that during this time the storage requirements increase a bit, but not much. Does that second of experience get the same value in CEV? I don't pretend to answer these last questions, but the issues are there.

Replies from: PhilGoetz, jimrandomh

↑ comment by PhilGoetz · 2011-02-06T17:04:18.591Z · LW(p) · GW(p)

That was my first reaction, but if you rely on information-theoretic measures of difference, then insane people will be weighted very heavily, while homogenous cultures will be weighted little. The basic precepts of Judaism, Christianity, and Islam might each count as one person.

↑ comment by jimrandomh · 2010-08-20T13:19:50.169Z · LW(p) · GW(p)

Does this imply that someone could gain measure, by finding a simpler entity with volition similar to theirs and self-modifying into it or otherwise instantiating it? If so, wouldn't that encourage people to gamble with their sanity, since verifying similarity of volition is hard, and gets harder the greater the degree of simplification?

Replies from: PaulAlmond

↑ comment by PaulAlmond · 2010-08-21T00:30:55.493Z · LW(p) · GW(p)

I think I know what you are asking here, but I want to be sure. Could you elaborate, maybe with an example?

comment by JGWeissman · 2010-06-04T17:12:05.238Z · LW(p) · GW(p)

What if we used a two-tiered CEV? A CEV applied to a small, hand selected group of moral philosophers could be used to determine weighting rules and ad hoc exceptions to the CEV that runs on all of humanity to determine the utility function of the FAI.

Then when the CEV encounters the trillion Dr Evil uploads, it will consult what the group of moral philosophers would have wanted to handle it if "they knew more, thought faster, were more the people we wished we were, had grown up farther together", which would be be something like weight them together as one person.

Replies from: kodos96, MugaSofer

↑ comment by kodos96 · 2010-06-04T17:54:59.418Z · LW(p) · GW(p)

What if we used a two-tiered CEV? A CEV applied to a small, hand selected group of moral philosophers

And who would select the initial group? Oh, I know! We can make it a 3-tiered system, and have CEV applied to an even smaller group choose the group of moral philosophers!

Wait... my spidey-sense is tingling... I think it's trying to tell me that maybe there's a problem with this plan

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-06-04T18:20:56.545Z · LW(p) · GW(p)

And who would select the initial group?

SIAI, whose researchers came up with the idea of CEV because they have the goal representing all of humanity with the FAI they want to create.

Ultimately, you have to trust someone to make these decisions. A small, select group will be designing and implementing the FAI anyways.

Replies from: kodos96

↑ comment by kodos96 · 2010-06-04T18:41:43.696Z · LW(p) · GW(p)

A small, select group will be designing and implementing the FAI anyways.

Yes, but a small, select group determining its moral values is a different thing entirely, and seems to defeat the whole purpose of CEV. At that point you might as well just have the small group of moral philosophers explicitly write out a "10 Commandments" style moral code and abandon CEV altogether.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-06-04T19:07:02.859Z · LW(p) · GW(p)

Yes, but a small, select group determining its moral values is a different thing entirely

That is not what I am proposing. You are attacking a straw man. The CEV of the small group of moral philosophers does not determine the utility function directly. It only determines the rules used to run the larger CEV on all of humanity, based on what the moral philosophers consider a fair way of combining utility functions, not what they want the answer to be.

Replies from: kodos96

↑ comment by kodos96 · 2010-06-04T20:09:55.692Z · LW(p) · GW(p)

The CEV of the small group of moral philosophers does not determine the utility function directly

That may not be the intention, but if they're empowered to create ad hoc exceptions to CEV, that could end up being the effect.

Basically, my problem is that you're proposing to fix to a (possible) problem with CEV, by using CEV. If there really is a problem here with CEV (and I'm not convinced there is), then that problem should be fixed - just running a meta-CEV doesn't solve it. All you're really doing is substituting one problem for another: the problem of who would be the "right" people to choose for the initial bootstrap. Thats a Really Hard Problem, and if we knew how to solve it, then we wouldn't really even need CEV in the first place - we could just let the Right People choose the FAI's utility function directly.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-06-04T21:35:04.744Z · LW(p) · GW(p)

That may not be the intention, but if they're empowered to create ad hoc exceptions to CEV, that could end up being the effect.

You seem to be imagining the subjects of the CEV acting as agents within some negotiating process, making decisions to steer the result to their prefered outcome. Consider instead that the CEV is able to ask the subjects questions, which could be about the fairness (not the impact on the final result) of treating a subject of the larger CEV in a certain way, and get honest answers. If your thinking process has a form like "This would be best for me, but that wouldn't really be fair to this other person", the CEV can focus in on the "but that wouldn't really be fair to this other person". Even better, it can ask the question "Is it fair to that other person", and figure out what your honest answer would be.

Basically, my problem is that you're proposing to fix to a (possible) problem with CEV, by using CEV.

No, I am trying to solve a problem with CEV applied to an unknown set of subjects with a CEV applied to a known set of subjects.

All you're really doing is substituting one problem for another: the problem of who would be the "right" people to choose for the initial bootstrap. Thats a Really Hard Problem, and if we knew how to solve it, then we wouldn't really even need CEV in the first place - we could just let the Right People choose the FAI's utility function directly.

The problem of selecting a small group of subjects for the first CEV is orders of magnitude easier than specifying a Friendly utility function. These subjects do not have to write out the utilty function, or even directly care about all things that humanity as a whole cares about. They just have to care about the problem of fairly weighting everyone in the final CEV.

Replies from: torekp

↑ comment by torekp · 2010-06-05T22:01:14.822Z · LW(p) · GW(p)

If your thinking process has a form like "This would be best for me, but that wouldn't really be fair to this other person", the CEV can focus in on the "but that wouldn't really be fair to this other person". Even better, it can ask the question "Is it fair to that other person", and figure out what your honest answer would be.

I think this is an even better point than you make it out to be. It obviates the need to consult the small group of subjects in the first place. It can be asked of everyone. When this question is asked of the Dr. Evil clones, the honest answer would be "I don't give a care what's fair," and the rules for the larger CEV will then be selected without any "votes" from Evil clones.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-05T22:06:55.821Z · LW(p) · GW(p)

torekp!CEV: "Is it fair to other people that Dr. Evil becomes the supreme ruler of the universe?"
Dr. Evil clone #574,837,904,521: "Yes, it is. As an actually evil person, I honestly believe it."

Replies from: D_Alex

↑ comment by D_Alex · 2010-06-06T05:02:08.209Z · LW(p) · GW(p)

And right there is the reason why the plan would not work...!

The wishes of the evil clones would not converge on any particular Dr. Evil. You'd get a trillion separate little volitions, which would be outweighed by the COHERENT volition of the remaining 1%.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-06T05:24:03.987Z · LW(p) · GW(p)

That might be true if Dr. Evil's goal is to rule the world. But if Dr. Evil's goals are either a) for the world to be ruled by a Dr. Evil or b) to destroy the world, then this is still a problem. Both of those seem like much less likely failure modes more out of something from a comic book or the like (the fact that we are calling this fellow Dr. Evil doesn't help matters) but it does suggest that there are serious general failures of the CEV protocol.

Replies from: stcredzero

↑ comment by stcredzero · 2010-06-07T00:08:18.283Z · LW(p) · GW(p)

Both of those seem like much less likely failure modes more out of something from a comic book or the like (the fact that we are calling this fellow Dr. Evil doesn't help matters)

It could be worse: The reason why there are only two Sith, a master and apprentice, is because The Force can be used to visualize the CEV of a particular group, and The Sith have mastered this and determined that 2 is the largest reliably stable population.

↑ comment by MugaSofer · 2013-01-24T11:08:03.488Z · LW(p) · GW(p)

An excellent idea! Of course, the CEV of a small group would probably be less precise, but I expect it's good enough for determining the actual CEV procedure.

What if it turns out we're ultimately preference utilitarians?

comment by Tyrrell_McAllister · 2010-06-03T21:04:46.265Z · LW(p) · GW(p)

Insofar as the Evil clones are distinct individuals, they seem to be almost entirely potentially distinct. They will need to receive more computing resources before they can really diverge into distinct agents.

I would expect CEV to give the clones votes only to the extent that CEV gives votes to potential individuals. But the number of potential clones of normal humans is even greater than Evil's trillion, even accounting for their slightly greater actuality. So, I think that they would still be outvoted.

Replies from: JoshuaZ, Baughn

↑ comment by JoshuaZ · 2010-06-03T21:43:54.867Z · LW(p) · GW(p)

Does a pair of identical twins raised in the same environment get marginally less weight for the CEV than two unrelated individuals raised apart? If not, how do you draw the line for what degree of distinction matters?

↑ comment by Baughn · 2010-06-03T21:32:50.059Z · LW(p) · GW(p)

This depends on how the AI counts humans.

I agree that it should discount similar individuals in this way, but I am not entirely sure on the exact algorithm. What should happen if, for example, there are two almost identical individuals on opposite sides of the world - by accident?

comment by JoshuaZ · 2010-06-03T20:36:36.224Z · LW(p) · GW(p)

That's an interesting point, but I'm having trouble seeing it as worthy of a top-level post. Maybe if you had a solution proposed also.

Replies from: Wei_Dai, Alexandros, Kevin

↑ comment by Wei Dai (Wei_Dai) · 2010-06-03T21:43:44.835Z · LW(p) · GW(p)

I can see why you might feel that way, if this was just a technical flaw in CEV that can be fixed with a simple patch. But I've been having a growing suspicion that the main philosophical underpinning of CEV, namely preference utilitarianism, is seriously wrong, and this story was meant to offer more evidence in that vein.

Replies from: Vladimir_Nesov, SilasBarta, CarlShulman, RomanDavis, wuwei, timtyler, Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-03T21:55:03.609Z · LW(p) · GW(p)

Why should anyone choose aggregation of preference over a personal FAI, other than under explicit pressure? Whatever obligations you feel (as part of your preference, as opposed to as part of an imaginary game where you play fair), will be payed in full according to your personal preference. This explicit pressure to include other folks in the mix can only be exerted by those present, and presumably "in the know", so there is no need to include the dead or potential future folk. Whatever sympathy you have for them, you'll have ability to express through the personal FAI. The virtue of laziness in FAI design again (this time, moral laziness).

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2010-06-04T14:23:56.217Z · LW(p) · GW(p)

Why should anyone choose aggregation of preference over a personal FAI, other than under explicit pressure?

But that doesn't explain why Eliezer is vehemently against any unequal weighting of volitions in CEV, such as the "geopolitical-power-weighted set of volitions" that Roko suggested might be necessary if major political powers got involved.

As far as I can tell, Eliezer's actual motivations for wanting to build CEV of humanity instead of a personal FAI are:

It's the fair thing to do (in some non-game-theoretic sense of fairness).
The CEV of humanity has a better chance to leading to a good outcome than his personal extrapolated volition. (See this comment.)

Personally I don't think these reasons are particularly good, and my current position is close to yours and Roko's. But the fact that Eliezer has stuck to his beliefs on this topic makes me wonder if we're missing something.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2010-06-04T16:15:51.295Z · LW(p) · GW(p)

A given person's preference is one thing, but their mind is another. If we do have a personal preference on one hand, and a collection of many people's preferences on the other, the choice is simple. But the people included in the preference extraction procedure are not the same thing as their preferences. We use a collection of people, not a collection of preferences.

It's not obvious to me that my personal preference is best described by my own brain and not an extrapolation from as many people's brains as possible. Maybe I want to calculate, but I'm personally a flawed calculator, as are all the others, each in its own way. By examining as many calculators as possible, I could glimpse a better picture of how correct calculation is done, than I could ever find by only examining myself.

I value what is good not because humans value what is good, and I value whatever I in particular value (as opposed to what other people value) not because it is I who values that. If looking at other people's minds helps me to figure out what should be valued, then I should do that.

That's one argument for extrapolating collective volition; however, it's a simple argument, and I expect that whatever can be found from my mind alone should be enough to reliably present arguments such as this, and thus to decide to go through the investigation of other people if that's necessary to improve understanding of what I value. Whatever moral flaws specific to my mind exist, shouldn't be severe enough to destroy this argument, if it's true, but the argument could also be false. If it's false, then I lose by defaulting to the collective option, but if it's true, delegating it to FAI seems like a workable plan.

At the same time, there are likely practical difficulties in getting my mind in particular as the preference source to FAI. If I can't get my preference in particular, then as close to the common ground for humanity as I can get (a decision to which as many people as possible agree as much as possible) is better for me (by its construction: if it's better for most of the humanity, it's also better for me in particular).

↑ comment by SilasBarta · 2010-06-03T21:55:56.755Z · LW(p) · GW(p)

If that was your point, I wish you had gone into more detail about that in a top-level article.

↑ comment by CarlShulman · 2010-06-03T22:00:56.396Z · LW(p) · GW(p)

A similar problem shows up in hedonic utilitarianism, or indeed in any case where your algorithm for determining what to do requires 'counting people.'

↑ comment by RomanDavis · 2010-06-04T03:03:19.139Z · LW(p) · GW(p)

Time inconsistency doesn't bother me at all. It's not my fault if you're dead.

↑ comment by wuwei · 2010-06-03T23:18:33.717Z · LW(p) · GW(p)

CEV is not preference utilitarianism, or any other first-order ethical theory. Rather, preference utilitarianism is the sort of thing that might be CEV's output.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2010-06-04T10:17:10.157Z · LW(p) · GW(p)

Obviously CEV isn't identical to preference utilitarianism, but CEV and preference utilitarianism have the following principles in common, which the hack exploits:

Give people what they want, instead of what you think is good for them.
If different people want different things, give each individual equal weight.

It seems clear that Eliezer got these ideas from preference utilitarianism, and they share some of the same flaws as a result.

↑ comment by timtyler · 2010-06-03T23:48:23.548Z · LW(p) · GW(p)

Whether preference utilitarianism is "right" or "wrong" would appear to depend on whether you are a preference utilitarian - or not.

↑ comment by Vladimir_Nesov · 2010-06-03T21:50:12.344Z · LW(p) · GW(p)

Well, you always have game-theoretic mix option, nothing "seriously wrong" with that (and so with preference aggregation more broadly construed than CEV in particular), although it's necessarily a worse outcome than a personal FAI.

↑ comment by Alexandros · 2010-06-03T21:36:01.988Z · LW(p) · GW(p)

To be honest, I loke that it's straight and to the point without being stretched out. And the point itself is quite powerful.

↑ comment by Kevin · 2010-06-03T20:47:43.855Z · LW(p) · GW(p)

I missed the part where Less Wrong had a definition of the required worthiness of a top-level post.

Replies from: Tyrrell_McAllister, RobinZ, JoshuaZ

↑ comment by Tyrrell_McAllister · 2010-06-03T20:51:23.401Z · LW(p) · GW(p)

I missed the part where Less Wrong had a definition of the required worthiness of a top-level post.

The definition is still being established, largely through comments such as JoshuaZ's

↑ comment by RobinZ · 2010-06-03T20:51:11.656Z · LW(p) · GW(p)

From the About page:

Less Wrong is a partially moderated community blog that allows general authors to contribute posts as well as comments. Users vote posts and comments up and down (with code based on Reddit's open source). "Promoted" posts (appearing on the front page) are chosen by the editors on the basis of substantive new content, clear argument, good writing, popularity, and importance.

We suggest submitting links with a short description. Recommended books should have longer descriptions. Links will not be promoted unless they are truly excellent - the "promoted" posts are intended as a filtered stream for the casual/busy reader.

As far as I can tell, most people vote along the lines of the promotion criteria.

↑ comment by JoshuaZ · 2010-06-03T21:28:16.692Z · LW(p) · GW(p)

There's isn't a formal definition as of yet, but ideally I'd like to see top-level posts satisfy the following criteria:

1) Too long or involved to be included in an open thread. 2) Of general interest to the LW community. 3) Contribute substantial new and interesting points. 4) Likely to generate wide-ranging discussion.

I have trouble seeing this post as meeting 1 or 4.

Replies from: Kevin

↑ comment by Kevin · 2010-06-03T23:01:46.041Z · LW(p) · GW(p)

People also complained about the "AI in a box boxes you post", which was a great post nearly identical in structure to this one. Few people read the open thread; good posts should not default to the open thread. Why is your criteria for top-level posts so arbitrarily difficult? We are not facing a problem of an influx of low quality content and the moderation+promotion system works well.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2010-06-04T01:31:51.878Z · LW(p) · GW(p)

My criteria for top-level posts is not "so arbitrarily difficult." Frankly, I'm not completely sure that that the AI boxing you post should have been a top-level post either. However, given that that post did not focus on any specific AI solution but a more general set of issues, whereas this one focuses on CEV, there may be a distinction between them. That said, I agree that as of right now, the moderation/promotion system is working well. But I suspect that that is partially due to people implicitly applying criteria like the ones I listed in their moderation decisions.

Incidentally, I'm curious what evidence you have that the open threads are not as read as top-level posts. In particular, I'm not sure this applies to non-promoted top-level posts. I suspect that it is true, and indeed, if it isn't then my own logic for wanting criterion 2 becomes substantially weaker. Now that you've made our shared premise explicit I have to wonder what evidence we have for the claim.

comment by Wei Dai (Wei_Dai) · 2010-07-04T23:24:49.613Z · LW(p) · GW(p)

I just noticed that my old alter ego came up with a very similar "hack" two years ago:

Why doesn't Zaire just divide himself in half, let each half get 1/4 of the pie, then merge back together and be in possession of half of the pie?

comment by novalis · 2010-06-04T16:19:02.192Z · LW(p) · GW(p)

I think it might be possible to patch around this by weighting people by their projected future cycle count. Otherwise, I fear that you may end up with a Repugnant Conclusion even without an adversary -- a very large number of happy emulated people running very slowly would outweigh a smaller number of equally happy people running at human-brain-speed. Of course, this still gives an advantage to the views of those who can afford more computing power, but it's a smaller advantage. And perhaps our CEV would be to at least somewhat equalize the available computing power per person.

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-01-24T11:10:48.041Z · LW(p) · GW(p)

Otherwise, I fear that you may end up with a Repugnant Conclusion even without an adversary -- a very large number of happy emulated people running very slowly would outweigh a smaller number of equally happy people running at human-brain-speed.

Of course, the slowest possible clock speed is ... none, or one tick per lifetime of the universe or something, so we'd all end up as frozen snapshots.

comment by [deleted] · 2013-01-25T00:30:40.968Z · LW(p) · GW(p)

Doesn't the "coherent" aspect of "coherent extrapolated volition" imply, generally speaking, that it's not democracy-of-values, so to speak? That is to say, CEV of humanity is supposed to output something that follows on from the extrapolated values of both the guy on the street corner holding a "Death to all fags" sign who's been arrested twice for assaulting gay men outside bars, and the queer fellow walking past him -- if prior to the implementation of a CEV-using FAI the former should successfully mobilize raise the visibility of his standpoint so much that the world becomes very polarized with many openly-homophobic people and a small number of people who aren't, FAI won't then execute all the queers because the majority wanted it. AFAIK that's pretty fundamental to the definition of the term...

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-01-25T11:22:56.793Z · LW(p) · GW(p)

Presumably the sign guy based his hatred on a mistaken belief (e.g. "God is always right and he told me gays are Evil.") Dr Evil was implied, I think, to have different terminal values; if he didn't then CEV would be fine with him, and it would also ruin the appropriateness his name.

Replies from: None

↑ comment by [deleted] · 2013-01-25T15:58:13.074Z · LW(p) · GW(p)

That covers "Extrapolated", not "coherent", though. If Dr Evil really has supervillain terminal values, that still doesn't cohere with the many humans who don't.

Replies from: Elithrion, MugaSofer

↑ comment by Elithrion · 2013-01-30T23:17:31.915Z · LW(p) · GW(p)

I think you would find that there is more coherence among the 99% of humans who are Dr. Evil than among the 1% of humans who are not.

↑ comment by MugaSofer · 2013-01-28T11:53:52.991Z · LW(p) · GW(p)

Well, psychopaths don't share our moral terminal values, and I would still expect them to get shouted down. Dr Evil's clones outnumber us. I guess it comes down to how small a minority still holds human values, doesn't it?

Replies from: TheOtherDave, Andreas_Giger

↑ comment by TheOtherDave · 2013-01-28T14:14:12.283Z · LW(p) · GW(p)

psychopaths don't share our moral terminal values

You know, I keep hearing this said on LW as though it were a foregone conclusion. Is there an argument you can point me to that makes the case for believing this?

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-02-19T14:36:33.602Z · LW(p) · GW(p)

My (faulty) memory claims it's from an interview with a psychopath I saw on TV (who had been working on a project to help identify violent psychopaths, and had been unaware of his condition until he tested himself as a control.) He described being aware of what other people considered "right" or "moral", but no particular motivation towards it. His example was buying icecream instead of going to his grandmother's funeral, as I recall.

However, I also recall feeling confirmation, not surprise, on watching this interview, so I probably have source amnesia on this one. Still, data point.

It's worth noting that your classic "serial killer" almost certainly has other issues in any case.

Replies from: TheOtherDave, ArisKatsaris

↑ comment by TheOtherDave · 2013-02-19T23:31:27.451Z · LW(p) · GW(p)

Hm.
I infer you aren't asserting that going to one's grandmother's funeral rather than buying icecream is a moral terminal value for non-psychopaths, but rather that there's some moral terminal value implicit in that example which the psychopath in question demonstrably doesn't share but the rest of us do.
Is that right?
If so, can you say more about how you arrive at that conclusion?

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-02-20T10:53:15.089Z · LW(p) · GW(p)

Well, it was his example. The idea is that they can model our terminal values (as well as anybody else can) but they aren't moved by them. Just like I can imagine a paperclipper that would cheerfully render down humans for the iron in our blood, but I'm not especially inclined to emulate it.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2013-02-20T14:13:23.454Z · LW(p) · GW(p)

I still don't see how you get from observing someone describing not being moved by the same surface-level social obligations as their peers (e.g., attending grandma's funeral) to the conclusion that that person doesn't share the same moral terminal values as their peers, but leaving that aside, I agree that someone doesn't need to be moved by a value in order to model it.

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-02-21T17:28:50.660Z · LW(p) · GW(p)

Oh, it was only an example; he described his experience in much more detail. I guess he didn't want to use a more, well, disturbing example; he had been studying violent psychopaths, after all. (He also claimed his murderous predispositions had probably been curbed by a superlative home life.)

↑ comment by ArisKatsaris · 2013-02-19T23:44:36.545Z · LW(p) · GW(p)

Wait a sec, there are three different claims that have seemed to gotten confused in this thread.

Psychopaths don't have a moral sense as we'd recognize it.
Psychopaths have a different moral sense than most people.
Psychopaths have pretty much the same moral sense as us, but it doesn't drive them nearly as much as most people.

The difference in the above is between the absense of a moral push, a moral push in a different direction, or a push in the same direction but feebler than is felt by most people.

And I think distinguishing between the three is probably significant in discussions about CEV...

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-02-20T09:52:43.771Z · LW(p) · GW(p)

I think that knowing what people mean by "right" and actually having it as a terminal value are different things, but I'm not sure if 3 means regular garden-variety akrasia or simply terminal values with different weighing to our own.

↑ comment by Andreas_Giger · 2013-01-28T13:48:04.693Z · LW(p) · GW(p)

If Dr Evil's clones outnumber us, clearly they're the ones who hold human values and we're the psychopaths.

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-02-19T14:33:16.938Z · LW(p) · GW(p)

In which case it would be nice to find a way to make sure our values are treated as "human" and his as "psychopath", wouldn't it?

comment by wedrifid · 2010-08-11T12:40:01.916Z · LW(p) · GW(p)

Good post. It seems I missed it, I probably wasn't around that time in June. That is one extreme example of a whole range of problems with blindly implementing a Coherent Extrapolated Volition of that kind.

comment by Epiphany · 2012-08-31T06:37:24.305Z · LW(p) · GW(p)

And then Dr. Evil, forced to compete with 999,999,999,999 copies of himself that all want to rule, is back to square one. Would you see multiplying your competition by 999,999,999,999 as a solution to how to rule the universe? If you were as selfish as Dr. Evil, and intelligent enough to try attempting to take over the universe, wouldn't it occur to you that the copies are all going to want to be the one in charge? Perhaps it won't, but if it did, would you try multiplying your competition, then? If not, then maybe part of the solution to this is making it common knowledge that multiplying your competition by nearly a trillion isn't going to gain you any power.

That wouldn't keep trolls and idiots from trying it, though, and even though they'd be divided against themselves once they were there, which would probably make it impossible for any one of them to rule, it doesn't mean they wouldn't do heinous things like vote their way into some sort of preferred class. I just don't think a trillion copies of a selfish person would result in the original person ruling is all.

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-01-24T10:57:23.308Z · LW(p) · GW(p)

Assume the least convenient possible world; the copies are united against the noncopies.

Replies from: wedrifid

↑ comment by wedrifid · 2013-01-25T03:52:44.230Z · LW(p) · GW(p)

Assume the least convenient possible world; the copies are united against the noncopies.

That is the most convenient possible world from the perspective of making the grandparent's point. Was that your intention?

The least convenient possible world would have all the copies united in favor of the original Dr. Evil.

The world that I infer is imagined in the original post (also fairly inconvenient to the grandparent) is one in which Dr. Evil wants the universe to be shaped according to his will rather than wanting to be the one who shapes the universe according to his will. This is a world in which having more copies of himself ensures that he does get his way more and so he has an incentive to do it.

Replies from: MugaSofer

↑ comment by MugaSofer · 2013-01-25T09:03:26.785Z · LW(p) · GW(p)

That is the most convenient possible world from the perspective of making the grandparent's point. Was that your intention?

Well ... yeah. The least convenient world for avoiding the point.

Hey, I didn't name it. shrugs

The least convenient possible world would have all the copies united in favor of the original Dr. Evil.

That is less convenient.

The world that I infer is imagined in the original post (also fairly inconvenient to the grandparent) is one in which Dr. Evil wants the universe to be shaped according to his will rather than wanting to be the one who shapes the universe according to his will. This is a world in which having more copies of himself ensures that he does get his way more and so he has an incentive to do it.

Yup, that's what I meant by "the copies are united".

comment by Jonii · 2010-06-03T21:43:30.253Z · LW(p) · GW(p)

But of course an AI realizes that satisfying the will of trillion copies of Dr. Evil wasn't what his/her programmers intented.

Pun being, this legendary bad argument is surprisingly strong here. I know, I shouldn't be explaining my jokes.

Replies from: Unknowns

↑ comment by Unknowns · 2010-06-04T01:55:08.507Z · LW(p) · GW(p)

Of course, the AI realizes that it's programmers did not want it doing what the programmers intended, but what the CEV intended instead, so this response fails completely.

Hacking the CEV for Fun and Profit

Contents

207 comments