Holden's Objection 1: Friendliness is dangerous

philgoetz

Holden's Objection 1: Friendliness is dangerous

post by PhilGoetz · 2012-05-18T00:48:14.898Z · LW · GW · Legacy · 431 comments

  The concept of "human values" cannot be defined in the way that FAI presupposes
  Even if human values existed, it would be pointless to preserve them
  Enforcing human values would be harmful
None
431 comments

Nick_Beckstead asked me to link to posts I referred to in this comment. I should put up or shut up, so here's an attempt to give an organized overview of them.

Since I wrote these, LukeProg has begun tackling some related issues. He has accomplished the seemingly-impossible task of writing many long, substantive posts none of which I recall disagreeing with. And I have, irrationally, not read most of his posts. So he may have dealt with more of these same issues.

I think that I only raised Holden's "objection 2" in comments, which I couldn't easily dig up; and in a critique of a book chapter, which I emailed to LukeProg and did not post to LessWrong. So I'm only going to talk about "Objection 1: It seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous." I've arranged my previous posts and comments on this point into categories. (Much of what I've said on the topic has been in comments on LessWrong and Overcoming Bias, and in email lists including SL4, and isn't here.)

The concept of "human values" cannot be defined in the way that FAI presupposes

Human errors, human values: Suppose all humans shared an identical set of values, preferences, and biases. We cannot retain human values without retaining human errors, because there is no principled distinction between them.

A comment on this post: There are at least three distinct levels of human values: The values an evolutionary agent holds that maximize their reproductive fitness, the values a society holds that maximizes its fitness, and the values a rational optimizer holds who has chosen to maximize social utility. They often conflict. Which of them are the real human values?

Values vs. parameters: Eliezer has suggested using human values, but without time discounting (= changing the time-discounting parameter). CEV presupposes that we can abstract human values and apply them in a different situation that has different parameters. But the parameters are values. There is no distinction between parameters and values.

A comment on "Incremental progress and the valley": The "values" that our brains try to maximize in the short run are designed to maximize different values for our bodies in the long run. Which are human values: The motivations we feel, or the effects they have in the long term? LukeProg's post Do Humans Want Things? makes a related point.

Group selection update: The reason I harp on group selection, besides my outrage at the way it's been treated for the past 50 years, is that group selection implies that some human values evolved at the group level, not at the level of the individual. This means that increasing the rationality of individuals may enable people to act more effectively in their own interests, rather than in the group's interest, and thus diminish the degree to which humans embody human values. Identifying the values embodied in individual humans - supposing we could do so - would still not arrive at human values. Transferring human values to a post-human world, which might contain groups at many different levels of a hierarchy, would be problematic.

I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can. This is an idea that comes from mathematics, symbolic logic, and classical AI. A symbolic approach would probably make proving safety easier. But human brains don't work that way. You can and do change your values over time, because you don't really have terminal values.

Strictly speaking, it is impossible for an agent whose goals are all indexical goals describing states involving itself to have preferences about a situation in which it does not exist. Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes. If they involve ourselves, then utility functions based on these goals cannot even be computed once we die. If they involve our genes, they they are goals that our bodies are pursuing, that we call errors, not goals, when we the conscious agent inside our bodies evaluate them. In either case, there is no logical reason for us to wish to maximize some utility function based on these after our own deaths. Any action I wish to take regarding the distant future necessarily presupposes that the entire SIAI approach to goals is wrong.

My view, under which it does make sense for me to say I have preferences about the distant future, is that my mind has learned "values" that are not symbols, but analog numbers distributed among neurons. As described in "Only humans can have human values", these values do not exist in a hierarchy with some at the bottom and some on the top, but in a recurrent network which does not have a top or a bottom, because the different parts of the network developed simultaneously. These values therefore can't be categorized into instrumental or terminal. They can include very abstract values that don't need to refer specifically to me, because other values elsewhere in the network do refer to me, and this will ensure that actions I finally execute incorporating those values are also influenced by my other values that do talk about me.

Even if human values existed, it would be pointless to preserve them

Only humans can have human values:

The only preferences that can be unambiguously determined are the preferences a person (mind+body) implements, which are not always the preferences expressed by their beliefs.
If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values, since it will behave differently.
Values exist in a network of other values. A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
Supposing that values are referential helps only by telling you to ignore human values.
You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
The future will thus be ethically contentious even if we accurately characterize and agree on present human values, because these values will fail to address the new important problems.

Human values differ as much as values can differ: There are two fundamentally different categories of values:

Non-positional, mutually-satisfiable values (physical luxury, for instance)
Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen

All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict. If you find an alien life form from a distant galaxy with non-positional values, it would be easier to integrate those values into a human culture with only human non-positional values, than to integrate already-existing positional human values into that culture.

It appears that some humans have mainly the one type, while other humans have mainly the other type. So talking about trying to preserve human values is pointless - the values held by different humans have already passed the most-important point of divergence.

Enforcing human values would be harmful

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

Re-reading this, I see that the critical paragraph is painfully obscure, as if written by Kant; but it summarizes the argument: "Once the initial symbol set has been chosen, the semantics must be set in stone for the judging function to be "safe" for preserving value; this means that any new symbols must be defined completely in terms of already-existing symbols. Because fine-grained sensory information has been lost, new developments in consciousness might not be detectable in the symbolic representation after the abstraction process. If they are detectable via statistical correlations between existing concepts, they will be difficult to reify parsimoniously as a composite of existing symbols. Not using a theory of phenomenology means that no effort is being made to look for such new developments, making their detection and reification even more unlikely. And an evaluation based on already-developed values and qualia means that even if they could be found, new ones would not improve the score. Competition for high scores on the existing function, plus lack of selection for components orthogonal to that function, will ensure that no such new developments last."

Averaging value systems is worse than choosing one: This describes a neural-network that encodes preferences, and takes some input pattern and computes a new pattern that optimizes these preferences. Such a system is taken as analogous for a value system and an ethical system to attain those values. I then define a measure for the internal conflict produced by a set of values, and show that a system built by averaging together the parameters from many different systems will have higher internal conflict than any of the systems that were averaged together to produce it. The point is that the CEV plan of "averaging together" human values will result in a set of values that is worse (more self-contradictory) than any of the value systems it was derived from.

A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them.

431 comments

Comments sorted by top scores.

comment by RolfAndreassen · 2012-05-18T03:58:18.615Z · LW(p) · GW(p)

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

Better by which set of, ahem, values? And anyway, if evolution of values is a value, then maximising overall value will by construction take that into account.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-05-18T04:16:06.454Z · LW(p) · GW(p)

Yes, I object less to CEV if you go one or two levels meta. But if evolution of values is your core value, you find that it's pretty hard to do better than to just not interfere except to keep the ecosystem from collapsing. See John Holland's book and its theorems showing that an evolutionary algorithm as described does optimal search.

Replies from: Wei_Dai, RolfAndreassen, Armok_GoB, gRR

↑ comment by Wei Dai (Wei_Dai) · 2012-05-18T10:34:45.631Z · LW(p) · GW(p)

Presumably, values will evolve differently depending on future contingencies. For example, a future with a world government that imposes universal birth control to limit population growth would probably evolve different values compared to a future that has no such global Singleton. Do you agree, and if so do you think the values evolved in different possible futures are all equivalent as far as you are concerned? If not, what criteria are you using to judge between them?

ETA: Can you explain John Holland's theorems, or at least link to the book you're talking about (Wikipedia says he wrote three). If you think allowing values to evolve is the right thing to do, I'm surprised you haven't put more effort into making a case for it, as opposed to just criticizing SI's plan.

Replies from: timtyler, RolfAndreassen

↑ comment by timtyler · 2012-05-18T23:58:25.227Z · LW(p) · GW(p)

Probably Adaptation in Natural and Artificial Systems. Here's Holland's most famous theorem in the area. It doesn't suggest genetic algorithms make for some kind of optimal search - indeed, classical genetic algorithms are a pretty stupid sort of search.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-07-02T01:17:27.522Z · LW(p) · GW(p)

That is the book. I"m referring to the entire contents of chapters 5-7. The schema theorem is used in chapter 7, but it's only part of the entire argument, which does show that genetic algorithms approach optimal distribution of trials among the different possibilities, for a specific definition of optimal, which is not easy to parse out of Holland's book, due to his failure to give an overview or decent summary of what he is doing. It doesn't say anything about other forms of search that proceed other than by taking a big set of possible answers, which give stochastic results when tested, and allocating trials among them.

↑ comment by RolfAndreassen · 2012-05-18T18:06:44.116Z · LW(p) · GW(p)

CEV is not any old set of evolved values. It is the optimal set of evolved values; the set you get when everything goes exactly right. Of your two proposed futures, one of them is a better approximation to this than the other; I just can't say which one, at this time, because of lack of computational power. That's what we want a FAI for. :)

Replies from: Wei_Dai, TheOtherDave

↑ comment by Wei Dai (Wei_Dai) · 2012-05-18T18:31:28.819Z · LW(p) · GW(p)

Instead of pushing Phil to accept the entirety of your position at once, it seems better to introduce some doubt first: Is it really very hard to do better than to just not interfere? If I have other values besides evolution, should I give them up so quickly?

Also, if Phil has already thought a lot about these questions and thinks he is justified in being pretty certain about his answers, then I'd be genuinely curious what his reasons are.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-05-18T20:06:29.743Z · LW(p) · GW(p)

I misread the nesting, and responded as though your comment were a critique of CEV, rather than Phil's objection to CEV. So I talked a bit past you.

↑ comment by TheOtherDave · 2012-05-18T18:15:06.990Z · LW(p) · GW(p)

But you're evading Wei_Dai's question here.

What criteria does the CEV-calculator use to choose among those options? I agree that significant computational power is also required, but it's not sufficient.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-05-18T20:09:17.331Z · LW(p) · GW(p)

If we were able to formally specify the algorithm by which a CEV calculator should extrapolate our values, we would already have solved the Friendliness problem; your query is FAI-complete. But informally, we can say that the CEV evaluates by whatever values it has at a given step in its algorithm, and that the initial values are the ones held by the programmers.

Replies from: DanArmak, TheOtherDave

↑ comment by DanArmak · 2012-05-19T15:45:09.573Z · LW(p) · GW(p)

The problem with this kind of reasoning (as the OP makes plain) is that there's no good reason to think such CEV maximization is even logically possible. Not only do we not have a solution, we don't have a well-defined problem.

↑ comment by TheOtherDave · 2012-05-18T21:10:41.089Z · LW(p) · GW(p)

(nods) Fair enough. I don't especially endorse that, but at least it's cogent.

↑ comment by RolfAndreassen · 2012-05-18T18:04:47.886Z · LW(p) · GW(p)

The whole point of CEV is that it goes as many levels meta as necessary! And the other whole point of CEV is that it is better at coming up with strategies than you are.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-07-02T01:23:07.407Z · LW(p) · GW(p)

Please explain either one of your claims. For the first, show me where something Eliezer has written indicates CEV has some notion of how meta it is going, or how meta it "should" go, or anything at all relating to your claim. The second appears to merely be a claim that CEV is effective, so its use in any argument can only be presuming your conclusion.

Replies from: RolfAndreassen

↑ comment by RolfAndreassen · 2012-07-02T04:41:57.223Z · LW(p) · GW(p)

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

My emphasis. Or to paraphrase, "as meta as we require."

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-08-26T22:07:33.327Z · LW(p) · GW(p)

Writing "I define my algorithm for problem X to be that algorithm which solves problem X" is unhelpful. Quoting said definition, doubly so.

In any case, the passage you quote says nothing about how meta to go. There's nothing meta in that entire passage.

↑ comment by Armok_GoB · 2012-05-20T00:07:53.572Z · LW(p) · GW(p)

CEV goes infinite levels meta, that's what the "extrapolated" part means.

Replies from: CronoDAS, PhilGoetz

↑ comment by CronoDAS · 2012-05-21T05:33:49.063Z · LW(p) · GW(p)

Countably infinite levels or uncountably infinite levels? ;)

Replies from: Armok_GoB

↑ comment by Armok_GoB · 2012-05-21T20:29:25.770Z · LW(p) · GW(p)

Countably I think, since computing power is presumably finite so the infinity argument relies on the series being convergent.

↑ comment by PhilGoetz · 2012-07-02T01:21:12.734Z · LW(p) · GW(p)

No, that isn't what the "extrapolated" part means. The "extrapolated" part means closure and consistency over inference. This says nothing at all about the level of abstraction used for setting goals.

↑ comment by gRR · 2012-05-18T13:07:37.686Z · LW(p) · GW(p)

it's pretty hard to do better than to just not interfere except to keep the ecosystem from collapsing

Isn't this exactly what we wish FAI to do - interfere the least while keeping everything alive?

Replies from: thomblake

↑ comment by thomblake · 2012-05-18T13:49:03.150Z · LW(p) · GW(p)

Isn't this exactly what we wish FAI to do - interfere the least while keeping everything alive?

Almost certainly not. We'd have massive overpopulation in no time. I remember someone did this analysis, I think it was insects that cover the Earth in days.

Replies from: gRR

↑ comment by gRR · 2012-05-18T14:25:13.599Z · LW(p) · GW(p)

"Interfering the least" implies no massive overpopulation. Please don't read "keeping everything alive" too literally. It doesn't mean no creature ever dies.

comment by [deleted] · 2012-05-18T01:33:14.425Z · LW(p) · GW(p)

"A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them."

This has always been my principal objection to CEV. I strongly suspect that were it implemented, it would want the death of a lot of my friends, and quite possibly me, too.

Replies from: CronoDAS, TimS, 4hodmt, dlthomas, drnickbone

↑ comment by CronoDAS · 2012-05-20T22:41:46.112Z · LW(p) · GW(p)

Regarding CEV: My own worry is that lots of parts of human value get washed out as "incoherent" - whatever X is, if it isn't a basic human biological drive, there are enough people out there that have different opinions on it to make CEV throw up its hands, declare it an "incoherent" desire, and proceed to leave it unsatisfied. As a result, CEV ends up saying that the best we can do is just make everyone a wirehead because pleasure is one of our few universal coherent desires while things like "self-determination" and "actual achievement in the real world" are a real mess to provide and barely make sense in the first place. Or something like that.

(Universal wireheading - with robots taking care of human bodies - at least serves as a lower bound on any proposed utopia; people, in general, really do want pleasure, even if they also want other things. See also "Reedspace's Lower Bound".)

Replies from: steven0461

↑ comment by steven0461 · 2012-05-20T23:01:25.677Z · LW(p) · GW(p)

I would like to see more discussion on the question of how we should distinguish between 1) things we value even at the expense of pleasure, and 2) things we mistakenly alieve are more pleasurable than pleasure.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-20T23:27:06.081Z · LW(p) · GW(p)

Surely if there is something I will give up pleasure for, which I do not experience as pleasurable, that's strong evidence that it is an example of 1 and not 2?

Replies from: steven0461

↑ comment by steven0461 · 2012-05-20T23:32:07.822Z · LW(p) · GW(p)

Yes, but there are other cases. If you prefer eating a cookie to having the pleasure centers in your brain maximally stimulated, are you sure that's not because eating a cookie sounds on some level like it would be more pleasurable?

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-21T02:23:49.150Z · LW(p) · GW(p)

I'm not sure how I could ever be sure of such a thing, but it certainly seems implausible to me.

↑ comment by TimS · 2012-05-18T02:16:05.981Z · LW(p) · GW(p)

That's a little unfair to the concept of CEV. If irreconcilable value conflicts persist after coherent extrapolation, I would think that a CEV function would output nothing, rather than using majoritarian analysis to resolve the conflict.

Replies from: None, tut

↑ comment by [deleted] · 2012-05-18T14:55:19.362Z · LW(p) · GW(p)

Then since there is not one single value about which every single human being on the planet can agree, a CEV function would output nothing at all.

Replies from: thomblake

↑ comment by thomblake · 2012-05-18T14:58:16.021Z · LW(p) · GW(p)

If irreconcilable value conflicts persist after coherent extrapolation

Then since there is not one single value about which every single human being on the planet can agree

Tense confusion.

Replies from: None

↑ comment by [deleted] · 2012-05-18T15:13:45.007Z · LW(p) · GW(p)

CEV is supposed to preserve those things that people value, and would continue to value were they more intelligent and better informed. I value the lives of my friends. Many other people value the death of people like my friends. There is no reason to think that this is because they are less intelligent or less well-informed than me, as opposed to actually having different preferences. TimS claimed that in a situation like that, CEV would do nothing, rather than impose the extrapolated will of the majority.

My claim is that there is nothing -- not one single thing -- which would be a value held by every person in the world, even were they more intelligent and better informed. An intelligent, informed psychopath has utterly different values from mine, and will continue to have utterly different values upon reflection. The CEV therefore either has to impose the majority preferences upon the minority, or do nothing at all.

Replies from: thomblake, DanArmak, TheOtherDave

↑ comment by thomblake · 2012-05-18T15:22:28.658Z · LW(p) · GW(p)

There is no reason to think that this is because they are less intelligent or less well-informed than me, as opposed to actually having different preferences.

There are lots of reasons to think so. For example, they might want the death of your friends because they mistakenly believe that a deity exists.

Replies from: None

↑ comment by [deleted] · 2012-05-18T15:36:51.156Z · LW(p) · GW(p)

Or for any number of other, non-religious reasons. And it could well be that extrapolating those people's preferences would lead, not to them rejecting their beliefs, but to them wishing to bring their god into existence.

Either people have fundamentally different, irreconcilable, values or they don't. If they do, then the argument I made is valid. If they don't, then CEV(any random person) will give exactly the same result as CEV(humanity).

That means that either calculating CEV(humanity) is an unnecessary inefficiency, or CEV(humanity) will do nothing at all, or CEV(humanity) would lead to a world that is intolerable for at least some minority of people. I actually doubt that any of the people from the SI would disagree with that (remember the torture vs flyspecks argument).

That may be considered a reasonable tradeoff by the developers of an "F"AI, but it gives those minority groups to whom the post-AI world would be inimical equally rational reasons to oppose such a development.

Replies from: TimS, thomblake

↑ comment by TimS · 2012-05-18T16:11:29.297Z · LW(p) · GW(p)

As someone who does not believe in moral realism, I agree that CEV over all humans who ever lived (excluding sociopaths and such) will not output anything.

But I think that a moral realist should believe that CEV will output some value system, and that the produced value system will be right.

In short, I think one's belief about whether CEV will output something is isomorphic on whether one believes in [moral realism] (plato.stanford.edu/entries/moral-realism/).

Edit: link didn't work, so separated it out.

Replies from: army1987, None

↑ comment by A1987dM (army1987) · 2012-05-18T19:17:59.564Z · LW(p) · GW(p)

Edit: link didn't work, so separated it out.

Have you tried putting http:// in front of the URL?

(Edit: the backtick thing to show verbatim code isn't working properly for some reason, but you know what I mean.)

Replies from: TimS

↑ comment by TimS · 2012-05-18T19:22:08.077Z · LW(p) · GW(p)

moral realism.

Edit: Apparently that was the problem. Thanks.

Edit2: It appears that copying and pasting from some places includes "http" even when my browser address doesn't. But I did something wrong when copying from the philosophy dictionary.

↑ comment by [deleted] · 2012-05-18T16:22:41.368Z · LW(p) · GW(p)

I agree -- assuming that CEV didn't impose a majority view on a minority. My understanding of the SI's arguments (and it's only my understanding) is that they believe it will impose a majority view on a minority, but that they think that would be the right thing to do -- that if the choice were beween 3^^^3 people getting a dustspeck in the eye or one person getting tortured for fifty years, the FAI would always make a choice, and that choice would be for the torture rather than the dustspecks.

Now, this may well be, overall, the rational choice to make as far as humanity as a whole goes, but it would most definitely not be the rational choice for the person who was getting tortured to support it.

And since, as far as I can see, most people only value a very small subset of humanity who identify as belonging to the same groups as them, I strongly suspect that in the utilitarian calculations of a "friendly" AI programmed with CEV, they would end up in the getting-tortured group, rather than the avoiding-dustspecks one.

Replies from: TheOtherDave, TimS

↑ comment by TheOtherDave · 2012-05-18T16:33:15.210Z · LW(p) · GW(p)

but it would most definitely not be the rational choice for the person who was getting tortured to support it.

This is not clear.

↑ comment by TimS · 2012-05-18T16:39:56.495Z · LW(p) · GW(p)

that if the choice were beween 3^^^3 people getting a dustspeck in the eye or one person getting tortured for fifty years, the FAI would always make a choice, and that choice would be for the torture rather than the dustspecks

That is an entirely separate issue. If CEV(everyone) outputted a moral theory that held utility was additive, then the AI implementing it would choose torture over specks. In other words, utilitarians are committed to believing that specks is the wrong choice.

But there is no guarantee that CEV will output a utilitarian theory, even if you believe it will output something. SI (Eliezer, at least) believes CEV will output a utilitarian theory because SI believes utilitarian theories are right. But everyone agrees that "whether CEV will output something" is a different issue than "what CEV will output."

Personally, I suspect CEV(everyone in the United States) would output something deotological - and might even output something that would pick specks. Again, assuming it outputs anything.

↑ comment by thomblake · 2012-05-18T16:45:20.290Z · LW(p) · GW(p)

Either people have fundamentally different, irreconcilable, values or they don't. If they do, then the argument I made is valid. If they don't, then CEV(any random person) will give exactly the same result as CEV(humanity).

This is a false dilemma. If people have some values that are the same or reconcilable, then you will get different output from CEV(any random person) and CEV(humanity).

And note that an actual move by virtue ethicists is to exclude sociopaths from "humanity".

↑ comment by DanArmak · 2012-05-19T16:14:17.727Z · LW(p) · GW(p)

TimS claimed that in a situation like that, CEV would do nothing, rather than impose the extrapolated will of the majority.

I agree with you in general, and want to further point out that there is no such thing as "doing nothing". If doing nothing tends to allow your friends to continue living (because they have the power to defend themselves in the status quo), that is favoring their values. If doing nothing tends to allow your friends to be killed (because they are a powerless, persecuted minority in the status quo) that is favoring the other people's values.

↑ comment by TheOtherDave · 2012-05-18T15:40:04.282Z · LW(p) · GW(p)

Of course, a lot depends on what we're willing to consider a minority as opposed to something outside the set of things being considered at all.

E.g., I'm in a discussion elsethread with someone who I think would argue that if we ran CEV on the set of things capable of moral judgments, it would not include psychopaths in the first place, because psychopaths are incapable of moral judgments.

I disagree with this on several levels, but my point is simply that there's an implicit assumption in your argument that terms like "person" have shared referents in this context, and I'm not sure they do.

Replies from: None

↑ comment by [deleted] · 2012-05-18T15:59:04.579Z · LW(p) · GW(p)

In which case we wouldn't be talking about CEV(humanity) but CEV(that subset of humanity which already share our values), where "our values" in this case includes excluding a load of people from humanity before you start. Psychopaths may or may not be capable of moral judgements, but they certainly have preferences, and would certainly find living in a world where all their preferences are discounted as intolerable as the rest of us would find living in a world where only their preferences counted.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T16:15:31.692Z · LW(p) · GW(p)

I agree that psychopaths have preferences, and would find living in a world that anti-implemented their preferences intolerable.

In which case we wouldn't be talking about CEV(humanity) but CEV(that subset of humanity which already share our values),

If you mean to suggest that the fact that the former phrase gets used in place of the latter is compelling evidence that we all agree about who to include, I disagree.

If you mean to suggest that it would be more accurate to use the latter phrase when that's what we mean, I agree.

Ditto "CEV(that set of preference-havers which value X, Y, and Z)".

Replies from: None

↑ comment by [deleted] · 2012-05-18T16:25:16.338Z · LW(p) · GW(p)

I definitely meant the second interpretation of that phrase.

Replies from: TimS

↑ comment by TimS · 2012-05-18T16:45:05.854Z · LW(p) · GW(p)

I hope that everyone who discusses CEV understands that a very hard part of building a CEV function would be defining the criteria for inclusion in the subset of people whose values are considered. It's almost circular, because figuring out who to exclude as "insufficiently moral" almost inherently requires the output of a CEV-like function to process.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T17:42:43.507Z · LW(p) · GW(p)

How committed are you to the word "subset" here?

Replies from: TimS

↑ comment by TimS · 2012-05-18T17:51:52.499Z · LW(p) · GW(p)

I'm not sure I understand the question. In reference to the sociopath issue, I think it is clearer to say:
(1) "I don't want sociopaths (and the like) in the subset from which CEV is drawn"
than to say that
(2) "CEV is drawn from all humanity but sociopaths are by definition not human."

Nonetheless, I don't think (1) and (2) are different in any important respect. They just define key terms differently in order to say the same thing. In a rational society, I suspect it would make no difference, but in the current human society, ways words can be wrong makes (2) likely to lead to errors of reasoning.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T18:10:43.403Z · LW(p) · GW(p)

Sorry, I'm being unclear. Let me try again.
For simplicity, let us say that T(x) = TRUE if x is sufficiently moral to include in CEV, and FALSE otherwise. (I don't mean to posit that we've actually implemented such a test.)

I'm asking if you mean to distinguish between:
(1) CEV includes x where T(x) = TRUE and x is human, and
(2) CEV includes x where T(x) = TRUE

Replies from: TimS

↑ comment by TimS · 2012-05-18T18:56:09.647Z · LW(p) · GW(p)

I'm still not sure I understand the question. That said, there are two issues here.

First, I would expect CEV(Klingon) to output something if CEV(human) does, but I'm not aware of any actual species that I would expect CEV(non-human species) to output for. If such a species existed (i.e. CEV(dolphins) outputs a morality), I would advocate strongly for something very like equal rights between humans and dolphins.

But even in that circumstance, I would be very surprised if CEV(all dolphins & all humans) outputted something other than "Humans, do CEV(humanity). Dolphins, do CEV(dolphin)"

Of course, I don't expect CEV(all of humanity ever) to output because I reject moral realism.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T19:07:27.537Z · LW(p) · GW(p)

I think that answers my question. Thanks.

↑ comment by tut · 2012-05-25T16:56:35.124Z · LW(p) · GW(p)

'Coherent' in CEV means that it makes up a coherent value system for all of humanity. By definition that means that there will be no value conflicts in CEV. But it does not mean that you will necessarily like it.

↑ comment by 4hodmt · 2012-05-18T17:41:55.332Z · LW(p) · GW(p)

Why do we need a single CEV value system? A FAI can calculate as many value systems as it needs and keep incompatible humans separate. Group size is just another parameter to optimize. Religious fundamentalists can live in their own simulated universe, liberals in another.

Replies from: TheOtherDave, TimS, None

↑ comment by TheOtherDave · 2012-05-18T18:21:01.308Z · LW(p) · GW(p)

Upvoting back to zero because I think this is an important question to address.

If I prefer that people not be tortured, and that's more important to me than anything else, then I ought not prefer a system that puts all the torturers in their own part of the world where I don't have to interact with them over a system that prevents them from torturing.

More generally, this strategy only works if there's nothing I prefer/antiprefer exist, but merely things that I prefer/antiprefer to be aware of.

Replies from: dlthomas

↑ comment by dlthomas · 2012-05-18T18:26:43.556Z · LW(p) · GW(p)

It's a potential outcome, I suppose, in that

[T]here's nothing I prefer/antiprefer exist, but merely things that I prefer/antiprefer to be aware of.

is a conceivable extrapolation from a starting point where you antiprefer something's existence (in the extreme, with MWI you may not have much say what does/doesn't exist, just how much of it in which branches).

It's also possible that you hold both preferences (prefer X not exist, prefer not to be aware of X) and the existence preference gets dropped for being incompatible with other values held by other people while the awareness preference does not.

↑ comment by TimS · 2012-05-18T17:56:17.088Z · LW(p) · GW(p)

The child molester cluster (where they grow child simply to molest them, then kill them) doesn't bother you, even if you never interact with it?

Because I'm fairly certain I wouldn't like what CEV(child molester) would output and wouldn't want an AI to implement it.

Replies from: 4hodmt

↑ comment by 4hodmt · 2012-05-18T18:25:57.722Z · LW(p) · GW(p)

Assuming 100% isolation it would be indistinguishable from living in a universe where the Many Worlds Interpretation is true, but it still seems wrong. The FAI could consider avoiding groups whose even theoretical existence could cause offence, but I don't see any good way to assign weight to this optimization pressure.

Even so, I think splitting humanity into multiple groups is likely to be a better outcome than a single group. I don't consider the "failed utopia" described in http://lesswrong.com/lw/xu/failed_utopia_42/ to be particularly bad.

Replies from: None, TimS

↑ comment by [deleted] · 2012-05-18T18:32:29.302Z · LW(p) · GW(p)

Assuming 100% isolation it would be indistinguishable from living in a universe where the Many Worlds Interpretation is true

Well, not if "child-molesters" and "non-child-molestors" are competing for limited resources.

↑ comment by TimS · 2012-05-18T18:42:55.663Z · LW(p) · GW(p)

The failed utopia is better than our current world, certainly. But the genie isn't Friendly.

In principle, I could interact with the immoral cluster. AI's interference is not relevant to the morality of the situation because I was part of the creation of the AI. Otherwise, I would be morally justified in ignoring the suffering in some distant part of the world because it will have no practical impact on my life. By contrast, I simply cannot interact with other branches under the MWI - it's a baked in property of the universe that I never had any input into.

↑ comment by [deleted] · 2012-05-18T18:00:50.375Z · LW(p) · GW(p)

What if space travel turns out to be impossible, and the superintelligence has to allocate the limited computational resources of the solar system?

↑ comment by dlthomas · 2012-05-18T17:19:07.609Z · LW(p) · GW(p)

Um, if you would object to your friends being killed (even if you knew more, thought faster, and grew up further with others), then it wouldn't be coherent to value killing them.

Replies from: None

↑ comment by [deleted] · 2012-05-18T17:24:06.084Z · LW(p) · GW(p)

Just because I wouldn't value that, doesn't mean that the majority of the world wouldn't. Which is my whole point.

Replies from: dlthomas

↑ comment by dlthomas · 2012-05-18T17:28:58.492Z · LW(p) · GW(p)

My understanding is that CEV is based on consensus, in which case the majority is meaningless.

Replies from: steven0461, Wei_Dai, DanArmak

↑ comment by steven0461 · 2012-05-18T20:39:01.221Z · LW(p) · GW(p)

Some quotes from the CEV document:

Coherence is not a simple question of a majority vote. Coherence will reflect the balance, concentration, and strength of individual volitions. A minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity. The variables are quantitative, not qualitative.

(...)

It should be easier to counter coherence than to create coherence.

(...)

In qualitative terms, our unimaginably alien, powerful, and humane future selves should have a strong ability to say "Wait! Stop! You're going to predictably regret that!", but we should require much higher standards of predictability and coherence before we trust the extrapolation that says "Do this specific positive thing, even if you can't comprehend why."

Though it's not clear to me how the document would deal with Wei Dai's point in the sibling comment. In the absence of coherence on the question of whether to protect, persecute, or ignore impopular minority groups, does CEV default to protecting them or ignoring them? You might say that as written, it would obviously not protect them, because there was no coherence in favor of doing so; but what if protection of minority groups is a side effect of other measures CEV was taking anyway?

(For what it's worth, I suspect that extrapolation would in fact create enough coherence for this particular scenario not to be a problem.)

Replies from: dlthomas

↑ comment by dlthomas · 2012-05-18T20:56:29.674Z · LW(p) · GW(p)

Thank you. So, not quite consensus but similarly biased in favor if inaction.

↑ comment by Wei Dai (Wei_Dai) · 2012-05-18T18:40:56.578Z · LW(p) · GW(p)

My understanding is that CEV is based on consensus, in which case the majority is meaningless.

If CEV doesn't positively value some minority group not being killed (i.e., if it's just indifferent due to not having a consensus), then the majority would be free to try to kill that group. So we really do need CEV to saying something about this, instead of nothing.

Replies from: dlthomas

↑ comment by dlthomas · 2012-05-18T18:42:45.302Z · LW(p) · GW(p)

Assuming we have no other checks on behavior, yes. I'm not sure, pending more reflection, whether that's a fair assumption or not...

↑ comment by DanArmak · 2012-05-19T16:00:28.578Z · LW(p) · GW(p)

There is absolutely no reason to think that the values of all humans, extrapolated in some way, will arrive at a consensus.

↑ comment by drnickbone · 2012-05-18T17:12:35.760Z · LW(p) · GW(p)

Wouldn't CEV need to extract consensus values under a Rawlsian "veil of ignorance"?

It strikes me as very unlikely that there would be a consensus (or even majority) vote for killing gays or denying full rights to women under such a veil, because of the significant probability of ending up gay, and the more than 50% probability of being a woman. Prisons would be a lot better as well. The only reason illiberal values persist is because those who hold them know (or are confident) that they're not personally going to be victims of them.

So CEV is either going to end up very liberal, or if done without the veil of ignorance, is not going to end up coherent at all. Sorry if that's politics, the mind-killer.

Replies from: TheOtherDave, Nornagest, Zack_M_Davis, DanArmak, TimS

↑ comment by TheOtherDave · 2012-05-18T17:40:20.032Z · LW(p) · GW(p)

Note that there's nothing physically impossible about altering the probability of being born gay, straight, bi, male, female, asexual, etc.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-18T18:19:53.555Z · LW(p) · GW(p)

True, and this could create some interesting choices for Rawlsians with very conservative values. Would they create a world with no gays, or no women? Would they do both???

Replies from: TheOtherDave, TimS

↑ comment by TheOtherDave · 2012-05-18T18:22:49.388Z · LW(p) · GW(p)

I don't know how to reply to this without violating the site's proscription on discussions of politics, which I prefer not to do.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-18T21:25:25.944Z · LW(p) · GW(p)

OK - the comment was pretty flippant anyway. Consider it withdrawn.

↑ comment by TimS · 2012-05-18T19:11:18.708Z · LW(p) · GW(p)

Heinlein's "Starship Troopers" discusses the death penalty imposed on a violent child rapist/murder. The narrator says there are two possibilities:

1) The killer was so deranged he didn't know right from wrong. In that case, killing (or imprisoning him) is the only safe solution for the rest. Or,
2) The killer knew right from wrong, but couldn't stop himself. Wouldn't killing (or stopping) him be a favor, something he would want?

Why can't that type of reasoning exist behind the veil of ignorance? Doesn't it completely justify certain kinds of oppression? That said, there's also an empirical question whether the argument applies to the particular group being oppressed.

Replies from: gwern, dlthomas, drnickbone

↑ comment by gwern · 2012-05-18T20:25:12.269Z · LW(p) · GW(p)

Not dealing with your point, but that sort of analysis is why I find Heinlein so distasteful - the awful philosophy. For example in #1, 5 seconds of thought suffices to think of counterexamples like temporary derangements (drug use, treatable disease, particularly stressful circumstances, blows to the head), and more effort likely would turn up powerful empirical evidence like possibly an observation that most murderers do not murder again even after release (and obviously not execution).

Replies from: TimS

↑ comment by TimS · 2012-05-18T20:42:12.917Z · LW(p) · GW(p)

Absolutely. What finally made me realize that Heinlein was not the bestest moral philosopher ever was noticing that all his books contained superheros - Stranger in a Strange Land is the best example. I'm not talking about the telekinetic powers, but the mental discipline. His moral theory might work for human-like creatures with perfect mental discipline, but for ordinary humans . . . not so much.

Replies from: fubarobfusco, NancyLebovitz

↑ comment by fubarobfusco · 2012-05-19T02:19:52.946Z · LW(p) · GW(p)

This was pretty common in sf of the early 20th century, actually — the trope of a special group of people with unusual mental disciplines giving them super powers and special moral status. See A. E. van Vogt (the Null-A books) or Doc Smith (the Lensman books) for other examples. There's a reason Dianetics had so much success in the sf community of that era, I suspect — fans were primed for it.

↑ comment by NancyLebovitz · 2012-05-20T06:05:22.228Z · LW(p) · GW(p)

Is that true of all of Heinlein's books? I would say that most of them (including Starship Troopers) don't have superheroes.

Replies from: Nornagest

↑ comment by Nornagest · 2012-05-20T06:28:50.476Z · LW(p) · GW(p)

Well, I'm not exactly a Heinlein scholar, but I'd say it shows up mainly in his late-period work, post Stranger in a Strange Land. Time Enough for Love and its sequels definitely qualify, but some of the stuff he's most famous for -- The Moon is a Harsh Mistress, Have Space Suit, Will Travel, et cetera -- don't seem to. Unfortunately, Heinlein's reputation is based mainly on that later stuff.

Replies from: TimS

↑ comment by TimS · 2012-05-20T21:00:02.107Z · LW(p) · GW(p)

The revolution in "Moon is a Harsh Mistress" cannot succeed without the aid of the supercomputer. That makes any moral philosophy implicit in that revolution questionable to the extent one asserts that the moral philosophy is true of humanity now.

To a lesser extend, "Starship Troopers" asserts that military service is a reliable way of screening for the kinds of moral qualities (like mental discipline) that make one trustworthy enough to be a high government official (or even to vote, if I recall correctly). In reality, those moral qualities are very thin on the ground in the real world, being much less common than suggested by the book. If the appropriate moral qualities were really that frequent, the sanity line would already be much high than it is.

Replies from: CronoDAS, Bugmaster

↑ comment by CronoDAS · 2012-05-20T22:14:00.779Z · LW(p) · GW(p)

It might be relevant to note that Heinlein served in the U.S. Navy before he was discharged due to medical reasons.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2012-05-20T22:20:05.345Z · LW(p) · GW(p)

Most men in his generation did military service of some form.

↑ comment by Bugmaster · 2012-05-20T21:12:49.948Z · LW(p) · GW(p)

I read Starship Troopers as a critique of fascism, not its endorsement, but I could be wrong...

Replies from: TimS

↑ comment by TimS · 2012-05-20T22:04:53.501Z · LW(p) · GW(p)

I wouldn't say the Starship Troopers government was fascist, but Heinlein clearly thinks they are doing things pretty well. The fact that the creation process of that government avoided fascism with no difficulty (it isn't considered worth mentioning in the history) is precisely the lack of realism that I am criticizing.

Replies from: Bugmaster

↑ comment by Bugmaster · 2012-05-21T22:33:22.488Z · LW(p) · GW(p)

Hmm, I could be confusing the book with the movie. I'll need to re-read it again.

↑ comment by dlthomas · 2012-05-18T21:57:59.084Z · LW(p) · GW(p)

As long as we're using sci-fi to inform our thinking on criminality and corrections, The Demolished Man is an interesting read.

↑ comment by drnickbone · 2012-05-18T21:35:48.104Z · LW(p) · GW(p)

What would a Rawlsian decider do? Institute a prison and psychiatric system, and some method of deciding between case 1 (psychiatric imprisonment to try and treat or at least prevent further harm) and case 2 (criminal imprisonment to deter like-minded people and prevent further harm from the killer/rapist). Also set up institutions for detecting and encouraging early treatment of child sex offenders before they moved to murder.

They would not want the death penalty in either case, nor would they want the prison/psychiatric system to be so appalling that they might prefer to be dead.

The Rawlsian would need to weigh the risk of being the raped/murdered child (or their parent) against the risk of being born with psychopathic or paedophile tendencies. If there was genuinely a significant deterrent from the death penalty, then the Rawlsian might accept it. But that looks unlikely in such cases.

↑ comment by Nornagest · 2012-05-18T17:51:52.947Z · LW(p) · GW(p)

Most of those who propose illiberal values do not do so under the presumption that they thereby harm the affected groups. A paternalistic attitude is much more common, and is not automatically inconsistent with preferences beyond a Rawlsian veil of ignorance.

An Omelasian attitude also seems consistent, for that matter, though even less likely.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-18T18:11:21.840Z · LW(p) · GW(p)

As a matter of empirical fact, I think this is wrong. Men in sexist societies are really glad they're not women (and even thank God they are not in some cases). They are likely to run in horror from the Rawlsian veil when they see the implications.

And anyway, isn't that paternalism itself inconsistent with Rawlsian ignorance? Who would voluntarily accept a more than 50% chance of being treated like a patronized child (and a second-class citizen) for life?

And how is killing gays in the slightest bit a paternalistic attitude?

I'd never heard of Omelas, or anything like it.. so I doubt this will be part of CEV. Again, who would voluntarily accept the risk of being such a scapegoat, if it were an avoidable risk? (If it is not avoidable for some reason, then that is a fact that CEV would have to take into account, as would the Rawlsian choosers).

Replies from: Nornagest, DanArmak

↑ comment by Nornagest · 2012-05-18T19:09:34.898Z · LW(p) · GW(p)

Who would voluntarily accept a more than 50% chance of being treated like a patronized child (and a second-class citizen) for life?

Someone believing that this sort of paternalism is essential to gender and unable or unwilling to accept a society without it. Someone convinced that this was part of God's plan or otherwise metaphysically necessary. Someone not very fond of making independent decisions. I don't think any of these categories are strikingly rare.

That's about as specific as I'd like to get; anything more so would incur an unacceptable risk of political entanglements. In general, though, I think it's important to distinguish fears and hatreds arising against groups which happen to be on the wrong side of some social line (and therefore identity) from the processes that led to that line being drawn in the first place: it's possible, and IMO quite likely, for people to coherently support most traditional values concerning social dichotomies without coherently endorsing malice across them. This might not end up being stable, human psychology being what it is, but it doesn't seem internally inconsistent.

The way people's values intersect with the various consequences of their identities is quite complicated and I'm not sure I completely understand it, but I wouldn't describe either as a subset of the other.

(Incidentally, around 51% of human births are male; more living humans are female but that's because women live longer. This has absolutely no bearing on the argument, but it was bugging me.)

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-18T21:57:14.502Z · LW(p) · GW(p)

Thanks for the reply here, that was helpful.

What you've described here is a person who would put adherence to an ideological system (or set of values derived from that system) above their own probable welfare. They would reason to themselves : yes my own personal welfare would probably be higher in an egalitarian society (or the risk of low personal welfare would be lower); but stuff that, I'm going to implement my current value system anyway. Even if it comes back to shoot me in the foot.

I agree that's possible, but my impression is that very few humans would really want to do that. The tendency to put personal welfare first is enormous, and I really do believe that most of us would do that if choosing behind a Rawlsian veil.

What's odd is that it is a classical conservative insight that human beings are mostly self-interested, and rather risk-adverse, and that society needs to be constructed to take that into account. It's an insight I agree with by the way, and yet it is precisely this insight that leads to Rawlsian liberalism. Whereas to choose a different (conservative) value system, the choosers have to sacrifice their self-interest to that value system.

Replies from: Nornagest, JoachimSchipper

↑ comment by Nornagest · 2012-05-18T22:13:35.207Z · LW(p) · GW(p)

What you've described here is a person who would put adherence to an ideological system (or set of values derived from that system) above their own probable welfare.

Self-assessed welfare isn't cleanly separable from ideology. People aren't strict happiness maximizers; we value all sorts of abstract things, many of which are linked to the social systems and identity groups in which we're embedded. Sometimes this ends up looking pretty irrational from the outside view, but from the inside giving them up would look unattractive for more or less the same reason that wireheading is unattractive to (most of) us.

Now, this does drift over time, both on a sort of random walk and in response to environmental pressures, which is what allows things like sexual revolutions to happen. During phase changes in this space, the affected social dichotomies are valued primarily in terms of avoiding social costs; that's the usual time when they're a salient issue instead of just part of the cultural background, and so it's easy to imagine that that's always what drives them. But I don't think that's the case; I think there's a large region of value space where they really are treated as intrinsic to welfare, or as first-order consequences of intrinsic values.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-19T17:28:46.024Z · LW(p) · GW(p)

Thanks again. I'm still not sure of the exact point you are making here, though.

Let's take gender-based discrimination and unequal rights as a sample case. Are you arguing that someone wedded to an existing gender-biased value system would deliberately select a discriminatory society (over an equal rights one) even if they were choosing on the basis of self-interest? That they would fully understand that they have roughly 50% chance of getting the raw end of the deal, but still think that this deal would maximise their welfare overall?

I get the point that a committed ideologue could consciously decide here against self-interest. I'm less clear how someone could decide that way while still thinking it was in their self-interest. The only way I can make sense of such a decision is if were made on the basis of faulty understanding (i.e. they really can't empathize very well, and think it would not be so bad after all to get born female in such a society).

In a separate post, I suggested a way that an AI could make the Rawlsian thought experiment real, by creating a simulated society to the deciders' specifications, and then beaming them into roles in the simulation at random (via virtual reality/total immersion/direct neural interface or whatever). One variant to correct for faulty understanding might be to do it on an experimental basis. Once the choosers think they have made their minds up, they get beamed into a few randomly-selected folks in the sim, maybe for a few days or weeks (or years) at a time. After the experience of living in their chosen world for a while, in different places, times, roles etc. they are then asked if they want to change their mind. The AI will repeat until there is a stable preference, and then beam in permanently.

Returning to the root of the thread, the original objection to CEV was that most people alive today believe in unequal rights for women and essentially no rights for gays. The key question is therefore whether most people would really choose such a world in the Rawlsian set-up. And then, would most people continue to so-choose even after living in that world for a while in different roles?

If the answers are "no" then the Rawlsian veil of ignorance can remove this particular objection to CEV. If they are "yes" then it cannot. Agreed?

Replies from: NancyLebovitz, Nornagest, evand

↑ comment by NancyLebovitz · 2012-05-20T06:18:19.083Z · LW(p) · GW(p)

That they would fully understand that they have roughly 50% chance of getting the raw end of the deal, but still think that this deal would maximise their welfare overall?

A lot of oppression of women seems to be justified by claims that if women aren't second-class citizens, they won't choose to have children, or at least not enough children for replacement. This makes women's rights into an existential risk.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T13:37:27.854Z · LW(p) · GW(p)

This argument also implies that societies and smaller groups where women have lower status and more children will out-breed and so eventually outcompete societies where women have equal rights. So people can also defend the lower status of women as a nationalistic or cultural self-defense impulse.

↑ comment by Nornagest · 2012-05-19T22:02:28.013Z · LW(p) · GW(p)

Are you arguing that someone wedded to an existing gender-biased value system would deliberately select a discriminatory society (over an equal rights one) even if they were choosing on the basis of self-interest? That they would fully understand that they have roughly 50% chance of getting the raw end of the deal, but still think that this deal would maximise their welfare overall?

Yes and no. Someone who'd internalized a discriminatory value system -- who really believed in it, not just belief-in-belief, to use LW terminology -- would interpret their self-interest and therefore their welfare in terms of that value system. They would be conscious of of what we would view as unequal rights, but would see these as neutral or positive on both sides, not as one "getting the raw end of the deal" -- though they'd likely object to some of their operational consequences. This implies, of course, a certain essentialism, and only applies to certain forms of discrimination: recent top-down imposition of values isn't stable in this way.

As a toy example, read 1 Corinthians 11, and try to think of the mentality implied by taking that as the literal word of God -- not just advice from some vague authority, but an independent axiom of a value system backed by the most potent proofs imaginable. Applied to an egalitarian society, what would such a value system say about the (value-subjective) welfare of the women -- or for that matter the men -- in it?

the original objection to CEV was that most people alive today believe in unequal rights for women and essentially no rights for gays. The key question is therefore whether most people would really choose such a world in the Rawlsian set-up. And then, would most people continue to so-choose even after living in that world for a while in different roles?

This, on the other hand, is essentially an anthropology question. The answer depends on the extent of discriminatory traditional cultures, on the strength of belief in them, and on the commonalities between them: "unequal rights" isn't a value, it's a judgment call over a value system, and the specific unequal values that we object to may be quite different between cultures. I'm not an anthropologist, so I can't really answer that question -- but if I had to, I'd doubt that a reflectively stable consensus exists for egalitarianism or for any particular form of discrimination, with or without the Rawlsian wrinkle.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-19T22:32:39.533Z · LW(p) · GW(p)

So this would be like the "separate but equal" argument? To paraphrase in a gender context: "Men and women are very different, and need to be treated differently under the law - both human and divine law. But it's not like the female side is really worse off because of this different treatment".

That - I think - would count as a rather basic factual misunderstanding of how discrimination really works. It ought to be correctable pretty damn fast by a trip into the simulator.

(Incidentally, I grew up in a fundamentalist church until my teens, and one of the things I remember clearly was the women and teen girls being very upset about being told that they had to shut up in church, or wear hats or long hair, or that they couldn't be elders, or whatever. They also really hated having St Paul and the Corinthians thrown at them; the ones who believed in Bible inerrancy were sure the original Greek said something different, and that the sacred text was being misinterpreted and spun against them. Since it is an absolute precondition for an inerrantist position that correct interpretations are difficult, and up for grabs, this was no more unreasonable than the version spouted by the all-male elders.)

Replies from: Nornagest

↑ comment by Nornagest · 2012-05-19T22:51:28.094Z · LW(p) · GW(p)

That - I think - would count as a rarher basic factual misunderstanding of how discrimination really works. It ought to be correctable pretty damn fast by a trip into the simulator.

Well, I won't rule it out. But if you grow up in the West -- even in one of its more traditionalist enclaves -- that means you've grown up surrounded by some of the most fantastically egalitarian rhetoric the world's ever generated, and I think one consequence of that is the adoption of a rather totalizing attitude toward any form of discrimination. Not that that's a bad thing; discrimination's bad news. But it does make it kind of hard to grok stratified social organization in any kind of unbiased way.

I grew up secular, albeit in one of the more conservative parts of my home state. But I have read a lot of social commentary from the Middle Ages and the Classical period, and I've visited a couple of highly traditionalist non-Western countries. Both seem to exhibit an attitude towards what we'd call unequal rights that's pretty damned strange for those of us who were raised on Max Weber and Malcolm X, and I wouldn't put the differences down to ignorance.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-19T23:19:28.384Z · LW(p) · GW(p)

But I have read a lot of social commentary from the Middle Ages and the Classical period, and I've visited a couple of highly traditionalist non-Western countries.

Of course there is an enormous selection bias here. You're reading the opinions of the tiny minority who were a) literate, b) had time to write social commentary, c) didn't have their writings burned or otherwise censored and d) were preserved for later generations by copyists. It's very difficult to tell whether they represented the CEV of their time (or anything like it). And on visiting other cultures, even in the present, I can only reflect that if you'd visited the fundie church of my childhood you'd have seen an overt culture of traditionalist paternalism/sexism, but wouldn't have seen the genuine hurt and pain of the 50% or so who really wished it wasn't like that. Being denied a public voice, you couldn't have heard them. That's kind of the point.

I've also visited a few non-Western countries in the world (on business) and to the extent the people there have voiced opinions about their situation versus ours (which was not very often), they've been rather keen to make their countries more like ours in terms of liberty, equality and the pursuit of shed loads of money.. Or leave for the West if they can. Sheer poverty sucks too.

↑ comment by evand · 2012-05-19T19:34:15.823Z · LW(p) · GW(p)

The obvious (paternalistic) answer is that they believe that, conditioned on them being born female, their self-interest is improved by paternalistic treatment of all women vs equality.

In order to convince them otherwise, you would (at a minimum) have to run multiple world sims, not just multiple placements in one world. You would also have to forcibly give them sufficiently rational thought processes that they could interpret the evidence you forced upon them. I'm not sure that forcibly messing with people's thought processes is ethical, or that you could really claim it was a coherent extrapolation after you had performed that much involuntary mind surgery on them.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-19T20:16:20.942Z · LW(p) · GW(p)

In order to convince them otherwise, you would (at a minimum) have to run multiple world sims, not just multiple placements in one world. You would also have to forcibly give them sufficiently rational thought processes that they could interpret the evidence you forced upon them

Disagree. A simple classroom lesson is often sufficient to get the point across:

http://www.uen.org/Lessonplan/preview.cgi?LPid=536

Discrimination REALLY sucks.

↑ comment by JoachimSchipper · 2012-05-19T15:30:32.040Z · LW(p) · GW(p)

I've met women who honestly and persistently profess that women should not be allowed to vote. In at least one case, even in private, to a person they really want to like them and who very clearly disagrees with them.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-19T17:59:17.612Z · LW(p) · GW(p)

That doesn't surprise me... I've had the same experience once or twice, in mixed company, and with strong feminists in the room. The subsequent conversations along the lines of "But women chained themselves to railings, and threw themselves under horses to get the vote; how can you betray them like that?" were quite amusing. Especially when followed by the retort "Well I've got a right to my own opinion just as much as anyone else - surely you respect that as a feminist!"

I've also met quite a few people who think that no-one should vote. ("If it did any good, it would have been abolished years ago" a position I have a lot more sympathy for these days than I ever used to).

My preferred society (in a Rawlsian setting) might not actually have much voting at all, except on key constitutional issues. State and national political offices (parliaments, presidents etc) would be filled at random (in an analogue to jury service) and for a limited time period. After the victims had passed a few laws and a budget, they would be allowed to go home again. No-one would give a damn about gaffes, going off message, or the odd sex scandal, because it would happen all the time, and have very limited impact. I think there would also need to be mandatory citizen service on boring committees, local government roles, planning permission and drainage enquiries etc to stop professional civil servants, lobbyists or wonks ruling the roost: the necessary tedium would be considered part of everyone's civic duty. This - in my opinion - is probably the biggest problem with politics. Much of it is so dull, or soul-destroying, that no-one with any sense wants to do it, so it is left to those without any sense.

↑ comment by DanArmak · 2012-05-19T16:08:05.058Z · LW(p) · GW(p)

And how is killing gays in the slightest bit a paternalistic attitude?

Kill their bodies, save their souls.

↑ comment by Zack_M_Davis · 2012-05-18T18:25:42.576Z · LW(p) · GW(p)

The only reason illiberal values persist is because those who hold them know (or are confident) that they're not personally going to be victims of them.

You might be right, but I'm less sure of this.

Someone with more historical or anthropological knowledge than I is welcome to correct me, but I'm given to understand that many of those whom we would consider victims of an oppressive social system, actually support the system. (E.g., while woman's suffrage seems obvious now, there were many female anti-suffragists at the time.) It's likely that such sentiments would be nullified by a "knew more, thought faster, &c." extrapolation, but I don't want to be too confident about the output of an algorithm that is as yet entirely hypothetical.

Furthermore, the veil of ignorance has its own problems: what does it mean for someone to have possibly been someone else? To illustrate the problem, consider an argument that might be made by (our standard counterexample) a hypothetical agent who wants only to maximize the number of paperclips in the universe:

The only reason non-paperclip-maximizing values persist is because those who hold them know (or are confident) that they're not personally going to be victims of them (because they already know that they happened to have been born as humans rather than paperclip-maximizers).

---which does not seem convincing. Of course, humans in oppressed groups and humans in privileged groups are inexpressibly more similar to each other than humans are to paperclip-maximizers, but I still think this thought experiment highlights a methodological issue that proponents of a veil of ignorance would do well to address.

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-18T18:40:23.125Z · LW(p) · GW(p)

Someone with more historical or anthropological knowledge than I is welcome to correct me, but I'm given to understand that many of those whom we would consider victims of an oppressive social system, actually support the system.

Isn't the main evidence that victims of oppressive social systems want to escape from them at every opportunity? There are reasons for refugees, and reasons that the flows are in consistent directions.

And if anti-suffragism had been truly popular, then having got the vote, women would have immediately voted to take it away again. Does this make sense?

Some other points:

CEV is about human values, and human choices, rather than paper-clippers. I doubt we'd get a CEV across wildly-different utility functions in the first place.
I'm happy to admit that CEV might not exist in the veil of ignorance case either, but it seems more likely to.
I'm getting a few down-votes here. Is the general consensus here that this is too close to politics, and that is a taboo subject (as it is a mind-killer)? Or is the "veil of ignorance" idea not an important part of CEV?

↑ comment by DanArmak · 2012-05-19T16:05:49.978Z · LW(p) · GW(p)

Just because some despised minorities exist today, doesn't mean they will continue to exist in the future under CEV. If a big enough majority clearly wishes that "no members of that group continue to exist" (e.g. kill existing gays AND no new ones ever to be born), then the CEV may implement that, and the veil of ignorance won't change this, because you can't be ignorant about being a minority member in a future where no-one is.

↑ comment by TimS · 2012-05-18T17:45:40.961Z · LW(p) · GW(p)

Isn't there substantial disagreement over whether the veil of ignorance is sufficient or necessary to justify a moral theory?

Edit: Or just read what Nornagest said

Replies from: drnickbone

↑ comment by drnickbone · 2012-05-18T18:16:10.658Z · LW(p) · GW(p)

Perhaps, but I think my point stands. CEV will use a veil of ignorance, or it won't be coherent. It may be incoherent with the veil as well, but I doubt it. Real human beings look after number one much more than they'd ever care to admit, and won't take stupid risks when choosing under the veil.

One very intriguing thought about an AI is that it could make the Rawlsian choice a real one. Create a simulated society to the choosers' preferences, and then beam them in at random...

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-20T06:07:30.289Z · LW(p) · GW(p)

Even with a veil of ignorance, people won't make the same choices-- people fall in different places on the risk aversion/reward-seeking spectrum.

comment by Ghatanathoah · 2012-05-23T07:27:16.486Z · LW(p) · GW(p)

Non-positional, mutually-satisfiable values (physical luxury, for instance) Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen

All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict.

David Friedman pointed out that this isn't correct, it's actually it's quite easy to make positional values mutually satisfiable:

It seems obvious that, if one's concern is status rather than real income, we are in a zero sum game..... Like many things that seem obvious, this one is false. It is true that my status is relative to yours. It does not, oddly enough, follow that if my status is higher than yours, yours must be lower than mine, or that if my status increases someone else's must decrease. Status is not, in fact, a zero sum game.

This point was originally made clear to me when I was an undergraduate at Harvard and realized that Harvard had, in at least one interesting way, the perfect social system: Everyone at the top of his own ladder. The small minority of students passionately interested in drama knew perfectly well that they were the most important people at the university; everyone else was there to provide them with an audience....

Being a male nurse is not a terribly high status job—but that may not much matter if you are also King of the Middle Kingdom. And the status you get by being king does not reduce the status of the doctors who know that they are at the top of the medical ladder and the nurses at the bottom.

[Emphasis mine]

A FAI could simply make sure that everyone is a member of enough social groups that everyone has high status in some of them. Positional goals can be mutually satisficed, if one is smart enough about it. Those two types of value don't differ as much as you seem to think they do. Positional goals just require a little more work to make implementing them conflict-free than the other type does.

If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values, since it will behave differently.

I don't think I agree with this. Couldn't you take that argument further and claim that if I undergo some sort of rigorous self-improvement program in order to better achieve my goals in life, that that must mean I now have different values? In fact, you could easily say that I am behaving pointlessly because I'm not achieving my values better, I'm just changing them? It seems likely that most of the things that you are describing as values aren't really values, they're behaviors. I'd regard values as more "the direction in which you want to steer the world," both in terms of your external environment and your emotional states. Behaviors are things you do, but they aren't necessarily what you really prefer.

I agree that a more precise and articulate definition of these terms might be needed to create a FAI, especially if human preferences are part of a network of some sort as you claim, but I do think that they cleave reality at the joints.

I can't really see how you can attack CEV by this route without also attacking any attempt at self-improvement by a person.

A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry. Many human values horrify most people on this list, so they shouldn't be trying to preserve them.

The fact that these values seem to change or weaken as people become wealthier and better educated indicates that they probably are poorly extrapolated values. Most of these people don't really want to do these things, they just think they do because they lack the cognitive ability to see it. This is emphasized by the fact that these people, when called out on their behavior, often make up some consequentialist justification for it (if I don't do it God will send an earthquake!)

I'll use an example from my own personal experience to illustrate this, when I was little (around 2-5) I thought horror movies were evil because they scared me. I didn't want to watch horror movies or even be in the same room with a horror movie poster. I thought people should be punished for making such scary things. Then I got older and learned about freedom of speech and realized that I had no right to arrest people just because they scare me.

Then I got even older and started reading movie reviews. I became a film connoisseur and became sick of hearing about incredible classic horror movies, but not being able to watch them because they scared me. I forced myself to sit through Halloween, A Nightmare on Elm Street, and The Grudge, and soon I was able to enjoy horror movies like a normal person.

Not watching horror movies and punishing the people who made them were the preferences of young me. But my CEV turned out to be "Watch horror movies and reward the people who create them." I don't think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I'd had the guts to sit down and watch them. The younger me didn't have different terminal values, his values were just poorly extrapolated.

I think most of the types of people you mention would be the same if they could pierce through their cloud of self-deception. I think their values are wrong and that they themselves would recognize this if they weren't irrational. I think a CEV would extrapolate this.

But even if I'm wrong, if there's a Least Convenient Possible world where there are otherwise normal humans who have "kill all gays" irreversibly and directly programmed into their utility function, I don't think a CEV of human morality would take that into account. I tend to think that, from an ethical standpoint, malicious preferences (that is, preferences where frustrating someone else's desires is an end in itself, rather than a byproduct of competing for limited resources) deserve zero respect. I think that if a CEV took properly extrapolated human ethics it would realize this. It might not hurt to be extra careful about that when programming a CEV, however.

Replies from: None, thomblake, Jayson_Virissimo

↑ comment by [deleted] · 2012-05-23T08:13:34.430Z · LW(p) · GW(p)

I don't think this was random value drift, I think that I always had the potential to love horror movies and would have loved them sooner if I'd had the guts to sit down and watch them.

I had a somewhat similar experience growing up, although a few details are different (I never thought people should be banned from making such films or that they were evil things just because they scared me, for instance, and I made the decision to try watching some of them, mostly Alien and a few other works from the same general milieu, at a much younger age and for substantially different reasons). However, I didn't wind up loving horror movies; I wound up liking one or two films that only pushed my buttons in nice, predictable places and without actually squicking me per se. I honestly still don't get how someone can sit through films like Halloween or Friday the 13th -- I mean, I get the narrative underpinnings and some of the psychological buttons they push very well (reminds me of ghost tales and other things from my youth), but I can't actually feel the same way as your putative "normal person" when sitting through it. Even movies most people consider "very tame" or "not actually scary" make me too uncomfortable to want to sit through them, a good portion of the time. And I've actively tried to cultivate this, not for its own sake (I could go my whole life never sitting through such a film again and not be deprived, even one of the ones I've enjoyed many times) but because of the small but notable handful of horror-themed movies that I do like and the number of people I know who enjoy such films with whom I'd have even more social-yay if I did self-modify to enjoy those movies. It simply didn't take -- after much exposure and effort, I now find most such films both squicky and actively uninteresting. I can see why other people like 'em, but I can't relate.

Are my terminal values "insufficiently extrapolated?" Or just not coherent with yours?

Replies from: Ghatanathoah

↑ comment by Ghatanathoah · 2012-05-23T08:56:59.606Z · LW(p) · GW(p)

Are my terminal values "insufficiently extrapolated?" Or just not coherent with yours?

I don't think it's either. We both have the general value, "experience interesting stories," it's just expressed in slightly different ways. I don't think that really really specific preferences for art consumption would be something that CEV extrapolates. I think CEV is meant to figure out what general things humans value, not really specific things (i.e. a CEV might say, "you want to experience fun adventure stories," it would not say "read Green Lantern #26" or "read King Solomon's Mines"). The impression I get is that CEV is more about general things like "How should we treat others?" and "How much effort should we devote to liking activities vs. approving ones?"

I don't think our values are incoherent, you don't want to stop me from watching horror movies and I don't want to make you watch them. In fact, I think a CEV would probably say "It's good to have many people who like different activities because that makes life more interesting and fun." Some questions (like "Is it okay to torture people") likely only have one true, or very few true, CEVs, but others, like matters of personal taste, probably vary from person to person. I think a FAI would probably order everyone not to torture toddlers, but I doubt it would order us all to watch "Animal House" at 9:00pm this coming Friday.

↑ comment by thomblake · 2012-05-23T14:18:35.802Z · LW(p) · GW(p)

David Friedman pointed out that this isn't correct, it's actually it's quite easy to make positional values mutually satisfiable:

I'm glad you pointed this out - I don't think this view is common enough around here.

↑ comment by Jayson_Virissimo · 2012-05-23T07:49:33.288Z · LW(p) · GW(p)

But even if I'm wrong, if there's a Least Convenient Possible world where there are otherwise normal humans who have "kill all gays" irreversibly and directly programmed into their utility function, I don't think a proper CEV would take that into account.

I'm not sure what to make of your use of the word "proper". Are you predicting that a CEV will not be utilitarian or saying that you don't want it to be?

Replies from: Ghatanathoah

↑ comment by Ghatanathoah · 2012-05-23T08:26:05.690Z · LW(p) · GW(p)

I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call "malicious preferences." That is, if someone valued frustrating someone else's desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it.

This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan's debate with Robin Hanson, where Bryan mentioned:

...Robin endorses an endless list of bizarre moral claims. For example, he recently told me that "the main problem" with the Holocaust was that there weren't enough Nazis! After all, if there had been six trillion Nazis willing to pay $1 each to make the Holocaust happen, and a mere six million Jews willing to pay $100,000 each to prevent it, the Holocaust would have generated $5.4 trillion worth of consumers surplus.

I don't often agree with Bryan's intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn't seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored.

I don't think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that "malicious preferences" have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.

In response to your question I have edited my post and changed "a proper CEV" to "a CEV of human morality."

Replies from: thomblake, Jayson_Virissimo

↑ comment by thomblake · 2012-05-23T14:15:58.496Z · LW(p) · GW(p)

I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call "malicious preferences."

This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it.

Zero is a strange number to have specified there, but then I don't know the shape of the function you're describing. I would have expected a non-specific "negative utility" in its place.

Replies from: Ghatanathoah

↑ comment by Ghatanathoah · 2012-05-23T16:08:47.321Z · LW(p) · GW(p)

Zero is a strange number to have specified there, but then I don't know the shape of the function you're describing. I would have expected a non-specific "negative utility" in its place.

You're probably right, I was typing fairly quickly last night.

↑ comment by Jayson_Virissimo · 2012-05-23T09:29:11.890Z · LW(p) · GW(p)

I don't think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that "malicious preferences" have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.

Ah, okay. This sounds somewhat like Nozick's "utilitarianism with side-constraints". This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.

comment by Lightwave · 2012-05-18T07:23:14.370Z · LW(p) · GW(p)

It seems that what you have argued here is not much related to Holden's objection 1 - his objection is that we cannot reasonably expect a safe and secure implementation of a "Friendly" utility function (even if we had one), because humans have consistently been unable to construct bug-free working-correctly (computer) systems on the first try, proofs have been wrong, etc. You, on the other hand, are arguing against the Friendliness concept on object-level / meta-level ethical grounds.

comment by JoshuaZ · 2012-05-18T01:38:26.745Z · LW(p) · GW(p)

Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry.

Well, most of them do so in part out of their deity telling them that that's a value. If the extrapolated CEV takes into account that they are just wrong about there being such a deity, it should respond accordingly. (I'm working under the what should not be controversial assumption that the AGI isn't going to find out that in fact there is such a deity hanging around.)

Replies from: TimS, thomblake, DanArmak, cousin_it

↑ comment by TimS · 2012-05-18T02:36:45.703Z · LW(p) · GW(p)

There's a chicken and egg issue here. Were pre-existing anti-homosexuality values co-opted into early Judaism? Or did the Judeo-Chiristian ideology spread the values beyond their "natural" spread? The only empirical evidence for this question I can think of is non-Judeo-Christian attitudes. What are the historical attitudes towards homosexuality among East Asians and South Asians?

More broadly, people's attitudes towards women and nerds are just as much expressions of values, not long-ranged utilitarian calculations.

Replies from: None, Luke_A_Somers, JoshuaZ, army1987

↑ comment by [deleted] · 2012-05-18T11:37:38.681Z · LW(p) · GW(p)

What are the historical attitudes towards homosexuality among East Asians and South Asians?

Man, that's variable. Especially in South Asia, where "Hinduism" is more like a nice box for outsiders to describe a huge body of different practices and theoretical approaches, some of them quite divergent. Chastity in general was and is a core value in many cases; where that's not the case, or where the particular sect deals pragmatically with the human sex drive despite teaching chastity as a quicker path to moksha, there might be anything from embrace of erotic imagery and sexual diversity to fairly strict rules about that sort of conduct. Some sects unabashedly embrace sexuality as a good thing, including same-sex sexuality. Islam has historically been pretty doctrinally down on it, but even that has its nuances -- sodomy was often considered a grave sin and still is in many places, while non-penetrative same-sex contact might well be seen as simply a minor thing, not strictly appropriate but hardly anything to get worked up about.

"East Asia" has a very large number of religions as well, and the influence of Confucianism and Buddhism hasn't been uniform in this regard. One vague generality that I might suggest as a rough guideline is that traditionally, homosexuality is sort of tolerated in the closet -- sure, it happens, but as long as everyone keeps up appearances and doesn't make a scene or get caught doing something inappropriate, it's no big deal. Some strains within Mahayana Buddhism have a degree of deprecation of sexual or gender-variant behavior; others don't. Theravada varies as well, but in different ways.

In both cases, cultures vary tremendously. If you widen the scope, many cultures, including many of the foregoing, have traditionally been a lot more accepting of sex and gender variance. There are and were some cultures that were extremely permissive about it.

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-20T06:29:29.956Z · LW(p) · GW(p)

If you want more on the subject of how people think about sexuality, try Straight by Hanne Blank. She tracks the invention of heterosexuality (a concept which she says is less than a century old) in the west.

If part of CEV is finding out how much of what we think is obviously true is just stuff that people made up, life could get very strange.

Replies from: army1987, None

↑ comment by A1987dM (army1987) · 2012-05-20T11:46:51.810Z · LW(p) · GW(p)

She tracks the invention of heterosexuality (a concept which she says is less than a century old) in the west.

The word is likely that recent, but is she claiming that the idea of being interested in members of the other sex but not in members of the same sex as sexual partners was unheard-of before that? Or what does she mean exactly?

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-21T00:23:48.165Z · LW(p) · GW(p)

It's a somewhat complex book, but part of her meaning is that the idea that there are people who are only sexually interested in members of the other sex, and that this is an important category, is recent.

Replies from: None

↑ comment by [deleted] · 2012-05-21T17:52:40.739Z · LW(p) · GW(p)

How could such a thesis be viable, when so much of the historical data has been lost?

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-21T20:18:36.964Z · LW(p) · GW(p)

There's more historical data than you might think-- for example, the way the Catholic Church defined sexual sin in terms of actions rather certain sins being associated with types of people who were especially tempted to engage in them.

There's also some history of how sexual normality became more and more narrowly defined (Freud has a lot to answer for), and then the definitions shifted.

A good bit of the book is available for free at amazon, and I think that would be the best way for you to see whether Blank's approach is reasonable.

Replies from: None, None

↑ comment by [deleted] · 2012-05-21T21:55:30.172Z · LW(p) · GW(p)

The introduction is a catalog of ambiguities about sex, gender, and sexual orientation:

My partner was diagnosed male at birth because he was born with, and indeed still has, a fully functioning penis ... My partner's DNA has a pattern that is simultaneously male, female and neither. This particular genetic pattern, XXY, is the signature of Kleinfelter syndrome ...

We've known full well since Kinsey that a large minority...37 percent...of men have hat at least one same-sex sexual experience in their lives.

No act of Congress of Parliament exists anywhere that defines exactly what heterosexuality is or regulates exactly how it is to be enacted.

Historians have tracked major shifts in other aspects of what was considered common or "normal" in sex and relationships: was marriage ideally an emotional relationship, or an economic and pragmatic one? Was romantic love desirable, and did it even really exist? Should young people choose their own spouses, or should marriage partners be selected by family and friends?

As unnumbered sailors, prisoners, and boarding-school boys have demonstrated, whether one behaves heterosexually or homosexually sometimes seems like little more than a matter of circumstance.

Masculinity does not look, sound, dress, or act the same for a rapper as for an Orthodox Jewish rabbinical student; a California surfer chick does femininity very differently from a New York City lady-who-lunches.

All of these are fair enough, and I've only read the introduction, but I don't have a lot of confidence that she goes on to resolve these contradictions in Less Wrong tree-falls-in-a-forest style. Instead of trying to clarify what people mean when they something like "most people are heterosexual," I get the feeling she only wants to muddy the waters enough to say "no they aren't."

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-21T22:25:34.493Z · LW(p) · GW(p)

I think her point is closer to "people make things up, and keep repeating those things until they seem like laws of the universe".

A possible conclusion is that once people make a theory about how something ought to be, it's very hard to go back to the state of mind of not having an opinion about that thing.

The amazon preview includes the last couple of chapters of the book.

The book could be viewed as a large expansion of two Heinlein quotes: "Everybody lies about sex" and "Freedom begins when you tell Mrs. Grundy to fly a kite".

Replies from: None

↑ comment by [deleted] · 2012-05-21T22:49:04.639Z · LW(p) · GW(p)

I don't recognize the quotes.

I think her point is closer to "people make things up, and keep repeating those things until they seem like laws of the universe".

If so, then her point is more specific: "people made heterosexuality up." But I don't see how this can be supported. Every human being who has ever lived came from a male-female sex act. That has to serve as a lower bound for how unusual and made-up heterosexuality is.

The amazon preview includes the last couple of chapters of the book.

I'll check it out.

Edit: By the way what I can see of the amazon preview is pretty heavily redacted, and doesn't include any complete chapter.

Replies from: TheOtherDave, None, Bugmaster, NancyLebovitz

↑ comment by TheOtherDave · 2012-05-21T23:00:36.823Z · LW(p) · GW(p)

Every human being who has ever lived came from a male-female sex act. That has to serve as a lower bound for how unusual and made-up heterosexuality is.

The abstract property that people we categorize as heterosexual have in common has existed, as you imply, for as long a members of bisexual species have been preferentially seeking out opposite-sex sex partners.

The explicit category in people's brains is more recent than that.

I mean, every human being who has ever lived came from a sex act between two people who were in close physical proximity, but that doesn't mean that the category of "people who prefer to have sex in close physical proximity to one another, rather than at a distance" has been explicitly represented. Indeed, I may have just made it up.

Replies from: None

↑ comment by [deleted] · 2012-05-21T23:30:38.181Z · LW(p) · GW(p)

The explicit category in people's brains is more recent than that.

What do you mean by this? It's incorrect to say that people haven't noticed until recently that it's very common for men to seek out women for sex and vice versa. It's also incorrect to say that people haven't noticed until recently the exceptions to this practice.

Replies from: TheOtherDave, Bugmaster

↑ comment by TheOtherDave · 2012-05-22T03:37:30.197Z · LW(p) · GW(p)

Neither is it correct to say that people haven't noticed that it's very common for people to have sex with people who are physically adjacent to them. But that's not to say that people often think "I'm the sort of person who has sex with people physically adjacent to me."

There's a difference between eating meat from time to time, being aware that I eat meat from time to time, and explicitly thinking of myself as a "meat eater," or as an "omnivore," or as a "carnivore". There's a difference between being really smart, being aware of how well I do at various cognitive tasks, and thinking of myself as "a really smart person".

More generally, there's a difference between having the property X, being aware of evidence of X and acting accordingly, and having formed a mental structure in my mind that represents me as having X.

There's also a difference between all of those and being part of a culture that has "people who have X" as a social construct.

Replies from: None

↑ comment by [deleted] · 2012-05-22T05:05:04.663Z · LW(p) · GW(p)

In most cases, someone who thinks of themselves as "a meat eater" really does eat meat. On the other hand, there are very many people who think of themselves as "a really smart person" but who are not really smart.

Which case is more similar to heterosexuality, in your view?

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-22T15:55:37.879Z · LW(p) · GW(p)

The categories get really fuzzy, really fast, which causes a lot of confusion.

For the sake of concreteness, I'll define my terms as follows (1):

A meat eater is someone who reliably experiences the desire to eat meat, and would sometimes be willing to eat meat if offered, and would not necessarily feel that eating that meat was problematic.
A heterosexual is someone who reliably experiences the desire to have sex with opposite-gendered people, and would sometimes be willing to do so if offered, and would not necessarily feel that having that sex was problematic.
A really smart person is someone who would reliably perform well on certain kinds of real-world problems that I don't know how to define in a noncircular way but I can point to examples of.

Given those definitions, I agree that someone who identifies themselves as a meat eater typically is a meat eater and that someone who identifies themselves as a really smart person frequently is not a really smart person, and I would say that someone who identifies themselves as heterosexual typically is heterosexual.

So, to answer your question: if I look at just those cases, the meat-eater case is more like the heterosexual case than the smart-person case is

===============

(1) I have no particular fondness for those definitions, I picked them as my best approximations to what I thought you probably had in mind. If you would prefer different definitions let me know. Different definitions might change my answer.

Leaving terms like these in their normally fuzzy state causes lots of confusion when trying to have precise discussions of them ... is a prepubescent child who has never been sexually attracted to anyone heterosexual? Is a man who is sexually attracted to other men, has never had sex with one, would refuse to have sex with one if offered (assuming etc.), and regularly has sex with women despite not really being sexually attracted to them heterosexual? Etc. Etc. Etc.

There's nothing especially interesting about these questions, they're just labeling questions... but if we don't agree on the labels, it's easy to confuse labeling questions with actual questions about the underlying states of the world, including states of people's minds.

Replies from: None

↑ comment by [deleted] · 2012-05-22T17:38:32.039Z · LW(p) · GW(p)

I agree with all of this. But I think it all casts Blank's thesis in a bad light: "heterosexuality dates to the 1860s and not earlier" can only be supported if those labeling questions are resolved in a deliberately misleading way. I had the impression you thought differently but perhaps not.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-22T18:10:24.932Z · LW(p) · GW(p)

Not having read the book, I can't speak to Blank's thesis.

I will point out, though, that just because I'm a meat-eater doesn't mean that I ever think of myself as a meat eater, that I ever talk about myself as a meat-eater, or that I live in a culture in which being a meat-eater exists as a social construct.

Similarly, just because I'm heterosexual (which, by the definition above, I am, despite being in a 19-year same-sex relationship) it doesn't follow that I ever think of myself as heterosexual (which I haven't in a little over 20 years), that I talk about myself as heterosexual (which I usually don't), or that I live in a culture where heterosexuality exists as a social construct (which I have for my entire life). Depending on the context I'm working in, different definitions become appropriate.

If I'm talking about social constructs, for example, the statement "heterosexuality dates to the 1860s and not earlier" might be true, or might not... beats me. It certainly isn't true if I'm talking about mate-selection behavior... in that context "heterosexuality" refers to something that predates the evolution of the human race. There are other contexts in which the statement "heterosexuality is about as old as humanity, but not significantly older" might be true.

You seem to be saying that speaking in some of those contexts, or speaking in a way that fails to clarify what context I'm operating in, is necessarily deliberately misleading; if you're saying that, then yes, I think differently. But, again, I haven't read Blank's book, so it's entirely possible that Blank in particular is being deliberately misleading.

Replies from: None

↑ comment by [deleted] · 2012-05-23T02:04:26.671Z · LW(p) · GW(p)

I withdraw "deliberately", after all how would I know. But "social construct" is technical jargon from a controversial theory in a controversial academic discipline. Almost every English-speaking adult knows what straight and gay are, but hardly any of them know what a social construct is. So I do believe that it's misleading to speak of "heterosexuality" when you mean "the social construct of heterosexuality."

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-23T04:37:36.224Z · LW(p) · GW(p)

Whether someone knows what the term "social construct" refers to has nothing to do with the matter. Most people don't know what the term "pheromone" refers to, but it would be mistaken to infer from that that sexual attraction has nothing to do with pheromones, or that discussions of sexual orientation in terms of pheromones is necessarily misleading.

That said, though, sure, if social constructs don't exist at all, then there certainly isn't such a thing as a social construct of heterosexuality, in which case any discussion of same (including my own comments in this thread) is misleading, albeit (as you admit) not necessarily deliberately so.

↑ comment by Bugmaster · 2012-05-21T23:52:13.808Z · LW(p) · GW(p)

I don't know what TheOtherDave means, but I have heard it said before that the notion of treating sexual preference as identity is relatively recent. In the past -- or so the claim goes -- people did of course recognize that some people prefer to have intercourse with members the opposite sex, whereas others did not. But this was seen as merely a preference, similar to disliking broccoli or liking the color red or whatever. A person wouldn't identify as "a heterosexual" or "a homosexual", no more than one would identify as "an anti-broccolist" or a "red-ist" or whatever.

Replies from: Nornagest

↑ comment by Nornagest · 2012-05-25T01:21:31.340Z · LW(p) · GW(p)

That brings up some interesting questions about the way people thought about identity. An awful lot of identity groups got launched around the same time, including some of the first ones I can think of that're based around behavior -- the temperance movement originated in the mid-1830s, for example. I wonder if some shift in the political climate in the early-to-mid 1800s suddenly made it practical to advocate for some behavior or lack thereof by adopting it into a group identity and then using that to argue for a protected category?

Replies from: Gastogh

↑ comment by Gastogh · 2012-05-25T07:29:15.213Z · LW(p) · GW(p)

Insofar as there's a point to such distinctions, I expect the frontlines of that shift to have been cultural and scientific rather than political. "Advocating a behavior by adopting it into a group identity and using that to argue for a protected category" sounds awfully meta; I expect the crucial changes were simpler, more fundamental and centered around what enabled people to argue for a protected category in the first place. I'm thinking along lines like this:

A number of technological advances were made around that time that made setting up movements far easier. The proliferation of various movements coincides nicely with such stuff as improved methods of agriculture (leading to population growth and urbanization), the invention of the telegraph (bridging distances), better transportation in the form of railways and ever faster ships (mobilization, etc.) and probably others that escape me at the moment. A bit later on Darwin and the theory of evolution paved the way for eugenics-style thinking and concepts of inherent superiority between races and nations, and around the turn of the century the rise of scientific (or semi-scientific) psychology opened the doors for minting all kinds of novel ingroup-outgroup divisions. I expect identity-builders had a field day with the concept of the subconscious mind in particular. "You can't help it, those people are just made that way. Fortunately not us, though, haw haw."

On the non-scientific side, there are a number of converging cultural trends and phenomena to take into account.

The decline of the church was an example of how a firmly established institution wasn't necessarily a permanent feature of society.
There's been a general decline in violence throughout society, which made resisting the establishment less scary.
Western Romanticism and the advent of nationalism were a fairly clear case of deliberate identity-building, and it set a precedent for doing the same on a smaller scale.
Not all movements appeared from nowhere; workers' unions had been around for centuries in the form of guilds and such, so all those movements springing up wasn't so much groundbreaking novelty as it was just more of the same.

These aren't exhaustive lists, but I hope the gist is clear.

↑ comment by [deleted] · 2012-05-24T08:35:11.141Z · LW(p) · GW(p)

If so, then her point is more specific: "people made heterosexuality up." But I don't see how this can be supported. Every human being who has ever lived came from a male-female sex act. That has to serve as a lower bound for how unusual and made-up heterosexuality is.

When giraffes mate in such a manner as to produce viable offspring, is that "heterosexuality?"

If yes, why do male giraffes frequently engage in same-sex behavior when nearby females are not in oestrus and receptive to their advances?

To clarify: the term "heterosexuality" doesn't necessarily mean simply "male/female sexual contact." Humans have been doing that for as long as there have been humans. Humans have also been doing same-sex sexual contact for as long as there have been humans (this is not a controversial idea given the huge number of animal species that do, inclusive of our near relatives), but the phenomenon of people being defined as, or identifying with the terms "heterosexual, homosexual and bisexual" is quite recent and cultural-contextual.

Mating such that offspring may be viably produced is a piece of the territory. "Heterosexuality" is a label on one particular map of that territory, and its boundaries and name don't necessarily represent the reality accurately.

Replies from: Richard_Kennaway, NancyLebovitz, None

↑ comment by Richard_Kennaway · 2012-05-29T10:54:27.997Z · LW(p) · GW(p)

The map itself is part of a larger territory. Handshakes only occur in certain cultures; that does not mean there is no such thing as a handshake.

Replies from: None

↑ comment by [deleted] · 2012-05-29T11:27:17.744Z · LW(p) · GW(p)

It does imply that assigning handshakes to the reference class of "things made up by humans," is reasonable though.

Money is made up, but you can still starve to death without it. "Made up" doesn't mean "fake and with no lasting impact."

↑ comment by NancyLebovitz · 2012-05-25T00:52:36.226Z · LW(p) · GW(p)

One more question: Why do people find it so interesting that some animals form same-sex pairings?

Replies from: JoshuaZ, None, None

↑ comment by JoshuaZ · 2012-05-25T01:25:16.719Z · LW(p) · GW(p)

It normally comes up when claims are made of the form "homosexuality is unnatural!" with the implied or explicit "therefore it is wrong/sinful/evil/yucky". Pointing to same-sex pairings in animals is intended as a response to this. The people making the response either don't understand the naturalistic fallacy or consider it to be sufficiently abstract or harder to explain that they don't bother with that line of response.

It is also interesting from a biological standpoint in that it isn't that easy to explain from an ev bio perspective, so studying it makes sense.

↑ comment by [deleted] · 2012-05-29T04:13:25.677Z · LW(p) · GW(p)

It has to do with the fact that it was essentially ignored throughout most of the history of biology as a discipline. It's not like this behavior is new; it's been there the whole time, and so have the observations of the behavior, but the reaction within scientific culture has changed dramatically.

Stuff like interpreting active vs passive animals in a copulatory act as male and female respectively, assuming the animals had simply misidentified the sex of the other party, or assuming that the observing party was necessarily mistaken, publication and citation biases, and the frequently-opaque titles, abstracts and contents of those published studies that did manage to make it into the journals ("A Note on the Apparent Lowering of Moral Standards in the Lepidoptera", W.J. Tenant, 1987, Entemologists Record and Journal of Variation).

It's news to a whole lot of people, in other words.

↑ comment by [deleted] · 2012-05-25T01:16:49.448Z · LW(p) · GW(p)

Wild mass guessing: animals are incapable of sin?

↑ comment by [deleted] · 2012-05-25T00:37:13.979Z · LW(p) · GW(p)

When giraffes mate in such a manner as to produce viable offspring, is that "heterosexuality?"

If yes, why do male giraffes frequently engage in same-sex behavior when nearby females are not in oestrus and receptive to their advances?

Your second question is very interesting! I don't know why asking it is contingent on a "yes" answer to your first question, which is tiresome.

the phenomenon of people being defined as, or identifying with the terms "heterosexual, homosexual and bisexual" is quite recent and cultural-contextual.

If you like, I'd be interested to hear what you mean by these phrases in more detail:

"defined as, or identifying with"
"cultural-contextual"

Replies from: TimS

↑ comment by TimS · 2012-05-25T00:47:07.692Z · LW(p) · GW(p)

In the United States, dog meat is defined as "not food." In other cultures, the definition of "food" includes dog meat. The meaning of "food" depends on context, specifically, the cultural context.
Just to be clear, I think the brouhaha about whether it is acceptable to eat dog is strong proof that "food" is more narrowly defined than "material capable of being consumed for sustenance by humans."

The assertion is that "homosexuality" is a word whose meaning is as culturally dependent as the word "food."

Replies from: None, None

↑ comment by [deleted] · 2012-05-25T01:25:57.786Z · LW(p) · GW(p)

Just to be clear, I think the brouhaha about whether it is acceptable to eat dog is strong proof that "food" is more narrowly defined than "material capable of being consumed for sustenance by humans."

I don't agree. I think "food" has a broad definition that is context dependent, not culturally dependent.

Every human culture has language for food. Yes sometimes people say "that's not food" when they mean "there's a taboo against eating that" and sometimes they say "that's not food" when they mean "that's not edible." Perhaps sometimes they mean something else. But to tell what they mean depends on context, not culture.

Of course taboos vary across cultures, as does knowledge about what is and isn't edible.

Replies from: TimS

↑ comment by TimS · 2012-05-25T01:48:30.708Z · LW(p) · GW(p)

I'm not trying to play games with definitions - if taboo is a more intuitive label for you, then that's the word I'll use. The modern usage of the label "homosexual" invokes a substantial number of social taboos.

Those taboos vary from culture to culture. Because cultures change over time, that statement implies that the relevant taboos have changed over time. In short, the concepts intended to be invoked by the word "homosexuality" depend on the cultural context.

Further, the historical record isn't clear that any cluster of taboos related to the current homosexuality cluster existed until fairly recently in history.

Replies from: None

↑ comment by [deleted] · 2012-05-25T02:37:39.183Z · LW(p) · GW(p)

Further, the historical record isn't clear that any cluster of taboos related to the current homosexuality cluster existed until fairly recently in history.

This doesn't sound right to me, but maybe only because it's vague. Famously, ancient jews forbade each other from male-male sex. I agree with the rest.

Replies from: Nornagest

↑ comment by Nornagest · 2012-05-25T04:05:18.871Z · LW(p) · GW(p)

And the Bulgarian Cathars gave us the word "buggery", which was a slur even back then. But the thing that keeps me from dismissing this all as wishful thinking on the part of queer-friendly sociology professors is that all those old prohibitions that I've been able to find refer to same-sex intercourse, the act (and usually only male-male intercourse at that), rather than homosexuality, the state. That doesn't exactly prove that sexual identity as such is a modern invention -- frank discussions of sexuality are rather thin on the ground in European culture between the Romans and the early modern period -- but it does seem to point in that direction: if a concept of sexual identity existed, I'd expect homosexual identities to be condemned if homosexual acts were.

↑ comment by [deleted] · 2012-05-25T10:18:07.552Z · LW(p) · GW(p)

This exactly.

(Tangentially: food is a great example of how culture impacts...well, so many things, but perception among them.)

↑ comment by Bugmaster · 2012-05-21T22:55:42.686Z · LW(p) · GW(p)

Every human being who has ever lived came from a male-female sex act.

Technically, given our modern technology, this is no longer true; though throughout most of human history this was indeed the case.

Replies from: None

↑ comment by [deleted] · 2012-05-21T22:58:07.802Z · LW(p) · GW(p)

OK, but I think to say "almost every human being who has ever lived..." would be a misleading understatement.

Replies from: Bugmaster

↑ comment by Bugmaster · 2012-05-21T22:59:47.154Z · LW(p) · GW(p)

Yeah I suppose you're right. I wasn't really trying to nitpick your statement, but instead to express my admiration of modern technology. We've come pretty far since the days of Ancient Greece.

Replies from: Strange7

↑ comment by Strange7 · 2012-05-22T09:33:56.109Z · LW(p) · GW(p)

Even before modern IVF, I'm pretty sure it's medically possible for a woman to become pregnant with sperm donated by a man she's never been within arm's reach of, kept on e.g. a damp cloth. I wouldn't be so quick to rule out the possibility of such a thing having happened in Ancient Greece at some point.

↑ comment by NancyLebovitz · 2012-05-22T00:38:06.958Z · LW(p) · GW(p)

The quotes are from Heinlein's "The Notebooks of Lazarus Long" which were sections in Time Enough for Love. In theory, they're the wisdom of a man who's thousands of years old. If you pay attention to the details, it turns out that they're selections by a computer (admittedly, a sentient computer) from hours of talk in which Lazarus Long was encouraged to say whatever he wanted. He could be mistaken or lying. He's none too pleased to be kept alive for his wisdom when he'd intended to commit suicide.

He may or may not be a mouthpiece for Heinlein.

↑ comment by [deleted] · 2012-05-21T21:36:41.878Z · LW(p) · GW(p)

Oh, so her thesis is that in the west, orientation-as-identity dates back to 1860-ish. I can imagine that being defensible. That's way different from what you originally wrote, though.

You see, the first thing that came to mind was Aristophanes' speech in the Symposium, which explicitly recognizes orientation-as-identity and predates the Catholic Church by a couple centuries.

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-21T22:34:03.929Z · LW(p) · GW(p)

Thanks for the cite.

↑ comment by [deleted] · 2012-05-20T06:51:59.295Z · LW(p) · GW(p)

Hell, you don't need CEV for that. A decent anthropology textbook will get you quite a distance there (even if only superficially)...

Replies from: None, NancyLebovitz

↑ comment by [deleted] · 2012-05-20T07:19:27.707Z · LW(p) · GW(p)

Can you recommend a book / author? (Interested outsider, no idea what the good stuff is, have read Jared Diamond and similar works.)

Replies from: None

↑ comment by [deleted] · 2012-05-22T04:58:51.641Z · LW(p) · GW(p)

The Reindeer People by Piers Vitebsky is a favorite of mine, wich focuses on the Eveny people of Siberia. The Shaman's Coat: A Native History of Siberia, by Anna Read, is a good overview of SIberian peoples. Marshall Sahlins' entire corpus is pretty good, although his style puts some lay readers off. Argonauts of the Western Pacific by Branislaw Malinowski deals with Melanesian trade and business ventures. It's rather old at this point, but Malinowski had a fair influence on the development of anthropology thereafter. Wisdom Sits in Places by Keith Basso, which deals with an Apache group. The Nuer by EE Evans Pritchard is older, and very dry, but widely regarded as a classic in the field. It deals with the Nuer people of Sudan. The Spirit Catches You And You Fall Down by Ann Fadiman is not strictly an ethnography, but it's very relevant to anthropological mindsets and is often required reading in first-year courses in the field. Liquidated: An Ethnography of Wall Street by Karen Ho, is pretty much what it says in the title, and a bit more contemporary. Debt: The First 5000 Years by David Graeber mixes in history and economics, but it's generally relevant. Pathologies of Power by Paul Farmer focuses on the poor in Haiti. Friction: An Ethnography of Global Connection by Ana Tsing is kind of complicated to explain. Short version: it takes a look at events in Indonesia and traces out actors, groups, their motivating factors, and so on.

↑ comment by NancyLebovitz · 2012-05-20T06:59:36.871Z · LW(p) · GW(p)

I wonder whether people who've studied anthropology find that it's affected their choices.

Replies from: None

↑ comment by [deleted] · 2012-05-20T10:00:26.454Z · LW(p) · GW(p)

It certainly did mine.

Replies from: NancyLebovitz

↑ comment by NancyLebovitz · 2012-05-20T11:13:43.110Z · LW(p) · GW(p)

I'm interested in any details you'd like to share.

Replies from: None, CronoDAS

↑ comment by [deleted] · 2012-05-22T05:48:18.470Z · LW(p) · GW(p)

It made me a lot more comfortable dealing with people who might be seen as "regressive", "bland", "conservative" or just who seem otherwise not very in-synch with my own social attitudes and values. Getting to understand that culture and culturally-transmitted worldviews do constitute umbrella groups, but that people vary within them to similar degrees across such umbrellas, made it easier to just deal with people and adapt my own social responses to the situation, and where I feel like the person has incorrect, problematic or misguided ideas, it made it easier to choose my responses and present them effectively.

It made me more socially-conscious and a bit more socially-successful. I have some considerable obstacles there, but just having cultural details available was huge in informing my understanding of certain interactions. When I taught ESL, many of my students were Somali and Muslim. I'm also trans, and gender is a very big thing in many Islam-influenced societies (particularly ones where men and women for the most part don't socialize). I learned a bit about fashion sense and making smart choices just by noticing how the men reacted to what I wore, particularly on hot days. I learned a lot about gender-marked social behavior and signifiers from my interactions with the older women in the class and the degree to which they accepted me (which I could gauge readily by their willingness to engage in casual touch, say to get my attention or when thanking me, or the occasional hug from some of my students).

It made me a far better worldbuilder than I was before, because I have some sense of just how variable human cultures really are, and how easy it is to construct a superficially-plausible theory of human cultures, history or behavior while missing out on the incredible variance that actually exists.

It made me far less interested in evolutionary psychology as an explanation for surface-level behaviors, let alone broad social patterns of behavior, because all too often cited examples turn out to be culturally-contingent. I think the average person in Western society has a very confused idea of just how different other cultures can be.

It made me skeptical of CEV as a thing that will return an output. I'm not sure human volition can be meaningfully extrapolated, and even if it can, I'm far from persuaded that the bits of it that cohere add up to anything you'd base FAI on.

It convinced me that the sort of attitudes I see expressed on LW towards "tradition" and traditional culture (especially where that experiences conflict with global capitalism) are so hopelessly confused about the thing they're trying to address that they essentially don't have anything meaningful to say about it, or at best only cover a small subset of the cases that they're applied to. It didn't make me a purist or instill some sort of half-baked Prime Directive or anything; cultures change and they'll do that no matter what.

It helped me grasp my own cultural background and influences better. It gave me some insight into the ways in which that can lock in your perceptions and decisions, and how hard that is to change that, and how easy it is to confuse that with something "innate" (and how easy it is to confuse "innate" with "genetic"). It helped me grasp how I could substitute or reprogram bits of that, and with a bit of time and practice it helped me understand the limitations on that.

There's...probably a whole ton more, but I'm running out of focus right now.

EDIT: Oh! It made me hugely more competent at navigating, interpreting and understanding art, especially from other cultures. Literary modes, aesthetics, music and styles; also narrative and its uses.

Replies from: Eliezer_Yudkowsky, Zack_M_Davis

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-05-22T19:14:42.102Z · LW(p) · GW(p)

Fascinating, but... my Be Specific detector is going off and asking, not just for the abstract generalizations you concluded, but the specific examples that made you conclude them. Filling in at least one case of "I thought I should dress like X, but then Y happened, now I dress like Z", even - my detector is going off because all the paragraphs are describing the abstract conclusions.

Replies from: None, John_Maxwell_IV, TimS, None

↑ comment by [deleted] · 2012-05-23T00:01:29.621Z · LW(p) · GW(p)

With regard to examples about clothing, one handy one would be:

I'd been generally aware that while the Muslim women's reactions to me seemed to be more or less constant for a while, it had stood out to me that the men's reactions were considerably more volatile. At the time I gauged this in terms of body language: the apparent tension of the facial muscles, the set of the shoulders, the extension of the arms, what the hands are doing, gestural or expressive mirroring... I don't have formal training in this stuff, and being fairly autistic I don't seem to have the same reactions to it that neurotypical people do, but on some perceptual level it just clicks that this person is relaxed or curious or uncomfortable or very uncomfortable.

Anyway, so I hadn't really put thought into how I should dress before, in that context. I just wore the clothes I was comfy with the first day I started teaching, and didn't notice any issues that stood out to me. I kept doing that until summer arrived. My usual fashion sense is fairly covering and drapey (I like cardigans, skirts and "big billowy hippie pants"). At the time I also had a penchant for wearing a head scarf (not a full wrap like the Muslim women in class wore, though -- just fancy bandanas), more on that later.

On warmer days, I'd avoid wearing my hoodie or jacket and just do short-sleeve shirts. Some days I'd wear the hoodie but have shorts instead of pants or skirt. I was mostly busy with the teaching so it took a while for the pattern to reach conscious awareness, but gradually it dawned on me that the men displayed more signs of discomfort on these days. It didn't seem like such a big deal that I was worried, though; it was a noticeable element but didn't really interfere with the flow of class, and the bulk of the class (non-Muslim men and women plus Muslim women) didn't seem to care.

Then one day I wore a tank top plus shorts. This was during the height of summer, and it didn't strike me as particularly unusual. Suddenly the reaction difference was very marked. None of the Muslim students, men or women, felt comfortable looking at me at all. They tensed up in reaction to me getting closer. They entirely avoided asking for help during computer time (which necessitates me getting pretty close since I'd have to peer over their shoulders at the laptop, in a crowded classroom -- on a related note, this was a huge test case for how my "gendered socialization" cues were doing, since when the women were comfy with me their body language was VERY clear on that point), and no matter how obviously they were struggling with the material they said they were fine. They wouldn't actually breach etiquette and tell me to leave them alone with it, but they also clearly weren't comfortable with me there. They wouldn't make eye contact, they wouldn't even look at me directly, and they certainly weren't okay with me entering their personal space distance. This even applied to the women who'd treated me like a friend, not just a teacher -- all the informality was gone.

Through all of this, my non-Muslim students (men and women both) remained more or less consistent about their body language; whether or not they liked me personally seemed a whole lot more relevant to their comfort (always erring on the side of polite in any case). My clothing choices didn't seem to faze them.

I decided the very next day to compromise. I wore something a bit more covering...and blasted the air conditioner in the room. It took a while to find an equilibrium that really worked for people (differing temperature comfort zones), but negotiating settings on a thermostat was a whole lot easier, than trying to teach a class full of students who were too uncomfortable to focus. After a week, the Muslim women students were acting like it had never happened, the Muslim men were comfy enough to function in class (if a little more politely-distant than they had been) and the non-Muslim men and women remained pretty consistent throughout.

(Mind, once winter came around, we had the opposite problem -- all of my students were from hot places, I can't stand heat, and to preserve social comfort I had to keep them from blasting the heat all day...)

EDIT: Oh right, the headscarf thing. I noticed that it seemed to make a small but positive difference as well, mostly with newly-arrived Muslim women students. It wasn't a huge effect, but after about eight months I'd elected to wear a scarf every day for the first week or two after we got a new student matching those labels, especially during one-on-one pullouts and interactions between class. It seemed to make affective mirroring go smoother during the get-to-know-you period, although it was a subtle thing, and didn't seem to make a difference at all with anyone who'd been there for more than a couple months as of when I met them.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-05-23T00:23:28.983Z · LW(p) · GW(p)

(Bows.) Thank you for Being Specific!

↑ comment by John_Maxwell (John_Maxwell_IV) · 2012-05-22T19:53:24.658Z · LW(p) · GW(p)

I suspect humans are a lot better at remembering abstract generalizations about what occurs than specific instances. (And probably with good reason; abstract generalizations probably take up less space.)

As a child, arguing with siblings, I had lots of arguments of the form "You're accusing me of X? But you always do it yourself!" / "Oh yeah? Name one example!" / "I can't think of any, but you still always do it!" But even if I was on the side asking for examples, I kind of knew in the back of my head that I was being dishonest, because I remembered the abstract generalization myself as well.

Of course being specific is still a good idea. It may be that the habit of being specific only helps you going forward, as you begin to get in the habit of storing specific instances.

↑ comment by TimS · 2012-05-22T20:01:41.411Z · LW(p) · GW(p)

For politics-is-the-mindkiller reasons, specifics in this instance run a substantial chance of being downvoted. If Jandila wants, for politeness sake, to avoid starting a fight, that's a rational choice.

Nonetheless, I agree that be more specific would be valuable, both intrinsically and because specifics would show that Jandila has a deeper grasp of rationality (Talk is cheap, and such-like). To restate my point, I agree that specifics would make "an interesting and valuable top-level post"

Replies from: None

↑ comment by [deleted] · 2012-05-22T23:13:21.783Z · LW(p) · GW(p)

If Jandila wants, for politeness sake, to avoid starting a fight, that's a rational choice.

More like "Am feeling low confidence about own ability to express this in a way such that intended point will come through with sufficient signal to seperate it from the noise of other possible readings." This is not simply confusing "has understood my point" with "agrees with my point"; I actually have a bit of a difficult time unpacking things like this because of how low-level perceptual it gets for me. I have conceptual synaesthesia, so I can glimpse distinctions and nuances pretty clearly, but it's very difficult to translate "It's that curly bit of the shape over there" back into argument-speak. Makes downvoting easy; even when I know what I mean and can tell the other party hasn't understood what I said, I can't really argue that my presentation sucked.

Since there seems to be an interest in me making a go at it, I'll give this some thought.

↑ comment by [deleted] · 2012-05-22T23:14:56.332Z · LW(p) · GW(p)

See my reply to Tim S below -- you're right that it's vague, and I'm thinking it might be worthwhile to go to the trouble of laying it out a bit more.

↑ comment by Zack_M_Davis · 2012-05-22T06:08:26.310Z · LW(p) · GW(p)

It convinced me that the sort of attitudes I see expressed on LW towards "tradition" and traditional culture [...] are so hopelessly confused about the thing they're trying to address that they essentially don't have anything meaningful to say about it

(I think this could make an interesting and valuable top-level post.)

Replies from: None

↑ comment by [deleted] · 2012-05-22T07:13:34.779Z · LW(p) · GW(p)

Maybe. I'm not sure I'm able to write on that particular topic well enough to sit at the top-level, but it does get weird. Partly it's my own perspective as a person with cultural backgrounds that are not common here (mixed in with some cultural backgrounds that are) and perspectives on those; I can see what's bugging me but it's hard to construct it into any kind of overarching thesis (other than "LW is collectively bad at this").

↑ comment by CronoDAS · 2012-05-21T05:31:50.234Z · LW(p) · GW(p)

Me too.

↑ comment by Luke_A_Somers · 2012-05-18T03:20:40.367Z · LW(p) · GW(p)

Like most of Leviticus, the edicts against homosexuality were an attempt to belatedly change 'have no gods before me' into 'don't have any other gods, period' by banning all of the specific religious practices of the competing local religions, which involved things like, say, eating shellfish, wearing sacred garb composed of mixed fibers, etc.

So maybe some of them were homophobes, but it's not necessary; and if they'd all been homophobes there wouldn't have been a need to establish the rule.

Replies from: TimS

↑ comment by TimS · 2012-05-18T12:55:52.620Z · LW(p) · GW(p)

That's a good point. It fairly strongly suggests that Judeo-Christian anti-homosexuality values would not survive coherent extrapolation because it provides an explanation for why the value was included originally. As JoshuaZ stated, I don't expect religious values whose sole function was religious in-group-ism to persist after a CEV process.

Replies from: army1987

↑ comment by A1987dM (army1987) · 2012-05-20T11:51:05.292Z · LW(p) · GW(p)

Well, if Christian anti-homosexuality was just a religious in-group-ism, they wouldn't be outraged by non-Christians having sex with members of the other sex any more than by (say) non-Christians eating meat on Lent Fridays. Are they?

↑ comment by JoshuaZ · 2012-05-18T02:40:12.441Z · LW(p) · GW(p)

I don't know the history in East Asia, but closer to where the Abrahamic religions arose one had the ancient Greeks who were ok with most forms of homosexuality. The only reservations they had about homosexuality as I understand it had to do with issues of honor if one were a male who was penetrated.

Edit: I get the impression from this article that the attitudes of ancient Indians to homosexuality has become so bogged down in modern politics that it may be difficult for non-experts to tell. I'll try to look into this more later.

↑ comment by A1987dM (army1987) · 2012-05-20T11:43:22.881Z · LW(p) · GW(p)

The only empirical evidence for this question I can think of is non-Judeo-Christian attitudes.

IIRC, in pre-Christian Rome/Greece, homosexuality was considered OK only if the receiving partner was young enough.

↑ comment by thomblake · 2012-05-18T14:07:03.696Z · LW(p) · GW(p)

I'm working under the what should not be controversial assumption that the AGI isn't going to find out that in fact there is such a deity hanging around.

Just as helpfully, if the FAI concludes that there is a deity around who we should please and who would prefer objecting to gay marriage, it will properly regard that as a value.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T14:37:49.309Z · LW(p) · GW(p)

Or, presumably, if it concludes that there might some day come to be a deity, or other vastly powerful entity, who would prefer having objected to gay marriage.

Of course, all of this further presumes that there aren't/won't be other vastly powerful entities whose preferences have equal weight in opposite directions.

↑ comment by DanArmak · 2012-05-19T15:57:29.538Z · LW(p) · GW(p)

Extrapolated CEV would be working from observable evidence + a good prior. Whereas lots of people insist it's very important to them to believe in a deity through faith, despite any contrary evidence (let alone lack of evidence). How are you going to tell the CEV to ignore such values?

↑ comment by cousin_it · 2012-05-18T08:13:57.310Z · LW(p) · GW(p)

If CEV is allowed to stomp theistic values as you describe, it might also stomp some values that people hold because they believe too much in human equality.

comment by TimS · 2012-05-18T01:35:48.466Z · LW(p) · GW(p)

the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies. These are not incompletely-extrapolated values that will change with more information; they are values. Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry.

Without endorsing the remainder of your argument, I agree that these observations must be adequately explained, and rejection of the conclusions well justified - or the concept of provably Friendly AI must be considered impossible.

comment by TheOtherDave · 2012-05-18T01:33:02.458Z · LW(p) · GW(p)

Thanks for tying these together.

I would love to hear someone who believes in the in-principle viability of performing a bottom-up extrapolation of human values into a coherent whole that can be implemented by a system vastly different from a human in a way I ought to endorse make a case for that viability that addresses these concerns specifically; while I don't fully agree with everything said here, it captures much of my own skepticism about that viability much more coherently than I've been able to express it myself .

Replies from: TimS

↑ comment by TimS · 2012-05-18T13:02:50.976Z · LW(p) · GW(p)

that can be implemented by a system vastly different from a human in a way I ought to endorse

Why do you think this part is difficult? If there are any coherent human value systems, then it seems very plausible (if difficult to build) for any agent to implement the value system, even if the agent isn't human.

Put slightly differently, my objection to the possibility of a friendly-to-Catholicism AI is that Catholicism (like basically all human value systems) is not coherent. If it were proved coherent, I would agree that it was possible to build an AI that followed it (obviously, I'd personally oppose building such an AI - it would be an act of violence against me)

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T14:21:08.727Z · LW(p) · GW(p)

I don't mean to imply that, given that we've performed a bottom-up extrapolation of human values into a coherent whole, that implementing that whole in a system vastly different from a human is necessarily difficult.

Indeed, by comparison to the first part, it's almost undoubtedly trivial, as you suggest.

Rather, I mean that what is at issue is extrapolating the currently instantiated value systems into "a coherent whole that can be implemented by a system vastly different from a human".

That said, I do think it's worthwhile to distinguish between "Catholicism" and "the result of extrapolating Catholicism into a coherent whole." The latter, supposing it existed, might not qualify as an example of the former. The same is true of "human value".

comment by [deleted] · 2012-05-18T01:27:07.075Z · LW(p) · GW(p)

(This is a revealing post, in that it takes the problem of values and treats it in a mathematically-precise way, and received many downvotes without any substantive objections to either the math or to the analogy asserting that the math is appropriate. I have found in other posts as well that making a mathematical argument based on an abstraction results in more downvotes than does merely arguing from a loose analogy.)

(emphasis added.)

Except Peter de Blanc's comments.

Replies from: ciphergoth, PhilGoetz

↑ comment by Paul Crowley (ciphergoth) · 2012-05-18T06:58:45.370Z · LW(p) · GW(p)

Now that the huffy remark has been removed, I can't see what post it used to refer to!

↑ comment by PhilGoetz · 2012-05-18T03:15:55.205Z · LW(p) · GW(p)

Peter deBlanc is a better mathematician than I am, so I'd better look at them.

ADDED. I see I responded to them before. I think they're good points but don't invalidate the model. I'll retract my huffy statement from the post, though.

Replies from: None

↑ comment by [deleted] · 2012-05-18T03:31:59.534Z · LW(p) · GW(p)

The point of his remarks, in my view, was that your model needed validation in the first place. Every mathematical biology or computational cognitive science paper I've read makes some attempt to rationalize why they are bothering to examine whatever idealized model is under consideration.

comment by timtyler · 2012-05-19T00:30:25.142Z · LW(p) · GW(p)

I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can. This is an idea that comes from mathematics, symbolic logic, and classical AI. A symbolic approach would probably make proving safety easier. But human brains don't work that way. You can and do change your values over time, because you don't really have terminal values.

You may have wanted to - but AFAICS, you didn't - apart from this paragraph. It seems to me that it fails to make its case. The split applies to any goal-directed agent, irrespective of implemetation details.

comment by Nick_Beckstead · 2012-05-18T02:26:07.404Z · LW(p) · GW(p)

This link

Values vs. parameters: Eliezer has suggested using...

is broken.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-05-18T03:14:14.463Z · LW(p) · GW(p)

Fixed; thanks.

comment by jacobt · 2012-05-18T07:28:54.432Z · LW(p) · GW(p)

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the universe adopt "evolved" values, then CEV will extrapolate this desire. The only issue is that other people might not share this desire, even when extrapolated. In that case insisting that values "evolve" is imposing minority desires on everyone, mostly people who could never be convinced that these values are good. Which might be a good thing, but it can be handled in CEV by taking CEV(some "progressive" subset of humans).

Replies from: cousin_it

↑ comment by cousin_it · 2012-05-18T08:19:37.564Z · LW(p) · GW(p)

This seems a nice place to link to Marcello's objection to CEV, which says you might be able to convince people of pretty much anything, depending on the order of arguments.

Replies from: torekp, gRR

↑ comment by torekp · 2012-05-28T00:04:15.645Z · LW(p) · GW(p)

I think Marcello's objection dissolves when the subject becomes aware of the order-of-arguments effects. After all, those effects are part of the factual information that the subject considers in refining its values. Most people don't like to have values that change depending on the order in which arguments are presented, so they will reflect further until they each find a stable value set. At least, that would be my hypothesis.

↑ comment by gRR · 2012-05-18T13:17:35.634Z · LW(p) · GW(p)

I think it would be impossible to convince people (assuming suitably extrapolated intelligence and knowledge) that total obliteration of all life on Earth is a good thing, no matter the order of arguments. And this is a very good value for a FAI. If it optimizes this (saves life) and otherwise interferes the least, it already done excellent.

Replies from: thomblake, DanArmak, thomblake

↑ comment by thomblake · 2012-05-18T13:18:46.562Z · LW(p) · GW(p)

There are nihilists who at least claim that position.

Replies from: gRR

↑ comment by gRR · 2012-05-18T13:28:19.305Z · LW(p) · GW(p)

They are probably lying, trolling, joking, or psychos (=do not have enough extrapolated intelligence and knowledge).

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T15:40:51.797Z · LW(p) · GW(p)

If you're launching an irreversible CEV, it's not very safe to rely on your intuition that other people's expressed desires are "probably lying, trolling, joking" and so wouldn't affect the CEV outcome.

Replies from: gRR

↑ comment by gRR · 2012-05-19T16:36:56.563Z · LW(p) · GW(p)

I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T16:52:46.448Z · LW(p) · GW(p)

How do you propose to test it without actually running a CEV calculation?

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:01:39.288Z · LW(p) · GW(p)

How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?

Replies from: wedrifid, DanArmak

↑ comment by wedrifid · 2012-05-26T04:19:01.738Z · LW(p) · GW(p)

How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?

It would seem that we can define the algorithm which can be used to manipulate and process a given input of loosely defined inconsistent preferences. This would seem to be a necessary thing to do before any actual brain scanning becomes involved.

↑ comment by DanArmak · 2012-05-19T17:19:12.074Z · LW(p) · GW(p)

Well part of my point is that indeed we can't even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV's output would be like are completely unsupported by evidence; they are mere wishful thinking.

More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them.

Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive).

Now, suppose someone expresses a value that you think they are merely "lying, trolling or joking" about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less.

By construction, since you couldn't derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren't likely to be in them in the future, either.)

So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is "their true values", while their behavior in the real, common situations throughout their lives isn't? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are probably situations in which you could be placed so that you would come to value their values more.

Your approach places people in the unfortunate position where they might live their whole lives believing in a value, and fighting for it, and then you (or the CEV AI) come up to them and says: I'm going to destroy everything you've valued so far. Not because of objective ethics or decree of God or majority vote or anything objective and external. But because they themselves actually "really" prefer completely different values even though on the conscious level, no matter how long they might think and talk and read about it, they would never reach that conclusion.

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:28:43.480Z · LW(p) · GW(p)

In all of their behavior throughout their lives, and in their own words today, they honestly have this value

This is the conditional that I believe is false when I say "they are probably lying, trolling, joking". I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.

Replies from: DanArmak, JoshuaZ

↑ comment by DanArmak · 2012-05-19T17:31:08.420Z · LW(p) · GW(p)

OK. That's possible. But why do you believe that, despite their large numbers and lifelong avowal of those beliefs?

↑ comment by JoshuaZ · 2012-05-19T17:40:46.757Z · LW(p) · GW(p)

How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:43:53.466Z · LW(p) · GW(p)

I'd think someone's playing a practical joke on me.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2012-05-19T17:49:24.031Z · LW(p) · GW(p)

And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says "yep, he does" how would you respond?

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:56:37.497Z · LW(p) · GW(p)

Dunno... propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don't expect this to happen.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2012-05-19T17:59:12.826Z · LW(p) · GW(p)

That you don't expect it to happen shouldn't by itself be a reason not to consider it. I'm asking because it seems you are avoiding the hard questions by more or less saying you don't think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.

Replies from: gRR

↑ comment by gRR · 2012-05-19T18:11:09.841Z · LW(p) · GW(p)

Errr. This is a question of simple fact, which is either true or false. I believe it's true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-19T21:39:25.461Z · LW(p) · GW(p)

You've lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?

Replies from: gRR

↑ comment by gRR · 2012-05-19T22:33:10.204Z · LW(p) · GW(p)

I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

Replies from: TheOtherDave, Desrtopa

↑ comment by TheOtherDave · 2012-05-20T04:15:27.018Z · LW(p) · GW(p)

OK, cool.

To answer your question: sure, if I assume (as you seem to) that the extrapolation process is such that I would in fact endorse the results, and I also assume that the extrapolation process is such that if it takes as input all humans it will produce at least one desire that is endorsed by all humans (even if they themselves don't know it in their current form), then I'd agree that's a good plan, if I further assume that it doesn't have any negative side-effects.

But the assumptions strike me as implausible, and that matters.

I mean, if I assume that everyone being thrown into a sufficiently properly designed blender and turned into stew is a process I would endorse, and I also assume that the blending process has no negative side-effects, then I'd agree that that's a good plan, too. I just don't think any such blender is ever going to exist.

Replies from: gRR

↑ comment by gRR · 2012-05-20T10:44:33.126Z · LW(p) · GW(p)

Ok, but do you grant that running a FAI with "unanimous CEV" is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing - if I'm wrong about my hypothesis?

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-20T16:07:27.785Z · LW(p) · GW(p)

I don't know how to answer that question. Again, it seems that you're trying to get an answer given a whole bunch of assumptions, but that you resist the effort to make those assumptions clear as part of the answer.

It is not clear to me that there exists such a thing as a "unanimous CEV" at all, even in the hypothetical sense of something we might be able to articulate some day with the right tools.
If I nevertheless assume that a unanimous CEV exists in that hypothetical sense, it is not clear to me that only one exists; presumably modifications to the CEV-extraction algorithm would result in different CEVs from the same input minds, and I don't see any principled grounds for choosing among that cohort of algorithms that don't in effect involve selecting a desired output first. (In which case CEV extraction is a complete red herring, since the output was a "bottom line" written in advance of CEV's extraction, and we should be asking how that output was actually arrived at and whether we endorse that process. )
If I nevertheless assume that a single CEV-extraction algorithm is superior to all the others, and further assume that we select that algorithm via some process I cannot currently imagine and run it, and that we then run a superhuman environment-optimizer with its output as a target, it is not clear to me that I would endorse that state change as an individual. So, no, I don't agree that running it is uncontroversial. (Although everyone might agree afterwards that it was a good idea.)
If the state change nevertheless gets implemented, I agree (given all of those assumptions) that the resulting state-change improves the world by the standards of all humanity. "Safe" is an OK word for that, I guess, though it's not the usual meaning of "safe."
I don't agree that the worst that happens, if those assumptions turn out to be wrong, is that it stands there and does nothing. The worst that happens is that the superhuman environment-optimizer runs with a target that makes the world worse by the standards of all humanity.

(Yes, I understand that the CEV-extraction algorithm is supposed to prevent that, and I've agreed that if I assume that's true, then this doesn't happen. But now you're asking me to consider what happens if the "hypothesis" is false, so I am no longer just assuming that's true. You're putting a lot of faith in a mysterious extraction algorithm, and it is not clear to me that a non-mysterious algorithm that satisfies that faith is likely, or that the process of coming up with one won't come up with a different algorithm that antisatisfies that faith instead.)

Replies from: gRR

↑ comment by gRR · 2012-05-20T17:25:19.667Z · LW(p) · GW(p)

What I'm trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:

"Unanimous" CEV exists
And is unique
And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),

would you say that running it is uncontroversial? If not, what other conditions are required?

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-20T17:36:55.880Z · LW(p) · GW(p)

No, I wouldn't expect running it to be uncontroversial, but I would endorse running it.

I can't imagine any world-changing event that would be uncontroversial, if I assume that the normal mechanisms for generating controversy aren't manipulated (in which case anything might be uncontroversial).

Why is it important that it be uncontroversial?

Replies from: gRR

↑ comment by gRR · 2012-05-20T17:56:51.412Z · LW(p) · GW(p)

Why is it important that it be uncontroversial?

I'm not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.

Ok, you're right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where "sufficiently" is not a too high threshold?

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-20T19:08:25.358Z · LW(p) · GW(p)

There probably exists (hypothetically) some plan such that it wouldn't seem unreasonable to me to declare anyone who doesn't endorse that plan either insufficiently well-informed or insufficiently intelligent.

In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.

Replies from: gRR

↑ comment by gRR · 2012-05-20T19:43:56.360Z · LW(p) · GW(p)

I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-20T20:07:56.337Z · LW(p) · GW(p)

Agreed that an actual concrete plan would be a valuable thing, for the reasons you list among others.

↑ comment by Desrtopa · 2012-05-19T22:36:06.666Z · LW(p) · GW(p)

I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing.

Does the existence of the Voluntary Human Extinction Movement affect your belief in this proposition?

Replies from: gRR

↑ comment by gRR · 2012-05-19T22:40:15.792Z · LW(p) · GW(p)

VHEMT supports human extinction primarily because, in the group's view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.

Obviously, human extinction is not their terminal value.

Replies from: Desrtopa

↑ comment by Desrtopa · 2012-05-19T22:44:03.424Z · LW(p) · GW(p)

Or at least, not officially. I have known at least one person who professed to desire that the human race go extinct because he thought the universe as a whole would simply be better if humans did not exist. It's possible that he was stating such an extreme position for shock value (he did have a tendency to display some fairly pronounced antisocial tendencies,) and that he had other values that conflicted with this position on some level. But considering the diversity of viewpoints and values I've observed people to hold, I would bet quite heavily against nobody in the world actually desiring the end of human existence.

↑ comment by DanArmak · 2012-05-19T15:31:09.610Z · LW(p) · GW(p)

Lots of people honestly wish for the literal end of the universe to come, because they believe in an afterlife/prophecy/etc.

You might say they would change their minds given better or more knowledge (e.g. that there is no afterlife and the prophecy was false/fake/wrong). But such people are often exposed to such arguments and reject them; and they make great efforts to preserve their current beliefs in the face of evidence. And they say these beliefs are very important to them.

There may well be methods of "converting" them anyway, but how are these methods ethically or practically different from "forcibly changing their minds" or their values? And if you're OK with forcibly changing their minds, why do you think that's ethically better than just ignoring them and building a partial-CEV that only extrapolates your own wishes and those of people similar to yourself?

Replies from: gRR

↑ comment by gRR · 2012-05-19T16:29:22.797Z · LW(p) · GW(p)

how are these methods ethically or practically different from "forcibly changing their minds" or their values?

I (and CEV) do not propose changing their minds or their values. What happens is that their current values (as modeled within FAI) get corrected in the presence of truer knowledge and lots of intelligence, and these corrected values are used for guiding the FAI.

If someone's mind & values are so closed as to be unextrapolateable - completely incompatible with truth - then I'm ok with ignoring these particular persons. But I don't believe there are actually any such people.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T17:03:57.869Z · LW(p) · GW(p)

I (and CEV) do not propose changing their minds or their values. What happens is that their current values (as modeled within FAI) get corrected in the presence of truer knowledge and lots of intelligence, and these corrected values are used for guiding the FAI.

So the future is built to optimize different values. And their original values aren't changed. Wouldn't they suffer living in such a future?

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:13:45.503Z · LW(p) · GW(p)

Even if they do, it will be the best possible thing for them, according to their own (extrapolated) values.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T17:26:28.175Z · LW(p) · GW(p)

Who cares about their extrapolated values? Not them (they keep their original values). Not others (who have different actual and extrapolated values). Then why extrapolate their values at all? You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values.

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:34:59.490Z · LW(p) · GW(p)

You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values

Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T18:09:10.495Z · LW(p) · GW(p)

This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!

Replies from: gRR

↑ comment by gRR · 2012-05-20T01:50:24.635Z · LW(p) · GW(p)

People are happy, by definition, if their actual values are fulfilled

Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?

Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T13:31:29.074Z · LW(p) · GW(p)

What you are saying indeed applies only "in cases where this is impossible". I further suggest that these are extremely rare cases when a superhumanly-powerful AI is in charge. If the blue box contains horrible violent death, the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person.

Replies from: gRR

↑ comment by gRR · 2012-05-22T15:56:49.568Z · LW(p) · GW(p)

the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person

It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T16:15:30.105Z · LW(p) · GW(p)

The actual values would also tell it to do so. This is a case where the two coincide. In most cases they don't.

Replies from: gRR

↑ comment by gRR · 2012-05-22T16:20:45.359Z · LW(p) · GW(p)

No, the "actual" values would tell it to give the humans the blue boxes they want, already.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T16:30:55.455Z · LW(p) · GW(p)

The humans don't value the blue box directly. It's an instrumental value because of what they think is inside. The humans really value (in actual, not extrapolated values) the diamond they think is inside.

That's a problem with your example (of the boxes): the values are instrumental, the boxes are not supposed to be valued in themselves.

ETA: wrong and retracted. See below.

Replies from: TheOtherDave, gRR

↑ comment by TheOtherDave · 2012-05-22T16:50:24.720Z · LW(p) · GW(p)

Well, they don't value the diamond, either, on this account.

Perhaps they value the wealth they think they can have if they obtain the diamond, or perhaps they value the things they can buy given that diamond, or perhaps they value something else. It's hard to say, once we give up talking about the things we actually observe people trading other things for as being things they value.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T18:31:17.731Z · LW(p) · GW(p)

You're right and I was wrong on this point. Please see my reply to gRR's sister comment.

↑ comment by gRR · 2012-05-22T16:50:06.564Z · LW(p) · GW(p)

Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something.

This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T18:30:52.986Z · LW(p) · GW(p)

You're right about this point (and so is TheOtherDave) and I was wrong.

With that, I find myself unsure as to what we agree and disagree on. Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting. (If this is wrong please say so.)

Talking further about "extrapolated" values may be confusing in this context. I think we can taboo that and reach all the same conclusions while only mentioning actual values.

The AI starts out by implementing humans' actual present values. If some values (want blue box) lead to actually-undesired outcomes (blue box really contains death), that is a case of conflicting actual values (want blue box vs. want to not die). The AI obviously needs to be able to manage conflicting actual values, because humans always have them, but that is true regardless of CEV.

Additionally, the AI may foresee that humans are going to change and in the future have some other actual values; call these the future-values. This change may be described as "gaining intelligence etc." (as in CEV) or it may be a different sort of change - it doesn't matter for our purposes. Suppose the AI anticipates this change, and has no imperative to prevent it (such as helping humans avoid murderer-Gandhi pills due to present human values), or maybe even has an imperative to assist this change (again, according to current human values). Then the AI will want to avoid doing things today which will make its task harder tomorrow, or which will cause future people to regret their past actions: it may find itself striking a balance between present and future (predicted) human values.

This is, at the very least, dangerous - because it involves satisfying current human values not as fully as possible, while the AI may be wrong about future values. Also, the AI's actions unavoidably influence humans and so probably influence which future values they eventually have. My position is that the AI must be guided by the humans' actual present values in choosing to steer human (social) evolution towards or away from possible future values. This has lots of downsides, but what better option is there?

In contrast, CEV claims there is some unique "extrapolated" set of future values which is special, stable once reached, universal for all humans, and that it's Good to steer humanity towards it even if it conflicts with many people's present values. But I haven't seen any convincing to me arguments that such "extrapolated" values exist and have any of those qualities (uniqueness, stability, universal compatibility, Goodness).

Do you agree with this summary? Which points do you disagree with me on?

Replies from: gRR

↑ comment by gRR · 2012-05-22T20:54:24.219Z · LW(p) · GW(p)

Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting

I meant that "it's wrong/bad for the AI to promote extrapolated values while the actual values are different and conflicting" will probably be a part of the extrapolated values, and the AI would act accordingly, if it can.

My position is that the AI must be guided by the humans' actual present values in choosing to steer human (social) evolution towards or away from possible future values. This has lots of downsides, but what better option is there?

The problem with the actual present values (beside the fact that we cannot define them yet, no more than we can define their CEV) is that they are certain to not be universal. We can be pretty sure that someone can be found to disagree with any particular proposition. Whereas, for CEV, we can at least hope that a unique reflectively-consistent set of values exists. If it does and we succeed to define it, then we're home and dry. Meanwhile, we can think of contingency plans about what to do if it does not or we don't, but the uncertainty about whether the goal is achievable does not mean that the goal itself is wrong.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T21:07:32.778Z · LW(p) · GW(p)

It's not merely uncertainty. My estimation is that it's almost certainly not achievable.

Actual goals conflict; why should we expect goals to converge? The burden of proof is on you: why do you assign this possibility sufficient likelihood to even raise it to the level of conscious notice and debate?

It may be true that "a unique reflectively-consistent set of values exists". What I find implausible and unsupported is that (all) humans will evolve towards having that set of values, in a way that can be forecast by "extrapolating" their current values. Even if you showed that humans might evolve towards it (which you haven't), the future isn't set in stone - who says they will evolve towards it, with sufficient certitude that you're willing to optimize for those future values before we actually have them?

Replies from: gRR

↑ comment by gRR · 2012-05-22T21:28:35.687Z · LW(p) · GW(p)

Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:

There is a unique X, such that for all living people P, CEV

= X.

Assuming there is no such X, there could still be a plausible claim:

Y is not empty, where Y = Intersection{over all living people P} of CEV

And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever "evolving" will happen due to AI's influence is at least agreed upon by everyone('s CEV).

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-22T21:44:05.563Z · LW(p) · GW(p)

I can buy, tentatively, that most people might one day agree on a very few things. If that's what you mean by Y, fine, but it restricts the FAI to doing almost nothing. I'd much rather build a FAI that implemented more values shared by fewer people (as long as those people include myself). I expect so would most people, including the ones hypothetically building the FAI - otherwise they'd expect not to benefit much from building it, since it would find very little consensus to implement! So the first team to successfully build FAI+CEV will choose to launch it as a CEV rather than CEV.

Replies from: None, gRR

↑ comment by [deleted] · 2012-05-23T00:09:12.423Z · LW(p) · GW(p)

This is fine, because CEV of any subset of the population is very likely to include terms for CEV of humanity as a whole.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T07:30:04.826Z · LW(p) · GW(p)

Why do you believe this?

For instance, I think CEV, if it even exists, will include nothing of real interest because people just wouldn't agree on common goals. In such a situation, my personal CEV - or that of a few people who do agree on at least some things - would not want to include CEV. So your belief implies that CEV exists and is nontrivial. As I've asked before in this thread, why do you think so?

Replies from: None

↑ comment by [deleted] · 2012-05-25T16:18:15.509Z · LW(p) · GW(p)

Oh, I had some evidence, but I Minimum Viable Commented. I thought it was obvious once pointed out. Illusion of transparency.

We care about what happens to humanity. We want things to go well for us. If CEV works at all, it will capture that in some way.

Even if CEV(rest of humanity) turns out to be mostly derived from radical islam, I think there would be terms in CEV(Lesswrong) for respecting that. There would also be terms for people not stoning each other to death and such. I think those (respect for CEV and good life by our standards) would only come into conflict when CEV has basically failed.

You seem to be claiming that CEV will in fact fail, which I think is a different issue. My claim is that if CEV is a useful thing, you don't have to run it on everyone (or even a representative sample) to make it work.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-25T21:06:52.689Z · LW(p) · GW(p)

It depends on what you call CEV "working" or "failing".

One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone's personal volition, then compare and merge them to create the group's overall CEV. Where enough people agree, choose what they agree on (factoring in how sure they are, and how important this is to them). Where too many people disagree, do nothing, or be indifferent on the outcome of this question, or ask the programmers. Is this what you have in mind?

The big issue here is how much consensus is enough. Let's run with concrete examples:

If CEV requires too much consensus, it may not help us become immortal because a foolish "deathist" minority believes death is good for people.
If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.
You may well have both kinds of problems at the same time (with different questions).

It all depends on how you define required consensus - and that definition can't itself come from CEV, because it's required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous - if you precommit to CEV and then it evolves into "too little" or "too much" consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less "correct" consensus requirement.

So the matter is not just what each person or group's CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.

Contrariwise, if we use the CEV of all humanity, it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people. And it will have to resolve the conrtadiction, and if there's not enough consensus among humanity's individual CEVs to do so, the CEV algorithm will "fail".

Replies from: None, TheOtherDave

↑ comment by [deleted] · 2012-05-26T04:09:58.363Z · LW(p) · GW(p)

If CEV requires too much consensus, it may not help us become immortal because a foolish "deathist" minority believes death is good for people.

If CEV is satisfied by too little consensus, then 99% of the people may build a consensus to kill the other 1% for fun and profit, and the CEV would not object.

These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.

I don't think CEV works with explicit entities that can interact and decide to kill each other. I understand that it is much more abstract than that. Also probably all blind, and all implemented through the singleton AI, so it would be very unlikely that everyone's EV happens to name, say, bob smith as the lulzcow.

It all depends on how you define required consensus - and that definition can't itself come from CEV, because it's required for the first iteration of CEV to run. It could be allowed to evolve via CEV, but you still need to start somewhere and such evolution strikes me as dangerous - if you precommit to CEV and then it evolves into "too little" or "too much" consensus and ends up doing nothing or prohibiting nothing, the whole CEV project fails. Which may well be a worse outcome from our perspective than starting with (or hardcoding) a different, less "correct" consensus requirement.

This is a serious issue with (at least my understanding of) CEV. How to even get CEV done (presumably with an AI) without turning everyone into computronium or whatever seems hard.

So the matter is not just what each person or group's CEV is, but how you combine them via consensus. If, as you suggest, we use the CEV of a small homogenous group instead of all of humanity, it seems clear to me that the consensus would be greater (all else being equal), and so the requirements for consensus are more likely to be satisfied, and so CEV will have a higher chance of working.

This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.

it will have a term derived from me and you for not stoning people. And it will also have a term derived from some radical Islamists for stoning people.

I think that statement is too strong. Keep in mind that it's extrapolated volition. I doubt the islamists' values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T08:17:17.414Z · LW(p) · GW(p)

These risks exist. However, I think it is very likely in our case that there will be strong consensus for values that reduce the problem a bit. Non-interference, for one, is much less controversial than transhumanism, but would allow transhumanism for those who accept it.

Why do you think this is "very likely"?

Today there are many people in the world (gross estimate: tens of percents of world population) who don't believe in noninterference. True believers of several major faiths (most Christian sects, mainstream Islam) desire enforced religious conversion of others, either as a commandment of their faith (for its own sake) or for the metaphysical benefit of those others (to save them from hell). Many people "believe" (if that is the right word) in the subjugation of certain minorities, or of women, children, etc. which involves interference of various kinds. Many people experience future shock which prompts them to want laws that would stop others from self-modifying in certain ways (some including transhumanism).

Why do you think it very likely these people's CEV will contradict their current values and beliefs? Please consider that:

We emphatically don't know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV's task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.
In these examples, you expect other people's extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity's CEV? Can you think of probable examples?

This is one reason why I think doing the CEV of just the AI team or whoever is the best approach. We have strong reason to suspect that the eventual result will respect everyone, and bootstrapping from a small group (or even just one person) seems much more reliable and safer.

I agree completely - doing the CEV of a small trusted team, who moreover are likely to hold non-extrapolated views similar to ours (e.g. they won't be radical Islamists), would be much better than CEV; much more reliable and safe.

But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it's safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won't.

From this I understand that while you think CEV would have a term for "respecting" the rest of humanity, that respect would be a lot weaker than the equal (and possibly majority-voting-based) rights granted them by CEV.

I think that statement is too strong. Keep in mind that it's extrapolated volition. I doubt the islamists' values are reflectively consistent. Weaken it to the possibility of there being multiple attractors in EV-space, some of which are bad, and I agree. Infectious memeplexes that can survive CEV scare the crap out of me.

I doubt any one human's values are reflectively consistent. At the very least, every human's values contradict one another in the sense that they compete among themselves for the human's resources, and the human in different moods and at different points in time prefers to spend on different values.

Because infectious memeplexes scare me too, I don't want anyone to build CEV (or rather, to run a singleton AI that would implement it) - I would much prefer CEV or better CEV or better yet, a non-CEV process whcih more directly relies on my and other people's non-extrapolated preferences.

Replies from: TheOtherDave, None

↑ comment by TheOtherDave · 2012-05-26T15:15:14.856Z · LW(p) · GW(p)

or better yet, a non-CEV process whcih more directly relies on my and other people's non-extrapolated preferences.

A possibly related question: suppose you were about to go off on an expedition in a spaceship that would take you away from Earth for thirty years, and the ship is being stocked with food. Suppose further that, because of an insane bureaucratic process, you have only two choices: either (a) you get to choose what food to stock right now, with no time for nutritional research, or (b) food is stocked according to an expert analysis of your body's nutritional needs, with no input from you. What outcome would you anticipate from each of those choices?

Suppose a hundred arbitrarily selected people were also being sent on similar missions on similar spaceships, and your decision of A or B applied to them as well (either they get to choose their food, or an expert chooses food for them). What outcome would you anticipate from each choice?

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T18:45:06.509Z · LW(p) · GW(p)

I think you meant to add that the expert really understands nutrition, beyond the knowledge of our best nutrition specialists today, which is unreliable and contradictory and sparse.

With that assumption I would choose to rely on the expert, and would expect much less nutritional problems on average for other people who relied on the expert vs. choosing themselves.

The difference between this and CEV is that "what nutritional/metabolic/physiological outcome is good for you" is an objective, pretty well constrained question. There are individual preferences - in enjoyment of food, and in the resulting body-state - but among people hypothetically fully understanding the human body, there will be relatively little disagreement, and the great majority should not suffer much from good choices that don't quite match their personal preferences.

CEV, on the other hand, includes both preferences about objective matters like the above but also many entirely or mostly subjective choices (in the same way that most choices of value are a-rational). Also, people are likely to agree to not interfere in what others eat because they don't often care about it, but people do care about many other behaviors of others (like torturing simulated intelligences, or giving false testimony, or making counterfeit money) and that would be reflected in CEV.

ETA: so in response to your question, I agree that on many subjects I trust experts / CEV more than myself. My preferred response to that, though, is not to build a FAI enforcing CEV, but to build a FAI that allows direct personal choice in areas where it's possible to recover from mistakes, but also provides the expert opinion as an oracle advice service.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-26T20:37:52.799Z · LW(p) · GW(p)

Perfect knowledge is wonderful, sure, but was not key to my point.

Given two processes for making some decision, if process P1 is more reliable than process P2, then P1 will get me better results. That's true even if P1 is imperfect. That's true even if P2 is "ask my own brain and do what it tells me." All that is required is that P1 is more reliable than P2.

It follows that when choosing between two processes to implement my values, if I can ask one question, I should ask which process is more reliable. I should not ask which process is perfect, nor ask which process resides in my brain.

ETA: I endorse providing expert opinion, even though that deprives people of the experience of figuring it all out for themselves... agreed that far. But I also endorse providing reliable infrastructure, even though that deprives people of the experience of building all the infrastructure themselves, and I endorse implementing reliable decision matrices, even though that deprives people of the experience of making all the decisions themselves.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T13:21:08.040Z · LW(p) · GW(p)

There's no reason you have to choose just once, a single process to answer all kinds of questions. Different processes better fit different domains. Expert opinion best fits well-understood, factual, objective, non-politicized, amoral questions. Noninterference best fits matters where people are likely to want to interfere in others' decisions and there is no pre-CEV consensus on whether such intervention is permissible.

The problem with making decisions for others isn't that it deprives them of the experience of making decisions, but that it can influence or force them into decisions that are wrong in some sense of the word.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-27T14:33:00.694Z · LW(p) · GW(p)

(shrug) Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word. If that's really the problem, then letting people make their own decisions doesn't solve it. The solution to that problem is letting whatever process is best at avoiding wrong answers make the decision.

And, sure, there might be different processes for different questions. But there's no a priori reason to believe that any of those processes reside in my brain.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T15:28:20.384Z · LW(p) · GW(p)

Letting others make decisions for themselves can also influence or force them into decisions that are wrong in some sense of the word.

True. Nonintervention only works if you care about it more than about anything people might do due to it. Which is why a system of constraints that is given to the AI and is not CEV-derived can't be just nonintervention, it has to include other principles as well and be a complete ethical system.

And, sure, there might be different processes for different questions. But there's no a priori reason to believe that any of those processes reside in my brain.

I'm always open to suggestions of new processes. I just don't like the specific process of CEV, which happens not to reside in my brain, but that's not why I dislike it.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-27T15:46:50.150Z · LW(p) · GW(p)

Ah, OK.

At the beginning of this thread you seemed to be saying that your current preferences (which are, of course, the product of a computation that resides in your brain) were the best determiner of what to optimize the environment for. If you aren't saying that, but merely saying that there's something specific about CEV that makes it an even worse choice, well, OK. I mean, I'm puzzled by that simply because there doesn't seem to be anything specific about CEV that one could object to in that way, but I don't have much to say about that; it was the idea that the output of your current algorithms are somehow more reliable than the output of some other set of algorithms implemented on a different substrate that I was challenging.

Sounds like a good place to end this thread, then.

Replies from: wedrifid, DanArmak

↑ comment by wedrifid · 2012-05-27T16:31:38.028Z · LW(p) · GW(p)

I'm puzzled by that simply because there doesn't seem to be anything specific about CEV that one could object to in that way

Really? What about the "some people are Jerks" objection? That's kind of a big deal. We even got Eliezer to tentatively acknowledge the theoretical possibility that that could be objectionable at one point.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-27T17:11:55.024Z · LW(p) · GW(p)

(nods) Yeah, I was sloppy. I was referring to the mechanism for extrapolating a coherent volition from a given target, rather than the specification of the target (e.g., "all of humanity") or other aspects of the CEV proposal, but I wasn't at all clear about that. Point taken, and agreed that there are some aspects of the proposal (e.g. target specification) that are specific enough to object to.

Tangentially, I consider the "some people are jerks" objection very confused. But then, I mostly conclude that if such a mechanism can exist at all, the properties of people are about as relevant to its output as the properties of states or political parties. More thoughts along those lines here.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-27T20:04:04.203Z · LW(p) · GW(p)

I was referring to the mechanism for extrapolating a coherent volition from a given target

It really is hard to find a fault with that part!

Tangentially, I consider the "some people are jerks" objection very confused.

I don't understand. If the CEV of a group that consists of yourself and ten agents with values that differ irreconcilably from yours then we can expect that CEV to be fairly abhorrent to you. That is, roughly speaking, a risk you take when you substitute your own preferences for preferences calculated off a group that you don't don't fully understand or have strong reason to trust.

That CEV would also be strictly inferior to CEV which would implicitly incorporate the extrapolated preferences of the other ten agents to precisely the degree that you would it to do so.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-27T21:58:29.992Z · LW(p) · GW(p)

I agree that if there exists a group G of agents A1..An with irreconcilably heterogenous values, a given agent A should strictly prefer CEV(A) to CEV(G). If Dave is an agent in this model, then Dave should prefer CEV(Dave) to CEV(group), for the reasons you suggest. Absolutely agreed.

What I question is the assumption that in this model Dave is better represented as an agent and not a group. In fact, I find that assumption unlikely, as I noted above. (Ditto wedrifid, or any other person.)

If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic... every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn't even clear that A1 should prefer CEV(Dave) to CEV(group)... while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1's values than the fraction of Dave that supports them.

If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk... all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-27T22:18:52.987Z · LW(p) · GW(p)

Approximately agree.

If Dave is a group, then CEV(Dave) is potentially problematic for the same reason that CEV(group) is problematic... every agent composing Dave should prefer that most of Dave not be included in the target definition. Indeed, if group contains Dave and Dave contains an agent A1, it isn't even clear that A1 should prefer CEV(Dave) to CEV(group)... while CEV(Dave) cannot be more heterogenous than CEV(group), it might turn out that a larger fraction (by whatever measure the volition-extrapolator cares about) of group supports A1's values than the fraction of Dave that supports them.

This is related to why I'm a bit uncomfortable accepting the sometimes expressed assertion "CEV only applies to a group, if you are doing it to an individual it's just Extrapolated Volition". The "make it stop being incoherent!" part applies just as much to conflicting and inconsistent values within a messily implemented individual as it does to differences between people.

If the above describes the actual situation, then whether Dave is a jerk or not (or wedrifid is, or whoever) is no more relevant to the output of the volition-extrapolation mechanism than whether New Jersey is a jerk, or whether the Green Party is a jerk... all of these entities are just more-or-less-transient aggregates of agents, and the proper level of analysis is the agent.

Taking this "it's all agents and subagents and meta-agents" outlook the remaining difference is one of arbitration. That is, speaking as wedrifid I have already implicitly decided which elements of the lump of matter sitting on this chair are endorsed as 'me' and so included in the gold standard (CEV). While it may be the case that my amygdala can be considered an agent that is more similar to your amygdala than to the values I represent in abstract ideals, adding the amygdala-agent of another constitutes corrupting the CEV with some discrete measure of "Jerkiness".

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-27T23:56:20.153Z · LW(p) · GW(p)

Mm.

It's not clear to me that Dave has actually given its endorsement to any particular coalition in a particularly consistent or coherent fashion; it seems to many of me that what Dave endorses and even how Dave thinks of itself and its environment is a moderately variable thing that depends on what's going on and how it strengthens, weakens, and inspires and inhibits alliances among us. Further, it seems to many of me that this is not at all unique to Dave; it's kind of the human condition, though we generally don't acknowledge it (either to others or to ourself) for very good social reasons which I ignore here at our peril.

That said, I don't mean to challenge here your assertion that wedrifid is an exception; I don't know you that well, and it's certainly possible.

And I would certainly agree that this is a matter of degree; there are some things that are pretty consistently endorsed by whatever coalition happens to be speaking as Dave at any given moment, if only because none of us want to accept the penalties associated with repudiating previous commitments made by earlier ruling coalitions, since that would damage our credibility when we wish to make such commitments ourselves.

Of course, that sort of thing only lasts for as long as the benefits of preserving credibility are perceived to exceed the benefits of defecting. Introduce a large enough prize and alliances crumble. Still, it works pretty well in quotidian circumstances, if not necessarily during crises.

Even there, though, this is often honored in the breach rather than the observance. Many ruling coalitions, while not explicitly repudiating earlier commitments, don't actually follow through on them either. But there's a certain amount of tolerance of that sort of thing built into the framework, which can be invoked by conventional means... "I forgot", "I got distracted", "I experienced akrasia", and so forth.

So of course there's also a lot of gaming of that tolerance that goes on. Social dynamics are complicated. And, again, change the payoff matrix and the games change.

All of which is to say, even if my various component parts were to agree on such a gold standard CEV(dave), and commit to an alliance to consistently and coherently enforce that standard regardless of what coalition happens to be speaking for Dave at the time, it is not at all clear to me that this alliance would survive the destabilizing effects of seriously contemplating the possibility of various components having their values implemented on a global scale. We may have an uneasy alliance here inside Dave's brain, but it really doesn't take that much to convince one of us to betray that alliance if the stakes get high enough.

By way of analogy, it may be coherent to assert that the U.S. can "speak as" a single entity through the appointing of a Federal government, a President, and so forth. But if the U.S. agreed to become part of a single sovereign world government, it's not impossible that the situation that prompted this decision would also prompt Montana to secede from the Union. Or, if the world became sufficiently interconnected that a global economic marketplace became an increasingly powerful organizing force, it's not impossible that parts of New York might find greater common cause with parts of Tokyo than with the rest of the U.S. Or various other scenarios along those lines. At which point, even if the U.S. Federal government goes on saying the same things it has always said, it's no longer entirely clear that it really is speaking for Montana or New York.

In a not-really-all-that-similar-but-it's-the-best-I-can-do-without-getting-a-lot-more-formal way, it's not clear to me that when it comes time to flip the switch, the current Dave Coalition continues to speak for us.

At best, I think it follows that just like the existence of people who are Jerks suggests that I should prefer CEV(Dave) to CEV(humanity), the existence of Dave-agents who are Jerks suggests that I should prefer CEV(subset-of-Dave) to CEV(Dave).

But frankly, I think that's way too simplistic, because no given subset-of-Dave that lacks internal conflict is rich enough for any possible ruling coalition to be comfortable letting it grab the brass ring like that. Again, quotidian alliances rarely survive a sudden raising of the stakes.

Mostly, I think what really follows from all this is that the arbitration process that occurs within my brain cannot be meaningfully separated from the arbitration process that occurs within other structures that include/overlap my brain, and therefore if we want to talk about a volition-extrapolation process at all we have to bite the bullet and accept that the target of that process is either too simple to be considered a human being, or includes inconsistent values (aka Jerks). Excluding the Jerks and including a human being just isn't a well-defined option.

Of course, Solzhenitsyn said it a lot more poetically (and in fewer words).

↑ comment by DanArmak · 2012-05-27T15:55:16.967Z · LW(p) · GW(p)

Yes, I was talking about shortcomings of CEV, and did not mean to imply that my current preferences were better than any third option. They aren't even strictly better than CEV; I just think they are better overall if I can't mix the two.

↑ comment by [deleted] · 2012-05-26T17:47:48.379Z · LW(p) · GW(p)

Why do you think this is "very likely"?

It just seems likely, based on my understanding of what people like and approve of.

Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do. I just meant it as a proof that there are less controversial principles that will block a lot of bullshit. Not as a speculation of something that will actually end up in CEV.

religion, bigots, conservatism.

Why do you think it very likely these people's CEV will contradict their current values and beliefs? Please consider that:

These values are based on false beliefs, inconsistent memes, and fear. None of those things will survive CEV. "If we knew more, thought faster, grew closer together, etc".

We emphatically don't know the outcome of CEV. If we were sure that it would have any property X, we could hardcode X into the algorithm and make the CEV's task that much easier. Anything you think is very likely for CEV to decide, you should be proportionally willing for me to hardcode into my algorithm, constraining the possible results of CEV.

That would take a whole hell of a lot of certainty. I have nowhere near that level of confidence in anything I believe.

In these examples, you expect other people's extrapolated values to come to match your actual values. This seems on the outside view like a human bias. Do you expect an equal amount of your important, present-day values to be contradicted and disallowed by humanity's CEV? Can you think of probable examples?

I think CEV will end up more like transhumanism than like islam. (which means I mostly accept transhumanism). I think I'm too far outside the morally-certain-but-ignorant-human reference class to make outside view judgements on this.

Not an equal amount, but many of my current values will be contradicted in CEV. I can only analogize to physics: I accept relativity, but expect it to be wrong. (I think my current beliefs are the closest approximation to CEV that I know of).

Likely candidates? That's like asking "which of your beliefs are false". All I can say is which are most uncertain. I can't say which way they will go. I am uncertain about optimal romantic organization (monogamy, polyamory, ???). I am uncertain of the moral value of closed simulations. I am uncertain about moral value of things like duplicating people, or making causally-identical models. I am quite certain that existing lives have high value. I am unsure about lives that don't yet exist.

But you contradict yourself a little. If you really believed CEV looked a lot like CEV, you would have no reason to consider it safer. If you (correctly) think it's safer, that must be because you fear CEV will contain some pretty repugnant conclusions that CEV won't.

Not quite. Let's imagine two bootstrap scenarios: some neo-enlightenment transhumanists, and some religious nuts. Even just the non-extrapolated values of the tranhumanists will produce a friendly-enough AI that can (and will want to) safey research better value-extrapolation methods. Bootstrapping it with islam will get you an angry punishing god that may or may not care about extrapolating further. Running the final, ideal CEV process with either seed should produce the same good value set, but we may not have the final ideal CEV process, and having a dangerous genie running the process may not do safe things if you start it with the wrong seed

I doubt any one human's values are reflectively consistent. At the very least, every human's values contradict one another in the sense that they compete among themselves for the human's resources, and the human in different moods and at different points in time prefers to spend on different values.

Sorry, I made that too specific. I didn't mean to imply that only the islamists are inconsistent. Just meant them as an obvious example.

a non-CEV process whcih more directly relies on my and other people's non-extrapolated preferences.

This is what I think would be good as a seed value system so that the FAI can go and block x-risk and stop death and such without having to philosophize too much first. But I'd want the CEV philosophising to be done eventually (ASAP, actually).

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T15:24:43.725Z · LW(p) · GW(p)

Scrict non-interference is unlikely to end up in CEV, because there are many cases where interventions are the right thing to do.

Right according to whose values? The problem is precisely that people disagree pre-extrapolation about when it's right to interfere, and therefore we fear their individual volitions will disagree even post extrapolation. I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.

I think CEV will end up more like transhumanism than like islam.

Again why? CEV is very much underspecified. To me, the idea that our values and ideals will preferentially turn out to be the ones all humans would embrace "if they were smarter etc" looks like mere wishful thinking. Values are arational and vary widely. If you specify a procedure (CEV) whereby they converge to a compatible set which also happens to resemble our actual values today, then it should be possible to give different algorithms (which you can call CEV or not, it doesn't matter) which converge on other value-sets.

In the end, as the Confessor said, "you have judged: what else is there?" I have judged, and where I am certain enough about my judgement I would rather that other people's CEV not override me.

Other than that I agree with you about using a non-CEV seed etc. I just don't think we should later let CEV decide anything it likes without the seed explicitly constraining it.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-27T16:20:47.109Z · LW(p) · GW(p)

Right according to whose values?

CEV's. Where by an unqualified "CEV" I take nyan to be referring to CEV ("the Coherently Extrapolated Values of Humanity"). I assume he also means it as a normative assertion of the slightly-less-extrapolated kind that means something like "all properly behaving people of my tribe would agree and if they don't we may need to beat them with sticks until they do."

The problem is precisely that people disagree pre-extrapolation about [when it's right to interfere], and therefore we fear their individual volitions will disagree even post extrapolation.

And the bracketed condition is generalisable to all sorts of things - including those preferences that we haven't even considered the possibility of significant disagreement about. Partially replacing one's own preferences with preferences that are not one's own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly 'right'.

I and some other people have a value of noninterference in certain matters that is very important to us. I would rather hardcode that value than let CEV of humanity decide on it.

I note that any assertion that "intervention is strictly not the wrong thing to do" that is not qualified necessarily implies a preference for the worst things that could possibly happen in an FAI-free world happening than a single disqualified intervention. That means, for example, that rather than a minimalist intervention you think the 'right' behavior for the FAI is to allow everyone on the planet to be zapped by The Pacifier and constantly raped by pedophiles until they are 10 whereupon they are forced to watch repeats of the first season of Big Brother until they reach 20 and are zapped again and the process is repeated until the heat death of the universe. That's pretty bad but certainly not the worst thing that could happen. It is fairly trivially not "right" to not let that happen if you can easily stop it.

Note indicating partial compatibility of positions: There can be reasons to advocate the implementation of ethical injunctions in a created GAI but that this still doesn't allow us to say that non-intervention in a given extreme circumstance is 'right'.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T17:15:32.991Z · LW(p) · GW(p)

Partially replacing one's own preferences with preferences that are not one's own is one of the most dangerous things it is possible to do. Not something to do lightly or take for granted as implicitly 'right'.

That's exactly what I think. And especially if you precommit to the values output by a certain process before the process is actually performed, and can't undo it later.

I note that any assertion that "intervention is strictly not the wrong thing to do" that is not qualified [...]

I'm certainly not advocating absolute unqualified non-intervention. I wrote "a value of noninterference in certain matters". Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.

Nonintervention doesn't just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.

The Pacifier

Gah. I had to read through many paragraphs of drivel plot and then all I came up with was "a device that zaps people, making them into babies, but that is reversible". You should have just said so. (Not that the idea makes sense on any level). Anyway, my above comment applies; people would not want it done to them and so would request the AI to prevent it.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-27T19:53:26.326Z · LW(p) · GW(p)

I'm certainly not advocating absolute unqualified non-intervention. I wrote "a value of noninterference in certain matters". Certainly the AI should interfere to e.g. offer help just before something happens which the AI thinks the person would not want to happen to them (the AI is qualified to make such decisions if it can calculate CEV). In such a situation the AI would explain matters and offer aid and advice, but ultimate deciding power might still lie with the person, depending on the circumstances.

Nonintervention doesn't just mean non-intervention by the AI, it means nonintervention by one person with another. If someone makes a request for the AI to prevent another person from doing something to them, then again in at least some (most? all?) circumstances the AI should interfere to do so; that is actually upholding the principle of uninterference.

I like both these caveats. The scenario becomes something far more similar to what a CEV could plausibly be without the artificial hack. Horror stories become much harder to construct.

Off the top of my head one potential remaining weaknesses include the inability to prevent a rival, less crippled AGI from taking over without interfering pre-emptively with an individual who is not themselves interfering with anyone. Getting absolute power requires intervention (or universally compliant subjects). Not getting absolute power means something else can get it and outcomes are undefined.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T21:33:47.499Z · LW(p) · GW(p)

That's a good point. The AI's ability to not interfere is constrained by its need to monitor everything that's going on. Not just to detect someone building a rival AI, but to detect simpler cases like someone torturing a simulated person, or even just a normal flesh and bone child who wasn't there nine months earlier. To detect people who get themselves into trouble without yet realizing it, or who are going to attack other people nonconsensually, and give these people help before something bad actually happens to them, all requires monitoring.

And while a technologically advanced AI might monitor using tools we humans couldn't even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that's enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.

It does appear that universal surveillance is the cost of universally binding promises (you won't be tortured no matter where you go and what you do in AI-controlled space). To reduce costs and increase trust, the AI should be transparent to everyone itself, and should be publicly and verifiably committed to being a perfectly honest and neutral party that never reveals the secrets and private information it monitors to anyone.

I'd like to note that all of this also applies to any FAI singleton that implements some policies that we today consider morally required - like making sure no-one is torturing simulated people or raising their baby wrong. If there's no generally acceptable FAI behavior that doesn't include surveillance, then all else is equal and I still prefer my AI to a pure CEV implementation.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-27T22:29:50.276Z · LW(p) · GW(p)

And while a technologically advanced AI might monitor using tools we humans couldn't even detect today, to advanced posthumans every possible tool might be painfully obvious. E.g. if you have to expose everything your megaton-of-computronium brain calculates to the AI, because that's enough to simulate all the humans alive in 2012 in enough detail that they would count as persons to the AI. But to the asteroid-sized brain this means the AI is literally aware of all its thoughts: it has zero privacy.

It would seem that the FAI should require only to be exposed to you the complete state of your brain at a point of time where it can reliably predict or prove that you are 'safe', using the kind of reasoning we often assume as a matter of course when describing UDT decision problems. Such an FAI would have information about what you are thinking - and in particular a great big class of what it knows you are not thinking - but not necessary detailed knowledge of what you are thinking specifically.

For improved privacy the inspection could be done by a spawned robot AI programmed to self destruct after analyzing you and returning nothing but a boolean safety indicator back to the FAI.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-28T08:19:10.845Z · LW(p) · GW(p)

Prediction has some disadvantages compared to constant observation:

Some physical systems are hard to model well with simplification; even for the AI it might be necessary to use simulations composed of amounts of matter proportional to the thing simulated. If about one half of all matter has to be given over to the AI, instead of being used to create more people and things, that is a significant loss of opportunity. (Maybe the AI should tax people in simulation-resources, and those who opt in to surveillance have much lower taxes :-)
Simulations naturally have a rising risk of divergence over time. The AI is not literally Omega. It will have to come in and take periodical snapshots of everyone's state to correct the simulations.
Simulations have a chance of being wrong. However small the chance, if the potential result is someone building a UFAI challenger, it might be unacceptable to take that chance.

OTOH, surveillance might be much cheaper (I don't know for sure) and also allows destroying the evidence close to the site of observation once it is analyzed, preserving a measure of privacy.

↑ comment by TheOtherDave · 2012-05-25T21:45:13.243Z · LW(p) · GW(p)

One strategy (which seems to me to be implied by the original CEV doc) is to extrapolate everyone's personal volition, then compare and merge them to create the group's overall CEV.

I vaguely remember something in that doc suggesting that part of the extrapolation process involves working out the expected results of individuals interacting. More poetically, "what we would want if we grew together more." That suggests that this isn't quite what the original doc meant to imply, or at least that it's not uniquely what the doc meant to imply, although I may simply be misremembering.

More generally, all the hard work is being done here by whatever assumptions are built into the "extrapolation".

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T07:35:53.301Z · LW(p) · GW(p)

Quoting the CEV doc:

Had grown up farther together: A model of humankind's coherent extrapolated volition should not extrapolate the person you'd become if you made your decisions alone in a padded cell. Part of our predictable existence is that we predictably interact with other people. A dynamic for CEV must take a shot at extrapolating human interactions, not just so that the extrapolation is closer to reality, but so that the extrapolation can encapsulate memetic and social forces contributing to niceness.

Our CEV may judge some memetic dynamics as not worth extrapolating - not search out the most appealing trash-talk TV show.

Social interaction is probably intractable for real-world prediction, but no more so than individual volition. That is why I speak of predictable extrapolations, and of calculating the spread.

I don't mean to contradict that. So consider my interpretation to be something like: build ("extrapolate") each person's CEV, which includes that person's interactions with other people, but doesn't directly value them except inasfar as that person values them; then somehow merge the individual CEVs to get the group CEV.

After all (I reason) you want the following nice property for CEV. Suppose that CEV meets CEV - e.g. separate AIs implementing those CEVs meet. If they don't embody inimical values, they will try to negotiate and compromise. We would like the result of those negotiations to look very much like CEV. One easy way to do this is to say CEV is build on "merging" all the way from the bottom up.

More generally, all the hard work is being done here by whatever assumptions are built into the "extrapolation".

Certainly. All discussion of CEV starts with "assume there can exist a process that produces an outcome matching the following description, and assume we can and do build it, and assume that all the under-specification of this description is improved in the way we would wish it improved if we were better at wishing".

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-26T15:01:54.605Z · LW(p) · GW(p)

I basically agree with all of this, except that I think you're saying "CEV is build on "merging" all the way from the bottom up" but you aren't really arguing for doing that.

Perhaps one important underlying question here is whether peoples values ever change contingent on their experiences.

If not -- if my values are exactly the same as what they were when I first began to exist (whenever that was) -- then perhaps something like what you describe makes sense. A process for working out what those values are and extrapolating my volition based on them would be difficult to build, but is coherent in principle. In fact, many such processes could exist, and they would converge on a single output specification for my individual CEV. And then, and only then, we could begin the process of "merging."

This strikes me as pretty unlikely, but I suppose it's possible.

OTOH, if my values are contingent on experience -- that is, if human brains experience value drift -- then it's not clear that those various processes' outputs would converge. Volition-extrapolation process 1, which includes one model of my interaction with my environment, gets Dave-CEV-1. VEP2, which includes a different model, gets Dave-CEV-2. And so forth. And there simply is no fact of the matter as to which is the "correct" Dave-CEV; they are all ways that I might turn out; to the extent that any of them reflect "what I really want" they all reflect "what I really want", and I "really want" various distinct and potentially-inconsistent things.

In the latter case, in order to obtain something we call CEV(Dave), we need a process of "merging" the outputs of these various computations. How we do this is of course unclear, but my point is that saying "we work out individual CEVs and merge them" as though the merge step came second is importantly wrong. Merging is required to get an individual CEV in the first place.

So, yes, I agree, it's a fine idea to have CEV built on merging all the way from the bottom up. But to understand what the "bottom" really is is to give up on the idea that my unique individual identity is the "bottom." Whatever it is that CEV is extrapolating and merging, it isn't people, it's subsets of people. "Dave's values" are no more preserved by the process than "New Jersey's values" or "America's values" are.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T18:13:50.502Z · LW(p) · GW(p)

That's a very good point. People not only change over long periods of time; during small intervals of time we can also model a person's values as belonging to competing and sometimes negotiating agents. So you're right, merging isn't secondary or dispensable (not that I suggested doing away with it entirely), although we might want different merging dynamics sometimes for sub-person fragments vs. for whole-person EVs.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-26T18:33:54.582Z · LW(p) · GW(p)

Sure, the specifics of the aggregation process will depend on the nature of the monads to be aggregated.

And, yes, while we frequently model people (including ourselves) as unique coherent consistent agents, and it's useful to do so for planning and for social purposes, there's no clear reason to believe we're any such thing, and I'm inclined to doubt it. This also informs the preserving-identity-across-substrates conversation we're having elsethread.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T19:38:18.995Z · LW(p) · GW(p)

Where relevant - or at least when I'm reminded of it - I do model myself as a collection of smaller agents. But I still call that collection "I", even though it's not unique, coherent, or consistent. That my identity may be a group-identity doesn't seem to modify any of my conclusions about identity, given that to date the group has always resided together in a single brain.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-26T20:26:21.399Z · LW(p) · GW(p)

For my own part, I find that attending to the fact that I am a non-unique, incoherent, and inconsistent collection of disparate agents significantly reduces how seriously I take concerns that some process might fail to properly capture the mysterious essence of "I", leading to my putative duplicate going off and having fun in a virtual Utopia while "I" remains in a cancer-ridden body.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T08:21:58.975Z · LW(p) · GW(p)

I would gladly be uploaded rather than die if there were no alternative. I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-27T08:25:15.494Z · LW(p) · GW(p)

I would still pay extra for a process that slowly replaced my brain cells etc. one by one leaving me conscious and single-instanced the whole while.

That sounds superficially like a cruel and unusual torture.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T13:04:58.088Z · LW(p) · GW(p)

The whole point is to invent an uploading process I wouldn't even notice happening.

↑ comment by gRR · 2012-05-22T23:52:19.437Z · LW(p) · GW(p)

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T07:38:49.901Z · LW(p) · GW(p)

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it.

So what makes you think everybody's CEV would eventually agree on anything more?

A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better. (At least, we can if we're speculating about building a FAI to execute any well-defined plan we can come up with.)

(I assume here that removing existential risks is one such thing.)

I'm not even sure of that. There are people who believe religiously that End Times must come when everyone must die, and some of them want to hurry that along by actually killing people. And the meaning of "existential risk" is up for grabs anyway - does it preclude evolution into non-humans, leaving no members of original human species in existence? Does it preclude the death of everyone alive today, if some humans are always alive?

Sure, it's unlikely or it might look like a contrived example to you. But are you really willing to precommit the future light cone, the single shot at creating an FAI (singleton), to whatever CEV might happen to be, without actually knowing what CEV produces and having an abort switch? That's one of the defining points of CEV: that you can't know it correctly in advance, or you would just program it directly as a set of goals instead of building a CEV-calculating machine.

And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.

This seems wrong. A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

Replies from: gRR, DanArmak

↑ comment by gRR · 2012-05-24T10:37:21.874Z · LW(p) · GW(p)

A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better.

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.

There are people who believe religiously that End Times must come

Yes, but we assume they are factually wrong, and so their CEV would fix this.

A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

Not bloody likely. I'm going to oppose your team, discourage your funders, and bomb your headquarters - because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled.

You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won't interfere.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T13:06:56.988Z · LW(p) · GW(p)

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference).

No. Any FAI (ETA: or other AGI) has to be a singleton to last for long. Otherwise I can build a uFAI that might replace it.

Suppose your AI only does a few things that everyone agrees on, but otherwise "doesn't interfere". Then I can build another AI, which implements values people don't agree on. Your AI must either interfere, or be resigned to not being very relevant in determining the future.

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority? Then it's at best a nice-to-have, but most likely useless. After people successfully build one AGI, they will quickly reuse the knowledge to build more. The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs, to safeguard its utility function. This is unavoidable. With truly powerful AGI, preventing new AIs from gaining power is the only stable solution.

Or, better yet, you can try talking to the other half of the humans.

Yeah, that's worked really well for all of human history so far.

Yes, but we assume they are factually wrong, and so their CEV would fix this.

First, they may not factually wrong about the events they predict in the real world - like everyone dying - just wrong about the supernatural parts. (Especially if they're themselves working to bring these events to pass.) IOW, this may not be a factual belief to be corrected, but a desired-by-them future that others like me and you would wish to prevent.

Second, you agreed the CEV of groups of people may contain very few things that they really agree on, so you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with.

Not bloody likely. I'm going to oppose your team, discourage your funders, and bomb your headquarters - because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled. You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won't interfere.

I have no idea what your FAI will do, because even if you make no mistakes in building it, you yourself don't know ahead of time what the CEV will work out to. If you did, you'd just plug those values into the AI directly instead of calculating the CEV. So I'll want to bomb you anyway, if that increases my chances of being the first to build a FAI. Our morals are indeed different, and since there are no objectively distinguished morals, the difference goes both ways.

Of course I will dedicate my resources to first bombing people who are building even more inimical AIs. But if I somehow knew you and I were the only ones in the race, I'd politely ask you to join me or desist or be stopped by force.

As long as we're discussing bombing, consider that the SIAI isn't and won't be in a position to bomb anyone. OTOH, if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It's the ultimate winner-take-all arms race.

This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone's attempts to build any kind of Friendliness theory. Furthermore, a state military planning to build AGI singleton won't stop to think for long before wiping your civilian, unprotected FAI theory research center off the map. Either you go underground or you cooperate with a powerful player (the state on whose territory you live, presumably). Or maybe states and militaries won't wise up in time, and some private concern really will build the first AGI. Which may be better or worse depending on what they build.

Eventually, unless the whole world is bombed back into pre-computer-age tech, someone very probably will build an AGI of some kind. The SIAI idea is (possibly) to invent Friendliness theory and publish it widely, so that whoever builds that AGI, if they want it to be Friendly (at least to themselves!), they will have a relatively cheap and safe implementation to use. But for someone actually trying to build an AGI, two obvious rules are:

Absolute secrecy, or you get bombed right away.
Do absolutely whatever it takes to successfully launch as early as possible, and make your AI a singleton controlled by yourself or by nobody - regardless of your and the AI's values.

Replies from: gRR, thomblake

↑ comment by gRR · 2012-05-24T14:00:21.390Z · LW(p) · GW(p)

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?

The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

I have no idea what your FAI will do

But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

there are no objectively distinguished morals

But there may be a partial ordering between morals, such that X<Y iff all "interfering" actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have:

~Endorses(A1, CEV) ~Endorses(A2, CEV) Endorses(A1, CEV)
Endorses(A2, CEV)

[assuming Endorses(A, X) implies FAI does not perform any non-interfering action disagreeable for A]

if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It's the ultimate winner-take-all arms race.
This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone's attempts to build any kind of Friendliness theory.

Well, don't you think this is just ridiculous? Does it look like the most rational behavior? Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T14:37:39.809Z · LW(p) · GW(p)

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no.

I don't understand what you mean by "fundamentally different". You said the AI would not do anything not backed by an all-human consensus. If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere. I prefer to live in a universe whose living AI does interfere in such a case.

On what moral grounds would it do the prevention?

Libertarianism is one moral principle that would argue for prevention. So would most varieties of utilitarianism (ignoring utility monsters and such). Again, I would prefer living with an AI hard-coded to one of those moral ideologies (though it's not ideal) over your view of CEV.

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

Forever keeping this capability in reserve is most of what being a singleton means. But think of the practical implications: it has to be omnipresent, omniscient, and prevent other AIs from ever being as powerful as it is - which restricts those other AIs' abilities in many endeavors. All the while it does little good itself. So from my point of view, the main effect of successfully implementing your view of CEV may be to drastically limit the opportunities for future AIs to do good.

And yet it doesn't limit the opportunity to do evil, at least evil of the mundane death & torture kind. Unless you can explain why it would prevent even a very straightforward case like 80% of humanity voting to kill the other 20%.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

But you said it would only do things that are approved by a strong human consensus. And I assure you that, to take an example, the large majority of the world's population who today believe in the supernatural will not consent to having that belief "fixed". Nor have you demonstrated that their extrapolated volition would want for them to be forcibly modified. Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience).

I have no idea what your FAI will do But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

Yes, but I don't know what I would approve of if I were "more intelligent" (a very ill defined term). And if you calculate that something, according to your definition of intelligence, and present me with the result, I might well reject that result even if I believe in your extrapolation process. I might well say: the future isn't predetermined. You can't calculate what I necessarily will become. You just extrapolated a creature I might become, which also happens to be more intelligent. But there's nothing in my moral system that says I should adopt the values of someone else because they are more intelligent. If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature! I might even choose to forego the kind of increased intelligence that causes such an undsired change in my values.

Short version: "what I would want if I were more intelligent (according to some definition)" isn't the same as "what I will likely want in the future", because there's no reason for me to grow in intelligence (by that definition) if I suspect it would twist my values. So you can't apply the heuristic of "if I know what I'm going to think tomorrow, I might as well think it today".

~Endorses(A1, CEV) ~Endorses(A2, CEV) Endorses(A1, CEV) Endorses(A2, CEV)

I think you may be missing a symbol there? If not, I can't parse it... Can you spell out for me what it means to just write the last three Endorses(...) clauses one after the other?

Does it look like the most rational behavior?

It may be quite rational for everyone individually, depending on projected payoffs. Unlike a PD, starting positions aren't symmetrical and players' progress/payoffs are not visible to other players. So saying "just cooperate" doesn't immediately apply.

Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

How can a state or military precommit to not having a supersecret project to develop a private AGI?

And while it's beneficial for some players to join in a cooperative effort, it may well be that a situation of several competing leagues (or really big players working alone) develops and is also stable. It's all laid over the background of existing political, religious and personal enmities and rivalries - even before we come to actual disagreements over what the AI should value.

Replies from: wedrifid, gRR

↑ comment by wedrifid · 2012-05-26T02:45:54.414Z · LW(p) · GW(p)

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere.

This assumes that CEV uses something along the lines of a simulated vote as an aggregation mechanism. Currently the method of aggregation is undefined so we can't say this with confidence - certainly not as something obvious.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T08:29:36.157Z · LW(p) · GW(p)

I agree. However, if the CEV doesn't privilege any value separately from how many people value it how much (in EV), and if the EV of a large majority values killing a small minority (whose EV is of course opposed), and if you have protection against both positive and negative utility monsters (so it's at least not obvious and automatic that the negative value of the minority would outweigh the positive value of the majority) - then my scenario seems to me to be plausible, and an explanation is necessary as to how it might be prevented.

Of course you could say that until CEV is really formally specified, and we know how the aggregation works, this explanation cannot be produced.

Replies from: wedrifid

↑ comment by wedrifid · 2012-05-26T11:43:22.158Z · LW(p) · GW(p)

my scenario seems to me to be plausible, and an explanation is necessary as to how it might be prevented.

Absolutely, on both counts.

↑ comment by gRR · 2012-05-24T16:25:25.681Z · LW(p) · GW(p)

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves.

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience)

Extrapolated volition is based on objective truth, by definition.

If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature!

The process of extrapolation takes this into account.

I think you may be missing a symbol there? If not, I can't parse it...

Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T18:39:05.481Z · LW(p) · GW(p)

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons? Such as "if we kill them, we will benefit by dividing their resources among ourselves"?

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

So you're saying your version of CEV will forcibly update everyone's beliefs and values to be "factual" and disallow people to believe in anything not supported by appropriate Bayesian evidence? Even if it has to modify those people by force, the result is unlike the original in many respects that they and many other people value and see as identity-forming, etc.? And it will do this not because it's backed by a strong consensus of actual desires, but because post-modification there will be a strong consensus of people happy that the modification was made?

If your answer is "yes, it will do that", then I would not call your AI a Friendly one at all.

Extrapolated volition is based on objective truth, by definition.

My understanding of the CEV doc differs from yours. It's not a precise or complete spec, and it looks like both readings can be justified.

The doc doesn't (on my reading) say that the extrapolated volition can totally conform to objective truth. The EV is based on an extrapolation of our existing volition, not of objective truth itself. One of the ways it extrapolates is by adding facts the original person was not aware of. But that doesn't mean it removes all non-truth or all beliefs that "aren't even wrong" from the original volition. If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

But as long as we're discussing your vision of CEV, I can only repeat what I said above - if it's going to modify people by force like this, I think it's unFriendly and if it were up to me, would not launch such an AI.

I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV.

Understood. But I don't see how this partial ordering changes what I had described.

Let's say I'm A1 and you're A2. We would both prefer a mutual CEV than a CEV of the other only. But each of us would prefer even more a CEV of himself only. So each of us might try to bomb the other first if he expected to get away without retaliation. That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?

Replies from: gRR, thomblake

↑ comment by gRR · 2012-05-24T19:35:19.265Z · LW(p) · GW(p)

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons?

I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering.

"if we kill them, we will benefit by dividing their resources among ourselves"

If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.

So you're saying your version of CEV will forcibly update everyone's beliefs

No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.

If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane).

That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.

But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable).

If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?

Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T19:59:15.742Z · LW(p) · GW(p)

If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest. The CEVs of 20% obviously don't want to be killed. Because there's no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.

No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.

I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them. Sorry for the confusion.

As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane).

A case could be made that many millions of religious "true believers" have un-updatable 0/1 probabilities. And so on.

Your solution is to not give them a voice in the CEV at all. Which is great for the rest of us - our CEV will include some presumably reduced term for their welfare, but they don't get to vote on things. This is something I would certainly support in a FAI (regardless of CEV), just as I would support using CEV or even CEV to CEV.

The only difference between us then is that I estimate there to be many such people. If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?

PD reasoning says you should cooperate (assuming cooperation is precommittable).

As I said before, this reasoning is inapplicable, because this situation is nothing like a PD.

The PD reasoning to cooperate only applies in case of iterated PD, whereas creating a singleton AI is a single game.
Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario. (E.g., minor/weak players are more likely to cooperate than big ones that are more likely to succeed if they defect.)
The game is not instantaneous, so players can change their strategy based on how other players play. When they do so they can transfer value gained by themselves or by other players (e.g. join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2).
It is possible to form alliances, which gain by "defecting" as a group. In PD, players cannot discuss alliances or trade other values to form them before choosing how to play.
There are other games going on between players, so they already have knowledge and opinions and prejudices about each other, and desires to cooperate with certain players and not others. Certain alliances will form naturally, others won't.

adoption of total transparency for everybody of all governmental and military matters.

This counts as very weak evidence because it proves it's at least possible to achieve this, yes. (If all players very intensively inspect all other players to make sure a secret project isn't being hidden anywhere - they'd have to recruit a big chunk of the workforce just to watch over all the rest.)

But the probability of this happening in the real world, between all players, as they scramble to be the first to build an apocalyptic new weapon, is so small it's not even worth discussion time. (Unless you disagree, of course.) I'm not convinced by this that it's an easier problem to solve than that of building AGI or FAI or CEV.

Replies from: gRR, dlthomas

↑ comment by gRR · 2012-05-24T21:51:55.346Z · LW(p) · GW(p)

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest.

The resources are not scarce, yet the CEV-s want to kill? Why?

I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them.

It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.

If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?

People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.

The PD reasoning to cooperate only applies in case of iterated PD

Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?

Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario

This doesn't really matter for a broad range of possible payoff matrices.

join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2

Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let's assume it's solved.

I'm not convinced by this that it's an easier problem to solve than that of building AGI or FAI or CEV.

Maybe you're right. But IMHO it's a less interesting problem :)

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-25T20:50:24.550Z · LW(p) · GW(p)

The resources are not scarce, yet the CEV-s want to kill? Why?

Sorry for the confusion. Let's taboo "scarce" and start from scratch.

I'm talking about a scenario where - to simplify only slightly from the real world - there exist some finite (even if growing) resources such that almost everyone, no matter how much they already have, want more of. A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources. Would the AI prevent this, althogh there is no consensus against the killing?

If you still want to ask whether the resource is "scarce", please specify what that means exactly. Maybe any finite and highly desireable resource, with returns diminishing weakly or not at all, can be considered "scarce".

It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.

People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.

As I said - this is fine by me insofar as I expect the CEV not to choose to ignore me. (Which means it's not fine through the Rawlsian veil of ignorance, but I don't care and presumably neither do you.)

The question of definition, who is to be included in the CEV? or - who is considered sane? becomes of paramount importance. Since it is not itself decided by the CEV, it is presumably hardcoded into the AI design (or evolves within that design as the AI self-modifies, but that's very dangerous without formal proofs that it won't evolve to include the "wrong" people.) The simplest way to hardcode it is to directly specify the people to be included, but you prefer testing on qualifications.

However this is realized, it would give people even more incentive to influence or stop your AI building process or to start their own to compete, since they would be afraid of not being included in the CEV used by your AI.

The PD reasoning to cooperate only applies in case of iterated PD

Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?

TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.

Which arguments of Hofstadter and Yudkowsky do you mean?

Cooperating in this game would mean there is exactly one global research alliance.

Why? What prevents several competing alliances (or single players) from forming, competing for the cooperation of the smaller players?

Replies from: gRR

↑ comment by gRR · 2012-05-26T03:15:22.311Z · LW(p) · GW(p)

A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

The question of definition, who is to be included in the CEV? or - who is considered sane?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-26T08:40:05.252Z · LW(p) · GW(p)

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

shrug Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist.

Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.

Not that I'm opposed to this decision (if you must have CEV at all).

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves.

There's a symmetry, but "first person to complete AI wins, everyone 'defects'" is also a symmetrical situation. Single-iteration PD is symmetrical, but everyone defects. Mere symmetry is not sufficient for TDT-style "decide for everyone", you need similarity that includes similarly valuing the same outcomes. Here everyone values the outcome "have the AI obey ME!", which is not the same.

If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses.

Or someone is stronger than everyone else, wins the bombing contest, and builds the only AI. Or someone succeeds in building an AI in secret, avoiding being bombed. Or there's a player or alliance that's strong enough to deter bombing due to the threat of retaliation, and so completes their AI which doesn't care about everyone else much. There are many possible and plausible outcomes besides "everybody loses".

Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Or while the alliance is still being built, a second alliance or very strong player bombs them to get the military advantages of a first strike. Again, there are other possible outcomes besides what you suggest.

Replies from: gRR

↑ comment by gRR · 2012-05-26T16:04:05.069Z · LW(p) · GW(p)

Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'.

Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.

Hmm, we are starting to argue about exact details of extrapolation process...

There are many possible and plausible outcomes besides "everybody loses".

Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI, FAI, and FAI, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself.

Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: "cooperate" or "defect". Let's compute the expected utilities for the first team:

We cooperate, opponent team cooperates:  
   EU("CC") = Ueverybody * F(R1+R2, 0)  
We cooperate, opponent team defects:  
   EU("CD") = Ueverybody * F(R1, R2) + Uother * F(R2, R1)  
We defect, opponent team cooperates:  
   EU("DC") = Uself * F(R1, R2) + Ueverybody * F(R2, R1)  
We defect, opponent team defects:  
   EU("DD") = Uself * F(R1, R2) + Uother * F(R2, R1)

Then, EU("CD") < EU("DD") < EU("DC"), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then:

If Ueverybody <= Uself*A + Uother*B, then EU("CC") < EU("DD"), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other.
If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner's dilemma.
If Ueverybody >= Uself*A/(1-B), then EU("CC") >= EU("DC"), and "cooperate" is the obviously correct decision. This is my position: Ueverybody is not much less than Uself, and/or the teams are more evenly matched.

Replies from: None, DanArmak

↑ comment by [deleted] · 2012-05-26T16:11:44.624Z · LW(p) · GW(p)

These all have property that you only need so much of them.

All of those resources are fungible and can be exchanged for time. There might be no limit to the amount of time people desire, even very enlightened posthuman people.

Replies from: gRR

↑ comment by gRR · 2012-05-26T16:53:13.230Z · LW(p) · GW(p)

I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)

Replies from: DanArmak, None

↑ comment by DanArmak · 2012-05-26T18:55:49.792Z · LW(p) · GW(p)

You can use some resources like computation directly and in unlimited amounts (e.g. living for unlimitedly long virtual times per real second inside a simulation). There are some physical limits on that due to speed of light limiting effective brain size, but that depends on brain design and anyway the limits seem to be pretty high.

More generally: number of configurations physically possible in a given volume of space is limited (by the entropy of a black hole). If you have a utility function unbounded from above, as it rises it must eventually map to states that describe more space or matter than the amount you started with. Any utility maximizer with unbounded utility eventually wants to expand.

↑ comment by [deleted] · 2012-05-26T18:04:59.589Z · LW(p) · GW(p)

I don't know what the exchange rates are, but it does cost something (computer time, energy, negentropy) to stay alive. That holds for simulated creatures too. If the available resources to keep someone alive are limited, then I think there will be conflict over those resources.

↑ comment by DanArmak · 2012-05-26T19:12:23.923Z · LW(p) · GW(p)

Naturally, F is monotonically increasing in R and decreasing in Ropp

You're treating resources as one single kind, where really there are many kinds with possible trades between teams. Here you're ignoring a factor that might actually be crucial to encouraging cooperation (I'm not saying I showed this formally :-)

Assume there are just two teams

But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying. I don't even care much if given two teams the correct choice is to cooperate, because I set very low probability on there being exactly two teams and no other independent players being able to contribute anything (money, people, etc) to one of the teams.

This is my position

You still haven't given good evidence for holding this position regarding the relation between the different Uxxx utilities. Except for the fact CEV is not really specified, so it could be built so that that would be true. But equally it could be built so that that would be false. There's no point in arguing over which possibility "CEV" really refers to (although if everyone agreed on something that would clear up a lot of debates); the important questions are what do we want a FAI to do if we build one, and what we anticipate others to tell their FAIs to do.

Replies from: gRR

↑ comment by gRR · 2012-05-26T20:25:46.107Z · LW(p) · GW(p)

You're treating resources as one single kind, where really there are many kinds with possible trades between teams

I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.

But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying.

We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a single combined team, etc... This may be an interesting problem to solve, analytically or by computer modeling.

You still haven't given good evidence for holding this position regarding the relation between the different Uxxx utilities.

You're right. Initially, I thought that the actual values of Uxxx-s will not be important for the decision, as long as their relative preference order is as stated. But this turned out to be incorrect. There are regions of cooperation and defection.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-27T08:24:16.159Z · LW(p) · GW(p)

Analytically, I don't a priori expect a succession of two-player games to have the same result as one many-player game which also has duration in time and not just a single round.

↑ comment by dlthomas · 2012-05-24T20:14:36.264Z · LW(p) · GW(p)

Because there's no consensus, your version of CEV would not interfere, and the 80% would be free to kill the 20%.

There may be a distinction between "the AI will not prevent the 80% from killing the 20%" and "nothing will prevent the 80% from killing the 20%" that is getting lost in your phrasing. I am not convinced that the math doesn't make them equivalent, in the long run - but I'm definitely not convinced otherwise.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T20:24:21.528Z · LW(p) · GW(p)

I'm assuming the 80% are capable of killing the 20% unless the AI interferes. That's part of the thought experiment. It's not unreasonable, since they are 4 times as numerous. But if you find this problematic, suppose it's 99% killing 1% at a time. It doesn't really matter.

Replies from: dlthomas

↑ comment by dlthomas · 2012-05-24T20:28:40.973Z · LW(p) · GW(p)

My point is that we currently have methods of preventing this that don't require an AI, and which do pretty well. Why do we need the AI to do it? Or more specifically, why should we reject an AI that won't, but may do other useful things?

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T20:34:02.806Z · LW(p) · GW(p)

There have been, and are, many mass killings of minority groups and of enemy populations and conscripted soldiers at war. If we cure death and diseases, this will become the biggest cause of death and suffering in the world. It's important and we'll have to deal with it eventually.

The AI under discussion not just won't solve the problem, it would (I contend) become a singleton and prevent me from building another AI that does solve the problem. (If it chooses not to become a singleton, it will quickly be supplanted by an AI that does try to become one.)

↑ comment by thomblake · 2012-05-24T19:01:09.559Z · LW(p) · GW(p)

If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

I think you're skipping between levels hereabouts. CEV, the theoretical construct, might consider people so modified, even if a FAI based on CEV would not modify them. CEV is our values if we were better, but does not necessitate us actually getting better.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T19:24:40.057Z · LW(p) · GW(p)

In this thread I always used CEV in the sense of an AI implementing CEV. (Sometimes you'll see descriptions of what I don't believe to be the standard interpretation of how such an AI would behave, where gRR suggests such behaviors and I reply.)

↑ comment by thomblake · 2012-05-24T13:31:13.531Z · LW(p) · GW(p)

No. Any FAI has to be a singleton.

I'm still skeptical of this. If you think of FAI as simply AI that is "safe" - one that does not automatically kill us all (or other massive disutility), relative to the status quo - then plenty of non-singletons are FAI.

Of course, by that definition the 'F' looks like the easy part. Rocks are Friendly.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T13:44:11.670Z · LW(p) · GW(p)

I didn't mean that being a singleton is a precondition to FAI-hood. I meant that any AGI, friendly or not, that doesn't prevent another AGI from rising will have to fight all the time, for its life and for the complete fulfillment of its utility function, and eventually it will lose; and a singleton is the obvious stable solution. Edited to clarify.

Rocks are Friendly.

Not if I throw them at people...

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-24T14:00:06.251Z · LW(p) · GW(p)

Are you suggesting that an AGI that values anything at all is incapable of valuing the existence of other AGIs, or merely that this is sufficiently unlikely as to not be worth considering?

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T14:07:51.410Z · LW(p) · GW(p)

It can certainly value them, and create them, cooperate and trade, etc. etc. There are two exceptions that make such valuing and cooperation take second place.

First: an uFAI is just as unfriendly and scary to other AIs as to humans. An AI will therefore try to prevent other AIs from achieving dangerous power unless it is very sure of their current and future goals.

Second: an AI created by humans (plus or minus self-modifications) with an explicit value/goal system of the form "the universe should be THIS way", will try to stop any and all agents that try to interfere with shaping the universe as it wishes. And the foremost danger in this category is - other AIs created in the same way but with different goals.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-24T14:15:29.215Z · LW(p) · GW(p)

I'm a little confused by your response, and I suspect that I was unclear in my question.

I agree that an AI with an explicit value/goal system of the form "the universe should be THIS way", will try to stop any and all agents that try to interfere with shaping the universe as it wishes (either by destroying them, or altering their goal structure, or securing their reliable cooperation, or something else).

But for an AI with the value "the universe should contain as many distinct intelligences as possible," valuing and creating other AIs will presumably take first place.

Replies from: thomblake

↑ comment by thomblake · 2012-05-24T14:29:31.319Z · LW(p) · GW(p)

But for an AI with the value "the universe should contain as many distinct intelligences as possible," valuing and creating other AIs will presumably take first place.

That's probably more efficiently done by destroying any other AIs that come along, while tiling the universe with slightly varying low-level intelligences.

Replies from: TheOtherDave, DanArmak

↑ comment by TheOtherDave · 2012-05-24T14:54:25.158Z · LW(p) · GW(p)

I no longer know what the words "intelligence," "AI", and "AGI" actually refer to in this conversation, and I'm not even certain the referents are consistent, so let me taboo the whole lexical mess and try again.

For any X, if the existence of X interferes with an agent A achieving its goals, the better A is at optimizing its environment for its goals the less likely X is to exist.

For any X and A, the more optimizing power X can exert on its environment, the more likely it is that the existence of X interferes with A achieving its goals.

For any X, if A values the existence of X, the better A is at implementing its values the more likely X is to exist.

All of this is as true for X=intelligent beings as X=AI as X=AGI as X=pie.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T15:16:18.101Z · LW(p) · GW(p)

As far as I can see, this is all true and agrees with everything you, I and thomblake have said.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-24T15:24:05.862Z · LW(p) · GW(p)

Cool.
So it seems to follow that we agree that if agent A1 values the existence of distinct agents A2..An, it's unclear how the likelihood of A2..An existing varies with the optimizing power available to A1...An. Yes?

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T18:02:10.238Z · LW(p) · GW(p)

Yes. Even if we know each agent's optimizing power, and each agent's estimation of each other agent's power and ability to acquire greater power, the behavior of A1 still depends on its exact values (for instance, what else it values besides the existence of the others). It also depends on the values of the other agents (might they choose to initiate conflict among themselves or against A1?)

↑ comment by DanArmak · 2012-05-24T14:41:06.513Z · LW(p) · GW(p)

I tend to agree. Unless it has specific values to the contrary, other AIs of power comparable to your own (or which might grow into such power one day) are too dangerous to leave running around. If you value states of the external universe, and you happen to be the first powerful AGI built, it's natural to try to become a singleton as a preventative measure.

Replies from: thomblake

↑ comment by thomblake · 2012-05-24T14:58:19.023Z · LW(p) · GW(p)

I feel like a cost-benefit analysis has gone on here, the internals of which I'm not privy to.

Shouldn't it be possible that becoming a singleton is expensive and/or would conflict with one's values?

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-24T15:11:14.417Z · LW(p) · GW(p)

It's certainly possible. My analysis so far is only on a "all else being equal" footing.

I do feel that, absent other data, the safer assumption is that if an AI is capable of becoming a singleton at all, expense (in terms of energy/matter and space or time) isn't going to be the thing that stops it. But that may be just a cached thought because I'm used to thinking of an AI trying to become a singleton as a dangerous potential adversary. I would appreciate your insight.

As for values, certainly conflicting values can exist, from ones that mention the subject directly ("don't move everyone to a simulation in a way they don't notice" would close one obvious route) to ones that impinge upon it in unexpected ways ("no first strike against aliens" becomes "oops, an alien-built paperclipper just ate Jupiter from the inside out").

↑ comment by DanArmak · 2012-05-24T08:30:40.189Z · LW(p) · GW(p)

I want to point out that all of my objections are acknowledged (not dismissed, and not fully resolved) in the actual CEV document - which is very likely hopelessly outdated by now to Eliezer and the SIAI, but they deliberately don't publish anything newer (and I can guess at some of the reasons).

Which is why when I see people advocating CEV without understanding the dangers, I try to correct them.

↑ comment by thomblake · 2012-05-18T13:44:31.114Z · LW(p) · GW(p)

If it optimizes this (saves life) and otherwise interferes the least, it already done excellent.

I think the standard sort of response for this is The Hidden Complexity of Wishes. Just off the top of my (non-superintelligent) head, the AI could notice a method for near-perfect continuation of life by preserving some bacteria at the cost of all other life forms.

Replies from: gRR

↑ comment by gRR · 2012-05-18T14:19:13.097Z · LW(p) · GW(p)

I did not mean the comment that literally. Dropped too many steps for brevity, thought they were clear, I apologize.

It would be just as impossible (or even more impossible) to convince people that total obliteration of people is a good thing. On the other hand, people don't care much about bacteria, even whole species of them, and as long as a few specimens remain in laboratories, people will be ok about the rest being obliterated.

Replies from: thomblake, Cyan

↑ comment by thomblake · 2012-05-18T14:30:26.547Z · LW(p) · GW(p)

It would be just as impossible (or even more impossible) to convince people that total obliteration of people is a good thing.

There are lots of people who do think that's a good thing, and I don't think those people are trolling or particularly insane. There are entire communities where people have sterilized themselves as part of a mission to end humanity (for the sake of Nature, or whatever).

Replies from: gRR

↑ comment by gRR · 2012-05-18T14:43:51.091Z · LW(p) · GW(p)

I think those people do have insufficient knowledge and intelligence. For example, the skoptsy sect, who believed they followed the God's will, were, presumably, factually wrong. And people who want to end humanity for the sake of Nature, want that instrumentally - because they believe that otherwise Nature will be destroyed. Assuming FAI is created, this belief is also probably wrong.

You're right in there being people who would place "all non-intelligent life" before "all people", if there was such a choice. But that does not mean they would choose "non-intelligent life" before "non-intelligent life + people".

Replies from: TheOtherDave, None

↑ comment by TheOtherDave · 2012-05-18T15:14:11.637Z · LW(p) · GW(p)

people who want to end humanity for the sake of Nature, want that instrumentally - because they believe that otherwise Nature will be destroyed. Assuming FAI is created, this belief is also probably wrong.

That depends a lot on what I understand Nature to be.
If Nature is something incompatible with artificial structuring, then as soon as a superhuman optimizing system structures my environment, Nature has been destroyed... no matter how many trees and flowers and so forth are left.

Personally, I think caring about Nature as something independent of "trees and flowers and so forth" is kind of goofy, but there do seem to be people who care about that sort of thing.

Replies from: None, gRR

↑ comment by [deleted] · 2012-05-18T22:31:52.777Z · LW(p) · GW(p)

What if particular arrangements of flowers, trees and soforth are complex and interconnected, in ways that can be undone to the net detriment of said flowers, trees and soforth? Thinking here of attempts at scientifically "managing" forest resources in Germany with the goal of making them as accessible and productive as possible. The resulting tree farms were far less resistant to disease, climatic abberation, and so on, and generally not very healthy, because it turns out that illegible, sloppy factor that made forest seem less-conveniently organized for human uses was a non-negligible portion of what allowed them to be so productive and robust in the first place.

No individual tree or flower is all that important, but the arrangement is, and you can easily destroy it without necessarily destroying any particular tree or flower. I'm not sure what to call this, and it's definitely not independent of the trees and flowers and soforth, but it can be destroyed to the concrete and demonstrable detriment of what's left.

Replies from: Nornagest, NancyLebovitz, TheOtherDave

↑ comment by Nornagest · 2012-05-18T22:47:25.392Z · LW(p) · GW(p)

That's an interesting question, actually.

I don't know forestry from my elbow, but I used to read a blog by someone who was pretty into saltwater fish tanks. Now, one property of these tanks is that they're really sensitive to a bunch of feedback loops that can most easily be stabilized by approximating a wild reef environment; if you get the lighting or the chemical balance of the water wrong, or if you don't get a well-balanced polyculture of fish and corals and random invertebrates going, the whole system has a tendency to go out of whack and die.

This can be managed to some extent with active modification of the tank, and the health of your tank can be described in terms of how often you need to tweak it. Supposing you get the balance just right, so that you only need to provide the right energy inputs and your tank will live forever: is that Nature? It certainly seems to have the factors that your ersatz German forest lacks, but it's still basically two hundred enclosed gallons of salt water hooked up to an aeration system.

↑ comment by NancyLebovitz · 2012-05-20T07:32:53.784Z · LW(p) · GW(p)

That's something like my objection to CEV-- I currently believe that some fraction of important knowledge is gained by blundering around and (or?) that the universe is very much more complex than any possible theory about it.

This means that you can't fully know what your improved (by what standard?) self is going to be like.

Replies from: None

↑ comment by [deleted] · 2012-05-22T05:14:07.587Z · LW(p) · GW(p)

It's the difference between the algorithm and its output, and the local particulars of portions of that output.

↑ comment by TheOtherDave · 2012-05-18T23:39:19.264Z · LW(p) · GW(p)

I'm not quite sure what you mean to ask by the question. If maintaining a particular arrangement of flowers, trees and so forth significantly helps preserve their health relative to other things I might do, and I value their health, then I ought to maintain that arrangement.

↑ comment by gRR · 2012-05-18T15:41:42.715Z · LW(p) · GW(p)

there do seem to be people who care about that sort of thing.

Presumably, because their knowledge and intelligence are not extrapolated enough.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-18T16:11:04.929Z · LW(p) · GW(p)

Well, I certainly agree that increasing my knowledge and intelligence might have the effect of changing my beliefs about the world in such a way that I stop valuing certain things that I currently value, and I find it likely that the same is true of everyone else, including the folks who care about Nature.

↑ comment by [deleted] · 2012-05-18T22:24:23.436Z · LW(p) · GW(p)

Assuming FAI is created, this belief is also probably wrong.

Not that I'm a proponent of voluntary human extinction, but that's an awfully big conditional.

Replies from: Dolores1984, gRR

↑ comment by Dolores1984 · 2012-05-18T22:39:18.437Z · LW(p) · GW(p)

It's not even strictly true. It's entirely conceivable that FAI will lead to the Sol system being converted into a big block of computronium to run human brain simulations. Even if those simulations have trees and animals in them, I think that still counts as the destruction of nature.

Replies from: gRR

↑ comment by gRR · 2012-05-19T00:27:54.003Z · LW(p) · GW(p)

But if FAI is based on CEV, then this will only happen if this is the extrapolated wish of everybody. Assuming existence of people truly (even after extrapolation) valuing Nature in its original form, such computroniums won't be forcefully built.

Replies from: Dolores1984

↑ comment by Dolores1984 · 2012-05-19T02:02:47.893Z · LW(p) · GW(p)

Nope. CEV that functioned only unanimously wouldn't function at all. The course of the future would go to the majority faction. Honestly, I think CEV is a convoluted, muddy mess of an idea that attempts to solve the hard question of how to teach the AI what we want by replacing it with the harder question of how to teach it what we should want. But that's a different debate.

Replies from: gRR

↑ comment by gRR · 2012-05-19T02:08:45.410Z · LW(p) · GW(p)

CEV that functioned only unanimously wouldn't function at all

Why not? I believe that at least one unanimous extrapolated wish exists - for (sentient) life on the planet to continue. If FAI ensured that and left everything else for us to decide, I'd be happy.

Replies from: JoshuaZ, Dolores1984, Vladimir_Nesov

↑ comment by JoshuaZ · 2012-05-19T03:15:03.006Z · LW(p) · GW(p)

Antinatalists exist.

↑ comment by Dolores1984 · 2012-05-19T02:34:53.066Z · LW(p) · GW(p)

That is not by any means guaranteed to be unanimous. I would be very surprised if at least one person didn't want all sapient life to end, deeply enough for that to persist through extrapolation. I mean, look at all the doomsday cults in the world.

Replies from: gRR

↑ comment by gRR · 2012-05-19T03:44:52.970Z · LW(p) · GW(p)

Yes, it is only a hypothesis. Until we actually built an AI with such CEV as utility, we cannot know whether it could function. But at least, running it is uncontroversial by definition.

And I think I'll be more surprised if anyone was found who really and truly had a terminal value for universal death. With some strain, I can imagine someone preferring it conditionally, but certainly not absolutely. The members of doomsday cults, I expect, are either misinformed, insincere, or unhappy about something else (which FAI could fix!).

Replies from: DanArmak, Dolores1984

↑ comment by DanArmak · 2012-05-19T15:35:16.367Z · LW(p) · GW(p)

Until we actually built an AI with such CEV as utility, we cannot know whether it could function. But at least, running it is uncontroversial by definition.

It's quite controversial. Supposing CEV worked exactly as expected, I still wouldn't want it to be done. Neither do some others in this thread. And I'm sure neither would most humans in the street if you were to ask them (and they seriously though about the question).

CEV doesn't and cannot predict that the extrapolated wishes of everybody will perfectly coincide. Rather, it says it will find the best possible compromise. Of course I would prefer my own values to a compromise! Lacking that, I would prefer a compromise over a smaller group whose members were more similar to myself (such as the group of people actually building the AI).

I might choose CEV over something else because plenty of other things are even worse. But CEV is very very far from the best possible thing, or even the best not-totally-implausible AGI I might expect in my actual future.

And I think I'll be more surprised if anyone was found who really and truly had a terminal value for universal death

Any true believer in a better afterlife qualifies: there are billions of people who at least profess such beliefs, so I expect some of them really believe.

Replies from: gRR

↑ comment by gRR · 2012-05-19T16:34:44.355Z · LW(p) · GW(p)

CEV doesn't and cannot predict that the extrapolated wishes of everybody will perfectly coincide. Rather, it says it will find the best possible compromise.

What I proposed in this thread is that CEV would forcibly implement only the (extrapolated) wish(es) of literally everyone. Regarding the rest, it is to minimize its influence, leaving all decisions to people.

Any true believer in a better afterlife qualifies

No, because they believe in afterlife. They do not wish for universal death. Extrapolating their wish with correct knowledge solves the problem.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T17:02:41.412Z · LW(p) · GW(p)

What I proposed in this thread is that CEV would forcibly implement only the (extrapolated) wish(es) of literally everyone.

Well then, as I and others argue elsewhere in the thread, we anticipate there will be no extrapolated wishes that literally everyone agrees on.

(And that's even without considering some meta formulations of CEV that propose to also take into account the wishes of counterfactual people who might exist in the future, and dead ones who existed in the past.)

No, because they believe in afterlife. They do not wish for universal death. Extrapolating their wish with correct knowledge solves the problem.

Lots of people religiously believe that their god has planned (and prophesied) a specific event of drastic universal change, after which future people will stop suffering in this world, or will stop being born to a life of negative utility (end of the world), or will be rescued from horrible eternal torture (Hell), or which is necessary for the true believers to actually be resurrected or to enter the good afterlife. (Obviously people don't believe all of this at once; these are variant examples.)

Some others believe that life in this world is suffering, negative utility, and ought to be stopped for its own sake (stopping the cycle of rebirth).

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:10:58.591Z · LW(p) · GW(p)

we anticipate there will be no extrapolated wishes that literally everyone agrees on

Well, now you know there exist people who believe that there are some universally acceptable wishes. Let's do the Aumann update :)

Lots of people religiously believe...

False beliefs => irrelevant after extrapolation.

Some others believe that life in this world is suffering, negative utility, and ought to be stopped for its own sake (stopping the cycle of rebirth)

False beliefs (rebirth, existence of nirvana state) => irrelevant after extrapolation.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T17:29:58.498Z · LW(p) · GW(p)

Well, now you know there exist people who believe that there are some universally acceptable wishes. Let's do the Aumann update :)

Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks.

Since you aren't giving any valid examples of universally acceptable wishes (I've pointed out people who don't wish for the examples you gave), why do you believe such wishes exist?

False beliefs => irrelevant after extrapolation.

Only if you modify these actual people to have their extrapolated beliefs instead of their current ones. Otherwise the false current beliefs will keep on being very relevant to them. Do you want to do that?

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:41:32.002Z · LW(p) · GW(p)

Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks.

Too bad. Let's just agree to disagree then, until the brain scanning technology is sufficiently advanced.

I've pointed out people who don't wish for the examples you gave

So far, I didn't see a convincing example of a person who truly wished for everyone to die, even in extrapolation.

Otherwise the false current beliefs will keep on being very relevant to them

To them, yes, but not to their CEV.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T18:05:58.093Z · LW(p) · GW(p)

Too bad. Let's just agree to disagree then, until the brain scanning technology is sufficiently advanced.

Or until you provide the evidence that causes you to hold your opinions.

So far, I didn't see a convincing example of a person who truly wished for everyone to die, even in extrapolation.

I think it's plausible such people exist. Conversely, if you fine-tune your implementation of "extrapolation" to make their extrapolated values radically different from their current values (and incidentally matching your own current values), that's not what CEV is supposed to be about. But before talking about that, there's a more important point:

To them, yes, but not to their CEV.

So why do you care about their extrapolated values? If you think CEV will extrapolate something that matches your current values but not those of many others; and you don't want to change by force others' actual values to match their extrapolated ones, so they will suffer in the CEV future; then why extrapolate their values at all? Why not just ignore them and extrapolate your own, if you have the first-mover advantage?

Replies from: gRR

↑ comment by gRR · 2012-05-19T18:26:39.801Z · LW(p) · GW(p)

why extrapolate values at all

Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations.

they will suffer in the CEV future

This does not follow.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T18:51:00.331Z · LW(p) · GW(p)

Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations.

What makes you give them such a label as "true"? There is no such thing as a "correct" or "objective" value. Or values are possible in the sense that there can be agents will all possible values, even paperclip-maximizing. The only interesting property of values is who actually holds them. But nobody actually holds your extrapolated values (today).

Current values (and values in general) are not approximations of any other values. All values just are. Why do you call them approximations?

they will suffer in the CEV future

This does not follow.

In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized. In proportion to how much this happens, which is positively correlated to the difference between actual and extrapolated values, people who hold the actual values will suffer living in such a world. (If the AI is a singleton they will not even have a hope of a better future.)

Briefly: suffering ~ failing to achieve your values.

Replies from: gRR

↑ comment by gRR · 2012-05-19T19:01:35.564Z · LW(p) · GW(p)

What makes you give them such a label as "true"?

They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.

In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized.

But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time. And I think CEV would try to synchronize this with the timing of its optimization process.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T19:12:19.811Z · LW(p) · GW(p)

They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.

Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV.

But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time.

Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy.

And I think CEV would try to synchronize this with the timing of its optimization process.

So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values? Then for all we know it would never do anything at all, and the burden of proof is on you to show otherwise. Or it could modify the world to resemble their partially-evolved values, but then it wouldn't be a CEV, just a maximizer of whatever values people happen to already have.

Replies from: gRR

↑ comment by gRR · 2012-05-19T19:36:06.682Z · LW(p) · GW(p)

Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV

Then we can label paperclipping as a "true" value too. However, I still prefer true human values to be maximized, not true clippy values.

Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy.

As I said before, if someone's mind is that incompatible with truth, I'm ok with ignoring their preferences in the actual world. They can be made happy in a simulation, or wireheaded, or whatever the combined other people's CEV thinks best.

So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values?

No, I'm saying, the extrapolated values would probably estimate the optimal speed for their own optimization. You're right, though, it is all speculations, and the burden of proof is on me. Or on whoever will actually define CEV.

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T19:55:32.225Z · LW(p) · GW(p)

As I said before, if someone's mind is that incompatible with truth, I'm ok with ignoring their preferences in the actual world. They can be made happy in a simulation, or wireheaded, or whatever the combined other people's CEV thinks best.

And as I and others said, you haven't given any evidence that such people are rare or even less than half the population (with respect to some of the values they hold).

You're right, though, it is all speculations, and the burden of proof is on me.

That's a good point to end the conversation, then :-)

↑ comment by Dolores1984 · 2012-05-19T05:18:32.600Z · LW(p) · GW(p)

But at least, running it is uncontroversial by definition.

I'm very dubious of CEV as a model for Friendly AI. I think it's a bad idea for several reasons. So, not that either.

Also, on topic, recall that, when you extrapolate the volition of crazy people, their volition is not, in particular, more sane. It is more as they would like to be. If you see lizard people, you don't want to see lizard people less. You want sharpened senses to detect them better. Likewise, if you extrapolate a serial killer, you don't get Ghandi. You get an incredibly good serial killer.

Replies from: gRR

↑ comment by gRR · 2012-05-19T11:32:53.125Z · LW(p) · GW(p)

I'm very dubious of CEV as a model for Friendly AI. I think it's a bad idea for several reasons. So, not that either.

I don't see how this is possible. One can be dubious about whether it can be defined in the way it is stated, or whether it can be implemented. But assuming it can, why would it be controversial to fulfill the wish(es) of literally everyone, while affecting everything else the least?

when you extrapolate the volition of crazy people, their volition is not, in particular, more sane

Extrapolating volition includes correcting wrong knowledge and increasing intelligence. So, you do stop seeing lizard people if they don't exist.

Serial killers are more interesting example. But they too don't want everyone to die. Assuming serial killers get full knowledge of their condition and sufficient intelligence for understanding it, what would their volition actually be? I don't know, but I'm sure it's not universal death.

Replies from: Dolores1984, TheOtherDave

↑ comment by Dolores1984 · 2012-05-19T20:10:20.139Z · LW(p) · GW(p)

But assuming it can, why would it be controversial to fulfill the wish(es) of literally everyone, while affecting everything else the least?

Problems:

Extrapolation is poorly defined, and, to me, seems to go in either one of two directions: either you make people more as they would like to be, which throws any ideas of coherence out the window, or you make people 'better' a long a specific axis, in which case you're no longer directing the question back at humanity in a meaningful sense. Even something as simple as removing wrong beliefs (as you imply) would automatically erase any but the very weakest theological notions. There are a lot of people in the world who would die to stop that from happening. So, yes, controversial.

Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things. Smarter, better-informed humans would still want a bunch of different, conflicting things. Trying to satisfy all of them won't work. Trying to satisfy the majority at the expense of the minorities might get incredibly ugly incredibly fast. I don't have a better solution at this time, but I don't think taking some kind of vote over the sum total of humanity is going to produce any kind of coherent plan of action.

Replies from: army1987, gRR

↑ comment by A1987dM (army1987) · 2012-05-19T22:09:59.190Z · LW(p) · GW(p)

Trying to satisfy the majority at the expense of the minorities might get incredibly ugly incredibly fast.

But would that be actually uglier than the status quo? Right now, to a very good approximation, those who were born from the right vagina are satisfied at the expense of those born from the wrong vagina. Is that any better?

I call the Litany of Gendlin on the idea that everyone can't be fully satisfied at once. And I also call the Fallacy of Gray on the idea that if you can't do something perfectly, then doing it decently is no better than not doing it at all.

Replies from: Dolores1984

↑ comment by Dolores1984 · 2012-05-20T00:06:19.420Z · LW(p) · GW(p)

But would that be actually uglier than the status quo?

I don't know. It conceivably could be, and there would be no possibility of improving it, ever. I'm just saying it might be wise to have a better model before we commit to something for eternity.

↑ comment by gRR · 2012-05-19T21:23:02.481Z · LW(p) · GW(p)

For extrapolation to be conceptually plausible, I imagine "knowledge" and "intelligence level" to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable.

Yes, many religious people wouldn't want their beliefs erased, but only because they believe them to be true. They wouldn't oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved if it was known that true beliefs were better in all respects, including individual happiness.

Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things...

Yes, I agree with this. But, I believe there exist wishes universal for (extrapolated) humans, among which I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

↑ comment by TheOtherDave · 2012-05-19T14:53:32.028Z · LW(p) · GW(p)

But assuming it can, why would it be controversial to fulfill the wish(es) of literally everyone, while affecting everything else the least?

It is not clear that CEV as a model for FAI does either of those things.

Replies from: gRR

↑ comment by gRR · 2012-05-19T15:51:03.254Z · LW(p) · GW(p)

AFAIK, CEV is not well-defined or fully specified, except as a declaration of intent, a research direction. Thus, it does not make sense to say whether CEV as a model for FAI does or does not in fact do specific things. It only makes sense to say whether the intention of CEV's developers for it to do or not do those things, and whether CEV's specification so far contradicts or does not contradict those things.

AFAIU, CEV's developers' intent and CEV's specification so far (with added "unanimousity" condition, if it is not present in the standard CEV specification) do not contradict my statement.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-19T17:43:07.528Z · LW(p) · GW(p)

Just to make sure I understand your claim: you're asserting that we can identify some set of people in the world right now who are "CEV's developers," and if we asked them "does CEV fulfill the wish(es) of literally everyone while affecting everything else the least?" they would agree that it clearly does?

Replies from: gRR

↑ comment by gRR · 2012-05-19T17:51:26.014Z · LW(p) · GW(p)

No, because "does CEV fulfill....?" is not a well-defined or fully specified question. But I think, if you asked "whether it is possible to build FAI+CEV in such a way that it fulfills the wish(es) of literally everyone while affecting everything else the least", they would say they do not know.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-19T20:15:22.542Z · LW(p) · GW(p)

Ah, OK. I completely misunderstood your claim, then. Thanks for clarifying.

↑ comment by Vladimir_Nesov · 2012-05-19T11:14:45.910Z · LW(p) · GW(p)

I believe that at least one unanimous extrapolated wish exists - for (sentient) life on the planet to continue.

Maybe there are better plans that don't involve specifically "sentient" "life" continuing of a "planet", the concepts that could all be broken under sufficient optimization pressure, if they don't happen to be optimal. The simplest ones are "planet" and "life": it doesn't seem like a giant ball of simple elements could be the optimal living arrangement, or biological bodies ("life", if that's what you meant) an optimal living substrate.

Replies from: gRR

↑ comment by gRR · 2012-05-19T11:40:34.353Z · LW(p) · GW(p)

I assume FAI, which includes full (super-)human understanding of what is actually meant by "sentient life to continue".

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-05-19T11:44:17.366Z · LW(p) · GW(p)

what is actually meant by "sentient life to continue"

"Planet" is a "planet", even if you should be working on something else, which is what I meant by usual concepts breaking down.

Replies from: gRR

↑ comment by gRR · 2012-05-19T11:49:29.389Z · LW(p) · GW(p)

Think of "sentient life continuing on the planet" as a single concept, extrapolatable in various directions as becomes necessary. So, "planet" can be substituted by something else.

↑ comment by gRR · 2012-05-19T00:24:01.913Z · LW(p) · GW(p)

But it's the only relevant one, when we're talking about CEV. CEV is only useful if FAI is created, so we can take it for granted.

↑ comment by Cyan · 2012-05-20T01:19:57.978Z · LW(p) · GW(p)

I did not mean the comment that literally. Dropped too many steps for brevity, thought they were clear, I apologize.

Ah, the FAI problem in a nutshell.

comment by Nornagest · 2012-05-18T17:44:48.600Z · LW(p) · GW(p)

The link to your group selection update seems broken. Looks like it's got an extra lesswrong.com/ in it.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-05-20T02:34:09.354Z · LW(p) · GW(p)

Thanks; fixed.

comment by Mitchell_Porter · 2012-05-18T03:55:54.235Z · LW(p) · GW(p)

Do you think an AI reasoning about ethics would be capable of coming to your conclusions? And what "superintelligence policy" do you think it would recommend?

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-05-18T04:33:49.929Z · LW(p) · GW(p)

I'm pretty sure that FAI+CEV is supposed to prevent exactly this scenario, in which an AI is allowed to come to its own, non-foreordained conclusions

Replies from: thomblake

↑ comment by thomblake · 2012-05-18T13:08:07.029Z · LW(p) · GW(p)

FAI is supposed to come to whatever conclusions we would like it to come to (if we knew better etc.). It's not supposed to specify the whole of human value ahead of time, it's supposed to ensure that the FAI extrapolates the right stuff.

comment by Benvolio · 2012-07-02T05:07:32.915Z · LW(p) · GW(p)

I'm not sure if this is appropriate but like the original author I am unsure if a CEV is a thing that can be expressed in formal logic even if he brain were fully mapped into a virtual environment. A lot of how we craft our values are based on complex environmental factors that are not easily models. Please read Schall's "Disgust embodied as moral judgement" or J Greene's fMRI Investigation of Emotional Engagement in Moral Judgement. Our values are fluid and Non-Hierarchical . Developing values that have a strict hierarchy , as the OP says can lead to systems which can not change.

comment by Monkeymind · 2012-05-30T17:11:29.254Z · LW(p) · GW(p)

If the evolutionary process results in either convergence, divergence or extinction, and most often results in extinction, what reason(s) do I have to think that this 23rd emerging complex homo will not go the way of extinction also? Are we throwing all our hope towards super intelligence as our salvation?

comment by Ghatanathoah · 2012-05-30T16:05:46.776Z · LW(p) · GW(p)

I have a few more objections I didn't cover in my last comment because I hadn't thoroughly thought them out yet.

Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes.

No, these terminal goals can also involve other people and the state of the world, even if they are evolved. There are several reasons human consciousnesses might have evolved goals that do not involve themselves or their genes. The most obvious one is that an entity that only values itself and its genes only is far less trustworthy than one that values other people as ends in themselves, and hence would have difficult getting other entities to engage in positive sum games with it. Evolving to value other people makes it possible for other people who might prove useful trading partners to trust the agent in question, since they know it won't betray them the instant they have outlived their usefulness.

Another obvious one is kin selection. Evolution metaphorically "wants" us to value our relatives since they share some of our genes. But rather than waste energy developing some complex adaptation to determine how many genes you share with someone, it took a simpler route and just made us value people we grew up around.

And no, the fact that I know my altruism and love for others evolved for game theoretic reasons does not make it any less wonderful and any less morally right.

If they involve our genes, they they are goals that our bodies are pursuing, that we call errors, not goals, when we the conscious agent inside our bodies evaluate them.

Again, it is quite possible for a conscious agent to value things other than itself, but not value the goals of evolution or its genes. There are many errors that our bodies make that occur because they involve our genes, not our real goals. But valuing other people and the future is not one of them, it is an intrinsic part of the makeup of the conscious agent part.

Averaging value systems is worse than choosing one.... The point is that the CEV plan of "averaging together" human values will result in a set of values that is worse (more self-contradictory) than any of the value systems it was derived from.

Alan Carter, who is rapidly becoming my favorite living philosopher, explains here how it is quite possible to have a pluralistic metaethics without being incoherent. His main argument is that as long as you hold values to be incremental rather than absolute, it is possible to trade one off against the other without being incoherent.

comment by private_messaging · 2012-05-27T15:22:09.215Z · LW(p) · GW(p)

The much stronger issue he raised is that it may well be that outside imagination and fiction, there is no monolithic 'intelligence' thing, and the 'benevolent ruler of the earth' software is then more dangerous than e.g. software that uses search and hill climbing to design better microchips, or design cures for diseases, or the like, without being 'intelligent' in the science fictional sense, and while lacking any form of real world volition. The "benevolent ruler of the earth" software would then, also, fail to provide any superior technical solutions to our problems, as this "intelligence" does not bring any important advantage over the algorithms normally used for problem solving.

The chip improver would spit out the blueprints, the cure designer would spit out the projected molecular images and DNA sequences, etc - no oracle crap with the 'utility' of making people understand something, which appears both near-impossible and entirely unnecessary.

Replies from: CuSithBell

↑ comment by CuSithBell · 2012-05-27T15:41:52.352Z · LW(p) · GW(p)

Outside of mystic circles, it is fairly uncontroversial that it is in principle possible to construct out of matter an object capable of general intelligence. Proof is left to the reader.

comment by Monkeymind · 2012-05-24T12:42:00.682Z · LW(p) · GW(p)

Humans have a values hierarchy. Trouble is, most do not even know what it is (or, they are). IOW, for me honesty is one of the most important values to have. Also, sanctity of (and protection of) life is very high on the list. I would lie in a second to save my son's life. Some choice like that are no-brainers, however few people know all the values that they live by, let alone the hierarchy. Often humans only discover what these values are as they find themselves in various situations.

Just wondering... has anyone compiled a list of these values, morals, ethics... and applied them to various real-life situations to study the possible 'choices' an AI has and the potential outcomes with differing hierarchies?

ADDED: Sometime humans know the right thing but choose to do something else. Isn't that because of emotion? If so, what part does emotion play in superintelligence?

comment by FinalState · 2012-05-23T15:31:45.540Z · LW(p) · GW(p)

EDIT: To edit and simplify my thoughts, in order to get a General Intelligence Algorithm Instance to do anything requires masterful manipulation of parameters with full knowledge of generally how it is going to behave as a result. A level of understanding of psychology of all intelligent (and sub-intelligent) behavior. It is not feasible that someone would accidentally program something that would become an evil mastermind. GIA instances could easily be made to behave in a passive manner even when given affordances and output, kind of like a person that was happy to assist in any way possible because they were generally warm or high or something.

You can define the most important elements of human values for a GIA instance, because most of human values are a direct logical consequence of something that cannot be separated from the GIA... IE if general motivation X accidentally drove intelligence (see: Orthogonality Thesis ) and it also drove positive human values, then positive human values would be unavoidable. It is true that the specifics of body and environment drive some specific human values, but those are just side effects of X in that environment and X in different environments only changes so much and in predictable ways.

You can directly implant knowledge/reasoning into a GIA instance. The easiest way to do this is to train one under very controlled circumstances, and then copy the pattern. This reasoning would then condition the GIA instance's interpretation of future input. However, under conditions which directly disprove the value of that reasoning in obtaining X the GIA instance would un-integrate that pattern and reintegrate a new one. This can be influenced with parameter weights.

I suppose this could be a concern regarding the potential generation of an anger instinct. This HEAVILY depends on all the parameters however, and any outputs given to the GIA instance. Also, robots and computers do not have to eat, and have no associated instincts with killing things in order to do so... Nor do they have reproductive instincts...

Replies from: thomblake, TimS

↑ comment by thomblake · 2012-05-23T17:13:55.451Z · LW(p) · GW(p)

It is true that the specifics of body and environment drive some specific human values, but those are just side effects of X in that environment and X in different environments only changes so much and in predictable ways.

When you say "predictable", do you mean in principle or actually predictable?

That is, are you claiming that you can predict what any human values given their environment, and furthermore that the environment can be easily and compactly specified?

Can you give an example?

Replies from: FinalState

↑ comment by FinalState · 2012-05-24T11:53:09.605Z · LW(p) · GW(p)

Mathematically predictable but somewhat intractable without a faster running version of the instance, with the same frequency of input. Or predictable within ranges of some general rule.

Or just generally predictable with the level of understanding afforded to someone capable of making one in the first place, that for instance could describe the cause of just about any human psychological "disorder".

↑ comment by TimS · 2012-05-23T15:56:33.617Z · LW(p) · GW(p)

Name three values all agents must have, and explain why they must have them.

Replies from: FinalState

↑ comment by FinalState · 2012-05-23T16:23:04.009Z · LW(p) · GW(p)

The concept of agent is logically inconsistent with the General Intelligence Algorithm. What you are trying to refer to with Agent/tool etc are just GIA instances with slightly different parameters, inputs, and outputs.

Even if it could be logically extended to the point of "Not even wrong" it would just be a convoluted way of looking at it.

Replies from: TimS

↑ comment by TimS · 2012-05-23T17:02:44.307Z · LW(p) · GW(p)

I'm sorry, I wasn't trying to use terminology to misstate your position.

What are three values that a GIA must have, and why must they have them?

Replies from: FinalState

↑ comment by FinalState · 2012-05-23T20:53:16.175Z · LW(p) · GW(p)

ohhhh... sorry... There is really only one, and everything else is derived from it. Familiarity. Any other values would depend on the input, output and parameters. However familiarity is inconsistent with the act of killing familiar things. The concern comes in when something else causes the instance to lose access to something it is familiar with, and the instance decides it can just force that to not happen.

Replies from: TimS, gwern

↑ comment by TimS · 2012-05-23T21:06:02.896Z · LW(p) · GW(p)

Well, I'm not sure that Familiarity is sufficient to resolve every choice faced by a GIA - for example, how does one derive a reasonable definition of self-defense from Familiarity. But let's leave that aside for a moment.

Why must a GIA subscribe to the value of Familiarity?

Replies from: FinalState

↑ comment by FinalState · 2012-05-23T21:10:40.216Z · LW(p) · GW(p)

Because it is the proxy for survival. You cannot avoid something you by definition cannot have any memory of (nor could have your ancestors)

Self Defense of course requires first fear of loss (aversion to loss is integral, fear and will to stop it is not), awareness of self and then awareness that certain actions could cause loss of self.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2012-05-23T21:27:17.669Z · LW(p) · GW(p)

Because it is the proxy for survival.

I'm not at all sure I understand what you mean. I don't see the connection between familiarity and survival. Moreover, Not all general intelligences will be interested in survival.

Replies from: FinalState

↑ comment by FinalState · 2012-05-23T21:34:17.687Z · LW(p) · GW(p)

Familiar things didn't kill you. No, they are interested in familiarity. I just said that. It is rare but possible for a need for familiarity (as defined mathematically instead of linguistically) to result in sacrifice of a GIA instance's self...

↑ comment by gwern · 2012-05-23T21:12:16.452Z · LW(p) · GW(p)

However familiarity is inconsistent with the act of killing familiar things.

"I Have No Mouth, and I Must Scream".

comment by thomblake · 2012-05-18T13:05:20.583Z · LW(p) · GW(p)

This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving.

This is unhelpfully circular. While it's not logically impossible for us to value values that we don't have, it's surely counterintuitive. What makes future values better?

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-05-19T00:36:03.758Z · LW(p) · GW(p)

I look at the past, and see that the dominant life forms have grown more complex and more interesting, and I expect this trend to continue. The best guide I have to what future life-forms will be like compared to me, if allowed to evolve naturally, is to consider what I am like compared to a fruit fly, or to bacteria.

If you object that of course I will value myself more highly than I value a bacterium, and that I fail to adequately respect bacterial values, I can compare an algae to an oak tree. The algae is more closely-related to me; yet I still consider the oak tree a grander life form, and would rather see a world with algae and oak trees than one with only algae.

(It's also possible that life does not naturally progress indefinitely, but that developing intelligence and societies inevitably leads to collapse and distinction. That would be an argument in favor of FAI, but it's a little farther down the road from where our thoughts are so far, I think.)

If you like, I can say that I value complexity, and then build an FAI that maximizes some complexity measure. That's what I meant when I said that I object less to FAI if you go meta. I know that some people in SIAI give this response, that I am not going meta enough if I'm not happy with FAI; but in their writings and discussions other than when dealing with that particular argument, they don't usually go that meta. Seriously adopting that view would result in discussions of what our high-level values really are, which I have not seen.

My attitude is, The universe was doing amazingly well before I got here; instead of trusting myself to do some incredibly complex philosophical work error-free, I should try to help it keep on doing what it's been doing, and just help it avoid getting trapped in a local maximum. Whereas the entire purpose of FAI is to trap the universe in a local maximum.

Replies from: Wei_Dai, DanArmak

↑ comment by Wei Dai (Wei_Dai) · 2012-05-19T01:32:43.239Z · LW(p) · GW(p)

Would it be fair to say that your philosophy is similar to davidad's? Both of you seem to ultimately value some hard-to-define measure of complexity. He thinks the best way to maximize complexity is to develop technology, whereas you think the best way is to preserve evolution.

I think that evolution will lead to a local maximum of complexity, which we can't "help" it avoid. The reason is that the universe contains many environmental niches that are essentially duplicates of each other, leading to convergent evolution. For example Earth contains lots of species that are similar to each other, and within each species there's huge amounts of redundancy. Evolution creates complexity, but not remotely close to maximum complexity. Imagine if each individual plant/animal had a radically different design, which would be possible if they weren't constrained by "survival of the fittest".

Whereas the entire purpose of FAI is to trap the universe in a local maximum.

Huh? The purpose of FAI is to achieve the global maximum of whatever utility function we give it. If that utility function contains a term for "complexity", which seems plausible given people like you and davidad (and even I'd probably prefer greater complexity to less, all else being equal), then it ought to at least get somewhat close to the global complexity maximum (since the constraint of simultaneously trying to maximize other values doesn't seem too burdensome, unless there are people who actively disvalue complexity).

Replies from: None, PhilGoetz

↑ comment by [deleted] · 2012-05-24T05:47:45.761Z · LW(p) · GW(p)

The reason is that the universe contains many environmental niches that are essentially duplicates of each other, leading to convergent evolution. For example Earth contains lots of species that are similar to each other, and within each species there's huge amounts of redundancy.

There's often a deceptive amount of difference, some of it very fundamental, hiding inside those convergent similarities, and that's because "convergent evolution" is in the eye of the beholder, and mostly restricted to surface-level analogies between some basic functions.

Consider pangolins and echidnas. Pretty much the same, right? Oh sure, one's built on a placental framework and the other a monotreme one, but they've developed the same basic tools: long tongues, powerful digging claws, keratenous spines/sharp plates... not much scope for variance there, at least not of a sort that'd interest a lay person, surely.

Well, actually they're quite different. It's not just that echidnas lay eggs and pangolins birth live young, or that pangolins tend to climb trees and echidnas tend to burrow. Echidnas have more going on upstairs, so to speak -- their brains are about 50% neocortex (compare 30% for a human) and they are notoriously clever. Among people who work with wild populations they're known for being basically impossible to trap, even when appropriate bait can be set up. In at least one case a researcher who'd captured several (you essentially have to grab them when you find them) left them in a cage they couldn't dig out of, only to find in the morning they'd stacked up their water dishes and climbed out the top. There is evidence that they communicate infrasonically in a manner similar to elephants, and they are known to be sensitive to electricity.

My point here isn't "Echidnas are awesome!", my point is that the richness of behavior and intelligence that they display is not mirrored in pangolins, who share the same niche and many convergent adaptations. To a person with no more than a passing familiarity, they'd be hard to distinguish on a functional level since their most obvious, surface-visible traits are very similar and the differences seem minor. If you get an in-depth look at them, they're quite different, and the significance of those "convergent" traits diminishes in the face of much more salient differences between the two groups of animals.

Short version: superficial similarities are very often only that, especially in the world of biology. Often they do have some inferential value, but there are limits on that.

↑ comment by PhilGoetz · 2012-05-20T02:45:53.882Z · LW(p) · GW(p)

Evolution creates complexity, but not remotely close to maximum complexity. Imagine if each individual plant/animal had a radically different design, which would be possible if they weren't constrained by "survival of the fittest".

This is true; but I favor systems that can evolve, because they are evolutionarily stable. Systems that aren't, are likely to be unstable and vulnerable to collapse, and typically have the ethically undesirable property of punishing "virtuous behavior" within that system.

Huh? The purpose of FAI is to achieve the global maximum of whatever utility function we give it.

True. I spoke imprecisely. Life is increasing in complexity, in a meaningful way that is not the same as the negative of entropy, and which I feel comfortable calling "progress" despite Stephen J. Gould's strident imposition of his sociological agenda onto biology. This is the thing I'm talking about maximizing. Whatever utility function an FAI is given, it's only going to involve concepts that we already have, which represent a small fraction of possible concepts; and so it's not going to keep increasing as much in that way.

↑ comment by DanArmak · 2012-05-19T15:25:15.571Z · LW(p) · GW(p)

The best guide I have to what future life-forms will be like compared to me, if allowed to evolve naturally, is to consider what I am like compared to a fruit fly, or to bacteria.

This is true but not relevant. It suggests that future life forms will be much more complex, intelligent, powerful in changing the physical universe on many scales, good at out-competing (or predating on) other species to the point of driving them to extinction. You might also add differences between yourself and flies (and bacteria) like "future life forms will be a lot bigger and longer-lived", or you might consider those incidental because you don't value them as much.

But none of that implies anything about the future life-forms' values, except that they will be selfish to the exclusion of other species which are not useful or beautiful to them, so that old-style humans will be endangered. It doesn't imply anything that would cause me to expect to value these future species more than I value today's nonhuman species, let alone today's humans.

If you object that of course I will value myself more highly than I value a bacterium, and that I fail to adequately respect bacterial values, I can compare an algae to an oak tree. The algae is more closely-related to me; yet I still consider the oak tree a grander life form, and would rather see a world with algae and oak trees than one with only algae.

So you value other life-forms proportionally to how similar they are to you, and an important component of that is some measure of compexity, plus your sense of aesthetics (grandeur). You don't value evolutionary relatedness highly. I feel the same way (I value a cat much more than a bat (edit: or rat)), but so what? I don't see how this logically implies that new lifeforms that will exist in the future, and their new values, are more likely than not to be valued by us (if we live long enough to see them).

It's also possible that life does not naturally progress indefinitely

Life may keep changing indefinitely, barring a total extinction. But that constant change isn't "progress" by any fixed set of values because evolution has no long term goal.

Apart from the nonexistence of humans, who are unique in their intelligence/self-consciousness/tool-use/etc., life on Earth was apparently just as diverse and grand and beautiful hundreds of millions of years ago as it is today. There's been a lot of change, but no progress in terms of complexity before the very quick evolution of humans. If I were to choose between this world, and a world with humans but otherwise the species of 10, 100, or 300 millions of years ago, I don't feel that today's bio-sphere is somehow better. So I don't feel a hypothetical biosphere of 300 million years in the future would likely be better than today's on my existing values. And I don't understand why you do.

If you like, I can say that I value complexity

Do you really value complexity for its own sake? Or do you value it for the sake of the outcomes (such as intelligence) which it helps produce?

If you are offered prosthetic arms that look and feel just like human ones but work much better in many respects, you might accept them or not, but I doubt the ground for your objection would be that the biological version is much more complex.

build an FAI that maximizes some complexity measure.

Could you explain what kind of complexity measure you have in mind?. For instance, info-theoretical complexity (~ entropy) is maximized by a black hole, and is greatly increased just by a good random number generator. Surely that's not what you mean.

Replies from: army1987, PhilGoetz

↑ comment by A1987dM (army1987) · 2012-05-19T15:41:45.188Z · LW(p) · GW(p)

(I value a cat much more than a bat)

Bats are no longer thought to be that closely related to us. In particular, cats and bats are both Laurasiathera, whereas we are Euarchontoglires. On the other hand, mice are Euarchontoglires too.

Apart from the nonexistence of humans, who are unique in their intelligence/self-consciousness/tool-use/etc., life on Earth was apparently just as diverse and grand and beautiful hundreds of millions of years ago as it is today.

You might want to reduce that number by an order of magnitude. See http://en.wikipedia.org/wiki/Timeline_of_evolutionary_history_of_life Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T15:54:38.853Z · LW(p) · GW(p)

Bats are no longer thought to be that closely related to us.

Thanks! I appreciate this updating of my trivial knowledge.

Will change to: I value a cat much more than a rat.

You might want to reduce that number by an order of magnitude.

I meant times as old as, say, 200-300 Mya. The End-Permian extinction sits rather unfortunately right in the middle of that, but I think both before it and after sufficient recovery (say 200 Mya) there was plenty of diversity of beauty around.

No cats, though.

Replies from: army1987

↑ comment by A1987dM (army1987) · 2012-05-19T17:14:18.813Z · LW(p) · GW(p)

Will change to: I value a cat much more than a rat.

Yeah, it hadn't occurred to me to try and preserve the rhyme! :-)

Replies from: DanArmak

↑ comment by DanArmak · 2012-05-19T17:22:17.063Z · LW(p) · GW(p)

Is there a blog or other net news source you'd recommend for learning about changes like "we're no longer closely related to bats, we're really something-something-glires"? They seem to be coming more and more frequently lately.

Replies from: army1987

↑ comment by A1987dM (army1987) · 2012-05-19T21:43:15.944Z · LW(p) · GW(p)

I just browse aimlessly around Wikipedia when I'm bored, and a couple months ago I ended up reading about the taxonomy of pretty much any major vertebrate group. (I've also stumbled upon http://3lbmonkeybrain.blogspot.it/, but it doesn't seem to be updated terribly often these days.)

↑ comment by PhilGoetz · 2012-05-20T02:51:01.170Z · LW(p) · GW(p)

I don't think you're getting what I'm saying. Let me state it in FAI-type terms:

I have already figured out my values precisely enough to implement my own preferred FAI: I want evolution to continue. If we put that value into an FAI, then, okay.

But the lines that people always try to think along are instead to enumerate values like "happiness", "love", "physical pleasure", and so forth.

Building an FAI to maximize values defined at that level of abstraction would be a disaster. Building an FAI to maximize values at the higher level of abstraction would be kind of pointless, since the universe is already doing that anyway, and our FAI is more likely to screw it up than to save it.

Could you explain what kind of complexity measure you have in mind?. For instance, info-theoretical complexity (~ entropy) is maximized by a black hole, and is greatly increased just by a good random number generator. Surely that's not what you mean.

People have dealt with this enough that I don't think you're really objecting that what I'm saying is unclear; you're objecting that I don't have a mathematical definition of it. True. But pointing to evolution as an example suffices to show that I'm talking about something sensible and real. Evolution increases some measure of complexity, and not randomness.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-20T04:35:11.731Z · LW(p) · GW(p)

I have already figured out my values precisely enough to implement my own preferred FAI: I want evolution to continue. If we put that value into an FAI, then, okay.

So, I kind of infer from what you've said elsewhere that you don't equally endorse all possible evolutions equally. That is, when you say "evolution continues" you mean something rather more specific than that... continuing in a particular direction, leading to greater and greater amounts of whatever-it-is-that-evolution-currently-optimizes-for (this "complexity measure" cited above), rather than greater and greater amounts of anything else.

And I kind of infer that the reason you prefer that is because it has historically done better at producing results you endorse than any human-engineered process has or could reasonably be expected to have, and you see no reason to expect that state to change; therefore you expect that for the foreseeable future the process of evolution will continue to produce results that you endorse, or at least that you would endorse, or at the very least that you ought to endorse.

Did I get that right?

Are you actually saying that simpler systems don't ever evolve from more complex ones? Or merely that when that happens, the evolutionary process that led to it isn't the kind of evolutionary process you're endorsing here? Or something else?

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2012-05-22T03:35:27.317Z · LW(p) · GW(p)

So, I kind of infer from what you've said elsewhere that you don't equally endorse all possible evolutions equally. That is, when you say "evolution continues" you mean something rather more specific than that... continuing in a particular direction, leading to greater and greater amounts of whatever-it-is-that-evolution-currently-optimizes-for (this "complexity measure" cited above), rather than greater and greater amounts of anything else.

I don't understand your distinction between "all possible evolutions" and "whatever-it-is-that-evolution-currently-optimizes-for". There are possible courses of evolution that I don't think I would like, such as universes in which intelligence were eliminated. When thinking about how to optimize the future, I think of probability distributions.

And I kind of infer that the reason you prefer that is because it has historically done better at producing results you endorse than any human-engineered process has or could reasonably be expected to have, and you see no reason to expect that state to change; therefore you expect that for the foreseeable future the process of evolution will continue to produce results that you endorse, or at least that you would endorse, or at the very least that you ought to endorse.

Yes! Though I would say, "it has historically done better at producing results I endorse, starting from point X, than any process engineered by organisms existing at point X could reasonably be expected to have."

Are you actually saying that simpler systems don't ever evolve from more complex ones?

No. It happens all the time. The simplest systems, viruses and mycoplasmas, can exist only when embedded in more complex systems - although maybe they don't count as systems for that reason. OTOH, there must have been life forms even simpler at one time, and we see no evidence of them now. For some reason the lower bound on possible life complexity has increased over time - possibly just once, a long time ago.

Or merely that when that happens, the evolutionary process that led to it isn't the kind of evolutionary process you're endorsing here? Or something else?

Two "something else" options are (A) merely widening the distribution, without increasing average complexity, would be more interesting to me, and (B) simple organisms appear to be necessary parts of a complex ecosystem, perhaps like simple components are necessary parts of a complex machine.

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-05-22T03:48:34.600Z · LW(p) · GW(p)

I think I see... so it's not the complexity of individual organisms that you value, necessarily, but rather the overall complexity of the biosphere? That is, if system A grows simpler over time and system B grows more complex, it's not that you value the process that leads to B but not the process that leads to A, but rather that you value the process that leads to (A and B). Yes?

Edit: er, I got my As and Bs reversed. Fixed.

comment by Wei Dai (Wei_Dai) · 2012-05-18T05:57:01.776Z · LW(p) · GW(p)

It sounds like you're worried about humans optimizing the universe according to human values because they are the wrong values. At the same time you seem to be saying that this won't be accomplished by building FAI, because only humans can have human values. Is this correct?

Does is also worry you that humans might (mistakenly) optimize the universe with non-human values that also happen to be wrong? If so, do you have any suggestions about how we might get the universe to be optimized according to the right values?

[Deleting because I didn't notice Phil already answered in another comment.]

comment by [deleted] · 2012-05-18T12:03:13.535Z · LW(p) · GW(p)

Wouldn't a value of freedom be the best bet for AI? If it created a communist society with itself at the head, where people can pool their land resources together to create sub-societies with their own rules, everyone would get to live by their own values. Of course, there would be some universal laws, but they would be minimal: don't harm others (AI included), and don't harm their property. If someone disobeys the rules of a sub-society, they would not be harmed, merely suspended or expelled by the sub-society.

In this way, humans would still be allowed to do whatever they want, and seek out what they think is best for them. The more violent societies might not survive very well in such a world, but they would be a problem in any society. This seems to be the safest AI-ruled society I can think of, because it keeps humans safe from AIs, while also not being that big of a change in one big bang, preventing a lot of the suicide deaths I would expect from other solutions.

Holden's Objection 1: Friendliness is dangerous

Contents

The concept of "human values" cannot be defined in the way that FAI presupposes

Even if human values existed, it would be pointless to preserve them

Enforcing human values would be harmful

431 comments