Posts

[Link] Quantum theory as the most robust description of reproducible experiments 2014-05-08T11:18:52.888Z
Logical Uncertainty as Probability 2012-04-29T22:26:35.078Z
Against the Bottom Line 2012-04-21T10:20:04.861Z
Difference between CDT and ADT/UDT as constant programs 2012-03-19T19:41:30.982Z
Anthropic Reasoning by CDT in Newcomb's Problem 2012-03-14T00:44:47.242Z

Comments

Comment by gRR on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapters 105-107 · 2015-02-18T06:08:09.848Z · LW · GW

I am confused about how Philosopher's stone could help with reviving Hermione. Does QQ mean to permanently transfigure her dead body into a living Hermione? But then, would it not mean that Harry could do it now, albeit temporarily? And, he wouldn't even need a body. He could then just temporarily transfigure any object into a living Hermione. Also, now that I think of it, he could transfigure himself a Feynman and a couple of Einsteins...

Comment by gRR on Let's reimplement EURISKO! · 2014-05-10T10:53:46.641Z · LW · GW

The AI can be adapted for other, less restricted, domains

That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.

The AI adds code that will evolve into another AI into it's output

The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.

The AI could self-modify incorrectly and result in unfriendly AI

Any correct or incorrect self-modification is still restricted to the math domain, and so cannot result in an unsafe AI.

bug in the environment itself

Guarding against software bugs is easy in this case. You design an abstract virtual machine environment for the AI, then design the software that implements this environment, then formally prove that the software is correct. Guarding against errors caused by cosmic rays is also easy. You estimate the probability of such an error, and then add redundancy until the probability is so low that it won't happen until the end of the universe.

Look up how difficult it is to sandbox untrusted code

Sandboxing untrusted code is easy. The difficult thing is sandboxing it while making it think that it runs normally. This is irrelevant here.

I don't believe that a system can work only in formal proofs

It is quite probable that a "pure math Oracle" system cannot work. The point was, it can be made safe to try.

Comment by gRR on [Link] Quantum theory as the most robust description of reproducible experiments · 2014-05-08T21:16:30.796Z · LW · GW

Well, I liked the paper, but I'm not knowledgeable enough to judge its true merits. It deals heavily with Bayesian-related questions, somewhat in Jayne's style, so I thought it could be relevant to this forum.

At least one of the authors is a well-known theoretical physicist with an awe-inspiring Hirsch factor, so presumably the paper would not be trivially worthless. I think it merits a more careful read.

Comment by gRR on Harry Potter and the Methods of Rationality discussion thread, part 25, chapter 96 · 2013-07-26T19:58:23.144Z · LW · GW

Regarding the "he's here... he is the end of the world" prophecy, in view of the recent events, it seems like it can become literally true without it being a bad thing. After all, it does not specify a time frame. So Harry may become immortal and then tear apart the very stars in heaven, some time during a long career.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-26T20:25:46.107Z · LW · GW

You're treating resources as one single kind, where really there are many kinds with possible trades between teams

I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.

But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying.

We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a single combined team, etc... This may be an interesting problem to solve, analytically or by computer modeling.

You still haven't given good evidence for holding this position regarding the relation between the different Uxxx utilities.

You're right. Initially, I thought that the actual values of Uxxx-s will not be important for the decision, as long as their relative preference order is as stated. But this turned out to be incorrect. There are regions of cooperation and defection.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-26T16:53:13.230Z · LW · GW

I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-26T16:04:05.069Z · LW · GW

Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?

These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'.

Well it does need hardcoding: you need to tell the CEV to exclude people whose EVs are too similar to their current values despite learning contrary facts. Or even all those whose belief-updating process differs too much from perfect Bayesian (and how much is too much?) This is something you'd hardcode in, because you could also write ("hardcode") a CEV that does include them, allowing them to keep the EVs close to their current values.

Hmm, we are starting to argue about exact details of extrapolation process...

There are many possible and plausible outcomes besides "everybody loses".

Lets formalize the problem. Let F(R, Ropp) be the probability of a team successfully building a FAI first, given R resources, and having opposition with Ropp resources. Let Uself, Ueverybody, and Uother be the rewards for being first in building FAI, FAI, and FAI, respectively. Naturally, F is monotonically increasing in R and decreasing in Ropp, and Uother < Ueverybody < Uself.

Assume there are just two teams, with resources R1 and R2, and each can perform one of two actions: "cooperate" or "defect". Let's compute the expected utilities for the first team:

We cooperate, opponent team cooperates:  
   EU("CC") = Ueverybody * F(R1+R2, 0)  
We cooperate, opponent team defects:  
   EU("CD") = Ueverybody * F(R1, R2) + Uother * F(R2, R1)  
We defect, opponent team cooperates:  
   EU("DC") = Uself * F(R1, R2) + Ueverybody * F(R2, R1)  
We defect, opponent team defects:  
   EU("DD") = Uself * F(R1, R2) + Uother * F(R2, R1)

Then, EU("CD") < EU("DD") < EU("DC"), which gives us most of the structure of a PD problem. The rest, however, depends on the finer details. Let A = F(R1,R2)/F(R1+R2,0) and B = F(R2,R1)/F(R1+R2,0). Then:

  1. If Ueverybody <= Uself*A + Uother*B, then EU("CC") < EU("DD"), and there is no point in cooperating. This is your position: Ueverybody is much less than Uself, or Uother is not much less than Ueverybody, and/or your team has so much more resources than the other.

  2. If Uself*A + Uother*B < Ueverybody < Uself*A/(1-B), this is a true Prisoner's dilemma.

  3. If Ueverybody >= Uself*A/(1-B), then EU("CC") >= EU("DC"), and "cooperate" is the obviously correct decision. This is my position: Ueverybody is not much less than Uself, and/or the teams are more evenly matched.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-26T03:15:22.311Z · LW · GW

A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources

I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.

The question of definition, who is to be included in the CEV? or - who is considered sane?

This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus, they are automatically excluded.

TDT applies where agents are "similar enough". I doubt I am similar enough to e.g. the people you labelled insane.

We are talking about people building FAI-s. Surely they are intelligent enough to notice the symmetry between themselves. If you say that logic and rationality makes you decide to 'defect' (=try to build FAI on your own, bomb everyone else), then logic and rationality would make everyone decide to defect. So everybody bombs everybody else, no FAI gets built, everybody loses. Instead you can 'cooperate' (=precommit to build FAI<everybody's CEV> and to bomb everyone that did not make the same precommitment). This gets us a single global alliance.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-24T21:51:55.346Z · LW · GW

The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest.

The resources are not scarce, yet the CEV-s want to kill? Why?

I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them.

It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.

If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?

People that would still have false factual beliefs no matter how much evidence and how much intelligence they have? They would be incurably insane. Yes, I would agree to ignore their volition, no matter how many they are.

The PD reasoning to cooperate only applies in case of iterated PD

Err. What about arguments of Douglas Hofstadter and EY, and decision theories like TDT?

Unlike PD, the payoffs are different between players, and players are not sure of each other's payoffs in each scenario

This doesn't really matter for a broad range of possible payoff matrices.

join research alliance 1, learn its research secrets, then defect and sell the secrets to alliance 2

Cooperating in this game would mean there is exactly one global research alliance. A cooperating move is a precommitment to abide by its rules. Enforcing such precommitment is a separate problem. Let's assume it's solved.

I'm not convinced by this that it's an easier problem to solve than that of building AGI or FAI or CEV.

Maybe you're right. But IMHO it's a less interesting problem :)

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-24T19:35:19.265Z · LW · GW

So you're OK with the FAI not interfering if they want to kill them for the "right" reasons?

I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering.

"if we kill them, we will benefit by dividing their resources among ourselves"

If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killing, then again, I prefer humans to decide who gets them.

So you're saying your version of CEV will forcibly update everyone's beliefs

No. CEV does not updates anyone's beliefs. It is calculated by extrapolating values in the presence of full knowledge and sufficient intelligence.

If the original person effectively assigns 0 or 1 "non-updateable probability" to some belief, or honestly doesn't believe in objective reality, or believes in "subjective truth" of some kind, CEV is not necessarily going to "cure" them of it - especially not by force.

As I said elsewhere, if a person's beliefs are THAT incompatible with truth, I'm ok with ignoring their volition. Note, that their CEV is undefined in this case. But I don't believe there exist such people (excluding totally insane).

That there exists a possible compromise that is better than total defeat doesn't mean total victory wouldn't be much better than any compromise.

But the total loss would be correspondingly worse. PD reasoning says you should cooperate (assuming cooperation is precommittable).

If you think so you must have evidence relating to how to actually solve this problem. Otherwise they'd both look equally mysterious. So, what's your idea?

Off the top of my head, adoption of total transparency for everybody of all governmental and military matters.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-24T16:25:25.681Z · LW · GW

If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere

The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves.

But you said it would only do things that are approved by a strong human consensus.

Strong consensus of their CEV-s.

Maybe their extrapolated volition simply doesn't value objective truth highly (because they today don't believe in the concept of objective truth, or believe that it contradicts everyday experience)

Extrapolated volition is based on objective truth, by definition.

If I don't like the values I might say, thank-you for warning me, now I shall be doubly careful not to evolve into that kind of creature!

The process of extrapolation takes this into account.

I think you may be missing a symbol there? If not, I can't parse it...

Sorry, bad formatting. I meant four independent clauses: each of the agents does not endorse CEV, but endorses CEV.

How can a state or military precommit to not having a supersecret project to develop a private AGI?

That's a separate problem. I think it is easier to solve than extrapolating volition or building AI.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-24T14:00:21.390Z · LW · GW

Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?

If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?

The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs

Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantle the new one (the original is still the first and the strongest).

you can't even assume they'll have a nontrivial CEV at all, let alone that it will "fix" values you happen to disagree with.

But I can be sure that CEV fixes values that are based on false factual beliefs - this is a part of the definition of CEV.

I have no idea what your FAI will do

But you can be sure that it is something about which you (and everybody) would agree, either directly or if you were more intelligent and knew more.

there are no objectively distinguished morals

But there may be a partial ordering between morals, such that X<Y iff all "interfering" actions (whatever this means) that are allowed by X are also allowed by Y. Then if A1 and A2 are two agents, we may easily have:

~Endorses(A1, CEV) ~Endorses(A2, CEV) Endorses(A1, CEV)
Endorses(A2, CEV)

[assuming Endorses(A, X) implies FAI does not perform any non-interfering action disagreeable for A]

if and when nation-states and militaries realize AGI is a real-world threat, they will go to war with each trying to prevent anyone else from building an AGI first. It's the ultimate winner-take-all arms race.
This is going to happen, it might be happening already if enough politicians and generals had the beliefs of Eliezer about AGI, and it will happen (or not) regardless of anyone's attempts to build any kind of Friendliness theory.

Well, don't you think this is just ridiculous? Does it look like the most rational behavior? Wouldn't it be better for everybody to cooperate in this Prisoner's Dilemma, and do it with a creditable precommitment?

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-24T10:37:21.874Z · LW · GW

A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better.

No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.

There are people who believe religiously that End Times must come

Yes, but we assume they are factually wrong, and so their CEV would fix this.

A FAI team that precommitted to implementing CEV would definitely get the most funds. Even a team that precommitted to CEV might get more funds than CEV, because people like myself would reason that the team's values are closer to my own than humanity's average, plus they have a better chance of actually agreeing on more things.

Not bloody likely. I'm going to oppose your team, discourage your funders, and bomb your headquarters - because we have different moral opinions, right here, and if the differences turn out to be fundamental, and you build your FAI, then parts of my value will be forever unfulfilled.

You, on the other hand, may safely support my team, because you can be sure to like whatever my FAI will do, and regarding the rest, it won't interfere.

Comment by gRR on Problematic Problems for TDT · 2012-05-23T19:01:14.263Z · LW · GW

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-22T23:52:19.437Z · LW · GW

I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-22T21:28:35.687Z · LW · GW

Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:

There is a unique X, such that for all living people P, CEV

= X.

Assuming there is no such X, there could still be a plausible claim:

Y is not empty, where Y = Intersection{over all living people P} of CEV

.

And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever "evolving" will happen due to AI's influence is at least agreed upon by everyone('s CEV).

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-22T20:54:24.219Z · LW · GW

Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting

I meant that "it's wrong/bad for the AI to promote extrapolated values while the actual values are different and conflicting" will probably be a part of the extrapolated values, and the AI would act accordingly, if it can.

My position is that the AI must be guided by the humans' actual present values in choosing to steer human (social) evolution towards or away from possible future values. This has lots of downsides, but what better option is there?

The problem with the actual present values (beside the fact that we cannot define them yet, no more than we can define their CEV) is that they are certain to not be universal. We can be pretty sure that someone can be found to disagree with any particular proposition. Whereas, for CEV, we can at least hope that a unique reflectively-consistent set of values exists. If it does and we succeed to define it, then we're home and dry. Meanwhile, we can think of contingency plans about what to do if it does not or we don't, but the uncertainty about whether the goal is achievable does not mean that the goal itself is wrong.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-22T16:50:06.564Z · LW · GW

Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something.

This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-22T16:20:45.359Z · LW · GW

No, the "actual" values would tell it to give the humans the blue boxes they want, already.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-22T15:56:49.568Z · LW · GW

the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person

It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].

Comment by gRR on How likely the AI that knows it's evil? Or: is a human-level understanding of human wants enough? · 2012-05-21T12:29:00.349Z · LW · GW

If it extrapolates coherently, then it's a single concept, otherwise it's a mixture :)

This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word "sound" appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of "sound", and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.

Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.

Comment by gRR on How likely the AI that knows it's evil? Or: is a human-level understanding of human wants enough? · 2012-05-21T10:27:53.989Z · LW · GW

Does is rely on true meanings of words, particularly? Why not on concepts? Individually, "vibrations of air" and "auditory experiences" can be coherent.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-20T19:43:56.360Z · LW · GW

I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-20T17:56:51.412Z · LW · GW

Why is it important that it be uncontroversial?

I'm not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.

Ok, you're right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where "sufficiently" is not a too high threshold?

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-20T17:25:19.667Z · LW · GW

What I'm trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:

  • "Unanimous" CEV exists
  • And is unique
  • And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
  • And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),

would you say that running it is uncontroversial? If not, what other conditions are required?

Comment by gRR on Oh, mainstream philosophy. · 2012-05-20T10:45:43.834Z · LW · GW

I value the universe with my friend in it more than one without her.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-20T10:44:33.126Z · LW · GW

Ok, but do you grant that running a FAI with "unanimous CEV" is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing - if I'm wrong about my hypothesis?

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-20T01:50:24.635Z · LW · GW

People are happy, by definition, if their actual values are fulfilled

Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?

Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T22:40:15.792Z · LW · GW

VHEMT supports human extinction primarily because, in the group's view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.

Obviously, human extinction is not their terminal value.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T22:33:10.204Z · LW · GW

I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

Comment by gRR on Oh, mainstream philosophy. · 2012-05-19T22:27:25.533Z · LW · GW

But he assumes that it is worse for me because it is bad for my friend to have died. Whereas, in fact, it is worse for me directly.

Comment by gRR on Oh, mainstream philosophy. · 2012-05-19T21:42:27.439Z · LW · GW

People sometimes respond that death isn't bad for the person who is dead. Death is bad for the survivors. But I don't think that can be central to what's bad about death. Compare two stories.
Story 1. Your friend is about to go on the spaceship that is leaving for 100 Earth years to explore a distant solar system. By the time the spaceship comes back, you will be long dead. Worse still, 20 minutes after the ship takes off, all radio contact between the Earth and the ship will be lost until its return. You're losing all contact with your closest friend.
Story 2. The spaceship takes off, and then 25 minutes into the flight, it explodes and everybody on board is killed instantly.
Story 2 is worse. But why? It can't be the separation, because we had that in Story 1. What's worse is that your friend has died. Admittedly, that is worse for you, too, since you care about your friend. But that upsets you because it is bad for her to have died.

Actually, I think the universe is better for me with my friend being alive in it, even if I won't ever see her. My utility function is defined over the world states, not over my sensory inputs.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T21:23:02.481Z · LW · GW

For extrapolation to be conceptually plausible, I imagine "knowledge" and "intelligence level" to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable.

Yes, many religious people wouldn't want their beliefs erased, but only because they believe them to be true. They wouldn't oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved if it was known that true beliefs were better in all respects, including individual happiness.

Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things...

Yes, I agree with this. But, I believe there exist wishes universal for (extrapolated) humans, among which I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T19:36:06.682Z · LW · GW

Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV

Then we can label paperclipping as a "true" value too. However, I still prefer true human values to be maximized, not true clippy values.

Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy.

As I said before, if someone's mind is that incompatible with truth, I'm ok with ignoring their preferences in the actual world. They can be made happy in a simulation, or wireheaded, or whatever the combined other people's CEV thinks best.

So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values?

No, I'm saying, the extrapolated values would probably estimate the optimal speed for their own optimization. You're right, though, it is all speculations, and the burden of proof is on me. Or on whoever will actually define CEV.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T19:01:35.564Z · LW · GW

What makes you give them such a label as "true"?

They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.

In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized.

But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time. And I think CEV would try to synchronize this with the timing of its optimization process.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T18:26:39.801Z · LW · GW

why extrapolate values at all

Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations.

they will suffer in the CEV future

This does not follow.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T18:11:09.841Z · LW · GW

Errr. This is a question of simple fact, which is either true or false. I believe it's true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:56:37.497Z · LW · GW

Dunno... propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don't expect this to happen.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:51:26.014Z · LW · GW

No, because "does CEV fulfill....?" is not a well-defined or fully specified question. But I think, if you asked "whether it is possible to build FAI+CEV in such a way that it fulfills the wish(es) of literally everyone while affecting everything else the least", they would say they do not know.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:43:53.466Z · LW · GW

I'd think someone's playing a practical joke on me.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:41:32.002Z · LW · GW

Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks.

Too bad. Let's just agree to disagree then, until the brain scanning technology is sufficiently advanced.

I've pointed out people who don't wish for the examples you gave

So far, I didn't see a convincing example of a person who truly wished for everyone to die, even in extrapolation.

Otherwise the false current beliefs will keep on being very relevant to them

To them, yes, but not to their CEV.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:34:59.490Z · LW · GW

You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values

Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:28:43.480Z · LW · GW

In all of their behavior throughout their lives, and in their own words today, they honestly have this value

This is the conditional that I believe is false when I say "they are probably lying, trolling, joking". I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.

Comment by gRR on How can we ensure that a Friendly AI team will be sane enough? · 2012-05-19T17:19:56.155Z · LW · GW

Well, assuming EY's view of intelligence, the "cautionary position" is likely to be a mathematical statement. And then why not prove it? Given several decades? That's a lot of time.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:13:45.503Z · LW · GW

Even if they do, it will be the best possible thing for them, according to their own (extrapolated) values.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:10:58.591Z · LW · GW

we anticipate there will be no extrapolated wishes that literally everyone agrees on

Well, now you know there exist people who believe that there are some universally acceptable wishes. Let's do the Aumann update :)

Lots of people religiously believe...

False beliefs => irrelevant after extrapolation.

Some others believe that life in this world is suffering, negative utility, and ought to be stopped for its own sake (stopping the cycle of rebirth)

False beliefs (rebirth, existence of nirvana state) => irrelevant after extrapolation.

Comment by gRR on How can we ensure that a Friendly AI team will be sane enough? · 2012-05-19T17:04:05.934Z · LW · GW

My conditional was "cautionary position is the correct one". I meant, provably correct.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T17:01:39.288Z · LW · GW

How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?

Comment by gRR on How can we ensure that a Friendly AI team will be sane enough? · 2012-05-19T16:42:54.387Z · LW · GW

What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even "rule the world forever and reshape it in your image" territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?

Remember, we're describing the situation where the cautionary position is provably correct. So your "greatest temptation ever" is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.

Comment by gRR on Holden's Objection 1: Friendliness is dangerous · 2012-05-19T16:36:56.563Z · LW · GW

I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.