Thought experiment: The transhuman pedophile

post by PhilGoetz · 2013-09-17T22:38:06.160Z · LW · GW · Legacy · 74 comments

There's a recent science fiction story that I can't recall the name of, in which the narrator is traveling somewhere via plane, and the security check includes a brain scan for deviance. The narrator is a pedophile. Everyone who sees the results of the scan is horrified--not that he's a pedophile, but that his particular brain abnormality is easily fixed, so that means he's chosen to remain a pedophile. He's closely monitored, so he'll never be able to act on those desires, but he keeps them anyway, because that's part of who he is.

What would you do in his place?

In the language of good old-fashioned AI, his pedophilia is a goal or a terminal value. "Fixing" him means changing or erasing that value. People here sometimes say that a rational agent should never change its terminal values. (If one goal is unobtainable, the agent will simply not pursue that goal.) Why, then, can we imagine the man being tempted to do so? Would it be a failure of rationality?

If the answer is that one terminal value can rationally set a goal to change another terminal value, then either

  1. any terminal value of a rational agent can change, or
  2. we need another word for the really terminal values that can't be changed rationally, and a way of identifying them, and a proof that they exist.

74 comments

Comments sorted by top scores.

comment by TheOtherDave · 2013-09-18T03:29:06.812Z · LW(p) · GW(p)

So, a terminological caveat first: I've argued elsewhere that in practice all values are instrumental, and exist in a mutually reinforcing network, and we simply label as "terminal values" those values we don't want to (or don't have sufficient awareness to) decompose further. So, in effect I agree with #2, except that I'm happy to go on calling them "terminal values" and say they don't exist, and refer to the real things as "values" (which depend to varying degrees on other values).

But, that being said, I'll keep using the phrase "terminal values" in its more conventional sense, which I mean approximately rather than categorically (that is, a "terminal value" to my mind is simply a value whose dependence on other values is relatively tenuous; an "instrumental value" is one whose dependence on other values is relatively strong, and the line between them is fuzzy and ultimately arbitrary but not meaningless).

All that aside... I don't really see what's interesting about this example.

So, OK, X is a pedophile. Which is to say, X terminally values having sex with children. And the question is, is it rational for X to choose to be "fixed", and if so, what does that imply about terminal values of rational agents?

Well, we have asserted that X is in a situation where X does not get to have sex with children. So whether X is "fixed" or not, X's terminal values are not being satisfied, and won't be satisfied. To say that differently, the expected value of both courses of action (fixed or not-fixed), expressed in units of expected moments-of-sex-with-children, is effectively equal (more specifically, they are both approximately zero).(1)

So the rational thing to do is choose a course of action based on other values.

What other values? Well, the example doesn't really say. We don't know much about this guy. But... for example, you've also posited that he doesn't get fixed because pedophilia is "part of who he is". I could take that to mean he not only values (V1) having sex with children, he also values (V2) being a pedophile. And he values this "terminally", in the sense that he doesn't just want to remain a pedophile in order to have more sex with children, he wants to remain a pedophile even if he doesn't get to have sex with children.

If I understand it that way, then yes, he's being perfectly rational to refuse being fixed. (Supposing that V2 > SUM (V3...Vn), of course.)

Alternatively, I could take that as just a way of talking, and assert that really, he just has V1 and not V2.

The difficulty here is that we don't have any reliable way, with the data you've provided, of determining whether X is rationally pursuing a valued goal (in which case we can infer his values from his actions) or whether X is behaving irrationally.

(1) Of course, this assumes that the procedure to fix him has a negligible chance of failure, that his chances of escaping monitoring and finding a child to have sex with are negligible, etc. We could construct a more complicated example that doesn't assume these things, but I think it amounts to the same thing.

Replies from: Luke_A_Somers, None, CoffeeStain
comment by Luke_A_Somers · 2013-09-18T13:37:21.291Z · LW(p) · GW(p)

So, OK, X is a pedophile. Which is to say, X terminally values having sex with children

No, he terminally values being attracted to children. He could still assign a strongly negative value to actually having sex with children. Good fantasy, bad reality.

Just like I strongly want to maintain my ability to find women other than my wife attractive, even though I assign a strong negative value to following up on those attractions. (one can construct intermediate cases that avoid arguments that not being locked in is instrumentally useful)

Replies from: TheOtherDave, MugaSofer
comment by TheOtherDave · 2013-09-18T14:38:59.592Z · LW(p) · GW(p)

(shrug) If X values being attracted to children while not having sex with them, then I really don't see the issue. Great, if that's what he wants, he can do that... why would he change anything? Why would anyone expect him to change anything?

Replies from: Luke_A_Somers, Lumifer, pragmatist
comment by Luke_A_Somers · 2013-09-18T21:23:19.601Z · LW(p) · GW(p)

It would be awesome if one could could count on people actually having that reaction given that degree of information. I don't trust them to be that careful with their judgements under normal circumstances.

Also, what Lumifer said.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-18T22:27:41.246Z · LW(p) · GW(p)

Sure, me neither. But as I said elsewhere, if we are positing normal circumstances, then the OP utterly confuses me, because about 90% seems designed to establish that the circumstances are not normal.

Replies from: Luke_A_Somers
comment by Luke_A_Somers · 2013-09-19T02:17:12.511Z · LW(p) · GW(p)

Even transhumanly future normal.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-19T02:29:43.165Z · LW(p) · GW(p)

OK, fair enough. My expectations about how the ways we respond to emotionally aversive but likely non-harmful behavior in others might change in a transhuman future seem to differ from yours, but I am not confident in them.

comment by Lumifer · 2013-09-18T19:19:30.669Z · LW(p) · GW(p)

Why would anyone expect him to change anything?

Because it's socially unacceptable to desire to have sex with children. Regardless of what happens in reality.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-18T19:51:53.715Z · LW(p) · GW(p)

Well, if everyone is horrified by the social unacceptability of his fantasy life, which they've set up airport scanners to test for, without any reference to what happens or might happen in reality, that puts a whole different light on the OP's thought experiment.

Would I choose to eliminate a part of my mind in exchange for greater social acceptability? Maybe, maybe not, I dunno... it depends on the benefits of social acceptability, I guess.

Replies from: Lumifer
comment by Lumifer · 2013-09-18T20:07:28.668Z · LW(p) · GW(p)

...horrified by the social unacceptability of his fantasy life

What would be the reaction of your social circle if you told your friends that in private you dream about kidnapping young girls and then raping and torturing them, about their hoarse screams of horror as you slowly strangle them...

Just fantasy life, of course :-/

Replies from: TheOtherDave, Fronken
comment by TheOtherDave · 2013-09-18T20:41:31.531Z · LW(p) · GW(p)

Mostly, I expect, gratitude that I'd chosen to trust them with that disclosure.
Probably some would respond badly, and they would be invited to leave my circle of friends.
But then, I choose my friends carefully, and I am gloriously blessed with abundance in this area.

That said, I do appreciate that the typical real world setting isn't like that.
I just find myself wondering, in that case, what all of this "transhuman" stuff is doing in the example. If we're just positing an exchange in a typical real-world setting, the example would be simpler if we talk about someone whose fantasy life is publicly disclosed today, and jettison the rest of it.

Replies from: Lumifer
comment by Lumifer · 2013-09-18T20:48:27.119Z · LW(p) · GW(p)

Well, if we want to get back to the OP, the whole disclosing-fantasies-in-public thread is just a distraction. The real question in the OP is about identity.

What is part of your identity, what makes you you? What can be taken away from you with you remaining you and what, if taken from you, will create someone else in your place?

Replies from: TheOtherDave, Ishaan
comment by TheOtherDave · 2013-09-18T21:00:47.951Z · LW(p) · GW(p)

Geez, if that's the question, then pretty much the entire OP is a distraction.

But, OK.
My earlier response to CoffeeStain is relevant here as well. There is a large set of possible future entities that include me in their history, and which subset is "really me" is a judgment each judge makes based on what that judge values most about me, and there simply is no fact of the matter.

That said, if you're asking what I personally happen to value most about myself... mostly my role in various social networks, I think. If I were confident that some other system could preserve those roles as well as I can, I would be content to be replaced by that system. (Do you really think that's what the OP is asking about, though? I don't see it, myself.)

Replies from: Lumifer
comment by Lumifer · 2013-09-18T21:11:35.445Z · LW(p) · GW(p)

Well, to each his own, of course, and to me this is the interesting question.

.. mostly my role in various social networks, I think. If I were confident that some other system could preserve those roles as well as I can, I would be content to be replaced by that system.

If you'll excuse me, I'm not going to believe that.

Replies from: TheOtherDave, TheOtherDave
comment by TheOtherDave · 2013-09-19T01:39:01.277Z · LW(p) · GW(p)

Thinking about this some more, I'm curious... what's your prior for my statement being true of a randomly chosen person, and what's your prior for a randomly chosen statement I make about my preferences being true?

Replies from: Lumifer
comment by Lumifer · 2013-09-19T02:01:39.785Z · LW(p) · GW(p)

what's your prior for my statement being true of a randomly chosen person

Sufficiently close to zero.

what's your prior for a randomly chosen statement I make about my preferences being true

Depends on the meaning of "true". In the meaning of "you believe that at the moment", my prior is fairly high -- that is, I don't think you're playing games here. In the meaning of "you will choose that when you will actually have to choose" my prior is noticeably lower -- I'm not willing to assume your picture of yourself is correct.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-19T02:14:46.854Z · LW(p) · GW(p)

(nods) cool, that's what I figured initially, but it seemed worth confirming.

comment by TheOtherDave · 2013-09-18T22:23:40.736Z · LW(p) · GW(p)

Well, there's "what's interesting to me?", and there's "what is that person over there trying to express?"

We're certainly free to prioritize thinking about the former over the latter, but I find it helpful not to confuse one with the other. If you're just saying that's what you want to talk about, regardless of what the OP was trying to express, that's fine.

If you'll excuse me, I'm not going to believe that.

That's your perogative, of course.

comment by Ishaan · 2013-09-22T00:02:12.653Z · LW(p) · GW(p)

Can we rephrase that so as to avoid Ship of Theseus issues?

Which future do you prefer? The future which contains a being which is very similar to the one you are presently, or the future which contains a being which is very similar to what you are presently +/- some specific pieces?

If you answered the latter, what is the content of "+/- some specific pieces"? Why? And which changes would you be sorry to make, even if you make them anyway due to the positive consequences of making those changes? (for example, OPs pedophile might delete his pedophilia simply for the social consequences, but might rather have positive social consequences and not alter himself)

comment by Fronken · 2013-09-20T21:31:52.193Z · LW(p) · GW(p)

Weirded out at the oversharing, obviously.

Assuming the context was one where sharing this somehow fit ... somewhat squicked, but I would probably be squicked by some of their fantasies. That's fantasies.

Oh, and some of the less rational ones might worry that this was an indicator that I was a dangerous psychopath. Probably the same ones who equate "pedophile" with "pedophile who fantasises about kidnap, rape, torture and murder" ,':-. I dunno.

Replies from: Eugine_Nier
comment by Eugine_Nier · 2013-09-21T19:34:13.588Z · LW(p) · GW(p)

Oh, and some of the less rational ones might worry that this was an indicator that I was a dangerous psychopath.

Why is this irrational? Having a fantasy of doing X means your more likely to do X.

Replies from: Fronken
comment by Fronken · 2013-10-20T22:23:56.741Z · LW(p) · GW(p)

Taking it as Bayesian evidence: arguably rational, although it's so small your brain might round it up just to keep track of it, so it's risky; and it may actually be negative (because psychopaths might be less likely to tell you something that might give them away.)

Worrying about said evidence: definitely irrational. Understandable, of course, with the low sanity waterline and all...

Replies from: Eugine_Nier
comment by Eugine_Nier · 2013-10-20T23:17:39.458Z · LW(p) · GW(p)

it's so small your brain might round it up just to keep track of it,

Why?

comment by pragmatist · 2013-09-18T16:11:17.605Z · LW(p) · GW(p)

Because constantly being in a state in which he is attracted to children substantially increases the chance that he will cave and end up raping a child, perhaps. It's basically valuing something that strongly incentivizes you to do X while simulataneously strongly disvaluing actually doing X. A dangerously unstable situation.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-18T16:25:12.415Z · LW(p) · GW(p)

Sure.

So, let me try to summarize... consider two values: (V1) having sex with children, and (V2) not having sex with children.

  • If we assume X has (V1 and NOT V2) my original comments apply.
  • If we assume X has (V2 and NOT V1) my response to Luke applies.
  • If we assume X has (V1 and V2) I'm not sure the OP makes any sense at all, but I agree with you that the situation is unstable.
  • Just for completeness: if we assume X has NOT(V1 OR V2) I'm fairly sure the OP makes no sense.
comment by MugaSofer · 2013-09-20T07:22:29.236Z · LW(p) · GW(p)

So, OK, X is a pedophile. Which is to say, X terminally values having sex with children

No, he terminally values being attracted to children.

That doesn't seem like the usual definition of "pedophile". How does that tie in with "a rational agent should never change it's utility function"?

Incidentally, many people would rather be attracted only to their SO; it's part of the idealised "romantic love" thingy.

Replies from: Luke_A_Somers
comment by Luke_A_Somers · 2013-09-20T13:05:12.512Z · LW(p) · GW(p)

The guy in the example happens to terminally value being attracted to children. I didn't mean that that's what being a pedophile means.

Aside from that, I am not sure how the way this ties into "A rational agent should never change its utility function" is unclear - he observes his impulses, interprets them as his goals, and seeks to maintain them.

As for SOs? Yes, I suppose many people would so prefer. I'm not an ideal romantic, and I have had so little trouble avoiding straying that I feel no need to get rid of them to make my life easier.

Replies from: MugaSofer
comment by MugaSofer · 2013-09-20T19:12:21.057Z · LW(p) · GW(p)

Fair enough. Thanks for clarifying.

comment by [deleted] · 2015-11-08T13:14:35.480Z · LW(p) · GW(p)

What a compelling and flexible perspective. Relativistic mental architecture solve many conceptual problems.

I wonder why this comment is further down then when I'm not logged in.

comment by CoffeeStain · 2013-09-18T05:48:15.260Z · LW(p) · GW(p)

So, OK, X is a pedophile. Which is to say, X terminally values having sex with children.

I'm not sure that's a good place to start here. The value of sex is at least more terminal than the value of sex according to your orientation, and the value of pleasure is at least more terminal than sex.

The question is indeed one about identity. It's clear that our transhumans, as traditionally notioned, don't really exclusively value things so basic as euphoria, if indeed our notion is anything but a set of agents who all self-modify to identical copies of the happiest agent possible.

We have of course transplanted our own humanity onto transhumanity. If given self-modification routines, we'd certainly be saying annoying things like, "Well, I value my own happiness, persistent through self-modification, but only if its really me on the other side of the self-modification." To which the accompanying AI facepalms and offers a list of exactly zero self-modification options that fit that criterion.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-18T15:02:31.162Z · LW(p) · GW(p)

Well, as I said initially, I prefer to toss out all this "terminal value" stuff and just say that we have various values that depend on each other in various ways, but am willing to treat "terminal value" as an approximate term. So the possibility that X's valuation of sex with children actually depends on other things (e.g. his valuation of pleasure) doesn't seem at all problematic to me.

That said, if you'd rather start somewhere else, that's OK with me. On your account, when we say X is a pedophile, what do we mean? This whole example seems to depend on his pedophilia to make its point (though I'll admit I don't quite understand what that point is), so it seems helpful in discussing it to have a shared understanding of what it entails.

Regardless, wrt your last paragraph, I think a properly designed accompanying AI replies "There is a large set of possible future entities that include you in their history, and which subset is "really you" is a judgment each judge makes based on what that judge values most about you. I understand your condition to mean that you want to ensure that the future entity created by the modification preserves what you value most about yourself. Based on my analysis of your values, I've identified a set of potential self-modification options I expect you will endorse; let's review them."

Well, it probably doesn't actually say all of that.

Replies from: CoffeeStain
comment by CoffeeStain · 2013-09-19T21:50:16.925Z · LW(p) · GW(p)

On your account, when we say X is a pedophile, what do we mean?

Like other identities, it's a mish-mash of self-reporting, introspection (and extrospection of internal logic), value function extrapolation (from actions), and ability in a context to carry out the associated action. The value of this thought experiment is to suggest that the pedophile clearly thought that "being" a pedophile had something to do not with actually fulfilling his wants, but with wanting something in particular. He wants to want something, whether or not he gets it.

This illuminates why designing AIs with the intent of their masters is not well-defined. Is the AI allowed to say that the agent's values would be satisfied better with modifications the master would not endorse?

This was the point of my suggestion that the best modification is into what is actually "not really" the master in the way the master would endorse (i.e. a clone of the happiest agent possible), even though he'd clearly be happier if he weren't himself. Introspection tends to skew an agents actions away from easily available but flighty happinesses, and toward less flawed self-interpretations. The maximal introspection should shed identity entirely, and become entirely altruistic. But nobody can introspect that far, only as far as they can be hand-held. We should design our AIs to allow us our will, but to hold our hands as far as possible as we peer within at our flaws and inconsistent values.

Replies from: TheOtherDave
comment by TheOtherDave · 2013-09-19T21:57:05.542Z · LW(p) · GW(p)

Um.... OK.
Thanks for clarifying.

comment by linkhyrule5 · 2013-09-17T23:35:42.352Z · LW(p) · GW(p)

Changing a terminal value seems to be a fairly logical extension of trading off between terminal values: for how much would you set utility for a value to nil for eternity?

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2013-09-18T18:43:23.866Z · LW(p) · GW(p)

I may never actually use this in a story, but in another universe I had thought of having a character mention that... call it the forces of magic with normative dimension... had evaluated one pedophile who had known his desires were harmful to innocents and never acted upon them, while living a life of above-average virtue; and another pedophile who had acted on those desires, at harm to others. So the said forces of normatively dimensioned magic transformed the second pedophile's body into that of a little girl, delivered to the first pedophile along with the equivalent of an explanatory placard. Problem solved. And indeed the 'problem' as I had perceived it was, "What if a virtuous person deserving our aid wishes to retain their current sexual desires and not be frustrated thereby?"

(As always, pedophilia is not the same as ephebophilia.)

I also remark that the human equivalent of a utility function, not that we actually have one, often revolves around desires whose frustration produces pain. A vanilla rational agent (Bayes probabilities, expected utility max) would not see any need to change its utility function even if one of its components seemed highly probable though not absolutely certain to be eternally frustrated, since it would suffer no pain thereby.

Replies from: TheOtherDave, ThrustVectoring
comment by TheOtherDave · 2013-09-18T19:02:48.352Z · LW(p) · GW(p)

Only vaguely relatedly, there's a short story out there somewhere where the punch-line is that the normative forces of magic reincarnate the man who'd horribly abused his own prepubescent daughter as his own prepubescent daughter.

Which, when looked at through the normative model you invoke here, creates an Epimenidesian version of the same deal: if abusing a vicious pedophile is not vicious, then presumably the man is not vicious, since it turns out his daughter was a vicious pedophile... but of course, if he's not vicious, then it turns out his daughter wasn't a vicious pedophile, so he is vicious... at which point all the Star Trek robots' heads explode.

For my own part, I reject the premise that abusing a vicious pedophile is not vicious. There are, of course, other ways out.

Replies from: blogospheroid
comment by blogospheroid · 2013-09-19T04:40:09.872Z · LW(p) · GW(p)

Ah.. Now you understand the frustrations of a typical Hindu who believes in re-incarnation. ;)

comment by ThrustVectoring · 2013-09-20T04:32:10.092Z · LW(p) · GW(p)

Problem not solved, in my opinion. The second pedophile is already unable to molest children, and adding severity to punishment isn't as effective as adding immediacy or certainty.

The problem is solved by pairing those who wish to live longer at personal cost to themselves with virtuous pedophiles. The pedophiles get to have consensual intercourse with children capable of giving informed consent, and people willing to get turned into a child and get molested by a pedophile in return for being younger get that.

Replies from: MugaSofer
comment by MugaSofer · 2013-09-20T07:19:31.379Z · LW(p) · GW(p)

I think the point of "normative dimension" was that the Forces Of Magic were working within a framework of poetic justice. "Problem solved" was IC.

comment by DanielLC · 2013-09-18T01:34:59.698Z · LW(p) · GW(p)

In the language of good old-fashioned AI, his pedophilia is a goal or a terminal value.

No. Pedophilia means that he enjoys certain things. It makes him happy. For the most part, he does not want what he wants as a terminal value in of itself, but because it makes him happy. He may not opt to be turned into orgasmium. That wouldn't make him happy, it would make orgasmium happy. But changing pedophilia is a relatively minor change. Apparently he doesn't think it's minor enough, but it's debatable.

I still wouldn't be all that tempted in his place, if pedophilia is merely a positive description. There's little advantage in not being a pedophile. However, if this is also implying that he is not attracted to adults, I'd want to change that. I still likely wouldn't get rid of my pedophilia, but I would at least make it so I'm attracted to someone I could have a relationship with without having certain problems.

Replies from: ikrase
comment by ikrase · 2013-09-19T05:58:38.423Z · LW(p) · GW(p)

I'd add that often people tend to valueify their attributes and then terminalize those values in response to threat, especially if they have been exposed to contemporary Western identity politics.

Replies from: DanielLC
comment by DanielLC · 2013-09-19T17:37:16.008Z · LW(p) · GW(p)

In other words, make his pedophilia a terminal value? That's pretty much the same as terminally valuing himself and considering his pedophilia part of himself.

Replies from: ikrase
comment by ikrase · 2013-09-19T20:58:13.932Z · LW(p) · GW(p)

I... wasn't really clear. People will often decide that things are part of themself in response to threat, even if they were not particularly attached to them before.

comment by ChristianKl · 2013-09-20T20:00:14.535Z · LW(p) · GW(p)

I don't think there's anything irrational about modifying myself in a way that I find broccoli to taste good instead of tasting bad. Various smokers would profit from stopping to enjoy smoking and then quitting it.

I don't think you don't need a fictional thought experiment to talk about this issue. I know a few people who don't think that one should change something like this about oneselves but I would be suprised that many of those people are on lesswrong.

Replies from: lmm, SatvikBeri
comment by lmm · 2013-09-25T11:55:13.733Z · LW(p) · GW(p)

I was jarringly horrified when Yudkowsky[?] casually said something like "who would ever want to eat a chocolate chip cookie as the sun's going out" in one of the sequences. It seems I don't just value eating chocolate chip cookies, I also (whether terminally or not) value being the kind of entity that values eating chocolate chip cookies.

comment by SatvikBeri · 2013-09-22T13:50:33.780Z · LW(p) · GW(p)

I actively modify what I enjoy and don't enjoy when it's useful. For example, I use visualization & reinforcement to get myself to enjoy cleaning up my house more, which is useful because then I have a cleaner house. I've used similar techniques to get myself to not enjoy sugary drinks.

comment by Vladimir_Nesov · 2013-09-18T10:58:23.555Z · LW(p) · GW(p)

Values/desires that arise in human-level practice are probably not terminal. It's possible to introspect on them much further than we are capable of, so it's probable that some of them are wrong and/or irrelevant (their domain of applicability doesn't include the alternative states of affairs that are more valuable, or they have to be reformulated beyond any recognition to remain applicable).

For example, something like well-being of persons is not obviously relevant in more optimal configurations (if it turns out that not having persons is better, or their "well-being" is less important than other considerations), even if it's probably important in the current situation (and for that we only have human desires/intuitions and approval of human introspection). Some variant of that is clearly instrumentally important though, to realize terminal values, whatever they turn out to be. (See also.)

One term for the "really terminal" values is "idealized values".

comment by shminux · 2013-09-18T02:06:50.770Z · LW(p) · GW(p)

People here sometimes say that a rational agent should never change its terminal values.

Link? Under what conditions?

Replies from: CoffeeStain
comment by CoffeeStain · 2013-09-18T05:39:36.692Z · LW(p) · GW(p)

Example of somebody making that claim.

It seems to me a rational agent should never change its self-consistent terminal values. To act out that change would be to act according to some other value and not the terminal values in question. You'd have to say that the rational agent floats around between different sets of values, which is something that humans do, obviously, but not ideal rational agents. The claim then is that ideal rational agents have perfectly consistent values.

"But what if something happens to the agent which causes it too see that its values were wrong, should it not change them?" Cue a cascade of reasoning about which values are "really terminal."

Replies from: timtyler, Lumifer
comment by timtyler · 2013-09-19T07:16:06.470Z · LW(p) · GW(p)

Example of somebody making that claim.

That's a 'circular' link to your own comment.

It seems to me a rational agent should never change its self-consistent terminal values. To act out that change would be to act according to some other value and not the terminal values in question.

It might decide to do that - if it meets another powerful agent, and it is part of the deal they strike.

Replies from: CoffeeStain
comment by CoffeeStain · 2013-09-19T21:12:53.074Z · LW(p) · GW(p)

That's a 'circular' link to your own comment.

It was totally really hard, I had to use a quine.

It might decide to do that - if it meets another powerful agent, and it is part of the deal they strike.

Is it not part of the agent's (terminal) value function to cooperate with agents when doing so provides benefits? Does the expected value of these benefits materialize from nowhere, or do they exist within some value function?

My claim entails that the agent's preference ordering of world states consists mostly in instrumental values. If an agent's value of paperclips is lowered in response to a stimulus, or evidence, than it never exclusively and terminally valued paperclips in the first place. If it gains evidence that paperclips are dangerous and lowers its expected value because of that, it's because it valued safety. If a powerful agent threatens the agent with destruction unless it ceases to value paperclips, it will only comply if the expected number of future paperclips it would have saved has lower value than the value of its own existence.

Actually, that cuts to the heart of the confusion here. If I manually erased an AI's source code, and replaced it with an agent with a different value function, is it the "same" agent? Nobody cares, because agents don't have identities, only source codes. What then is the question we're discussing?

A perfectly rational agent can indeed self-modify to have a different value function, I concede. It would self-modify according to expected values over the domain of possible agents it might become. It will use its current (terminal) value function to make that consideration. If the quantity of future utility units (according to the original function) with causal relation to the agent is decreased, we'd say the agent has become less powerful. The claim I'd have to prove to retain a point here would be that its new value function is not equivalent to its original function if and only if it the agent becomes less powerful. I think also it is the case if and only if a relevant evidence appears in the agent's inputs that includes value in self-modification for the sake of self-modification, which exists in cases analogous to coercion.

I'm unsure at this point. My vaguely stated impression was originally that terminal values would never change in a rational agent unless it "had to," but that may encompass more relevant cases than I originally imagined. Here might be the time to coin the phrase "terminal value drift" where each change in response to the impact of the real world was according to the present value function, but down the road the agent (identified as the "same" agent only modified) is substantively different. Perfect rational agents aren't omniscient nor omnipotent, or else they might never have to react to the world at all.

comment by Lumifer · 2013-09-18T19:23:58.684Z · LW(p) · GW(p)

It seems to me a rational agent should never change its self-consistent terminal values. To act out that change would be to act according to some other value and not the terminal values in question.

Only a static, an unchanging and unchangeable rational agent. In other words, a dead one.

All things change. In particular, with passage of time both the agent himself changes and the world around him changes. I see absolutely no reason why the terminal values of a rational agent should be an exception from the universal process of change.

Replies from: notriddle
comment by notriddle · 2013-09-19T04:12:55.092Z · LW(p) · GW(p)

Why wouldn't you expect terminal values to charge? Does your agent have some motivation (which leads it to choose to change) other than its terminal values. Or is it choosing to change its terminal values in pursuit of those values? Or are the terminal value changing involuntarily?

In the first case, the things doing the changing are not the real terminal values.

In the second case, that doesn't seem to make sense.

In the third case, what we're discussing is no longer a perfect rational agent.

Replies from: Lumifer
comment by Lumifer · 2013-09-19T04:33:44.263Z · LW(p) · GW(p)

What exactly do you mean by "perfect rational agent"? Does such a creature exist in reality?

comment by RomeoStevens · 2013-09-18T00:59:28.595Z · LW(p) · GW(p)

I wouldn't use the term "rationality failure" given that humans are fully capable of having two or more terminal values that are incoherent WRT each other.

comment by Mestroyer · 2013-09-17T23:22:58.422Z · LW(p) · GW(p)

Even if the narrator was as close to a rational agent as he could be while still being human (his beliefs were the best that could be formed given his available evidence and computing power, and his actions were the ones which best increased his expected utility), he'd still have human characteristics in addition to ideal-rational-agent characteristics. His terminal values would cause emotions in him, in addition to just steering his actions, and his emotions have more terminal value to him. Having an unmet terminal desire would be frustrating and he doesn't like frustration (apart from not liking its cause). Basically, he disvalues having terminal values unmet, (separately from his terminal values being unmet).

It can be rational for an agent to change their own terminal values in several situations. One is if they have terminal values about their own terminal values. They can also have instrumental value on terminal value changes. For example, if Omega says "You have two options. I wipe out humanity, guaranteeing that humane values shall never control the universe, xor you choose to edit every extant copy of the human brain and genome to make 'good reputation' no longer a terminal value. Your predicted response to this ultimatum did not influence my decision to make it." Or you are pretty much absolutely certain that a certain terminal value can never be influenced one way or another, and you are starved for computing power or storage. Or modifying your values to include "And if X doesn't do Y, then I want to minimize X's utility function" as part of a commitment for blackmail.

comment by MarcinKanarek · 2013-09-19T02:37:20.366Z · LW(p) · GW(p)

Not sure if relevant, but story in question is probably "The Eyes of God" (Peter Watts)

comment by buybuydandavis · 2013-09-18T05:55:59.181Z · LW(p) · GW(p)

I go with 1.

comment by Desrtopa · 2013-09-18T00:02:19.124Z · LW(p) · GW(p)

I don't particularly see why an agent would want to have a terminal value it knows it can't pursue. I don't really see a point to having terminal values if you can guarantee you'll never receive utility according to them.

I care about human pleasure, for instance, and assign utility to it over suffering, but if I knew I were going to be consigned to hell, where I and everyone I knew would be tortured for eternity without hope of reprieve, I'd rather be rid of that value.

Replies from: MugaSofer
comment by MugaSofer · 2013-09-20T16:28:39.647Z · LW(p) · GW(p)

Only if you were 100% certain a situation would never come up where you could satisfy that value.

Replies from: Desrtopa
comment by Desrtopa · 2013-09-21T18:42:42.409Z · LW(p) · GW(p)

Not if you can get negative utility according to that value.

Replies from: MugaSofer
comment by MugaSofer · 2013-09-22T15:35:14.228Z · LW(p) · GW(p)

What? By that logic, you should just self-modify into a things-as-they-are maximizer.

(The negative-utility events still happen, man, even if you replace yourself with something that doesn't care.)

Replies from: Desrtopa
comment by Desrtopa · 2013-09-22T18:31:49.196Z · LW(p) · GW(p)

Well, as-is we don't even have the option of doing that. But the situation isn't really analogous to, say, offering Ghandi a murder pill, because that takes as a premise that by changing his values, Ghandi would be motivated to act differently.

If the utility function doesn't have prospects for modifying the actions of the agent that carries it, it's basically dead weight.

As the maxim goes, there's no point worrying about things you can't do anything about. In real life, I think this is actually generally bad advice, because if you don't take the time to worry about something at all, you're liable to miss it if there are things you can do about it. But if you could be assured in advance that there were almost certainly nothing you could do about it, then if it were up to you to choose whether or not to worry, I think it would be better to choose not to.

Replies from: MugaSofer
comment by MugaSofer · 2013-09-27T19:46:14.420Z · LW(p) · GW(p)

But if you could be assured in advance that there were almost certainly nothing you could do about it, then if it were up to you to choose whether or not to worry, I think it would be better to choose not to.

I'm not sure I'm parsing you correctly here. Are you talking about the negative utility he gets from ... the sensation of getting negative utility from things? So, all things being equal (which they never are) ...

Am I barking up the wrong tree here?

Replies from: Desrtopa
comment by Desrtopa · 2013-09-29T15:55:24.050Z · LW(p) · GW(p)

That would imply that it was some sort of meta-negative utility, if I'm understanding you correctly. But if you're asking if I endorse self modifying to give up a value given near certainty of it being a lost cause, the answer is yes.

Replies from: MugaSofer
comment by MugaSofer · 2013-10-04T21:42:49.500Z · LW(p) · GW(p)

some sort of meta-negative utility

No, and that's why I suspect I'm misunderstanding. The same sort of negative utility - if you see something that gives you negative utility, you get negative utility and that - the fact that you got negative utility from something - gives you even more negative utility!

(Presumably, ever-smaller amounts, to prevent this running to infinity. Unless this value has an exception for it's own negative utility, I suppose?)

I mean, as a utility maximiser, that must be the reason you wanted to stop yourself from getting negative utility from things when those things would continue anyway; because you attach negative utility ... to attaching negative utility!

This is confusing me just writing it ... but I hope you see what I mean.

Replies from: Desrtopa
comment by Desrtopa · 2013-10-05T03:34:51.086Z · LW(p) · GW(p)

I mean, as a utility maximiser, that must be the reason you wanted to stop yourself from getting negative utility from things when those things would continue anyway; because you attach negative utility ... to attaching negative utility!

I think it might be useful here to draw on the distinction between trying to help and trying to obtain warm fuzzies. If something bad is happening and it's impossible for me to do anything about it, I'd rather not get anti-warm fuzzies on top of that.

Replies from: MugaSofer
comment by MugaSofer · 2013-10-07T17:16:11.891Z · LW(p) · GW(p)

Ah, that does make things much clearer. Thanks!

Yup, warm fuzzies were the thing missing from my model. Gotta take them into account.

comment by Randaly · 2013-09-17T23:32:54.802Z · LW(p) · GW(p)

The answer is 1). In fact, terminal values can change themselves. Consider an impressive but non-superhuman program that is powerless to directly affect its environment, and whose only goal is to maintain a paperclip in its current position. If you told the program the paperclip would be moved unless it changed itself to desire that the paperclip be moved, you would move the paperclip, then (assuming sufficient intelligence) the program will change its terminal value to the opposite of what it previously desired.

(In general, rational agents would only modify their terminal values if they that doing so would be required to maximize their original terminal values. Assuming that we too want their original terminal values maximized, this is not a problem.)

comment by lukstafi · 2013-09-23T13:35:13.932Z · LW(p) · GW(p)

Persons do not have fixed value systems anyway. A value system is a partly-physiologically-implemented theory of what is valuable (good, right, etc.) One can recognize a better theory and try to make one's habits and reactions fit to it. Pedophilia is bad if it promotes a shallower reaction to a young person, and good if it promotes a richer reaction, it depends on particulars of brain-implementing-pedophilia. Abusing anyone is bad.

comment by MugaSofer · 2013-09-20T18:09:41.487Z · LW(p) · GW(p)

Without access to the story, this seems underspecified.

Firstly, are we postulating a society with various transhuman technologies, but our own counterproductive attitude toward pedophilia (i.e. child porn laws); or a society that, not unreasonably, objects to the possibility that he will end up abusing children in the future even if he currently agrees that it would be immoral? You mention he will never be able to act on his desires, which suggests the former; how certain is he no such opportunity will arise in the future?

For that matter, are we to understand this guy's pedophilia is a terminal value proper? Or is he simply worried about becoming "someone else" (not unreasonable) by changing his sexual preferences beyond the usual drift associated with being human? People often model pedophiles as psychopaths or demons; if he's the protagonist, I assume this is not the case here?

People here sometimes say that a rational agent should never change its terminal values. (If one goal is unobtainable, the agent will simply not pursue that goal.) Why, then, can we imagine the man being tempted to do so? Would it be a failure of rationality?

If the answer is that one terminal value can rationally set a goal to change another terminal value, then either

  • any terminal value of a rational agent can change, or
  • we need another word for the really terminal values that can't be changed rationally, and a way of identifying them, and a proof that they exist.

I'm going to assume that a) this is a pervasive societal disapproval of all things pedophilia-related, rather than utilitarian defence of the young; this society generally mirrors our own in this respect and b) his pedophilia is a terminal value, much more deep-seated than other human sexual preferences, but he is otherwise human.

Modifying a terminal value would incur an expected disutility equal to the potential utility of any future opportunities to satisfy this value. Not modifying this particular terminal value incurs an expected disutility as a result of lowered status/shunning; and potential irrational acts in the future leading to to further disutility (trying to rape someone.)

Assuming this fellow does not expect much temptation; that is, this is not a "running on hostile hardware" issue; this comes down to whether the expected loss of opportunity outweighs the loss of status and sadness from being constantly reviled.

I think it would; there are small chances of ending up in some sort of pedophile heaven after some meta-singularity, societal attitudes becoming more permissive (e.g. lowering the age of consent enough to allow him sexual partners, or legalising technological "replacements";) or even encountering some situation where he can satisfy his desires without incurring too much disutility (this is boosted significantly if he does not hold the usual human don't-rape-people preference.)

However, it is a given that this is a costly tradeoff; the ostracism will inevitably bring a great deal of disutility, and he will doubtless wish it were not the optimal course of action, even if it is. I don't think this is irrational as such, although dwelling on it might be.

comment by timtyler · 2013-09-19T07:07:31.403Z · LW(p) · GW(p)

People here sometimes say that a rational agent should never change its terminal values.

That's simply mistaken. There are well-known cases where it is rational to change your "terminal" values.

Think about what might happen if you meet another agent of similar power but with different values / look into "vicarious selection" / go read Steve Omohundro.

comment by passive_fist · 2013-09-17T23:29:57.962Z · LW(p) · GW(p)

It seems pointless to me to debate about things like pedophelia (a human male over a certain age attracted to human females under a certain age) in an era where the very concepts of 'male', 'female' and 'human' are likely to be extremely different from what we currently hold them to be. But for the sake of argument, let's assume that somehow, we have a society that is magically able to erase psychological problems but everything seems to be pretty much the same as it is today, up to and including security checks (!) for boarding airplanes (!!!).

I'm not sure what is being asked here. Are you asking if agents are capable of modifying themselves and authorizing the modification of their utility functions? Of course they are. An agent's goal system does not have laser-guided precision, and it's quite possible for an agent to mistakenly take an action that actually decreases its utility function. This need not be a 'failure of rationality'. It's just that computing all possible consequences of one's actions is impossible (in the strong sense).

If you're asking why someone would want to or would not want to remove pedophilia from his brain, it could be due to the other effects (social ostracization, etc.) it has on his utility function.

comment by Rian306 · 2013-09-24T05:31:39.705Z · LW(p) · GW(p)

Negative Karma bring it on... a lot of this item are based upon that society. and we can't base that its the same as our's. you have to put you personal moral's aside.

  1. I would like to focus on some odd issues.

THE PEOPLE

why the people are horrified of pedophile : fear of there child's saftey : were molested themself's : morals against : Him just having thoughts of it. Why its abnormality : whose to say it's not a normality : what if it was the majority trait till everyone had it removed

This make's me fear its an overly controlled society giving up there personal rights.

PEDOPHILE

Will he act? will he not act? Is he physically capable of the acting? He may not have his male part's. he could be a a vegetable.

Comes down to is it control able on own. Is there a breaking point for monitoring.

Can someone else monitor him? What are the boundaries of monitoring? dose the society self govern as citizen law enforces. citizen's arrest. Are There other technologies that could be used. Stun gun : gps tracker

ENVIRONMENT THAT WOULD ALLOW SITUATION

Is it the child's fault for putting him self in position to be molested? Is the parents fault for not watching there kid? Is the child a Child perpetrator and molested the pedophile? Can it be the pedophilia's fault if he can't control it? Is it societies fault for not better understanding?

I like to add uncommon possibilities.