Posts
Comments
I can't figure out an answer to any of those questions without having a way to decide which utility function is better. This seems to be a problem, because I don't see how it's even possible.
But why does it matter what they think about it for the short time before it happens, compared to the enjoyment of it long after?
So you positively value "eating ice cream" and negatively value "having eaten ice cream" - I can relate. What if the change, instead of making you dislike ice cream and like veggies, made you dislike fitness and enjoy sugar crashes? The only real difference I can see is that the first increases your expected lifespan and so increases the overall utility. They both resolve the conflict and make you happy, though, so aren't they both better than what you have now?
I guess you're right. It's the difference between "what I expect" and "what I want".
As far as I can tell, the only things that keep me from reducing myself to a utilon-busybeaver are a) insufficiently detailed information on the likelihoods of each potential future-me function, and b) an internally inconsistent utility function
What I'm addressing here is b) - my valuation of a universe composed entirely of minds that most-value a universe composed entirely of themselves is path-dependent. My initial reaction is that that universe is very negative on my current function, but I find it hard to believe that it's truly of larger magnitude than {number of minds}*{length of existence of this universe}*{number of utilons per mind}*{my personal utility of another mind's utilon}
Even for a very small positive value for the last (and it's definitely not negative or 0 - I'd need some justification to torture someone to death), the sheer scale of the other values should trivialize my personal preference that the universe include discovery and exploration.
Hm. If people have approximately-equivalent utility functions, does that help them all accomplish their utility better? If so, it makes sense to have none of them value stealing (since having all value stealing could be a problem). In a large enough society, though, the ripple effect of my theft is negligible. That's beside the point, though.
"Avoid death" seems like a pretty good basis for a utility function. I like that.
Fair.
So you, like I, might consider turning the universe into minds that most value a universe filled with themselves?
I'm not saying I can change to liking civil war books. I'm saying if I could choose between A) continuing to like scifi and having fantasy books, or B) liking civil war books and having civil war books, I should choose B, even though I currently value scifi>stats>civil war. By extension, if I could choose A) continuing to value specific complex interactions and having different complex interactions, or B) liking smiley faces and building a smiley-face maximizer I should choose B even though it's counterintuitive. This one is somewhat more plausible, as it seems it'd be easier to build an AI that could change my values to smiley faces and make smiley faces than it would be to build one that works toward my current complicated (and apparently inconsistent) utility function.
I don't think society-damaging actions are "objectively" bad in the way you say. Stealing something might be worse than just having it, due to negative repercussions, but that just changes the relative ordering. Depending on the value of the thing, it might still be higher-ordered than buying it.
You're saying that present-me's utility function counts and no-one else's does (apart from their position in present-me's function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness? That seems reasonable. But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples' functions counts more in my function than any possible thing in the expected lifespan of present-me's utility function.
Say there's a planet, far away from ours, where gravity is fairly low, atmospheric density fairly high, and the ground uniformly dangerous, and the sentient resident species has wings and two feet barely fitted for walking. Suppose, also, that by some amazingly unlikely (as far as I can see) series of evolutionary steps, these people have a strong tendency to highly value walking and negatively value flying.
If you had the ability to change their hardwired values toward transportation (and, for whatever reason, did not have the ability to change their non-neural physiology and the nature of their planet), would it be wrong to do so? If it's wrong, what makes it wrong? Your (or my, because I seem to agree with you) personal negative-valuation of {changing someone else's utility function} is heavily outweighed by the near-constant increase in happiness for generations of these people. If anything, it appears it would be wrong not to make that change. If that's the case, though, the surely it'd be wrong not to build a superintelligence designed to maximise "minds that most-value the universe they perceive", which, while not quite a smiley-face maximizer, still leads to tiling behaviour.
No matter how I go at it reasonably, it seems tiling behaviour isn't necessarily bad. My emotions say it's bad, and Eliezer seems to agree. Does Aumann's Agreement Theorem apply to utility?
If I considered it high-probability that you could make a change and you were claiming you'd make a change that wouldn't be be of highly negative utility to everyone else, I might well prepare for that change. Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change. Why does that make sense, though? Why do other peoples' current utility functions count if mine don't? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don't have to? If an AI programmed to make its programmer happy does so by directly changing the programmer's brain to provide a constant mental state of happiness, why is that a bad thing?
I like this idea, but I would also, it seems, need to consider the (probabilistic) length of time each utility function would last.
That doesn't change your basic point, though, which seems reasonable.
The one question I have is this: In cases where I can choose whether or not to change my utility function - cases where I can choose to an extent the probability of a configuration appearing - couldn't I maximize expected utility by arranging for my most-likely utility function at any given time to match the most-likely universe at that time? It seems that would make life utterly pointless, but I don't have a rational basis for that - it's just a reflexive emotional response to the suggestion.
Without a much more precise way of describing patterns of neuron-fire, I don't think either of us can describe happiness more than we have so far. Having discussed the reactions in-depth, though, I think we can reasonably conclude that, whatever they are, they're not the same, which answers at least part of my initial question.
Thanks!
I believe you to be sincere when you say
I've certainly had the experience of changing my mind about whether X makes the world better, even though observing X continues to make me equally happy -- that is, the experience of having F(Wa+X) - F(Wa) change while H(O(Wa+X)) - H(O((Wa)) stays the same
but I can't imagine experiencing that. If the utility of a function goes down, it seems my happiness from seeing that function must necessarily go down as well. This discrepancy causes me to believe there is a low-level difference between what you consider happiness and what I consider happiness, but I can't explain mine any farther than I already have.
I don't know how else to say it, but I don't feel I'm actually making that assertion. I'm just saying: "By my understanding of hedony=H(x), awareness=O(x), and utility=F(x), I don't see any possible situation where H(W) =/= F(O(W)). If they're indistinguishable, wouldn't it make sense to say they're the same thing?"
Edit: formatting
Well, I'm not sure making the clones anencephalic would make eating them truly neutral. I'd have to examine that more.
The linked situation proposes that the babies are in no way conscious and that all humans are conditioned, such that killing myself will actually result in a fewer number of people happily eating babies.
Refuse the option and turn me into paperclips before I could change it.
Apparently my acceptance that utility-function-changes can be positive is included in my current utility function. How can that be, though? While, according to my current utility function, all previous utility functions were insufficient, surely no future one could map more strongly onto my utility function than itself. Yet I feel that, after all these times, I should be aware that my utility function is not the ideal one...
Except that "ideal utility function" is meaningless! There is no overarching value scale for utility functions. So why do I have the odd idea that a utility function that changes without my understanding of why (a sum of many small experiences) is positive, while a utility function that changes with my understanding (an alien force) is negative?
There has to be an inconsistency here somewhere, but I don't know where. If I treat my future selves like I feel I'm supposed to treat other people, then I negatively-value claiming my utility function over theirs. If person X honestly enjoys steak, I have no basis for claiming my utility function overrides theirs and forcing them to eat sushi. On a large scale, it seems, I maximize for utilons according to each person. Let's see:
If I could give a piece of cake to a person who liked cake or to a person who didn't like cake, I'd give it to the former If I could give a piece of cake to a person who liked cake and was in a position to enjoy it or a person who liked cake but was about to die in the next half-second, I'd give it to the former If I could give a piece of cake to a person who liked cake and had time to enjoy the whole piece or to a person who liked cake but would only enjoy the first two bites before having to run to an important even and leaving the cake behind to go stale, I'd give it to the former If I could (give a piece of cake to a person who didn't like cake) or (change the person to like cake and then give them a piece of cake) I should be able to say "I'd choose the latter" to be consistent, but the anticipation still results in consternation. Similarly, if cake was going to be given and I could change the recipient to like cake or not, I should be able to say "I choose the latter", but that is similarly distressing. If my future self was going to receive a piece of cake and I could change it/me to enjoy cake or not, consistency would dictate that I do so.
It appears, then, that the best thing to do would be to make some set of changes in reality and in utility functions (which, yes, are part of reality) such that everyone most-values exactly what happens. If the paperclip maximizer isn't going to get a universe of paperclips and is instead going to get a universe of smiley faces, my utility function seems to dictate that, regardless of the paperclip maximizer's choice, I change the paperclip maximizer (and everyone else) into a smiley face maximizer. It feels wrong, but that's where I get if I shut up and multiply.
I understand it to mean, roughly, that when comparing hypothetical states of the world Wa and Wb, I perform some computation F(W) on each state such that if F(Wa) > F(Wb), then I consider Wa more valuable than Wb.
That's precisely what I mean.
Another way of saying this is that if OW is the reality that I would perceive in a world W, then my happiness in Wa is F(OWa). It simply cannot be the case, on this view, that I consider a proposed state-change in the world to be an improvement, without also being such that I would be made happier by becoming aware of that state-change actually occurring.
Yes
Further, if I sincerely assert about some state change that I believe it makes the world better, but it makes me less happy, it follows that I'm simply mistaken about my own internal state... either I don't actually believe it makes the world better, or it doesn't actually make me less happy, or both. Did I get that right? Or are you making the stronger claim that I cannot in point of fact ever sincerely assert something like that?
Hm. I'm not sure what you mean by "sincerely", if those are different. I would say if you claimed "X would make the universe better" and also "Being aware of X would make me less happy", one of those statements must be wrong. I think it requires some inconsistency to claim F(Wa+X)>F(Wa) but F(O(Wa+X))F2(O(Wa)), which is relatively common (Pascal's Wager comes to mind).
It was confusing me, yes. I considered hedons exactly equivalent to utilons.
Then you made your excellent case, and now it no longer confuses me. I revised my definition of happiness from "reality matching the utility function" to "my perception of reality matching the utility function" - which it should have been from the beginning, in retrospect.
I'd still like to know if people see happiness as something other than my new definition, but you have helped me from confusion to non-confusion, at least regarding the presence of a distinction, if not the exact nature thereof.
Well, the situation I was referencing assumed baby-eating without the actual sentience at any point of the babies, but that's not relevant to the actual situation. You're saying that my expected future utility functions, in the end, are just more values in my current function?
I can accept that.
The problem now is that I can't tell what those values are. It seems there's a number N large enough that if N people were to be reconfigured to heavily value a situation and the situation was then to be implemented, I'd accept the reconfiguration. This was counterintuitive and, due to habit, feels it should still be, but makes a surprising amount of sense.
Oh! I didn't catch that at all. I apologize.
You've made an excellent case for them not being the same. I agree.
That makes sense. I had only looked at the difference within "things that affect my choices", which is not a full representation of things. Could I reasonably say, then, that hedons are the intersection of "utilons" and "things of which I'm aware", or is there more to it?
Another way of phrasing what I think you're saying: "Utilons are where the utility function intersects with the territory, hedons are where the utility function intersects with the map."
And if I'm "best at" creating dissonance, hindering scientific research, or some other negatively-valued thing? If I should do the thing at which I'm most effective, regardless of how it fits my utility function...
I don't know where that's going. I don't feel that's a positive thing, but that's inherent in the proposition that it doesn't fit my utility function.
I guess I'm trying to say that "wasting my life" has a negative value with a lower absolute value than "persuading humanity to destroy itself" - though oratory is definitely not my best skill, so it's not a perfect example.
If I had some reason (say an impending mental reconfiguration to change my values) to expect my utility function to change soon and stay relatively constant for a comparatively long time after that, what does "maximizing my utility function now" look like? If I were about to be conditioned to highly-value eating babies, should I start a clone farm to make my future selves most happy or should I kill myself in accordance with my current function's negative valuation to that action?
My utility function maximises (and think this is neither entirely nonsensical nor entirely trivial in the context) utilons. I want my future selves to be "happy", which is ill-defined.
I don't know how to say this precisely, but I want as many utilons as possible from as many future selves as possible. The problem arises when it appears that actively changing my future selves' utility functions to match their worlds is the best way to do that, but my current self recoils from the proposition. If I shut up and multiply, I get the opposite result that Eliezer does and I tend to trust his calculations more than my own.
Thanks for pointing that out! The general questions still exist, but the particular situation produces much less anxiety with the knowledge that the two functions have some similarities.
I'm not sure what you're asking, but it seems to be related to constancy.
A paperclip maximizer believes maximum utility is gained through maximum paperclips. I don't expect that to change.
I have at various times believed:
- Belief in (my particular incarnation of) the Christian God had higher value than lack thereof
- Personal emplyment as a neurosurgeon would be preferable to personal employment as, say, a mathematics teacher
- nothing at all was positively valued and the negative value of physical exertion significantly outweighed any other single value
Given the changes so far, I have no reason to believe my utility function won't change in the future. My current utility function values most of my actions under previous functions negatively, meaning that per instantiation (per unit time, per approximate "me", etc.) the result is negative. Surely this isn't optimal?
I would not have considered utilons to have meaning without my ability to compare them in my utility function.
You're saying utilons can be generated without your knowledge, but hedons cannot? Does that mean utilons are a measure of reality's conformance to your utility function, while hedons are your reaction to your perception of reality's conformance to your utility function?
The hedonic scores are identical and, as far as I can tell, the outcomes are identical. The only difference is if I know about the difference - if, for instance, I'm given a choice between the two. At that point, my consideration of 2 has more hedons than my consideration of 1. Is that different from saying 2 has more utilons than 1?
Is the distinction perhaps that hedons are about now while utilons are overall?
Hm. This is true. Perhaps it would be better to say "Perceiving states in opposite-to-conventional order would give us reason to assume probabilities entirely consistent with considering a causality in opposite-to-conventional order."
Unless I'm missing something, the only reason to believe causality goes in the order that places our memory-direction before our non-memory direction is that we base our probabilities on our memory.
Well, Eliezer seems to be claiming in this article that the low-to-high is more valid than the high-to-low, but I don't see how they're anything but both internally consistent
I can only assume it wouldn't accept. A paperclip maximizer, though, has much more reason than I do to assume its utility function would remain constant.
I've read this again (along with the rest of the Sequence up to it) and I think I have a better understanding of what it's claiming. Inverting the axis of causality would require inverting the probabilities, such that an egg reforming is more likely than an egg breaking. It would also imply that our brains contain information on the 'future' and none on the 'past', meaning all our anticipations are about what led to the current state, not where the current state will lead.
All of this is internally consistent, but I see no reason to believe it gives us a "real" direction of causality. As far as I can tell, it just tells us that the direction we calculate our probabilities is the direction we don't know.
Going from a low-entropy universe to a high-entropy universe seems more natural, but only because we calculate our probabilities in the direction of low-to-high entropy. If we based our probabilities on the same evidence perceived the opposite direction, it would be low-to-high that seemed to need universes discarded and high-to-low that seemed natural.
...right?
The basic point of the article seems to be "Not all utilons are (reducible to) hedons", which confuses me from the start. If happiness is not a generic term for "perception of a utilon-positive outcome", what is it? I don't think all utilons can be reduced to hedons, but that's only because I see no difference between the two. I honestly don't comprehend the difference between "State A makes me happier than state B" and "I value state A more than state B". If hedons aren't exactly equivalent to utilons, what are they?
An example might help: I was arguing with a classmate of mine recently. My claim was that every choice he made boiled down to the option which made him happiest. Looking back on it, I meant to say it was the option whose anticipation gave him the most happiness, since making choices based on the result of those choices breaks causality. Anyway, he argued that his choices were not based on happiness. He put forth the example that, while he didn't enjoy his job, he still went because he needed to support his son. My response was that while his reaction to his job as an isolated experience was negative, his happiness from {job + son eating} was more than his happiness from {no job + son starving}.
I thought at the time that we were disagreeing about basic motivations, but this article and its responses have caused me to wonder if, perhaps, I don't use the word 'happiness' in the standard sense.
Giving a hyperbolic thought excercise: If I could choose between all existing minds (except mine, to make the point about relative values) experiencing intense agony for a year and my own death, I think I'd be likely to choose my death. This is not because I expect to experience happiness after death, but because considering the state of the universe in the second scenario brings me more happiness than considering the state of the universe in the first. As far as I can tell, this is exactly what it means to place a higher value on the relative pleasure and continuing functionality of all-but-one mind than on my own continued existence.
To anyone who argues that utilons aren't exactly equivalent to hedons (either that utilons aren't hedons or that utilons are reducible to hedons), please explain to me what you (and my sudden realisation that you exist allows me to realise you seem amazingly common) think happiness is.
I don't see why you need to count the proportional number of Eliezers at all. I'm guessing the reason you expect an ordered future isn't because of the relation of {number of Boltzmann Eliezers}/{number of Earth Eliezers} to 1. It seems to me you expect an orderly future because you (all instances of you and thus all instances of anything that is similar enough to you to be considered 'an Eliezer') have memories of an orderly past. These memories could have sprung into being when you did a moment ago, yes, but that doesn't give you any other valid way to consider things. Claiming you're probabilistically not a Boltzmann Eliezer because you can count the Boltzmann Eliezers assumes you have some sort of valid data in the first place, which means you're already assuming you're not a Boltzmann Eliezer.
You anticipate experiencing the future of Earth Eliezer because it's the only future out of unconsiderably-many that has enough definition for 'anticipation' to have any meaning. If sprouting wings and flying away, not sprouting wings but still flying away, sprouting wings and crashing, and not sprouting wings and teleporting to the moon are all options with no evidence to recommend one over another, what does it even mean to expect one of them? Then add to that a very large number of others - I don't know how many different experiences are possible given a human brain (and there's no reason to assume a Boltzmann brain that perceives itself as you do now necessarily has a human-brain number of experiences) - and you have no meaningful choice but to anticipate Earth Eliezer's future.
Unless I'm missing some important part of your argument, it doesn't seem that an absolute count of Eliezers is necessary. Can't you just assume a future consistent with the memories available to the complex set of thought-threads you call you?
I realise I'm getting to (and thus getting through) this stuff a lot later than most commenters. Having looked, though, I can't find any information on post-interval etiquette or any better place to attempt discussion of the ideas each post/comment produces and, as far as I can tell, the posts are still relevant. If I'm flaunting site policy or something with my various years-late comments, I'm sorry and please let me know so I know to stop.
The issue with polling 3^^^3 people is that once they are all aware of the situation, it's no longer purely (3^^^3 dust specks) vs (50yrs torture). It becomes (3^^^3 dust specks plus 3^^^3 feelings of altruistically having saved a life) vs (50yrs torture). The reason most of the people polled would accept the dust speck is not because their utility of a speck is more than 1/3^^^3 their utility of torture. It's because their utility of (a speck plus feeling like a lifesaver) is more than their utility of (no speck plus feeling like a murderer).
I may misunderstand your meaning of "warm fuzzies", but I find I obtain significant emotional satisfaction from mathematics, music, and my social interactions with certain people. I see no reason to believe that people receive some important thing from the fundamental aspects of religion that cannot be obtained in less detrimental ways.
I acknowledge the legitimacy of demanding I google the phrase before requesting another link and will attempt to increase the frequency with which that's part of my response to such an occasion, but maintain the general usefulness of pointing out a broken link in a post, especially one that's part of a Sequence.
The Jesus Camp link is broken. Does anyone have an alternative? I don't know what Eliezer is referencing there.
The ideal point of a police system (and, by extension, a police officer) is to choose force in such a way as to "minimize the total sum of death".
It appears that you believe that the current police system is nothing like that, while Eliezer seems to believe it is at least somewhat like that. While I don't have sufficient information to form a realistic opinion, it seems to me highly improbable that 95% of police actions are initiations of force or that every police officer chooses every day to minimize total sum of death.
The largest issue here is that Eliezer is focusing on "force chosen to minimize death" and you're focusing on "people in blue uniforms". While both are related to the ideal police system, they are not sufficiently similar to each other for an argument between them to make much sense.
I'm not sure I understand, but are you saying there's a reason to view a progression of configurations in one direction over another? I'd always (or at least for a long time) essentially considered time a series of states (I believe I once defined passage of time as a measurement of change), basically like a more complicated version of, say, the graph of y=ln(x). Inverting the x-axis (taking the mirror image of the graph) would basically give you the same series of points in reverse, but all the basic rules would be maintained - the height above the x-axis would always be the natural log of the x-value. Similarly, inverting the progression of configurations would maintain all physical laws. This seems to me fit all your posts on time up until this one.
This one, though, differs. Are you claiming in this post that one could invert the t-axis (or invert the progression of configurations in the timeless view) and obtain different physical laws (or at least violations of the ones in our given progression)? If so, I see a reason to consider a certain order to things. Otherwise, it seems that, while we can say y=ln(x) is "increasing" or describe a derivative at a point, we're merely describing how the points relate to each other if we order them in increasing x-values, rather than claiming that the value of ln(5) depends somehow on the value of ln(4.98) as opposed to both merely depending on the definition of the function. We can use derivatives to determine the temporally local configurations just as we can use derivatives to approximate x-local function values, but as far as I can tell it is, in the end, a configuration A that happens to define brains that contain some information on another configuration B that defined brains that contained information on some configuration C, so we say C happened, then B, then A, just like in the analogy we have a set of points that has no inherent order so we read it in order of increasing x-values (which we generally place left-to-right) but it's not inherently that - it's just a set of y-values that depend on their respective x-values.
Short version: Are you saying there's a physical reason to order the configurations C->B->A other than that A contains memories of B containing memories of C?