Your Utility Function is Your Utility Function

post by David Udell · 2022-05-06T07:15:24.439Z · LW · GW · 17 comments

Contents

17 comments

Spoilers for mad investor chaos and the woman of asmodeus (planecrash Book 1).

The Watcher spoke on, then, about how most people have selfish and unselfish parts - not selfish and unselfish components in their utility function, but parts of themselves in some less Law-aspiring way than that.  Something with a utility function, if it values an apple 1% more than an orange, if offered a million apple-or-orange choices, will choose a million apples and zero oranges.  The division within most people into selfish and unselfish components is not like that, you cannot feed it all with unselfish choices whatever the ratio.  Not unless you are a Keeper, maybe, who has made yourself sharper and more coherent; or maybe not even then, who knows?  For (it was said in another place) it is hazardous to non-Keepers to know too much about exactly how Keepers think.

It is dangerous to believe, said the Watcher, that you get extra virtue points the more that you let your altruistic part hammer down the selfish part.  If you were older, said the Watcher, if you were more able to dissect thoughts into their parts and catalogue their effects, you would have noticed at once how this whole parable of the drowning child, was set to crush down the selfish part of you, to make it look like you would be invalid and shameful and harmful-to-others if the selfish part of you won, because, you're meant to think, people don't need expensive clothing - although somebody who's spent a lot on expensive clothing clearly has some use for it or some part of themselves that desires it quite strongly.

--Eliezer Yudkowsky, planecrash

I've been thinking a lot lately about exactly how altruistic I am. The truth is that I'm not sure: I care a lot about not dying, and about my girlfriend and family and friends not dying, and about all of humanity not dying, and about all life on this planet not dying too. And I care about the glorious transhuman future and all that, and the  (or whatever) possible good future lives hanging in the balance.

And I care about some of these things disproportionately to their apparent moral magnitude. But, what I care about is what I care about. Rationality is the art of getting more of what you want, whatever that is; of systematized winning, by your own lights. You will totally fail in that art if you bulldoze your values in a desperate effort to fit in, or to be a "good" person, in the way your model of society seems to ask you to. What you ought to do instead is protect your brain's balance of undigested value-judgements: be corrigible to the person you will eventually, on reflection, grow up to be. Don't rush to lock in any bad, "good"-sounding values now; you are allowed to think for yourself [LW · GW] and discover what you stably value.

It is not the Way to do what is "right," or even to do what is "right" instrumentally effectively. The Way is to get more of what you want and endorse on reflection, whatever that ultimately is, through instrumental efficacy. If you want that, you'll have to protect the kernel encoding those still-inchoate values, in order to ever-so-slowly tease out what those values are. How you feel is your only guide to what matters. [LW · GW] Eventually, everything you care about could be generated from that wellspring. [LW · GW]

17 comments

Comments sorted by top scores.

comment by Quintin Pope (quintin-pope) · 2022-05-06T09:24:37.801Z · LW(p) · GW(p)

Well put. If you really do consist of different parts, each wanting different things, then your values should derive from a multi agent consensus among your parts, not just an argmax over the values of the different parts.

In other words, this:

Something with a utility function, if it values an apple 1% more than an orange, if offered a million apple-or-orange choices, will choose a million apples and zero oranges. The division within most people into selfish and unselfish components is not like that, you cannot feed it all with unselfish choices whatever the ratio. Not unless you are a Keeper, maybe, who has made yourself sharper and more coherent

seems like a very limited way of looking at “coherence”. In the context of multi agent negotiations, becoming “sharper and more coherent” should equate to having an internal consensus protocol that approaches closer to the Pareto frontier of possible multi agent equilibria.

Technically, “allocate all resources to a single agent” is a Pareto optimal distribution, but it’s only possible if a single agent has an enormously outsized influence on the decision making process. A person for whom that is true would, I think, be incredibly deranged and obsessive. None of my parts aspire to create such a twisted internal landscape.

I instead aspire to be the sort of person whose actions both reflect a broad consensus among my individual parts and effectively implement that consensus in the real world. Think results along the line of the equilibrium that emerges from superrational agents exchanging influence [LW · GW], rather than some sort of “internal dictatorship” where one part infinitely dominates over all others

Replies from: Slider
comment by Slider · 2022-05-07T20:23:16.485Z · LW(p) · GW(p)

So policy debates should appear one-sided? Wouldn't a consensus protocol be "duller" in that it takes less actions than one that didn't abide by a consensus?

Replies from: quintin-pope
comment by Quintin Pope (quintin-pope) · 2022-05-08T00:16:48.581Z · LW(p) · GW(p)

The result about superrational agents was only demonstrated for superrational agents. That means agents which implement, essentially, the best of all possible decision theories. So they cooperate with each other and have all the non-contradictory properties that we want out of the best possible decision theory, even if we don't currently know how to specify such a decision theory.

It's a goal to aspire to, not a reality that's already been achieved.

Replies from: Slider
comment by Slider · 2022-05-08T13:36:32.875Z · LW(p) · GW(p)

Trying to simply even further an answer to "How to deal with internal conflict?" of "Don't have internal conflict" is pretty much correct and unhelpful.

I though that there was a line favoring consensus as approaching conflict situations something along the lines of "How to do deal with internal conflict?" "In conflict you are outside the pareto frontier so you should do nothing as there is nothing mututally agreeable to be found." or "cooperate to the extent that mutual agreeement exists and then do nothing after that point where true disagreement starts"

comment by Dagon · 2022-05-06T16:46:47.405Z · LW(p) · GW(p)

Humans (at least most of us, including all I've talked deeply with) don't actually have utility functions which are consistent over time (nor reflectively consistent at points in time, but that's a different point).  You almost certainly have preference curves over multiple timeframes and levels of abstraction.

You can choose to reinforce some conceptions of your preference sets and suppress or override others, and over time these will become habit, and your "true" utility function will be different than you used to think.

You probably can't fully rewrite yourself completely arbitrarily, but you CAN over time "become better" by adjusting your instincts and your interpretation of experiences to more closely match your intellectual meta-preferences.

comment by Slider · 2022-05-06T10:06:41.554Z · LW(p) · GW(p)

The story involves a lot of "looking down" on people being ineffective because of confusion or out of underdevelopment. The is probably value in being aware of your value structure and not be in error about it but it also feels like that other agents might be categorised to be confused when they are just having a value structure that goes by different lines that you do.

Say that I care about wormless apples a lot and don't like wormfull apples. I encounter a lot of picks between an apple and an orange. Say that every time the apple has a worm I pick orange. From the perspective of an outsider (which might not have similar worm-detection abilities) it can seem that 100 times out of 1000 I go for the orange instead of the apple. That outsider might then be tempted to think that I am not following a utility function but I am being confused or contradictory in regards to my fruit buying. But in my terms and perception I am. And even if I only had a unconcious ough-field around worms that I non-conceptually intuit are there would that unawareness of the function make me not follow that function?

So if persons whos values are imperceptually complex and fragile to me appear the same as confused persons going around judging everybody that doesn't make sense in my terms that can make for trying to hammer people from unfamiliar shapes to familiar shapes which could be a form of xenophobia. I guess in Kelthams situation assuming that anybody different is lesser can make sense. And in the area of sexuality Keltham is not sure whether his world is strictly superior. But what about the areas where the comparison is not easy? The general form of Keltham learning about Cheliax tends to be of the form of "why people are this stupid?" being explained by there existing a bad environment. Markets are screwy because there are no good roads and not simply because traders are stupid. It tends to make fixing the situation more pressing but less judgy for the actors. Infrastructure that Keltham just took for granted is revealed to have conditions for existing.

In the story the "societally right" person in Cheliax is expected to be Evil (which has a narrower meaning in the setting). The bashing and shaming of peoples Good parts would work throught the same logic. There is a difference between concieving rationality as "the art of getting more of what you want" vs "the art of getting more of what you want". The point here is more about "expected to want" vs "actually want" and not about "others want" vs "I want".

comment by Jasnah Kholin (Jasnah_Kholin) · 2023-07-25T06:35:49.772Z · LW(p) · GW(p)

One of the things that I searched for in EA and didn't find, but think should exist: algorithm, or algorithms, to decide how much to donate, as a personal-negotiation thing.

There is Scott Alexander's post about 10% as Schelling point and way to placate anxiety, there is the Giving What You Can calculation. but both have nothing with personal values.

I want an algorithm that is about introspection - about not smashing your altruistic and utilitarian parts, but not other parts too, about finding what number is the right number for me, by my own Utility Function.

and I just... didn't find those discussions.
 

in dath ilan, when people expect to be able to name a price for everything more or less, and did extensive training to have the same answer to the questions  'how much would you pay to get this extra' and 'how much additional payment would you forgo to get this extra' and 'how much would you pay to avoid losing this' and 'how much additional payment would you demand if you were losing this.', there are answers.
 

What is the EA analog? how much I'm willing to pay if my parents will never learn about that? If I could press a button and get 1% more taxes that would have gone to top Giving Well charities but without all second order effects except the money, what number would I choose? What if negative numbers were allowed? what about the creation of a city with rules of its own, that take taxes for EA cause - how much i would accept then?
 

where are the "how to figure out how much money you want to donate in a Lawful way?" exercises?
 

Or maybe it's because far too many people prefer and try to have their thinking, logical part win internal battle against other, more egotistical ones?
 

Where are all the posts about "how to find out what you really care about in a Lawful way"? The closest I came about is Internal Double Crux and Multi-agent Model of the soul and all its versions. But where are my numbers?
 

comment by Joey KL (skluug) · 2022-05-06T21:40:53.307Z · LW(p) · GW(p)

This kind of thing seems totally backwards to me. In what sense do I lose if I "bulldoze my values"? It only makes sense to describe me as "having values" insofar as I don't do things like bulldoze them! It seems like a way to pretend existential choices don't exist--just assume you have a deep true utility function, and then do whatever maximizes it.

Why should I care about "teasing out" my deep values? I place no value on my unknown, latent values at present, and I see no reason to think I should!

Replies from: David Udell
comment by David Udell · 2022-05-06T22:08:19.100Z · LW(p) · GW(p)

You might not value growing and considering your values, but that would make you pretty unusual, I think. Most people think that they have a pretty-good-but-revisable understanding of what it is that they care about; e.g., people come to value life relatively more than they did before after witnessing the death of a family member for the first time. Most people seem open to having their minds changed about what is valuable in the world.

If you're not like that, and you're confident about it, then by all means lock in your values now against your future self.

I don't do that because I think I might be wrong about myself, and think I'll settle into a better understanding of myself after putting more work into it.

Replies from: skluug
comment by Joey KL (skluug) · 2022-05-06T22:54:15.266Z · LW(p) · GW(p)

I feel like you say this because you expect your values-upon-reflection to be good by light of your present values--in which case, you're not so much valuing reflection, as just enacting your current values. 

If Omega told me if I reflected enough, I'd realize what I truly wanted was to club baby seals all day, I would take action to avoid ever reflecting that deeply!

It's not so much that I want to lock in my present values as it is I don't want to lock in my reflective values. They seem equally arbitrary to me.

comment by TAG · 2022-05-06T12:28:27.028Z · LW(p) · GW(p)

The Way is to get more of what you want

Even if it destroys value for other people? If not, why not?

Replies from: dkirmani, AnthonyC
comment by dkirmani · 2022-05-06T17:00:58.267Z · LW(p) · GW(p)

Yup. Rationality is orthogonal to values.

Replies from: SamuelKnoche
comment by SamuelKnoche · 2022-05-10T07:44:24.413Z · LW(p) · GW(p)

Worth noting though that you should expect those who are most successful at getting more of what they want to have a utility function with a big overlap with the utility function of others, and to be able to credibly commit to not destroy value for other people. We live in a society. 

Replies from: Dagon
comment by Dagon · 2022-05-12T20:48:49.230Z · LW(p) · GW(p)

a utility function with a big overlap with the utility function of others,

Or at least to be pretty good at appearing that way, while privately defecting to increase their own control of resources.

comment by AnthonyC · 2022-05-07T13:59:01.748Z · LW(p) · GW(p)

Does what I want include wanting other people to get what they want? Often, yes. Not always. I want my family and friends and pets to get more of what they specifically want, and a large subset of strangers to mostly get more of what they want. But there are people who want to hurt others, and people who want to hurt themselves, and people who want things that reduce the world's capacity to support people getting want they want, and I don't want those people to get what they want. This will create conflict between agents, often value-destroying conflict, and zero or negative sum competitions, which I nevertheless will sometimes participate in and try to win.

comment by TAG · 2022-05-06T12:21:23.789Z · LW(p) · GW(p)

First of all, you need to decide whether you are an instrumental rationalist or an epistemic rationalist[*] If you are an epistemic rationalist, you need to figure out whether moral realism is true. Because if moral realism is true, that defines what you (really) should so.

[*] Because there is more than one Way.

Replies from: dkirmani
comment by dkirmani · 2022-05-06T16:58:19.852Z · LW(p) · GW(p)

Nah. Epistemic rationality is instrumentally convergent.