Can you define "utility" in utilitarianism without using words for specific human emotions?

alex_lw

Can you define "utility" in utilitarianism without using words for specific human emotions?

post by SurvivalBias (alex_lw) · 2022-09-21T03:29:34.261Z · LW · GW · No comments

This is a question post.

  Answers
    15 Richard_Kennaway
    13 oumuamua
    1 Dagon
    0 Viktor Rehnberg
None
No comments

I'm trying to get a slightly better grasp of utilitarianism as it is understood in rat/EA circles, and here's my biggest confusion at the moment.

How do you actually define "utility", not in the sense of how to compute it, but in the sense of specifying wtf are you even trying to compute? People talk about "welfare", "happiness" or "satisfaction", but those are intrinsically human concepts and most people seem to assume non-human agents at least in theory can have utility. So let's taboo those words, and all other words referring to specific human emotions (you can still use the word "human" or "emotion" itself if you have to). Caveats:

Your definition should exclude things like AlphaZero or a $50 robot toy following a lights spot.
If you use the word "sentient" or synonyms, provide at least some explanation of what do you mean by it.

If the answer is different for different flavors of utilitarianism, please clarify which one(s) your definition(s) apply to.

Alternatively, if "utility" is defined in human terms by design, can you explain what is the supposed process for mapping internal states of those non-human agents into human terms?

Answers

answer by Richard_Kennaway · 2022-09-21T17:49:23.710Z · LW(p) · GW(p)

"Utilitarianism" has two different, but related meanings. Historically, it generally means "the morally right action is the action that produces the most good", or as Bentham put it, "the greatest amount of good for the greatest number". Leave aside for the moment that this ignores the tradeoff between how much good and how many people, and exactly what the good is. Bentham and like-minded thinkers mean by "good" things like material well-being, flourishing, "happiness", and so on. They are pointing in a certain direction, even if a bit vaguely. Utilitarianism in this sense is about people, and its conception of the good consists of what humans generally want. It is necessarily expressed in terms of human concepts, because that is what it is about.

The other thing that the word "utilitarianism" has become used for is the thing that various theorems prove can be constructed from a preference relation satisfying certain axioms. Von Neumann and Morgenstern are the usual names mentioned, but there are also Savage, Cox, and others. Collectively, these are, as Eliezer has put it [LW · GW], "multiple spotlights all shining on the same core mathematical structure". The theory is independent of of any specific preference relation and of what the utility function determined by those preferences comes out to be. (ETA: This use of the word might be specific to the rationalist community. "Utility theory" is I think the more widely used term. Accordingly I've replaced "VNMU" by VNMUT" below.)

To distinguish these two concepts I shall call them "Benthamite utilitarianism" and "Von Neuman-Morgenstern utility theory", or BU and VNMUT for short. How do they relate to each other, and what does either have to say about AI?

BU has a specific notion of the individual good. VNMUT does not. VNMUT is concerned only with the structure of the preference relation, not its content. In VNMUT, the preference relation is anything satisfying the axioms; in BU it is a specific thing, not up for grabs, described by words such as "welfare", "happiness", or "satisfaction".

By analogy: BU is like studying the structure of some particular group, such as the Monster Group, while VNMUT is like group theory, which studies all groups and does not care where they came from or what they are used for.

VNMUT is made of theorems. BU is not. BU contains no mathematical structure to elucidate what is meant by "the greatest good for the greatest number". The slogan is a rallying call, but leaves many hard decisions to be made.
Neither BU nor VNMUT have a satisfactory concept of collective good. BU is silent about the tradeoff between the greatest good and the greatest number. There is no generally agreed on extension of VNMUT to mathematically construct a collective preference relation or utility function. There have been many attempts, on both the practical side (BU) and the theoretical side (VNMUT), but the body of such work does not have the coherence of those "multiple spotlights all shining on the same core mathematical structure". The differing attitudes we observe to the Repugnant Conclusion illustrate the lack of consensus.

What do either of these have to do with AI?

If a program is trained to produce outputs that maximise some objective function, that value is at least similar to a utility in the VNMUT sense, although it is not derived from a preference relation. The utility (objective function) is primitive and a preference relation can be derived from it: the program "prefers" a higher value to a lower.

As for BU, whether a program optimises for the human good is up to what its designers choose to have it optimise. Optimise for deadly poisons and that may be what you get. (I don't know if anyone has experimented with the compounds that that experiment suggested, although it seems to me quite likely that some military lab somewhere is doing so, if they weren't already.) Optimise for peace and love, and maybe you get something like that, or maybe you end up painting smiley faces onto everything. The AI itself is not feeling or emoting. Its concepts of "welfare", "happiness", or "satisfaction", such as they are, are embodied in the training procedure its programmers used to judge its outputs as desired or undesired.

↑ comment by M. Y. Zuo · 2022-10-03T20:46:05.241Z · LW(p) · GW(p)

How can we know conclusively that 'The AI itself is not feeling or emoting.'?

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2022-10-03T20:58:08.182Z · LW(p) · GW(p)

"Conclusively" is doing too much work there. Do you attribute feelings or emotions to current AIs? I deny them on the same grounds as I deny them to any of the other software I use, and to rocks. I say current AIs, because that is what I had in mind, and because there would be no point in arguing "But suppose someone did make an AI with emotions! Then it would have emotions!"

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2022-10-03T22:31:37.735Z · LW(p) · GW(p)

If the objectionable word is removed:

How can we know - that 'The AI itself is not feeling or emoting.'?

Would you have a different answer?

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2022-10-04T07:36:22.019Z · LW(p) · GW(p)

I just gave my answer. For more, there's this from my recent ding-dong with Signer [LW(p) · GW(p)]. Briefly, in the absence of any method of actually detecting and measuring consciousness (a concept in which I include feelings and emotions), a consciousnessometer, we must fall back on the experiences that give rise to the very concept, on the basis of which we attribute consciousness to people besides ourselves, and to some extent other animals. On that basis I see no reason to attribute it to any extant piece of software.

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2022-10-12T18:16:19.262Z · LW(p) · GW(p)

Briefly, in the absence of any method of actually detecting and measuring consciousness (a concept in which I include feelings and emotions), a consciousnessometer, we must fall back on the experiences that give rise to the very concept, on the basis of which we attribute consciousness to people besides ourselves, and to some extent other animals.

That seems like a less popular understanding.

Why must consciousness include 'feelings' and 'emotions'?

If someone has their portion of the brain responsible for emotional processing damaged, do they become less conscious?

Merriam-webster also lists that as number 2 in their dictionary, and a different definition in the number one position:

Definition of consciousness
1a: the quality or state of being aware especially of something within oneself
b: the state or fact of being conscious of an external object, state, or fact
c: AWARENESSespecially : concern for some social or political cause The organization aims to raise the political consciousness of teenagers.
2: the state of being characterized by sensation, emotion, volition, and thought : MIND
3: the totality of conscious states of an individual
4: the normal state of conscious liferegained consciousness
5: the upper level of mental life of which the person is aware as contrasted with unconscious processes

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2022-10-13T07:21:22.610Z · LW(p) · GW(p)

Why must consciousness include 'feelings' and 'emotions'?

If they are present, they are part of consciousness. They are included in the things of which one is aware within oneself (and in item 2 of the definition you quote). I did not intend any implication that they must be present, for consciousness to be present.

answer by oumuamua · 2022-09-21T09:09:22.275Z · LW(p) · GW(p)

People talk about "welfare", "happiness" or "satisfaction", but those are intrinsically human concepts

No, they are not. Animals can feel e.g. happiness as well.

If you use the word "sentient" or synonyms, provide at least some explanation of what do you mean by it.

Something is sentient if being that thing is like something. For instance, it is a certain way to be a dog, so a dog is sentient. As a contrast, most people who aren't panpsychists do not believe that it is like anything to be a rock, so most of us wouldn't say of a rock that it is sentient.

Sentient beings have conscious states, each of which are (to a classical utilitarian) desirable to some degree (which might be negative, of course). That is what utilitarians mean by "utility": The desirability of a certain state of consciousness.

I expect that you'll be unhappy with my answer, because "desirability of a certain state of consciousness" does not come with an algorithm for computing that, and that is because we simply do not have an understanding of how consciousness can be explained in terms of computation.

Of course having such an explanation would be desirable, but its absence doesn't render utilitarianism meaningless, because humans still have an understanding of what approximately we mean by terms such as "pleasure", "suffereing", "happiness", even if it is merely in a "I know it when I see it" kind of way.

↑ comment by SurvivalBias (alex_lw) · 2022-09-22T02:03:30.587Z · LW(p) · GW(p)

No, they are not. Animals can feel e.g. happiness as well.

Yeah but the problem here is that we perceive happiness in animals only in as much as it looks like our own happiness. Did you notice that the closer an animal to a human the more likely we are to agree it can feel emotions? An ape can definitely display something like a human happiness, so we're pretty sure it can experience it. A dog can display something mostly like human happiness so most likely they can feel it too. A lizard - meh, maybe but probably not. An insect, most people would say no. Maybe I'm wrong and there's an argument that animals can experience happiness which is not based on their similarity to us, in that case I'm very curious to see this argument.

Sentience

For the record, I believe we do have at least crude mechanistic model of how consciousness works in general, and yes what's with the hard problem of consciousness in particular (the latter being a bit of a wrong question [LW · GW]).

Otherwise, I actually think it somewhat answers my question. One my qualm would be that sentience does seem to come on a spectrum - but that can in theory be addressed by some scaling factor. The bigger issue for me is that it implies that a hardcore total utilitarian would be fine with a future populated by trillions of sentient but otherwise completely alien AIs successfully achieving their alien goals (e.g. maximizing paperclips) and experiencing desirable-state-of-consciousness about it. But I think some hardcore utilitarians would bite this bullet, and that wouldn't be a biggest bullet for a utilitarian to bite either.

answer by Dagon · 2022-09-21T15:35:51.307Z · LW(p) · GW(p)

[note: anti-realist non-Utilitarian here; I don't believe "utility" is actually a universal measurable thing, nor that it's comparable across entities (nor across time for any real entity). Consider this my attempt at an ITT on this topic for Utilitarianism]

One possible answer is that it's true that those emotions are pretty core to most people's conception of utility (at least most people I've discussed it with). But this does NOT mean that the emotions ARE the utility, they're just an evolved mechanism which points to utility, and not necessarily the only possible mechanism. Goodhart's Law hits pretty hard if you think of the emotions directly as utility.

Utility itself is an abstraction over the level of satisfaction of goals/preferences about the state of the universe for an entity. Or in some conceptions, the eu-satisfaction of the goals the entity would have if it were fully informed.

↑ comment by SurvivalBias (alex_lw) · 2022-09-21T20:08:30.871Z · LW(p) · GW(p)

>Utility itself is an abstraction over the level of satisfaction of goals/preferences about the state of the universe for an entity.

You can say that a robot toy has a goal of following a light source. Or thermostat has a goal of keeping the room temperature at a certain setting. But I'm yet to hear anyone counting those things towards total utility calculations.

Of course a counterargument would be "but those are not actual goals, those are the goals of humans that set it", but in this case you've just hidden all the references to humans into the word "goal" and are back to square 1.

answer by Viktor Rehnberg · 2022-09-21T08:17:51.620Z · LW(p) · GW(p)

Utility when it comes to a single entity is simply about preferences.

The entity should have

For any two outcomes/states of the world the entity should prefer one over the other or consider them equally preferable
The entity should be coherent in its preferences such that if it preferes to $B$ and $B$ to $C$ , then the entity prefers $A$ to $C$
When it comes to probabilities, if the entity prefers $A$ to $B$ then the entity prefers $A$ with probability $p$ to $B$ with probability $p$ all else equal. Furthermore, there exist a probability $p$ such that $p A$ and $(1 - p) C$ is equally preferable to $B$ with certainty with the preference ordering from 2.

This is simply Von Neumann -- Morgenstern utility theory and means that for such an entity you can translate the preference ordering to a real valued function $U$ over preferences. When we only consider a single agent this function is undetermined up to a any scaling with positive scalar values or shifting with scalar values.

Usually I'd like to add the expected utility hypothesis as well that

$U (A) - U (not A) = 2 (U (A) - U (0.5 A + 0.5 (not A)))$

where $0.5 A$ is $A$ with probability $0.5$ .

(Edit: Apparently step 3 implies the expected utility hypothesis. And cubefox [LW(p) · GW(p)] pointed out that my notation here was weird. An improved notation would be that

$U (X) = E [U (X)] = \sum ω \in Ω P_{X} (ω) U (ω)$

where $X$ is a random variable over the set of states $Ω$ . Then I'd say that the expected utility hypothesis is the step $U (X) = E [U (X)]$ .

end of edit.)

Now the tricky part to me is when it comes to multiple entities with utility functions. How do you combine these into a single valued function, how are they aggregated.

Here there are differences in

Aggregation function. Should you sum the contributions (total utilitarianism), average, take the minimum (for a maximin strategy), ...
Weighting. For each individual utility function we have a freedom in scale and shift. If we fix utility 0 as this entity does not exist or the world does not exist, then what remains is a scale of the utility functions which effectively functions as weighting in aggregations like sum and average. Here questions like, how many cows living lives worth living would are needed to choose that over a human having a life worth living and how do you determine where in the scale you are in a life worth living.

Another tricky part is that humans and other entities are not coherent to satisfy the axioms in Von Neumann -- Morgenstern utility theory. What to do then, which preferences are "rational" and which are not?

↑ comment by Viktor Rehnberg (viktor.rehnberg) · 2022-09-21T08:24:33.873Z · LW(p) · GW(p)

You could perhaps argue that "preference" is a human concept. You could extend it with something like coherent extrapolated volition to be what the entity would prefer if it knew all that was relevant, had all the time needed to think about it and was more coherent. But, in the end if something has no preference, then it would be best to leave it out of the aggregation.

↑ comment by SurvivalBias (alex_lw) · 2022-09-21T20:02:46.242Z · LW(p) · GW(p)

So utility theory is a useful tool, but as far as I understand it's not directly used as a source of moral guidance (although I assume once you have some other source you can use utility theory to maximize it). Whereas utilitarianism as a metaethics school is concerned exactly with that, and you can hear people in EA talking about "maximizing utility" as the end in and of itself all the time. It was in this latter sense that I was asking.

Can you define "utility" in utilitarianism without using words for specific human emotions?

Contents

Answers

Definition of consciousness

No comments