Impossible moral problems and moral authority

post by Charlie Steiner · 2019-11-18T09:28:28.766Z · score: 15 (11 votes) · LW · GW · 5 comments

The Tails Come Apart As Metaphor For Life, but with an extra pun.

Suppose you task your friends with designing the Optimal Meal. The meal that maximizes utility, in virtue of its performance at the usual roles food fills for us. We leave aside considerations such as sourcing the ingredients ethically, or writing the code for an FAI on the appetizer in tiny ketchup print, or injecting the lettuce with nanobots that will grant the eater eternal youth, and solely concern ourselves with arranging atoms to get a good meal qua meal.

So you tell your friends to plan the best meal possible, and they go off and think about it. One comes back and tells you that their optimal meal is like one of those modernist 30-course productions, where each dish is a new and exciting adventure. The next comes back and says that their optimal meal is mostly just a big bowl of their favorite beef stew, with some fresh bread and vegetables.

To you, both of these meals seem good - certainly better than what you've eaten recently. But then you start worrying that if this meal is important, then the difference in utility between the two proposed meals might be large, even though they're both better than the status quo (say, cold pizza). In a phrase, gastronomical waste. But then how do you deal with the fact that different people have chosen different meals? Do you just have to choose one yourself?

Now your focus turns inward, and you discover a horrifying fact. You're not sure which meal you think is better. You, as a human, don't have a utility function written down anywhere, you just make decisions and have emotions. And as you turn these meals over in your mind, you realize that different contexts, different fleeting thoughts or feelings, different ways of phrasing the question, or even just what side of the bed you got up on that morning, might influence you to choose a different meal at a point of decision, or rate a meal differently during or after the fact.

You contain within yourself the ability to justify either choice, which is remarkably like being unable justify either choice. This "Optimal Meal" was a boondoggle all along. Although you can tell that either would be better than going home and eating cold pizza, there was never any guarantee that your "better" was a total ordering of meals, not merely a partial ordering.

Then, disaster truly strikes. Your best friend asks you "So, what do you want to eat?"

You feel trapped. You can't decide. So you call your mom. You describe to her these possible meals, and she listens to you and makes sympathetic noises and asks you about the rest of your day. And you tell her that you're having trouble choosing and would like her help deciding, and so she thinks for a bit, and then she tells you that maybe you should try the modernist 30-course meal.

Then you and your friends go off to the Modernism Bistro, and you have a wonderful time.


This is a parable about how choosing the Optimal Arrangement Of All Atoms In The Universe is an impossible moral problem. Accepting this as a given, what kind of thing is happening when we accept the decision of some authority (superhuman AI or otherwise) as to what should be done with those atoms?

When you were trying to choose what to eat, there was no uniquely right choice, but you still had to make a choice anyhow. If some moral authority (e.g. your mom) makes a sincere effort to deliberate on a difficult problem, this gives you an option that you can accept as "good enough," rather than "a waste of unknowable proportions."

How would an AI acquire this moral authority stuff? In the case of humans, we can get moral authority by:

You might think "Of course we shouldn't trust an AI just because it's persuasive." But in an important sense, none of these reasons is good enough. We're talking about trusting something as an authority on an impossible problem, here.

A good track record on easier problems is a necessary condition to even be thinking about the right question, true. I'm not advocating that we fatalistically accept some random nonsense as the meaning of life. The point is that even after we try our hardest, we (or an AI making the choice for us) will be left in the situation of trying to decide between Optimal Meals, and narrowing this choice down to one option shouldn't be thought of as a continuation of the process that generated those options.

If after dinner, you called your mom back and said "That meal was amazing - but how did you figure out that was what I really wanted?", you would be misunderstanding what happened. Your mom didn't solve the problem of underdetermination of human values, she just took what she knew of you and made a choice - an ordinary, contingent choice. Her role was never to figure out what you "really wanted," it was to be an authority whose choice you and your friends could accept.

So there are two acts of trust that I'm thinking about this week. The first is how to frame FAI as a trusted authority rather than an oracle telling us the one best way to arrange all the atoms. And the second is how an FAI should trust its own decision-making process when it does meta-ethical reasoning, without assuming that it's doing what humans uniquely want.

5 comments

Comments sorted by top scores.

comment by G Gordon Worley III (gworley) · 2019-11-18T19:01:57.005Z · score: 5 (3 votes) · LW(p) · GW(p)

I think all of your reasons for how a human comes to have moral authority boil down to something like having a belief that doing things that this authority says are expected to be good (have positive valence, in my current working theory of values [LW · GW]). This perhaps gives a way of reframing alignment as the problem of constructing an agent to whom you would give moral authority to decide for you, rather than as we normally do as an agent that is value aligned.

comment by cousin_it · 2019-11-18T20:34:39.501Z · score: 3 (1 votes) · LW(p) · GW(p)

It seems to me that with meals, there's a fact of the matter that AI could help with. After all, if two copies of you went and had the different meals, one of them would probably be happier than the other.

Though that happiness might not depend only on the chosen meal. For example, if one meal is a cake that looks exactly like Justin Bieber, that might be actually not as fun as it sounds. But if you skipped it in favor of an ordinary burrito, you'd forever regret that you didn't get to see the Justin Bieber cake.

comment by Dagon · 2019-11-19T00:09:28.172Z · score: 4 (2 votes) · LW(p) · GW(p)

I think you still hit the multidimension comparison problem. One of the branches may be happier, the other more satisfied. One might spend the rest of the evening doing some valuable work, the other didn't have as much time. One consumed a slightly better mix of protein and fat, the other got the right amount of carbs and micronutrients.

Without a utility function to tell you the coefficients, there _IS_ no truth of the matter which is better.

Edit: this problem also hits intertemporally, even if you solve the comparability problem for multiple dimensions of utility. The one that's best when eating may be different than the one that you remember most pleasurably afterward, which may be different than the one you remember most pleasantly a year from now.

comment by Mary Chernyshenko (mary-chernyshenko) · 2019-11-18T13:49:41.476Z · score: 1 (1 votes) · LW(p) · GW(p)

But it is not important. If it were important, you wouldn't think about it as "what do I want". For example, if you want "world peace" and you have two ways to achieve it, and you can't choose because both are so great, it means either that you have no dice or there is a reason why both would fail.

comment by Charlie Steiner · 2019-11-19T04:46:44.618Z · score: 3 (2 votes) · LW(p) · GW(p)

Yes, I hope that my framing of the problem supports this sort of conclusion :P

An alternate framing where it still seems important would be "moral uncertainty". Where when we don't know what to do, it's because we are lacking some facts, maybe even key facts. So I'm sort of sneakily arguing against that frame.