Preferences over non-rewards

post by Stuart_Armstrong · 2017-11-03T15:28:02.504Z · score: 10 (4 votes) · LW · GW · None comments

Contents

  Preferences over preferences and knowledge
  Aliefs
  Tribalism and signalling
  Personal identity
  "You're not the boss of me!"
  Caring about derivatives rather than positions
  Values that don't make sense out of context
  A more complex format needed
None
No comments

In this penultimate post on "learning human values" series, I just want to address some human values/preferences/rewards that don't fit neatly into the (p, R) model [LW · GW] where p in the planning algorithm and R the actual reward.

Preferences over preferences and knowledge

Most people have preferences over their own preferences - and that of others. For example, consider someone who has an incorrect religious faith. They might believe something like:

"I want to always continue believing. I flinch away from certain sceptical arguments, but I'm sure my deity would protect me from doubt if I ever decided to look into them".

Hope this doesn't sound completely implausible for someone. Here they have beliefs, preferences over their future beliefs, and beliefs over their future beliefs. This doesn't seem to be able to be easily captured in the (p, R) framework. We can also see that asking them equivalent questions "Do you want to doubt your deity?" and "Do you want to learn the truth?" will get very different answers.

But it's not just theism, an example which is too easy to pick on. I have preferences over knowledge, for instance, as do most people. I would prefer that people had accurate information, for instance. I would also prefer that, when choosing between possible formalisations of preferences [LW · GW], people went with the less destructive and less self-destructive options. These are not overwhelmingly strong preferences, but they certainly exist.

Aliefs

Consider the following scenario: someone believes that roller-coasters are perfectly safe, but enjoys riding them for the feeling of danger they give them. It's clear that the challenge here is not reconciling the belief of safety with the alief of danger (which is simple: roller-coasters are safe), but to somehow transform the feeling of danger into another form that keeps the initial enjoyment.

Tribalism and signalling

The theism argument might suggest that tribalism will be a major problem, as various groups pressure adherents to conform to certain beliefs and preferences.

But actually that need not be such a problem. It's clear that there is a strong desire to remain part of that group (or, sometimes, just of a group). Once that desire is identified, all the rest become instrumental - the human will either do the actions that are needed to remain part of the group, without needing to change their beliefs or preference (just because evolution doesn't allow us to separate those two easily, doesn't mean an AI can't help us do it), or will rationally sacrifice beliefs and preferences to the cause of remaining part of the group.

Most signalling cases can be dealt with in the same way. So, though tribalism is a major reason people can end up with contingent preferences, it doesn't in itself pose problems to the (p, R) model.

Personal identity

The problem of personal identity is a tricky one. I would like to remain alive, happy, curious, having interesting experience, doing worthwhile and varied activities, etc...

Now, this is partially preferences about future preferences, but there's the implicit identity: I want this to happen to me. Even when I'm being altruistic, I want these experiences to happen to someone, not just to happen in some abstract sense.

But the concept of personal identity is a complicated one, and it's not clear if it can be collapsed easily into the (p, R) format.

"You're not the boss of me!"

Finally, even if personal identity is defined, it remains the case that people can judge different situations depending on how that situation is achieved. Being forced or manipulated into a situation will make them resent it much more than if they reach it through "natural" means. Of course, what counts as acceptable and unacceptable manipulations change, is filled with biases, inconsistencies, and incorrect beliefs (in my experience, far too many people think themselves immune to advertising, for instance).

Caring about derivatives rather than positions

People react strongly to situations getting worse of better, not so much to the absolute quality of the situation.

Values that don't make sense out of context

AIs would radically reshape the world and society. And yet humans have deeply held values that only make sense in narrow contexts - sometimes, they already no longer make sense. For instance, in my opinion, of the five categories in moral foundations theory, one no longer makes sense and three only make partial sense (and it seems to me that having these values in a world where it's literally impossible to satisfy them, is part of the problem people have with the modern world):

This can be seen as a subset of the whole "underdefined human values", but it could also be seen as an argument for preserving or recreating certain contexts, in which these values make sense.

A more complex format needed

These are just some of the challenges to the (p, R) format, and there are certainly others. It's not clear how much that format needs to complicated in order to usefully model all these extra types of preferences.

None comments

Comments sorted by top scores.