Human wanting

post by TsviBT · 2023-10-24T01:05:39.374Z · LW · GW · 1 comments

Contents

  Human wanting
  The meaning of wanting
  Baked-in claims
  The variety of human wanting
  The role of human wanting
    Human "wanting" is a network of conjectural concepts
    Human wanting is half of an aligned AGI
None
1 comment

[Metadata: crossposted from https://tsvibt.blogspot.com/2023/08/human-wanting.html. First completed August 22, 2023.]

We have pretheoretic ideas of wanting that come from our familiarity with human wanting, in its variety. To see what way of wanting can hold sway in a strong and strongly growing mind, we have to explicate these ideas, and create new ideas.

Human wanting

The problem of AGI alignment is sometimes posed along these lines: How can you make an AGI that wants to not kill everyone, and also wants to do some other thing that's very useful?

What role is the idea of "wanting" playing here? It's a pretheoretical concept. It makes an analogy to humans.

The meaning of wanting

What does it say about a human? When a human wants X, in a deep sense, then:

And, it is at least sometimes feasible for a human to choose to want X——or even, for a human to choose that another human will want X. Wanting is specifiable.

Baked-in claims

The concept of wanting is, like all concepts, problematic. It comes along with some claims:

  1. Agents (or minds, or [things that we'll encounter and that have large effects]) want; or at least, we can choose to make agents that want.
  2. All these features apply to the way the agent wants.
  3. All these features can possibly coexist.
  4. What the agent wants is something that can be specified.
  5. This sort of wanting is the sort of thing that's going on in a human when we say that the human wants something.
  6. Wanting is a Thing; it will reveal a dense and densening region of internal relations upon further investigation.

These claims are dubious when applied to human wanting, and more dubious when applied to other minds wanting.

The variety of human wanting

If we follow the reference to wanting in humans, we find a menagerie: wants that are:

The role of human wanting

There are two roles played in AGI alignment by human wanting:

Human "wanting" is a network of conjectural concepts

Our familiarity with human wanting suggests hypotheses for concepts that might be useful in describing and designing an AGI.

Our familiarity with human wanting can't be relied on too much without further analysis. We might observe behavior in another mind and then say "This mind wants such and such", and then draw conclusions from that statement——but those conclusions may not follow from the observations, even though they would follow if the mind were a human. The desirable properties that come along with a human wanting X may not come along with designs, incentives, selection, behavior, or any other feature, even if that feature does overlap in some ways with our familiar idea of wanting.

That human wanting shows great variety, does not in general argue against the use of any other idea of wanting. Both our familiar ideas about human wanting, and our more theoretical ideas about wanting, might prove to be useful starting ideas for creating capable minds with specifiable effects.

Human wanting is half of an aligned AGI

It's the wants of humanity that the AGI is supposed to help bring about, so the AGI+human system has to accommodate the human wanting.

Human wanting is proleptic, ambiguous, process-level, inexplicit, and so on. Human wanting is provisional. Because human wanting is provisional, the AGI must be correctable (corrigible). The AGI must be correctable through-and-through, in all aspects (since all aspects touch on how the AGI+human wants), even to the point of a paradox of tolerance——the human might to correct the AGI in a way that the AGI recognizes as ruining the correctable nature of the AGI, and that should be allowed (with warning).

1 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-10-24T02:04:37.358Z · LW(p) · GW(p)

I like this a lot. I think it's going to be really important when analyzing a created agent to compare the style/extent of its wanting to human wanting. I expect we will still to create something that has a limited subset of the wanting expressed by humans. I don't think enough thought has yet gone into analyzing what aspects of wanting are expressed by current RL agents, and how we could measure that objectively.