The Human's Hidden Utility Function (Maybe)

lukeprog

The Human's Hidden Utility Function (Maybe)

post by lukeprog · 2012-01-23T19:39:42.722Z · LW · GW · Legacy · 91 comments

91 comments

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice. And suppose that upon reflection we would clearly reject the outputs of two of these systems, whereas the third system looks something more like a utility function we might be able to use in CEV.

What I just described is part of the leading theory of choice in the human brain.

Recall that human choices are made when certain populations of neurons encode expected subjective value (in their firing rates) for each option in the choice set, with the final choice being made by an argmax or reservation price mechanism.

Today's news is that our best current theory of human choices says that at least three different systems compute "values" that are then fed into the final choice circuit:

The model-based system "uses experience in the environment to learn a model of the transition distribution, outcomes and motivationally-sensitive utilities." (See Sutton & Barto 1998 for the meanings of these terms in reinforcement learning theory.) The model-based system also "infers choices by... building and evaluating the search decision tree to work out the optimal course of action." In short, the model-based system is responsible for goal-directed behavior. However, making all choices with a goal-directed system using something like a utility function would be computationally prohibitive (Daw et al. 2005), so many animals (including humans) first evolved much simpler methods for calculating the subjective values of options (see below).
The model-free system also learns a model of the transition distribution and outcomes from experience, but "it does so by caching and then recalling the results of experience rather than building and searching the tree of possibilities. Thus, the model-free controller does not even represent the outcomes... that underlie the utilities, and is therefore not in any position to change the estimate of its values if the motivational state changes. Consider, for instance, the case that after a subject has been taught to press a lever to get some cheese, the cheese is poisoned, so it is no longer worth eating. The model-free system would learn the utility of pressing the lever, but would not have the informational wherewithal to realize that this utility had changed when the cheese had been poisoned. Thus it would continue to insist upon pressing the lever. This is an example of motivational insensitivity."
The Pavlovian system, in contrast, calculates values based on a set of hard-wired preparatory and consummatory "preferences." Rather than calculate value based on what is likely to lead to rewarding and punishing outcomes, the Pavlovian system calculates values consistent with automatic approach toward appetitive stimuli, and automatic withdrawal from aversive stimuli. Thus, "animals cannot help but approach (rather than run away from) a source of food, even if the experimenter has cruelly arranged things in a looking-glass world so that the approach appears to make the food recede, whereas retreating would make the food more accessible (Hershberger 1986)."

Or, as Jandila put it:

Model-based system: Figure out what's going on, and what actions maximize returns, and do them.
Model-free system: Do the thingy that worked before again!
Pavlovian system: Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.

In short:

We have described three systems that are involved in making choices. Even in the case that they share a single, Platonic, utility function for outcomes, the choices they express can be quite different. The model-based controller comes closest to being Platonically appropriate... The choices of the model-free controller can depart from current utilities because it has learned or cached a set of values that may no longer be correct. Pavlovian choices, though determined over the course of evolution to be appropriate, can turn out to be instrumentally catastrophic in any given experimental domain...

[Having multiple systems that calculate value] is [one way] of addressing the complexities mentioned, but can lead to clashes between Platonic utility and choice. Further, model-free and Pavlovian choices can themselves be inconsistent with their own utilities.

We don't yet know how choice results from the inputs of these three systems, nor how the systems might interact before they deliver their value calculations to the final choice circuit, nor whether the model-based system really uses anything like a coherent utility function. But it looks like the human might have a "hidden" utility function that would reveal itself if it wasn't also using the computationally cheaper model-free and Pavlovian systems to help determine choice.

At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system's preferences as representing "my values," and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.

On the other hand, as Eliezer points out, perhaps we ought to be suspicious of this, because "it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone's going to jump up and say: 'Ha ha! Love and friendship were actually in the other two!'"

Unfortunately, it's too early to tell whether these results will be useful for CEV. But it's a little promising. This is the kind of thing that sometimes happens when you hack away at the edges of hard problems. This is also a repeat of the lesson that "you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it."

(For pointers to the relevant experimental data, and for an explanation of the mathematical role of each valuation system in the brain's reinforcement learning system, see Dayan (2011). All quotes in this post are from that chapter, except for the last one.)

91 comments

Comments sorted by top scores.

comment by Scott Alexander (Yvain) · 2012-01-24T18:03:05.781Z · LW(p) · GW(p)

This is also a repeat of the lesson that "you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it."

On the other hand, rationality can be faster than science. And I'm feeling pretty good about positing three different forms of motivation, divided between model-free tendencies based on conditioning, and model-based goals, then saying we could use transhumanism to focus on the higher-level rational ones, without having read the particular neuroscience you're citing...

...actually, wait. I read as much of the linked paper as I could (Google Books hides quite a few pages) and I didn't really see any strong neuroscientific evidence. It looked like they were inferring the existence of the three systems from psychology and human behavior, and then throwing in a bit of neuroscience by mentioning some standard results like the cells that represent error in reinforcement learning. What I didn't see was a description of how three separate systems naturally fall out of brain studies. But I missed a lot of the paper - is there anything like that in there?

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-25T21:50:35.998Z · LW(p) · GW(p)

What I didn't see was a description of how three separate systems naturally fall out of brain studies. But I missed a lot of the paper - is there anything like that in there?

Some, yes. I've now updated the link in the OP so it points to a PDF of the full chapter.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-01-23T22:32:28.764Z · LW(p) · GW(p)

Um, objection, I didn't actually say that and I would count the difference as pretty significant here. I said, "I would be suspicious of that for the inverse reason my brain wants to say 'but there has to be a different way to stop the train' in the trolley problem - it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone's going to jump up and say: 'Ha ha! Love and friendship were actually in the other two!'"

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T22:42:52.637Z · LW(p) · GW(p)

What? You said that? Sorry, I didn't mean to misquote you so badly. I'll blame party distractions or something. Do you remember the line about a gift basket and it possibly making CEV easier?

Anyway, I'll edit the OP immediately to remove the misquote.

For reference, the original opening to this post was:

Me: "Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice. And suppose that upon reflection we would clearly reject the outputs of two of these systems, whereas the third system looks something more like a utility function. How would you feel?"

Eliezer: "I would feel like someone had left an enormous gift basket at my front door. That could make CEV easier."

Me: "Okay, well, what I just described is part of the leading theory of choice in the human brain."

comment by Nick_Beckstead · 2012-02-07T17:35:23.489Z · LW(p) · GW(p)

What's the evidence that this is the "leading theory of choice in the human brain"? (I am not saying I have evidence that it isn't, but it's important for this post that some large relevant section of the scientific community thinks this theory is awesome.)

comment by cousin_it · 2012-01-23T21:44:19.577Z · LW(p) · GW(p)

Congratulations on continuing this line of inquiry!

One thing that worries me is that it seems to focus on the "wanting" part to the exclusion of the "liking" part, so we may end up in a world we desire today but won't enjoy tomorrow. In particular, I suspect that a world built according to our publicly stated preferences (which is what many people seem to think when they hear "reflective equilibrium") won't be very fun to live in. That might happen if we get much of our fun from instinctive and Pavlovian actions rather than planned actions, which seems likely to be true for at least some people. What do you think about that?

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T22:02:16.310Z · LW(p) · GW(p)

I think that upon reflection, we would desire that our minds be designed in such a way that we get pleasure from getting the things we want, or pleasure whenever we want, or something — instead of how the system is currently set up, where we can't always choose when we feel good and we only sometimes feel good as a result of getting what we want.

Replies from: Multiheaded, None

↑ comment by Multiheaded · 2012-01-24T17:34:35.475Z · LW(p) · GW(p)

Yeah, I agree. I said that we should, in principle, rewire ourselves for this very reason in Bakkot's (in)famous introduction thread, but Konkvistador replied he's got reasons to be suspicious and fearful about such an undertaking.

↑ comment by [deleted] · 2012-01-24T01:24:19.194Z · LW(p) · GW(p)

It would be nice if liking and wanting coincided, but why does "make pleasurable that which we desire" sound better to you than "make desirable that which we find pleasurable"?

Suppose Kelly can't stop thinking about pickle milkshakes. "Oh dang," thinks she, "I could go for a pickle milkshake". But in fact, she'd find a pickle milkshake quite gross. What would Kelly-mature want for Kelly-now? Have someone tell her that pickle milkshakes are gross? Modify her tongue to enjoy pickle milkshakes? Directly make her stop wanting pickle milkshakes? Search flavour space for a beverage superficially similar to pickle milkshakes that does not upset her stomach? Take second order utilities into account and let her drink the milkshake, provided it's not damaging to her health in the long term, so that she's in control of and can learn from her pickle milkshake experiences?

The things you listed sound modifying Kelly's tongue. Is that a fair characterization?

comment by Vladimir_Nesov · 2012-01-23T20:58:34.673Z · LW(p) · GW(p)

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations

Humans violate any given set of axioms simply because they are not formally flawless, so such explanations only start being relevant when discussing an idealization, in this case a descriptive one. But properties of descriptive idealizations don't easily translate into properties of normative idealizations.

comment by Alicorn · 2012-01-23T19:59:13.338Z · LW(p) · GW(p)

The quoted summaries of each of the three systems are confusing and I don't feel like I have an understanding of them, except insofar as the word "Pavlovian" gives a hint. Can you translate more clearly, please?

Replies from: None, None, lukeprog

↑ comment by [deleted] · 2012-01-23T20:11:25.549Z · LW(p) · GW(p)

Or, to put it more simply:

Figure out what's going on, and what actions maximize returns, and do them.
Do the thingy that worked before again!
Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.

Replies from: lukeprog, shminux, Yvain

↑ comment by lukeprog · 2012-01-23T20:25:30.652Z · LW(p) · GW(p)

Added to the original post, credit given.

Replies from: JoachimSchipper

↑ comment by JoachimSchipper · 2012-01-24T12:11:02.749Z · LW(p) · GW(p)

Could you put it before the hard-to-parse explanations? It was nice to confirm my understanding, but it would have saved me a minute or two of effort if you'd put those first.

↑ comment by Shmi (shminux) · 2012-01-23T21:22:15.046Z · LW(p) · GW(p)

Maybe give Luke a lesson or two on C^3 (clear, concise and catchy) summaries.

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T22:05:34.345Z · LW(p) · GW(p)

Note that I wrote this post in two hours flat and made little attempt to optimize presentation in this case.

Replies from: shminux, lukeprog, Swimmer963

↑ comment by Shmi (shminux) · 2012-01-23T22:58:36.498Z · LW(p) · GW(p)

Sorry, I did not intend my comment to rub you the wrong way (or any of my previous comments that might have). FWIW, I think that you are doing a lot of good stuff for the SIAI, probably most of it invisible to an ordinary forum regular. I realize that you cannot afford spending extra two hours per post on polishing the message. Hopefully one of the many skills of your soon-to-be-hired executive assistant will be that of "optimizing presentation".

Replies from: lukeprog, MACHISMO

↑ comment by lukeprog · 2012-01-23T22:59:51.253Z · LW(p) · GW(p)

No worries!

↑ comment by MACHISMO · 2012-01-26T21:49:01.494Z · LW(p) · GW(p)

Indeed. Much invisible work is required before optimization can occur. Invisible forging of skills precedes their demonstration.

↑ comment by lukeprog · 2012-01-26T17:56:18.043Z · LW(p) · GW(p)

For my own reference, here are the posts I tried to write well:

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2012-01-26T19:01:43.128Z · LW(p) · GW(p)

It might be an interesting exercise to record predictions in a hidden-but-reliable form about karma of posts six months out, by way of calibrating one's sense of how well-received those posts will be to their target community.

↑ comment by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2012-01-23T23:29:08.854Z · LW(p) · GW(p)

It's still better than the posts I write in 2 hours! Did that 2 hours include the time spent researching, or were you just citing sources you'd already read for other reasons? In either case...not bad.

↑ comment by Scott Alexander (Yvain) · 2012-01-24T04:02:27.615Z · LW(p) · GW(p)

Is 2 operant/Skinnerian conditioning, and 3 classical/Pavolvian conditioning?

Replies from: None

↑ comment by [deleted] · 2012-01-24T07:44:46.253Z · LW(p) · GW(p)

If by "is" you mean "Do these correspond the underlying cognitive antecedents used in...", then my answer is "it would seem so."

↑ comment by [deleted] · 2012-01-23T20:08:39.077Z · LW(p) · GW(p)

The first one incorporates information about past experiences into simplified models of the world, and then uses the models to steer decisions through search-space based upon a sort of back-of-the-envelope, hazy calculation of expected value. It's a utility function, basically, as implemented by brain.

The second one also incorporates information about past experiences, but rather than constructing the dataset into a model and performing searches over it, it derives expectations directly from what's remembered, and is insensitive to things like probability or shifting subjecting values.

The third one is sort of like the first in its basic operations (incorporate information, analyze it, make models) -- but instead of calculating expected values, it aims to satisfy various inbuilt "drives", and sorts paths through search space based upon approach/avoid criteria linked to those drives.

↑ comment by lukeprog · 2012-01-23T20:19:18.233Z · LW(p) · GW(p)

I like Jandila's explanations.

comment by FiftyTwo · 2012-01-23T21:13:00.859Z · LW(p) · GW(p)

I'm not sure I understand the difference between 2 and 3. The term pavlovian is being applied to the third system, but 2 sounds more like the archtypal pavlovian learned response (dog learns that bell results in food). Does 3 refer exclusively to pre-encoded pleasant/unpleasant responses rather than learned ones? Or is there maybe a distinction between a value and an action response that I'm missing?

Replies from: Swimmer963

↑ comment by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2012-01-23T23:24:04.008Z · LW(p) · GW(p)

It appears to me like 3 is only pre-encoded preferences, whereas 2 refers to preferences that are learned in an automatic, "reflex-like" way...which, yeah, sounds a lot like the Pavlovian learned response.

comment by BrianNachbar · 2012-01-27T15:04:40.686Z · LW(p) · GW(p)

Where do the model-based system's terminal goals come from?

comment by Linda Linsefors · 2022-08-25T11:11:07.840Z · LW(p) · GW(p)

If anyone reads this comment...
Do you know if this claims are have held up? Does this post still agree with current neuroscience, or have there been some major updates?

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-08-26T22:04:07.233Z · LW(p) · GW(p)

I think the three sub-systems can be loosely mapped to the structure discussed in the [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering [LW · GW] as follows:

the model-based system is the Learning System, except that the Learning System doesn't calculate value but only learns to model better via reward prediction error.
the Pavlovian system is the Steering System and is the only system that provides ground truth "value" (this value is low-level reward; abstract concepts of value are formed by the learning system around this ground truth, but these exist only in so far as they are useful to predict the ground truth).
the model-free system doesn't exist as a separate system but is in the shallower parts of the Learning System. I don't think it maps to the Thought Assessor but may be wrong.

In this framework, one could say, as Eliezer suspected, that the value originated outside the model-based system.

comment by jimmy · 2012-01-24T00:35:29.306Z · LW(p) · GW(p)

I'm skeptical of any clear divide between the systems. Of course, there are more abstract and more primitive information paths, but they talk to each other, and I don’t buy that they can be cleanly separated.

Plans can be more or less complicated, and can involve “I don’t know how this part works, but it worked last time, so lets do this” and what worked last time can be very pleasurable and rewarding - so it doesn’t seem to break down cleanly into any one category.

I’d also argue that, to the extent that abstract planning is successful, it is because it propagates top down and affects the lower pavlovian systems. If your thoughts about your project aren’t associated with motivation and wanting to actually do something, then your abstract plans aren’t of much use. It just isn’t salient that this happening unless the process is disrupted and you find yourself not doing what you “want” to do.

Another point that is worth stating explicitly is that algorithms for maximizing utility are not utility functions. In theory, you could have 3 different optimizers that all maximize the same utility function, or 3 different utility functions that all use the same optimizer - or any other combination.

I don’t think this is a purely academic distinction either - I think that we have conflicts at the same level all the time (multiple personality disorder being an extreme case). Conflicts between systems with no talk at between levels look like someone saying they want something, and then doing another without looking bothered at all. When someone is obviously pained by the conflict, then they are clearly both operating on an emotional level, even if the signal originated at different places. Or I could create a pavlovian conflict in my dog by throwing a steak on the wet tile, and watching as his conditioned fear of wet tile fights the conditioned desire of steak.

comment by Multiheaded · 2012-01-24T17:35:37.603Z · LW(p) · GW(p)

'Ha ha! Love and friendship were actually in the other two!'"

This concern is not-abstract and very personal for me. As I've said around here before, I often find myself exhibiting borderline-sociopathic thinking in many situations, but the arrangement of empathy and ethical inhibitions in my brain, though off-kilter in many ways*, drives me to take even abstract ethical problems (LW examples: Three Worlds Collide, dust specks, infanticide, recently Moldbug's proposal of abolishing civil rights for the greater good) very personally, generates all kinds of strong emotions about them - yet it has kept me from doing anything ugly so far.

(The most illegal thing I've done in my life during the moments when I 'let myself go' was some petty and outwardly irrational shoplifting in my teenage years; reflecting back upon that, I did it not solely to get an adrenaline rush but also to push my internal equilibrium into a place where this "superego" thing would recieve an alarm and come back online)

What if this internal safety net of mine is founded solely upon #2 and #3?

( As I've mentioned in some personal anecdotes, - and hell, I don't wish to drone on and on about this, just feeling it's relevant - this part of me has been either very weak or dormant until I watched Evangelion when I was 18. The weird, lingering cathartic sensation and the feeling of psychological change, which felt a little like growing up several years in a week, was the most interesting direct experience in my life so far. However, I've mostly been flinching from consciously* trying to push myself towards the admirable ethics of interpersonal relations that I view as the director's key teaching. It's painful enough when it's happening without conscious effort on your part!)

Replies from: TheOtherDave, None

↑ comment by TheOtherDave · 2012-01-24T18:05:18.067Z · LW(p) · GW(p)

Do you have any particular reason for expecting it to be?

Or is this a more general "what if"? For example, if you contemplate moving to a foreign country, do you ask yourself what if your internal safety net is founded solely on living in the country you live in now?

Replies from: JoachimSchipper, Multiheaded

↑ comment by JoachimSchipper · 2012-01-25T12:38:08.196Z · LW(p) · GW(p)

I'm not Multiheaded, but it feels-as-if the part of brain that does math has no problem at all personally slaughtering a million people if it saves one million and ten (1); the ethical injunction against that, which is useful, feels-as-if it comes from "avoid the unpleasant (c.q. evil) thing". (Weak evidence based on introspection, obviously.)

(1) Killing a million people is really unpleasant, but saving ten people should easily overcome that even if I care more about myself than about others.

Replies from: Multiheaded

↑ comment by Multiheaded · 2012-01-26T22:57:02.581Z · LW(p) · GW(p)

Rougly that; I've thought about it in plenty more detail, but everything beyond this summary feels vague and I'm too lazy currently to make it coherent enough to post.

↑ comment by Multiheaded · 2012-01-24T18:07:53.426Z · LW(p) · GW(p)

Do you have any particular reason for expecting it to be?

It feels like I do, but it'll take a bit of very thoughtful writing to explicate why. So maybe I'll explain it here later.

↑ comment by [deleted] · 2012-01-24T18:28:33.531Z · LW(p) · GW(p)

comment by Bugmaster · 2012-01-25T04:28:02.778Z · LW(p) · GW(p)

This might be a silly question, but still:

Are the three models actually running on three different sets of wetware within the brain, or are they merely a convenient abstraction of human behavior ?

Replies from: BrianNachbar

↑ comment by BrianNachbar · 2012-01-27T19:39:32.866Z · LW(p) · GW(p)

I think what matters is whether they're concurrent—which it sounds like they are. Basically, whether they're more or less simultaneous and independent. If you were emulating a brain on a computer, they could all be on one CPU, or on different ones, and I don't think anyone would suggest that the em on the single CPU should get a different CEV than an identical one on multiple CPUs.

Replies from: Bugmaster

↑ comment by Bugmaster · 2012-01-27T19:56:36.442Z · LW(p) · GW(p)

I was really more interested in whether or not we can observe these models running independently in real, currently living humans (or chimps or rats, really). This way, we could gather some concrete evidence in favor of this three-model approach; and we could also directly measure how strongly the three models are weighted relative to each other.

Replies from: MaoShan

↑ comment by MaoShan · 2012-02-16T03:24:50.097Z · LW(p) · GW(p)

If you could reduce the cognitive cost of the model-based system by designing a "decision-making app", you could directly test if it was beneficial and actually (subjectively or otherwise) improved the subject's lives. If it was successful, you'd have a good chance of beta-testing a real CEV.

comment by Vladimir_Nesov · 2012-01-23T20:42:40.726Z · LW(p) · GW(p)

It seems to me that the actual situation is that upon reflection we would clearly reject (most of) the outputs of all three systems. What human brain actually computes, in any of its modules or in all of them together, is not easily converted into considerations about how the decisions should be made.

In other words, the valuations made by human valuation systems are irrelevant, even though the only plausible solution involves valuations based on human valuation systems. And converting brains into definitions of value will likely break any other abstractions about the brains that theorize them as consisting of various modules with various purposes.

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T21:02:22.734Z · LW(p) · GW(p)

I said that " it seems that upon reflection I would embrace an extrapolation of the model-based system's preferences as representing 'my values'."

Which does, in fact, mean that I would reject "most of the outputs of all three systems."

Note: I've since changed "would" to "might" in that sentence.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-23T21:14:00.873Z · LW(p) · GW(p)

I said that " it seems that upon reflection I would embrace an extrapolation of the model-based system's preferences as representing 'my values'."

OK, didn't notice that; I was referring more to the opening dialog. Though "extrapolation" still doesn't seem to fit, because brain "modules" are not the same kind of thing as goals. Two-step process where first you extract "current preferences" and then "extrapolate" them is likely not how this works, so positing that you get the final preferences somehow starting from the brains is weaker (and correspondingly better, in the absence of knowledge of how this is done).

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T22:03:16.531Z · LW(p) · GW(p)

I agree that the two-step process may very well not work. This is an extremely weak and preliminary result. There's a lot more hacking at the edges to be done.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-23T22:43:59.091Z · LW(p) · GW(p)

I agree that the two-step process may very well not work. This is an extremely weak and preliminary result.

What are you referring to by "this" in the second sentence? I don't think there is a good reason to posit the two-step process, so if this is what you refer to, what's the underlying result, however weak and preliminary?

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T22:49:46.185Z · LW(p) · GW(p)

By "this" I meant the content of the OP about the three systems that contribute to choice.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-23T22:55:29.732Z · LW(p) · GW(p)

OK, in that case I'm confused, since I don't see any connection between the first and the second sentences...

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-23T22:59:08.333Z · LW(p) · GW(p)

Let me try again:

Two-step process = (1) Extract preferences, (2) Extrapolate preferences. This may not work. This is one reason that this discovery about three valuation systems in the brain is so weak and preliminary for the purposes of CEV. I'm not sure it will turn out to be relevant to CEV at all.

Replies from: Vladimir_Nesov, pjeby

↑ comment by Vladimir_Nesov · 2012-01-23T23:31:16.259Z · LW(p) · GW(p)

I see, so the two-step thing acts as a precondition. Is it right that you are thinking of descriptive idealization/analysis of human brain as a path that might lead to definition of "current" (extracted) preferences, which is then to be corrected by "extrapolation"? If so, that would clarify for me your motivation for hoping to get anything FAI-relevant out of neuroscience: extrapolation step would correct the fatal flaws of the extraction step.

(I think extrapolation step (in this context) is magic that can't work, and instead analysis of human brain must extract/define the right decision problem "directly", that is formally/automatically, without losing information during descriptive idealization performed by humans, which any object-level study of neuroscience requires.)

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-24T00:37:02.579Z · LW(p) · GW(p)

Extraction + extrapolation is one possibility, though at this stage in the game it still looks incoherent to me. But sometimes things look incoherent before somebody smart comes along and makes them coherent and tractable.

Another possibility is that an FAI uploads some subset of humans and has them reason through their own preferences for a million subjective years and does something with their resulting judgments and preferences. This might also be basically incoherent.

Another possibility is that a single correct response to preferences falls out of game theory and decision theory, as Drescher attempts in Good and Real. This might also be incoherent.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-24T00:58:43.738Z · LW(p) · GW(p)

In these terms, the plan I see as the most promising is that the correct way of extracting preferences from humans that doesn't require further "extrapolation" falls out of decision theory.

(Not sure what you meant by Drescher's option (what's "response to preferences"?): does the book suggest that it's unnecessary to use humans as utility definition material? In any case, this doesn't sound like something he would currently believe.)

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-24T01:03:23.828Z · LW(p) · GW(p)

As I recall, Drescher still used humans as utility definition material but thought that there might be a single correct response to these utilities — one which falls out of decision theory and game theory.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-24T01:19:26.892Z · LW(p) · GW(p)

What's "response to utilities" (in grandparent you used "response to preferences" which I also didn't understand)? Response of what for what purpose? (Perhaps, the right question is about what you mean by "utilities" here, as in extracted/descriptive or extrapolated/normative.)

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-24T07:28:23.653Z · LW(p) · GW(p)

Response of what for what purpose?

Yeah, I don't know. It's kind of like asking what "should" or "ought" means. I don't know.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-24T13:40:58.170Z · LW(p) · GW(p)

No, it's not a clarifying question about subtleties of that construction, I have no inkling of what you mean (seriously, no irony), and hence fail to parse what you wrote (related to "response to utilities" and "response to preferences") at the most basic level. This is what I see in the grandparent:

Drescher still used humans as utility definition material but thought that there might be a single correct borogove — one which falls out of decision theory and game theory.

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-25T01:51:46.960Z · LW(p) · GW(p)

For our purposes, how about...

Drescher still used humans as utility definition material but thought that there might be a single, morally correct way to derive normative requirements from values — one which falls out of decision theory and game theory.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-25T02:16:48.570Z · LW(p) · GW(p)

Still no luck. What's the distinction between "normative requirements" and "values", in what way are these two ideas (as intended) not the same?

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-25T06:09:34.764Z · LW(p) · GW(p)

Suppose that by "values" in that sentence I meant something similar to the firing rates of certain populations of neurons, and by "normative requirements" I meant what I'd mean if I had solved metaethics.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-25T10:05:12.964Z · LW(p) · GW(p)

Then that would refer to the "extrapolation" step (falling out of decision theory, as opposed to something CEV-esque), and assume that the results of an "extraction" step are already available, right? Does (did) Drescher hold this view?

Replies from: lukeprog

↑ comment by lukeprog · 2012-01-25T14:03:32.721Z · LW(p) · GW(p)

From what I meant, it needn't assume that the results of an extraction step are already available, and I don't recall Drescher talking in so much detail about it. He just treats humans as utility material, however that might work.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-01-25T17:37:17.173Z · LW(p) · GW(p)

OK, thanks! That would agree with my plan then.

(In general, it's not clear in what ways descriptive "utility" can be more useful than original humans, or what it means as "utility", unless it's already normative preference, in which case it can't be "extrapolated" any further. "Extrapolation" makes more sense as a way of constructing normative preference from something more like an algorithm that specifies behavior, which seems to be CEV's purpose, and could then be seen as a particular method of extraction-without-need-for-extrapolation.)

↑ comment by pjeby · 2012-01-23T23:07:15.520Z · LW(p) · GW(p)

I think you've also missed the possibility that all three "systems" might just be the observably inconsistent behavior of one system in different edge cases, or at least that the systems are far more entangled and far less independent than they seem.

(I think you may have also ignored the part where, to the extent that the model-based system has values, they are often more satisficing than maximizing.)

comment by AspiringKnitter · 2012-01-24T20:33:31.422Z · LW(p) · GW(p)

If I understand this correctly, then the model-based system and the model-free system sound like inside and outside views.

Replies from: Manfred, lessdazed, JoachimSchipper

↑ comment by Manfred · 2012-01-25T15:40:52.219Z · LW(p) · GW(p)

Although in this case the "outside view" can't learn from anybody else's mistakes, it always has to make them itself.

↑ comment by lessdazed · 2012-01-25T04:03:32.919Z · LW(p) · GW(p)

I agree.

Whoever downvoted this should have said why they disagreed if they did.

↑ comment by JoachimSchipper · 2012-01-25T12:40:56.151Z · LW(p) · GW(p)

My inside view already feels pretty probabilistic, actually. (I suspect LW-reading mathematicians are not a very good model of the average human, though.)

comment by Htarlov (htarlov) · 2025-02-09T00:14:41.568Z · LW(p) · GW(p)

Part of the animal nature, including humans, is to crave novelty and surprise and avoid boredom. This is pretty crucial to the learning process in a changing and complex environment. Humans have multi-level drives, and not all of them are well-targeted on specific goals or needs.

It is very visible in small children. Some people with ADHD, like me, have a harder time regulating themself well and this is also especially visible for us, even when being adult. I know exactly what I should be doing. This is one thing. I also may feel hungry. That's another thing. But still, I may indulge in doing a third thing instead - something that satiates my need for stimulation and novelty (most often for me this means gaining some knowledge or understanding - I often fell into reading and thinking about rabbit holes of topics, that have hardly any real-life use, and that I can hardly do something about). Something not readily useful in terms of goal seeking, but generating some interesting possibilities long-term. In other words - exploration without targeted purpose.

Craving for novelty and surprise and avoidance of boredom is another element that in my opinion should be included.

comment by [deleted] · 2012-08-05T06:21:51.205Z · LW(p) · GW(p)

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and

A question, probably silly: Suppose you calculate what a person would do given every possible configuration of sensory inputs, and then construct a utility function that returns one if that thing is done and zero otherwise. Can't we then say that any deterministic action-taking thing acts according to some utility function?

Or, even more trivially, just let the utility be constant. Then any action maximizes utility.

Edit: If you're using utility functions to predict actions, then the constant utility function is like a maximum entropy prior, and the "every possible configuration" thing is like a hypothesis that simply lists all observations without positing some underlying pattern, so it would eventually get killed off by being more complicated than hypotheses that actually "compress" the evidence.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2012-08-05T11:39:25.039Z · LW(p) · GW(p)

A question, probably silly: Suppose you calculate what a person would do given every possible configuration of sensory inputs, and then construct a utility function that returns one if that thing is done and zero otherwise. Can't we then say that any deterministic action-taking thing acts according to some utility function?

No, although this idea pops up often enough that I have given it a name: the Texas Sharpshooter Utility Function.

There are two things glaringly wrong with it. Firstly, it is not a utility function in the sense of VNM (proof left as an exercise). Secondly, it does not describe how anything works -- it is purely post hoc (hence the name).

comment by gaffa · 2012-01-23T22:41:57.200Z · LW(p) · GW(p)

As a first reaction (and without being read up on the details), I'm very skeptical. Assuming these three systems are actually in place, I don't see any convincing reason why any one of them should be trusted in isolation. Natural selection has only ever been able to work on their compound output, oblivious to the role played by each one individually and how they interact.

Maybe the "smart" system has been trained to assign some particular outcome a value of 5 utilons, whereas we would all agree that it's surely and under all circumstances worth more than 20, because as it happens throughout evolution one of the other "dumb" systems has always kicked in and provided the equivalent of at least 15 utilons. If you then extract the first system bare and naked, it might deliver some awful outputs.

Replies from: mfb, endoself

↑ comment by mfb · 2012-02-05T18:14:40.384Z · LW(p) · GW(p)

As I understand it, the first system should be able to predict the result of the other two - if the brain knows a bit about how brains work.

While I don't know if the brain really has three different systems, I think that the basic idea is true: The brain has the option to rely on instincts, on "it worked before", or on "let's make a pro/contra list" - this includes any combination of the concepts.

The "lower" systems evolved before the "higher" ones, therefore I would expect that they can work as a stand-alone system as well (and they do in some animals).

↑ comment by endoself · 2012-01-24T04:09:52.038Z · LW(p) · GW(p)

I'm not familiar with the theory beyond what Luke has posted, but I think only one system is active at a time, so there is no summation occurring. However, we don't yet know what determines which system makes a particular decision or how these systems are implemented, so there definitely could be problems isolating them.

comment by Deanushka · 2012-02-06T22:31:15.897Z · LW(p) · GW(p)

Just some initial thoughts,

I do understand that these statements are broad generalisations for what really does occur though the premise is that a successful choice would be made from wieighting options provided from the scenarios.

As with genetics and other systems the beneficial error scenario which can be described in situations such as a miskeyed note on a keyboard leading to a variation of the sequence that is favourable seems excluded from these scenarios.

Improvisation based on self introduced errors may also be a core to these utilities being able to evolve reason.

Model-based system: Figure out what's going on, and what actions maximize returns, and do them.

Model-free system: Do the thingy that worked before again!

Pavlovian system: Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.

comment by mfb · 2012-01-29T17:37:31.635Z · LW(p) · GW(p)

I think that you can keep up the utility function a bit longer if you add the costs of thinking to it - required time and energy, and maybe aversion of thinking about it. "I could compare these two items in the supermarket for 30 minutes and finally find out which product is better - or I could just ignore the new option and take the same thing as last time". It can be the perfectly rational option to just stick with something which worked before.

It is also rational to decide how much time you invest to decide something (and if there is a lot of money involved, this is usually done). If the time for a decision is not enough to build and use a model, you fall back to more "primitive" methods. In fact, most of the everyday decisions have to be done like that. Each second, you have several options available, and no possibility to re-think about all of them every time.

We need all 3 systems for our life. The interesting thing is just to decide which system is useful for which decision and which time it should get. Look at it from a higher perspective, and you can get a well-defined utility function for a brain which has access to these systems to evaluate things.

comment by Dmytry · 2012-01-26T22:51:27.440Z · LW(p) · GW(p)

Okay, which system decides which way the rat should turn when rat is navigating a maze? A cat doing actual path-finding on complex landscape? (which is surprisingly hard to do if you are coding a cat AI. Path finding, well, it is rather 'rational' in the sense that animals don't walk into the walls and the like) A human navigating a maze with a map to get food? A cat doing path finding avoiding a place where the cat had negative experience? ("conditioning").

It seems to me that those 3 'systems', if there are such 3 systems, aren't interacting in the way that article speaks of.

comment by TheOtherDave · 2012-01-23T23:40:02.936Z · LW(p) · GW(p)

At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system's preferences as representing "my values," and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.

At a glance, I might be more comfortable embracing an extrapolation of the combination of the model-based system's preferences and the Pavlovian system's preferences.

Admittedly, a first step in extrapolating the Pavlovian system's preferences might be to represent its various targets as goals in a model, thereby leaving the extrapolator with a single system to extrapolate, but given that 99% of the work takes place after this point I'm not sure how much I care. Much more important is to not lose track of that stuff accidentally.

comment by timtyler · 2012-01-25T01:48:08.769Z · LW(p) · GW(p)

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice.

Er, I don't think so. To quote from here:

Utility maximisation is a general framework which is powerful enough to model the actions of any computable agent. The actions of any computable agent - including humans - can be expressed using a utility function. This was spelled out by Dewey in a 2011 paper titled: "Learning What to Value" - in his section about "O-Maximisers".

Some argue that humans have no utility function. However, this makes little sense: all computable agents have utility functions. The human utility function may not be easy to write down - but that doesn't mean that it doesn't exist.

Replies from: JoachimSchipper, Manfred

↑ comment by JoachimSchipper · 2012-01-25T12:45:34.664Z · LW(p) · GW(p)

Why would this necessarily be true? Somewhere in mind-design-space is a mind (or AI/algorithm) that confidently asserts A > B, B > C and C > A. (I'm not sufficiently versed in the jargon to know whether this mind would be an "agent", though - most minds are not goal-seeking in any real sense of the word.)

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T12:49:51.626Z · LW(p) · GW(p)

That mind would have some associated behaviour and that behaviour could be expressed by a utility function (assuming computability - which follows from the Church–Turing–Deutsch principle).

Navel gazing, rushing around in circles, burning money, whatever - all have corresponding utility functions.

Dewey explains why in more detail - if you are prepared to follow the previously-provided link from here.

Replies from: JoachimSchipper

↑ comment by JoachimSchipper · 2012-01-25T13:53:40.350Z · LW(p) · GW(p)

I've taken a look at the paper. If "outcomes" are things like "chose A", "chose B" or "chose C", the above mind is simply not an O-maximizer: consider a world with observations "I can choose between A and B/B and C/C and A" (equally likely, independent of any past actions or observations) and actions "take the first offered option" or "take the second offered option" (played for one round, for simplicity, but the argument works fine with multiple rounds); there is no definition of U that yields the described behaviour. (I'm aware that the paper asserts that "any agents [sic] can be written in O-maximizer form", but note that the paper may simply be wrong. It's clearly an unfinished draft, and no argument or proof is given.)

If outcomes are things like "chose A given a choice between A and B", which is not clear to me from the paper, then my mind is indeed an O-maximizer (that is, there is a definition of U such that an O-maximizer produces the same outputs as my mind). However, as I understand it, you have also encoded any cognitive errors in the utility function: if a mind can be Dutch-booked into a undesirable state, the associated O-maximizer will have to act on a U function that values this undesirable state highly if it comes about as a result of being Dutch-booked. (Remember, the O-maximizer maximizes U and behaves like the original mind.) As an additional consideration, most decision/choice theory seems to assume a ranking of outcomes, not (path, outcome) pairs.

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T15:30:01.462Z · LW(p) · GW(p)

I've taken a look at the paper. If "outcomes" are things like "chose A", "chose B" or "chose C", the above mind is simply not an O-maximizer: consider a world with observations "I can choose between A and B/B and C/C and A" (equally likely, independent of any past actions or observations) and actions "take the first offered option" or "take the second offered option" (played for one round, for simplicity, but the argument works fine with multiple rounds); there is no definition of U that yields the described behaviour.

What?!? You haven't clearly specified the behaviour of the machine. If you are invoking an uncomputable random number generator to produce an "equally likely" result then you have an uncomputable agent. However, there's no such thing as an uncomputable random number generator in the real world. So: how is this decision actually being made?

I'm aware that the paper asserts that "any agents [sic] can be written in O-maximizer form", but note that the paper may simply be wrong. It's clearly an unfinished draft, and no argument or proof is given.

It applies to any computable agent. That is any agent - assuming that the Church–Turing–Deutsch principle is true.

The argument given is pretty trivial. If you doubt the result, check it - and you should be able to see if it is correct or not fairly easily.

Replies from: JoachimSchipper

↑ comment by JoachimSchipper · 2012-01-25T16:57:55.343Z · LW(p) · GW(p)

The world is as follows: each observation x_i is one of "the mind can choose between A and B", "the mind can choose between B and C" or "the mind can choose between C and A" (conveniently encoded as 1, 2 and 3). Independently of any past observations (x_1 and the like) and actions (x_1 and the like), each of these three options is equally likely. This fully specifies a possible world, no?

The mind, then, is as follows: if the last observation is 1 ("A and B"), output "A"; if the last observation is 2 ("B and C"), output "B"; if the last observation is 3 ("C and A"), output "C". This fully specifies a possible (deterministic, computable) decision procedure, no? (1)

I argue that there is no assignment to U("A"), U("B") and U("C") that causes an O-maximizer to produce the same output as the algorithm above. Conversely, there are assignments to U("1A"), U("1B"), ..., U("3C") that cause the O-maximizer to output the same decisions as the above algorithm, but then we have encoded our decision algorithm into the U function used by the O-maximizer (which has its own issues, see my previous post.)

(1) Actually, the definition requires the mind to output something before receiving input. That is a technical detail that can be safely ignored; alternatively, just always output "A" before receiving input.

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T18:13:30.796Z · LW(p) · GW(p)

I argue that there is no assignment to U("A"), U("B") and U("C") that causes an O-maximizer to produce the same output as the algorithm above.

...but the domain of a utility function surely includes sensory inputs and remembered past experiences (the state of the agent). You are trying to assign utilities to outputs.

If you try and do that you can't even encode absolutely elementary preferences with a utility function - such as: I've just eaten a peanut butter sandwich, so I would prefer a jam one next.

If that is the only type of utility function you are considering, it is no surprise that you can't get the theory to work.

↑ comment by Manfred · 2012-01-25T15:39:54.212Z · LW(p) · GW(p)

The point is about how humans make decisions, not about what decisions humans make.

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T18:30:35.544Z · LW(p) · GW(p)

The point is about how humans make decisions, not about what decisions humans make.

Er, what are you talking about? Did you not understand what was wrong with Luke's sentence? Or what are you trying to say?

Replies from: Manfred

↑ comment by Manfred · 2012-01-25T19:39:29.229Z · LW(p) · GW(p)

The way I know to assign a utility function to an arbitrary agent is to say "I assign what the agent does utility 1, and everything else utility less than one." Although this "just so" utility function is valid, it doesn't peek inside the skull - it's not useful as a model of humans.

What I meant by "how humans make decisions" is a causal model of human decision-making. The reason I wouldn't call all agents "utility maximizers" is because I want utility maximizers to have a certain causal structure - if you change the probability balance of two options and leave everything else equal, you want it to respond thus. As gwern recently reminded me by linking to that article on Causality, this sort of structure can be tested in experiments.

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T20:40:54.954Z · LW(p) · GW(p)

Although this "just so" utility function is valid, it doesn't peek inside the skull - it's not useful as a model of humans.

It's a model of any computable agent. The point of a utility-based framework capable of modelling any agent is that it allows comparisons between agents of any type. Generality is sometimes a virtue. You can't easily compare the values of different creatures if you can't even model those values in the same framework.

The reason I wouldn't call all agents "utility maximizers" is because I want utility maximizers to have a certain causal structure - if you change the probability balance of two options and leave everything else equal, you want it to respond thus.

Well, you can define your terms however you like - if you explain what you are doing. "Utility" and "maximizer" are ordinary English words, though.

It seems to be impossible to act as though you don't have a utility function, (as was originally claimed) though. "Utility function" is a perfectly general concept which can be used to model any agent. There may be slightly more concise methods of modelling some agents - that seems to be roughly the concept that you are looking for.

So: it would be possible to say that an agent acts in a manner such that utility maximisation is not the most parsimonious explanation of its behaviour.

Replies from: Manfred

↑ comment by Manfred · 2012-01-26T01:23:58.630Z · LW(p) · GW(p)

Although this "just so" utility function is valid, it doesn't peek inside the skull - it's not useful as a model of humans.

It's a model of any computable agent.

Sorry, replace "model" with "emulation you can use to predict the emulated thing."

There may be slightly more concise methods of modelling some agents - that seems to be roughly the concept that you are looking for.

I'm talking about looking inside someone's head and finding the right algorithms running. Rather than "what utility function fits their actions," I think the point here is "what's in their skull?"

Replies from: timtyler

↑ comment by timtyler · 2012-08-05T12:30:12.242Z · LW(p) · GW(p)

I'm talking about looking inside someone's head and finding the right algorithms running. Rather than "what utility function fits their actions," I think the point here is "what's in their skull?"

The point made by the O.P. was:

Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions)

It discussed actions - not brain states. My comments were made in that context.

The Human's Hidden Utility Function (Maybe)

Contents

91 comments