Anchoring vs Taste: a model

post by Stuart_Armstrong · 2019-02-13T19:03:08.851Z · LW · GW · None comments


  Anchoring or taste
    into the agent
    human-like agent
    non-human agent
No comments

Here I'll develop my observation [LW · GW] that anchoring bias is formally similar to taste based preferences, and develop some more formalism for learning the values/preferences/reward functions of a human.

Anchoring or taste

An agent (think of them as a simplified human) confronts one of two scenarios:

In both cases, will spend £1 for the bar (£0.01/no nuts) or £3 (£100/nuts).

We want to say that scenario I is due to anchoring bias, while scenario II is due to taste differences. Can we?

Looking into the agent

We can't directly say anything about just by their actions, of course - even with simplicity priors. But we can make some assumptions if we look inside their algorithm, and see how they model [LW · GW] the situation.

Assume that 's internal structure consists of two pieces: a modeller and an assessor . Any input is streamed to both and . Then can interrogate by sending an internal variable , receives another variable in return, and then outputs .

In pictures, this looks like this, where each variable has been indexed by the timestep at which it is transmitted:

Here the input decomposes in (the movie) and (the question). Assume that these variables are sufficiently well grounded [LW · GW] that when I describe them ("the modeller", "the movie", "the key variables", and so on), these descriptions mean what they seem to.

So the modeller will construct a list of all the key variables, and pass these on to the assessor to get an idea of the price. The price will return in , and then will simply output that value as .

A human-like agent

First we'll design to look human-like. In scenario I the modeller will pass to the assessor - only the question "how much is a bar of chocolate worth?" will be passed on (in a real world scenario, more details about what kind of chocolate it is would be included, but let's ignore those details here). The answer will be £1 or £3, as indicated above, dependent on (which is also an input into ).

In scenario II, the modeller will pass on where is a boolean that indicates whether the chocolate contains nuts or not. The response will be £1 if (false) or £3 if (true).

Can we now say that anchoring is a bias but the taste of nuts is a preference? Almost, we're nearly there. To complete this, we need to make the normative assumption [LW · GW]:

Now we can say that anchoring is a bias (because the variable that changes the assessment, the movie, affects but is not passed on via ), while taste is likely a preference (because the key taste variable is passed on by ).

A non-human agent

We can also design an with the same behaviour as , but clearly non-human. For , in scenario II, while is scenario I, where is a boolean encoding whether the movie-chocolate was bought for £0.01 or for £100.

In that case, will assess anchoring as a demonstration of preference, while the presence of nuts is clearly an irrational bias. And I'd agree with this assessment - but I wouldn't call a human, for reasons explained here [LW · GW].

None comments

Comments sorted by top scores.