Probability space has 2 metrics

post by Donald Hobson (donald-hobson) · 2019-02-10T00:28:34.859Z · LW · GW · 11 comments

Contents

11 comments

A metric is technically defined as a function from pairs of points to the non negitive reals. With the properties that and and .

Intuitively, a metric is a way of measuring how similar points are. Which points are nearby which others. Probabilities can be represented in several different ways, including the standard range and the log odds . They are related by and and (equations algebraically equivalent)

The two metrics of importance are the baysian metric and the probability metric .

Suppose you have a prior, in log odds, for some proposition. Suppose you update on some evidence that is twice as likely to appear if the proposition is true, to get a posterior, in log odds. Then . The metric measures how much evidence you need to move between probabilities.

Suppose you have a choice of actions, the first action will make an event of utility happen with probability , the other will cause the probability of the event to be . How much should you care. .

The first metric stretches probabilities near 0 or 1 and is uniform in log odds. The second squashes all log odds with large absolute value together, and is uniform in probabilities. The first is used for baysian updates, the second for expected utility calculations.

Suppose an imperfect agent reasoned using a single metric, something in between these two. Some metric function less squashed up than but more squashed than around the ends. Suppose it crudely substituted this new metric into its reasoning processes whenever one of the other two metrics was required.

In decision theory problems, such an agent would rate small differences in probability as more important than they really were when facing probabilities near 0 or 1. From the inside, the difference between no chance and 0.01, would feel far larger than the distance between probabilities 0.46 and 0.47.

The Allais Paradox [LW · GW]

However, the metric is more squashed than , so moving from a 10000:1 odds to 1000:1 odds seems to require less evidence than moving from 10:1 to 1:1. When facing small probabilities, such an agent would perform larger baysian updates than really necessary, based on weak evidence.

Privileging the Hypothesis [LW · GW]

As both of these behaviors correspond to known human biases, could humans be using only a single metric on probability space?

11 comments

Comments sorted by top scores.

comment by rossry · 2019-02-11T06:23:16.236Z · LW(p) · GW(p)

The speculative proposition that humans might only be using one metric rings true and is compellingly presented.

However, I feel a bit clickbaited by the title, which (to me) implies that probability-space has only two metrics (which isn't true, as the later proposition depends on). Maybe consider changing it to "Probability space has multiple metrics", to avoid confusion?

comment by Shmi (shminux) · 2019-02-10T03:13:49.985Z · LW(p) · GW(p)

Note that the closer the probability of something to 0 or to 1, the harder it is evaluate accurately. A simple example: starting with a fair coin and observing a sequence of N heads in a row, what is an unbiased estimate of the coin's bias? Log odds of N heads are -N when starting with a point estimate of a fair coin, which matches the Bayesian updates, so it is reasonable to conclude that the probability of heads is 1-2^(-N), but at the level small enough there are so many other factors that can interfere, the calculation ceases being accurate. Maybe the coin has heads on both sides? Maybe your brain makes you see heads when the coin flip outcome is actually tails? Maybe you are only hallucinating the coin flips? So, if you finally get a tail, reducing the estimated probability of heads, you are able to reject multiple other unlikely possibilities, as well, and it makes sense that one would need less evidence when moving from -N to -N+1 for large N than for small N.

Replies from: Davidmanheim
comment by Davidmanheim · 2019-02-14T10:38:15.499Z · LW(p) · GW(p)

Yes - and this is equivalent to saying that evidence about probability provides Bayesian metric evidence - you need to transform it.

Replies from: shminux
comment by Shmi (shminux) · 2019-02-14T15:39:20.234Z · LW(p) · GW(p)

Could you explain your point further?

comment by Alexei · 2019-02-10T07:47:23.006Z · LW(p) · GW(p)

I don’t think I’ve read this view before, or if I have, I’ve forgotten it. Thanks for writing this up!

comment by Lukas Finnveden (Lanrian) · 2019-02-10T22:01:17.637Z · LW(p) · GW(p)

I think this should have b instead of p:

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2019-02-11T10:55:11.889Z · LW(p) · GW(p)

Fixed, thanks.

comment by Charlie Steiner · 2019-02-10T20:42:49.752Z · LW(p) · GW(p)

Awesome idea! I think there might be something here, but I think the difference between "no chance" and "0.01% chance" is more of a discrete change from not tracking something to tracking it. We might also expect neglect of "one in a million" vs "one in a trillion" in both updates and decision-making, which causes a mistake opposite that predicted by this model in the case of decision-making.

comment by Sniffnoy · 2019-02-10T04:43:13.631Z · LW(p) · GW(p)

I'm pretty sure this point has been made here before, but, hey, it's worth repeating, no? :)

comment by Bucky · 2019-02-11T22:29:45.228Z · LW(p) · GW(p)

I like the theory. How would we test it?

We have a fairly good idea of how people weight decisions based on probabilities via offering different bets and seeing which ones get chosen.

I don't know how much quantification has been done on incorrect Bayesian updates. Could one suggest trades where one is given options one of which has been recommended by an "expert" who has made the correct prediction to a 50:50 question on a related topic x times in a row. How much do people adjust based on the evidence of the expert? This doesn't sound perfect to me, maybe someone else has a better version or maybe people are already doing this research?!

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2019-02-11T22:50:32.220Z · LW(p) · GW(p)

Get a pack of cards in which some cards are blue on both sides, and some are red on one side and blue on the other. Pick a random card from the pile. If the subject is shown one side of the card, and its blue, they gain a bit of evidence that the card is blue on both sides. Give them the option to bet on the colour of the other side of the card, before and after they see the first side. Invert the prospect theory curve to get from implicit probability to betting behaviour. The people should perform a larger update in log odds when the pack is mostly one type of card, over when the pack is 50 : 50.