Book Review: Heuristics and Biases (MIRI course list)

Upon deciding to read the course list, the first book I picked up was Heuristics and Biases. It's a tome of 42 papers in psychology detailing a vast array of human biases. This post constitutes a review of the book in general, as well as a brief summary of each paper.

This review includes a number of biases not covered by the Sequences.

Heuristics and Biases: The Psychology of Intuitive Judgement

Heuristics and Biases cover

This book is dry. Dry like old sand. On Venus. When the sun's out.

I'd never had my mind blown by text so dry before.

Most of the 42 papers introduced studies that revealed cognitive biases. A few other papers summarized and discussed results, proposing models that could explain many biases at once. Each paper came with lots of data.

The book had two main themes. The first was an exploration of heuristics and biases. The second was an argument that biases are not secretly great.

There's a segment of the psychologist population who argue that biases are not actually failures but instead the result of contrived experiments. Researchers are just looking at people wrong.

This book dedicates a handful of papers to tearing such arguments apart (via compelling biases and studies on real world data). There was one particular quote that stuck with me (which I'll have to paraphrase, since I lent the book out a few days ago):

People in the human-optimality camp seem to think there is only one type of error in human judgement, that being the error of psychologists when they design tests.

I was sort of hoping for a paper studying the bias of psychologists as they design tests for biases, just so I could read the fallout.

But I digress. Such arguments didn't interest me much, as I came to the table ready and willing to believe that the brain has many pathological failure modes.

Fortunately, the main focus of the book was the varied ways in which brains fail at simple tasks. Many of the biases within are famous or were discussed in the sequences. In those cases, it was good to see the actual experimental setups and the real data. Many other biases were completely new to me.

The high points of each chapter are summarized below. I've marked the most salient chapters with green numbers (in the table of contents) and green headers (in the text).

Extensional vs Intuitive reasoning: The Conjunction Fallacy in Probability Judgement
Representativeness Revisited: Attribute Substitution in Intuitive Judgement
How Alike Is It? versus How Likely Is It?: A Disjunction Fallacy in Probability Judgements
Imagining Can Heighten or Lower the Perceived Likelihood of Contracting a Disease: The Mediating Effect of Ease of Imagery
The Availability Heuristic Revisited: Ease of Recall and Content of Recall as Distinct Sources of Information
Incorporating the Irrelevant: Anchors in Judgements of Belief and Value.
Putting Adjustment Back in the Anchoring and Adjustment heuristic
Self-Anchoring in Conversation: Why Language Users Do Not Do What They "Should"
Inferential Correction
Mental Contamination and the Debiasing Problem
Sympathetic Magical Thinking: The Contagion and Similarity "Heuristics"
Compatibility Effects in Judgement and Choice
The Weighting of Evidence and the Determinants of Confidence
Inside the Planning Fallacy: The Causes and Consequences of Optimistic Time Predictions
Probability Judgement Across Cultures
Durability Bias in Affective Forecasting
Resistance of Personal Risk Perceptions to Debiasing Interventions
Ambiguity and Self-Evaluation: The Role of Idiosyncratic Trait Definitions in Self-Serving Assesments of Ability
When Predictions Fail: The Dilemma of Unrealistic Optimism
Norm Theory: Comparing Reality to its Alternatives
Counterfactual Thought, Regret, and Superstition: How To Avoid Kicking Yourself
Two Systems of Reasoning
The Affect Heuristic
Individual Differences in Reasoning: Implications for the Rationality Debate?
Support Theory: A Nonextensional Representation of Subjective Probability
Unpacking, Repacking, and Anchoring: Advances in Support Theory
Remarks on Support Theory: Recent Advances and Future Directions
The Use of Statistical Heuristics in Everyday Inductive Reasoning
Feelings as Information: Moods Influence Judgements and Processing Strategies
Automated Choice Heuristics
How Good are Fast and Frugal Heuristics?
Intuitive Politicians, Theologians, and Prosecutors: Exploring the Empirical Implications of Deviant Functionalist Metaphors
The Hot Hand in Basketball: On the Misprediction of Random Sequences
Like Goes With Like: The Role of Representativeness in Erroneous and Pesudo-Scientific Beliefs
When Less is More: Counterfactual thinking and Satisfaction among Olympic Medalists
Understanding Misunderstandings: Social Psychological Perspectives
Assessing Uncertainty in Physical Constants
Do Analysts Overreact?
The Calibration of Expert Judgement: Heuristics and Biases Beyond the Laboratory
Clinical versus Actuarial Judgement
Heuristics and Biases in Application
Theory-Driven Reasoning about Plausible Pasts and Probable Futures in World Politics

1. Extensional vs Intuitive reasoning: The Conjunction Fallacy in Probability Judgement

This chapter mainly covered the conjunction fallacy: the phenomenon where people think a bookish introvert is more likely to be an accountant who plays jazz on the side than someone (in general) who plays jazz on the side.

It also generalized the conjunction fallacy by introducing a "narrative effect": events seem more likely if you are given an example of how they could occur.

For instance, given a description of a scummy employer, people rate "He killed an employee" as less likely than "He killed an employee to stop them from going to the police".

Suggestion: Whenever you're presented with a hypothetical, generate at least one (preferably more) explanations for how it could occur (unpack hypotheticals manually).

2. Representativeness Revisited: Attribute Substitution in Intuitive Judgement

Your intuition's likelihood rating is more a measure of representativeness than of probability.

For example, if you give people five suspects in a murder mystery, with little evidence for each, then people will give low probabilities that each individual committed the crime. If you then give each of them a motive, then people will give higher probabilities for all suspects.

The number that your gut throws out isn't probability (which should be roughly 1/5 for each suspect in both cases). Rather, it's a function of argument strength (how much dirt you have on that one person).

Note that people still rate any two people as equally likely, and that they rate each as 20% likely when asked to analyze all five at once. However, when people rate suspects individually, their likelihood ratings are a good predictor of how much dirt they have on the person (without taking other suspects into account).

Further studies concluded that when you ask people for probability they give you a measure of how well the considered object (one suspect) represents the parent group (suspicious people).

This theme emerges throughout the book in different forms.

3. How Alike Is It? versus How Likely Is It?: A Disjunction Fallacy in Probability Judgements

This was basically an exploration of the conjunction bias and representation theory.

For starters, it flipped the conjunction bias around and examined the disjunction bias (the propensity to rate the likelihood of X higher than the likelihood of X ∨ Y). It found a disjunction bias that was weaker than the conjunction bias, and then explored the bias in the light of representativeness. It exploited the fact that subcategories can appear more representative than parent categories. For example, a sparrow is a representative bird but not a very representative animal. They found that perceived likelihood was better predicted by representativeness than by actual likelihood.

4. Imagining Can Heighten or Lower the Perceived Likelihood of Contracting a Disease: The Mediating Effect of Ease of Imagery

Things that are easy to imagine are judged to be more likely. If you ask people to imagine breaking an arm, they can do it easily, and their reported likelihood of breaking an arm goes up. But if you ask people to imagine an abstract genetic thyroid disease, they'll have trouble picturing it, and their reported likelihood of contracting the disease will go down.

Takeaway 1: Be careful about telling people to picture rare things. If they are hard to imagine, people might end up less worried after trying to imagine themselves as victims.

Takeaway 2: When you're assessing probabilities, be careful not to conflate "easy to picture" with "likely to happen".

Your brain is wont to do this for you, so you should explicitly discount probabilities for easy-to-picture futures and explicitly increase probabilities for hard-to-picture ones.

5. The Availability Heuristic Revisited: Ease of Recall and Content of Recall as Distinct Sources of Information

This study was pretty interesting. They gave people sheets of paper with grayed-out 't's on each line. They asked the subjects to think of words that started with the letter 't'.

A control group was given no further instruction. The facilitated group was told that the gray 't's would facilitate. The inhibited group was told that the gray 't's would inhibit. Afterwards, each person was asked what proportion of words began with 't'.

The facilitated group gave lower proportions than the control, the inhibited group gave higher.

The conclusion was that people used the perceived difficulty as a source of input. All groups found it relatively easy to think of t-words. The facilitated group attributed some of this to the paper (and concluded that there were less t-words). The inhibited group concluded that t-words must be really prevalent, because even with the inhibiting paper they thought of a bunch of t-words. (Note that the paper was the same in all scenarios.)

More generally, it seems that people treat ease-of-recall as a heuristic to measure prevalence-of-item.

This heuristic is often good, but be wary. It's relatively easy to exploit this.

6. Incorporating the Irrelevant: Anchors in Judgements of Belief and Value.

This was a pretty standard introduction to anchoring and adjustment.

7. Putting Adjustment Back in the Anchoring and Adjustment heuristic

Anchoring isn't just something that happens in your environment. You self-anchor. For example, when did George Washington become president? Most people self-anchor at 1776.

8. Self-Anchoring in Conversation: Why Language Users Do Not Do What They "Should"

Your brain doesn't auto-filter irrelevant data.

There was a game where people had a bunch of boxes facing them with little items in them.

A partner sat facing them. Some of the boxes were clearly open, so that the partner could see the contents. Some of the boxes were clearly closed, and this was made obvious to the subject.

---------
| |6| | |
---------
| | |5| |
---------
|4| | | |
---------

In this example, imagine that the 5 and 6 boxes are open but the 4-box is closed. (In other words, partner can see 5 and 6 but not 4, and the subject knows this.)

The partner would give instructions like "Move the low number one square to the left". The subject would reach for the number 4 before self-correcting and reaching for the number 5 (the smallest number that the partner can see).

Moral: your brain won't restrict itself to mutual information. You've got to do it manually. This wasn't too surprising to me until more studies built off of these results:

Subjects who listened to the same voicemail were either told that the intent was sarcastic or sincere. Subjects on both sides thought it equally likely that a third observer would pick up the sarcastic/sincere intent — they failed to control for personal information.

Subjects asked to tap out songs thought that a majority of people would know what song they were tapping out. Listeners guessed about 2% of the time. Tappers blamed this on poor attention.

Another study observed speakers accidentally using information not available to an audience.

The conclusion was basically that the brain automatically uses all of the information available. It requires conscious effort to restrict yourself to only mutual information. See also the Illusion of Transparency and Inferential Distance.

Takeaway: Assume that intentions are opaque. Monitor your conversations to make sure you're restricting yourself to mutual information.

9. Inferential Correction

This paper had three points. First, the Affect Heuristic. If you see a person acting anxious, you're more likely to conclude that they're an "anxious person" instead of that they're in an anxious situation.

The second point is that you're more susceptible to this problem when you're under cognitive load (time pressure, busy with other tasks, etc.)

Third and most terrifying is the study referenced in this part of the sequences — You believe everything you read unless you exert cognitive power to disbelieve things.

In sum, there are many things that your brain does without your consent (jump to conclusions, believe what it reads) that require cognitive effort to undo.

Takaway: Any conclusions drawn while you're distracted should be thrown out. Make sure you're not under cognitive load before deciding something important.

(Also known as the "always masturbate before making relationship decisions" rule.)

10. Mental Contamination and the Debiasing Problem

This chapter was an exploration of debasing. The conclusion is that you have to control your exposure to stimuli that influence your responses. People are notoriously bad at de-biasing themselves. Once you're exposed to biasers (anchors, etc.) you're going to be biased.

A poignant example was the fact that many psychologists who know about the halo effect still fail to grade papers blind, thinking that they can debias themselves.

Moral: Don't be that guy. Grade papers blind. Audition musicians behind a screen. Remove names from résumés. Control your exposure.

11. Sympathetic Magical Thinking: The Contagion and Similarity "Heuristics"

Would you drink a glass of orange juice after I dip a cockroach in it?

Probably not, and understandably so: cockroaches can be dirty.

But would you drink a glass of orange juice after I dip a sterilized cockroach in it?

Your concern in the second case is hard to justify. It seems that humans still have a lot of "magical" thinking in them — touching gross things makes things gross.

Furthermore, it appears that some people root their aversion in things like germ theory, whereas others root their aversion in fear. For example, most people don't want to wear a sweater worn by a person with AIDS. Washing it is sufficient to remove the malaise for some but not others.

12. Compatibility Effects in Judgement and Choice

If you ask people to rank bets then they will weight the odds too heavily. If you ask them to price bets, they'll weight the payoffs too heavily. Generally, compatible units get more weight.

(As a corollary, anchors work much better when they share a unit with the thing being anchored.)

This was simple but surprising.

Recommendation: Find ways to normalize things before assessing them. (For example, normalize bets to risk-adjusted expected payoffs.) If you try to let your gut do things, it will give additional weight to dimensions of the problem that are compatible with the question.

13. The Weighting of Evidence and the Determinants of Confidence

People basically only use argument strength (how much the evidence looks like the hypothesis) when judging likelihood. They completely ignore base rates (the prior probability) and disregard the weight of evidence (how much evidence there is).

Base rate neglect is prevalent. Think back to the representativeness effect: when people analyze five suspects, the suspects seem more suspicious if they have motives. The assessed likelihood that each was the murderer goes up in proportion to the amount of dirt that people have, even if everyone else is just as suspicious. When you ask people to assess the likelihood that the janitor was the murderer, they think about reasons why the janitor is suspicious — without regard for who else is also suspicious.

This neglect is probably the biggest hurdle between most people and Bayesian reasoning. It's the root of doctors believing tests with low false positive rates despite high base rates, and many other deviations from "rational" reasoning.

Argument weight neglect is also prevalent. People act like five coinflips (all heads) are good evidence that a coin is weighted towards heads, despite how small the sample size is. Jokingly, the authors note that people seem to believe in the "law of small numbers": that evidence reflects the data even in small amounts. Experts manage to factor in argument weight, but even they do not give it enough credence.

Your brain treats evidence representationally: it's very good at measuring how well the evidence matches the hypothesis, and very bad at assessing how much the evidence should count for.

Turns out your intuition just doesn't know how to take argument weight or base rates into account. You've got to do it manually.

Suggestion: When you're assessing likelihood, start with your intuitive likelihood. Treat that number as argument strength. Ignore it completely. Force yourself to bring the base rates to mind. Force yourself to consider the amount of evidence, not just the degree to which it looks good. Once you have the base rates and the evidence weight in mind, factor in your original number accordingly.

You have to do this consciously: Your brain just isn't going to do it for you.

14. Inside the Planning Fallacy: The Causes and Consequences of Optimistic Time Predictions

Inside views suck. People's predictions are the same as their best case predictions. And when their predictions fail, people blame it on exceptional circumstances and fail to learn in the general case.

In other words, when people miss a deadline because their computer crashed, they update to think that computer crashes were more likely, but fail to draw the more general conclusion that their estimates were too optimistic.

This paper sowed the seeds for a later revelation about updating both your model and the likelihood that you should switch models.

15. Probability Judgement Across Cultures

This paper basically showed that Asian people (China and Thailand, IIRC) are way more overconfident than American people, even though both groups expect the opposite to be true. It might have had some other points, but I forget. What I took away from this paper is that common knowledge can be very wrong: you've got to go out and collect real data.

16. Durability Bias in Affective Forecasting

You expect good/bad events to make you feel happy/sad for a long time. People expect that disabilities will make them sadder for longer than they actually do. People expect winning the lottery will make them happier for longer than it actually does.

There was also a study showing that durability bias is stronger for bad events (i.e. bad-event-recovery happens faster than good-event-regression), perhaps due to some sort of "psychological immune system" that people deploy to deal with bad things after the fact (look at the bright side, etc.).

17. Resistance of Personal Risk Perceptions to Debiasing Interventions

People are hard to debias, despite many varied interventions.

Seriously, the interventions here were many, varied, and clever. None of them worked.

Moral: Control your exposure.

18. Ambiguity and Self-Evaluation: The Role of Idiosyncratic Trait Definitions in Self-Serving Assesments of Ability

You've probably heard of the "better than average" effect — everyone thinks that they're better than average at driving/leadership/etc. What this paper showed was really cool, though: this effect is due in large part to the fact that everyone defines ambiguous attributes differently.

For example, an organized person might define "good leadership" as organizational skills and the ability to keep things moving, whereas an outgoing person might define "good leadership" as the ability to inspire others and garner loyalty. It should come as no surprise, then, that everybody thinks they're better than average at "leadership".

This paper had a study where half the subjects would define how an ambiguous trait was measured, and the other half would rate themselves according to that measurement. When this was done, the "better than average" effect disappeared.

19. When Predictions Fail: The Dilemma of Unrealistic Optimism

Turns out it's hard to reduce optimistic biases.

This paper explored how optimistic biases can persist, given that you'd expect evolution to have weeded out overconfident people. (It's just you and me, Mr. Tiger.) But as it turns out, overconfidence bias melts away as events draw closer. This paper was a really interesting read.

Basically, it showed how optimism bias is strongest when it's most useful (when it can act as a self-fulfilling prophecy) and disappears when it's harmful (people are often underconfident in the moment).

20. Norm Theory: Comparing Reality to its Alternatives

You think it's worse to miss a plane by 5 minutes than by 30 minutes. Some counterfactual worlds seem closer than others.

21. Counterfactual Thought, Regret, and Superstition: How To Avoid Kicking Yourself

It hurts more for bad things to happen in exceptional cases. It feels worse when someone is killed in an accident while taking the scenic route home for the first time than when they're killed in an accident during their daily routine.

Given that this is the case, superstitions make sense.

Takaway: You will naturally be more outraged when bad things happen in exceptional cases. You probably need to manually scale down concern for the exceptional cases.

22. Two Systems of Reasoning

Introduces System 1 (the subconscious processes that provide your intuitions) and System 2 (the conscious thinker that corrects System 1) and explores their strengths and weaknesses. Notes that all biases can be viewed as two failures: The failure of System 1, who used a heuristic that was not actually applicable, and the failure of System 2 to notice the error and correct it.

23. The Affect Heuristic

Explicitly covered the Affect Heuristic. People were asked to rate bets. Each person would only see one of the below bets. Bet B was consistently rated better than bet A.

Bet A: 20%: Win 9$

Bet B: 20%: Win 9$, 80%: Lose 5¢

A 9$ win looks much better compared to a 5¢ loss than it does without context. This can be used to induce preference reversals.

This chapter felt somewhat redundant at this point in the book.

24. Individual Differences in Reasoning: Implications for the Rationality Debate?

This paper was mostly an argument against people who think that human biases are actually optimal and we're just looking at them wrong. It ruled out things like random error, contrived tests, and bad questions via a clever series of studies.

25. Support Theory: A Nonextensional Representation of Subjective Probability

This paper was really cool. It basically said two things:

Your brain doesn't assign likelihoods to events. It assigns likelihoods to descriptions. Different descriptions of the same event can yield wildly different probabilities.
When you unpack an event, your subjective probabilities are subadditive. In other words, the subjective probability of A is consistently lower than the combined subjective probabilities of all of A's components.

The remainder of the paper formally introduced this mathematical system and showed how it generalizes Representativeness and the Conjunction fallacy.

Takeaway: Normalize events before assessing their probability. This is hard, as it's not always obvious how events should be normalized.

26. Unpacking, Repacking, and Anchoring: Advances in Support Theory

This paper was a deeper exploration of Support Theory (introduced in the preceding chapter). It mostly explored unpacking and repacking: the propensity for an argument to feel more likely as you consider more of its components. (This is a generalization of the narrative effect from earlier.)

Basically, the likelihood you assign for a broad category (1000 people killed by floods in North America next year) is lower than the summed probability of its component parts (the likelihood of a flood caused by a California earthquake, caused by the eruption of Mt. Rainier, etc.).

This bias is difficult to defend against, because people aren't going to ask you to rate both "chance of a flood" and "chance of a flood caused by an earthquake in California" at the same time. Somehow, you've got to know to unpack/repack arguments even when you're asked only one side of the question.

Recommendation: manually try to unpack/repack any scenarios that come your way before giving probability assessments.

If someone asks you to assess the probability that the USA strikes Syria due to chemical weapons use, you should first try to repack this argument (what is the chance that the US strikes Syria?) and unpack this argument (What is the chance they strike Syria due to chemical weapons use by the rebels? What about by the government?) before answering.

27. Remarks on Support Theory: Recent Advances and Future Directions

This chapter felt a bit redundant. It tied more biases into the support theory model. I don't remember anything novel, but something could have slipped my mind. You could skip this chapter.

28. The Use of Statistical Heuristics in Everyday Inductive Reasoning

This paper was another one arguing against the human-optimality camp. Basically, the paper rejected the argument that statistical reasoning isn't applicable to human lives by pointing out that people reason statistically (and do better) when they've been trained in statistics.

29. Feelings as Information: Moods Influence Judgements and Processing Strategies

Happy people are more overconfident. Sad people are more underconfident. However, having people list reasons for their current mood removes the effect of mood upon predictions: priming reasons why your mood is not related to the question at hand helps people remove mood bias.

Takeaway: Before making big decisions, list the reasons why your mood is what it is.

30. Automated Choice Heuristics

People tend to "choose by liking", taking the choice that looks better. This can lead to some serious problems. For example:

I have a die. It has 2 green sides, 1 blue side, 2 red sides, and 1 yellow side. You get to choose between the two bets:

Bet A

Green or Blue: +20$
Red: -10$
Yellow: -10$

Bet B

Green: +20$
Blue: +15$
Red or Yellow: -10$

Many people choose bet B over bet A.

Now unpack the bets:

Bet A

Green: +20$
Blue: +20$
Red: -10$
Yellow: -10$

Bet B

Green: +20$
Blue: +15$
Red: -10$
Yellow: -10$

In this format, it's clear that A dominates. Nobody picks B.

I found this study astonishing.

Moral: Normalize options before choosing between them! Unpack things manually, and stop trusting your brain to intuit probabilities.

31. How Good are Fast and Frugal Heuristics?

Bayesianism is great. Yay Bayesianism. But it's expensive. Turns out that simple algorithms, like choosing according to a few binary attributes, are much faster and almost as good. This helps explain many human biases.

This paper was an interesting read from a "how might by brain be making these failures" perspective, but didn't introduce any new biases.

32. Intuitive Politicians, Theologians, and Prosecutors: Exploring the Empirical Implications of Deviant Functionalist Metaphors

This paper basically argued that certain biases make sense from social perspectives: nobody elects the underconfidant politician. This was interesting, but didn't teach any new biases. Skippable.

33. The Hot Hand in Basketball: On the Misprediction of Random Sequences

Which of these streams was randomly generated?

HTTHTHTHHHTTHTTHTHHT
HHHHHHTTTHHHTTHTTHTH

The answer is #2. I just generated both sequences a few seconds ago, the first by hand and the second at random. I didn't expect it to be quite so perfect an example of the point of this chapter, which was:

People don't know how randomness works. People see "streaks" in purely random data. If most people saw the second sequence as a H=hit, T=miss in a series of basketball free throws, they would conclude that the player was on a "hot streak" in the first part of the sequence.

34. Like Goes With Like: The Role of Representativeness in Erroneous and Pesudo-Scientific Beliefs

This chapter tied homeopathy, "rebirth", and other scientific hokey into representativeness bias. This was pretty redundant at this point in the book. I felt it was fairly skippable.

35. When Less is More: Counterfactual thinking and Satisfaction among Olympic Medalists

Silver medalists feel worse than Bronze medalists. Upon interviewing runners up, studies found them more likely to have thoughts along the lines of "if only…", whereas third placers had more thoughts along the lines of "at least…"

Takeaway: Come in first or third? I'm not sure you can hack your brain to make it more satisfied with second than third.

36. Understanding Misunderstandings: Social Psychological Perspectives

This article covered things like the polarization effect and the false polarization effect. Turns out there's a false polarization effect. I had no idea.

People in group A picture others in group A as extreme, and picture themselves as a moderate:

B--------|--me----A

People in group B have the same image:

B----me--|--------A

In reality, neither group is as extreme as each group thinks:

------B--|--A------

But people argue as if each side is extreme:

B--------|--------A

Even when common ground exists, the false polarization effect can make it very difficult to find.

Takeaway: Don't assume that members of The Other Side are extremist. Search for common ground.

Other neat tidbits:__

One effect of confirmation bias is that I can give conflicting evidence to two opposed groups, and both will leave the room more convinced of their own arguments.
Everybody thinks that the news is biased in favor of the other side.

I've got to watch out for that last one, personally. I get frustrated when people give "obviously wrong" arguments (such as the Chinese Room) any consideration whatsoever. This is something I'm working on.

37. Assessing Uncertainty in Physical Constants

The values of physical constants (in the literature, as decided by panels of physicists) are refined as time goes on. The updated values regularly differ from the old values by 3+ standard deviations (well outside the 99% confidence interval). This happens decade after decade after decade.

Moral of the story: Visualize the ways you could be way off. Use outside views. Increase your error bars.

38. Do Analysts Overreact?

Short version: many people don't understand regression to the mean.

When something does really well, you should expect it to perform more averagely in the future.

When something does really poorly, you should expect it to to perform more averagely in the future.

In general, expect things to perform more averagely in the future.

Moral: Don't get caught up thinking something is really good or really bad just from a few data points. It's likely just at a statistical high/low, and it's likely to regress to the mean.

39. The Calibration of Expert Judgement: Heuristics and Biases Beyond the Laboratory

This paper was staged as somewhat of a coup-de-grace for the humans-are-secretly-optimal group. It studied many experts in their natural environment, using data collected from the wild (not from laboratories). It concluded that experts are consistently poorly calibrated in ways that can't be explained by raw error, but can be explained fairly well by documented biases.

An interesting tidbit here was that overconfidence bias is actually usually overextremity bias: underconfidence when p is below some threshold, overconfidence when it's above.

The scary thing was that even experts with constant and reliable feedback weren't able to debias themselves. It took both feedback and knowledge of biases to approach good calibration.

40. Clinical versus Actuarial Judgement

This chapter was my favorite.

Basically, an algorithmic procedures for classifying psychoses was developed from data using Bayesian analysis and linear regression. The resulting decision rule had a 82% success rate. The average expert had a 69% success rate, with none of them scoring about 75%.

Here's the fun part:

If you give the experts the decision rule, along with the relevant probabilities, they still do worse than the decision rule.

They could just go with the decision rule every time and increase their accuracy. However, they know that the decision rule is sometimes wrong, and so they occasionally choose to override it. In this particular study, not one expert was capable of beating the decision rule even when they got to cheat and use the decision rule's answer.

Moral: shut up and multiply.

41. Heuristics and Biases in Application

This was pretty much a summary of the book. Its major points were that humans are not secretly optimal, that you've just got to know about your biases, and that you should externalize your decision procedures as much as possible.

42. Theory-Driven Reasoning about Plausible Pasts and Probable Futures in World Politics

If Frans Ferdinand hadn't been shot, would World War I still have started? Most experts say yes. Europe was at a boiling point. It was going to occur sooner or later.

Now ask experts to predict something (like the collapse of the Euro in the 90s). Interview the people who predicted wrong. Most of them will point out that it "almost happened".

Distant history is immutable. Near history is mutable.

More generally, the data that your model was trained on is immutable, but the predictions that your model got wrong were mutable. This was just one example of data supporting the chapter's thesis, which was that it's very difficult for people to consider that their entire model is wrong.

Moral of the story: When your predictions were wrong but your model still seems accurate, stop. Just stop. Take a deep breath. It will look like there was a black swan, or an extreme circumstance, or a violated assumption. It will look like your model would have been right, except for some fluke. And maybe that's true.

But your predictions were wrong, and you need to update accordingly. When your predictions are wrong, you shouldn't just patch your model: you should also downgrade the probability that you're using the correct model.

This is one of the scarier things I learned. One of the points made by this paper was that people who value parsimony highly are more likely to defend their models to the death. This is particularly relevant to me, as I value parsimony highly and I have a lot of models that I'm known to defend.

When your model fails, you can't just patch your model — you also have to update the probability that you should be using a different model entirely.

Discussion

This is only the tip of the iceberg: there were many smaller revelations I had while reading this book that I just don't have time to cover.

I've presented a lot of these biases in absolute terms for the sake of brevity. As you might expect, the effects listed above are not binary. Seeing the actual data is illustrative.

One of the biggest reasons it's important to see the actual studies here is because of the gap between data and interpretation. The conclusions of the psychologists might be incorrect. The data may be saying something that the authors missed.

It's important to read experiments with an eye for alternative interpretations. Fortunately, professional psychologists are better at this than me. 95% of the alternate interpretations I cooked up were ruled out by follow-up studies in the next paragraph.

In fact, a large number of follow-up studies ruled out interpretations that hadn't even crossed my mind. I made a little game of it: before reading the follow-up studies, I'd think up alternative interpretations. For every one that was addressed, I got a point. For every one I missed, I lost a point. (Whenever one wasn't addressed, I went back and made sure I understood the study.)

What to read

If you have doubts about the biases mentioned above, I strongly recommend picking up this book and reading chapters corresponding to the biases that you doubt. The book rules out many alternative interpretations and will put many doubts to rest.

If you're really serious in reducing your biases, the whole book is quite useful (except perhaps for chapters 27, 32, and 34). Even if you've already heard about the biases, there's just no substitute for actual data.

If you're already familiar with more famous biases and you've already read the sequences, you'd still do well to read the papers headlined in green: 2, 4, 8, 9, 12, 13, 18, 25, 26, 29, 30, 31, 33, 36, 38, 40, 42. I expect those chapters to contain the most new content for this community.

Altogether, this book was too large and dry for most people. I expect casual rationalists to be fine with high-level summaries.

22 comments

Comments sorted by top scores.

comment by Ben Pace (Benito) · 2013-09-02T17:34:14.198Z · LW(p) · GW(p)

For Lukeprog's decision theory FAQ, he made all of the sections open and close-able. This post would be more read friendly like that :)

Edit: No, Lukeprog didn't. My bad. It doesn't actually look like a thing you can do.

Replies from: CronoDAS, So8res, lukeprog

↑ comment by CronoDAS · 2013-09-02T22:24:42.414Z · LW(p) · GW(p)

At least put a "continue reading" link in the article, so I don't have to scroll through the whole thing to get to the next post on the Main page.

Replies from: Vaniver

↑ comment by Vaniver · 2013-09-03T01:03:34.775Z · LW(p) · GW(p)

This is called a "summary break" in the editor, and the button looks like a horizontal line in between two pages.

↑ comment by So8res · 2013-09-03T23:18:34.980Z · LW(p) · GW(p)

I don't see an easy way to do that with the editor, and unfortunately I don't have the spare time to figure out what the lesswrong editor lets me do (Does it let me run javascript? How was Luke doing it?).

I've added "summary breaks" between each article.

↑ comment by lukeprog · 2013-09-04T01:57:05.519Z · LW(p) · GW(p)

That's not true of the decision theory FAQ. I did it for the version of So You Want to Save the World on my website, but I don't think you can use the same Javascript on Less Wrong.

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2013-09-04T04:37:45.000Z · LW(p) · GW(p)

Huh, that was an unusual misremembering. Cheers!

comment by arundelo · 2013-09-02T17:21:47.860Z · LW(p) · GW(p)

Argument strength is given much, much more credence than argument weight.

Do you have definitions for these terms? Some brief searching didn't help me, but from the context you do give this is what I gathered:

Say we are measuring the strength and weight of the argument, "Alice tested positive for Foo's Disease and therefore has Foo's Disease". The argument's strength increases as the test's false positive rate decreases. The argument's weight increases as the disease's base rate (its frequency in Alices's demographic group) decreases.

Is that about right?

Is argument strength completely independent of base rate? Is argument weight dependent solely on base rate?

Replies from: So8res

↑ comment by So8res · 2013-09-03T15:06:22.810Z · LW(p) · GW(p)

Argument strength is how much the evidence favors your hypothesis (the proportion of heads to tails when you're checking if a coin is weighted).

Argument weight is the number of tests that you've done. (Have you flipped the coin three times, or one hundred?)

People treat 5 tests, all heads as better evidence of a weighted coin than 1000 tests, 55% heads. The latter is actually more indicitive of a weighted coin (assuming sane priors).

Sorry, I should have made this more clear. I'll edit the post.

comment by Sithlord_Bayesian · 2013-09-02T16:19:38.894Z · LW(p) · GW(p)

The values of physical constants are regularly outside of the expected interval by 3+ standard deviations. This goes on for decades.

Moral of the story: Visualize the ways you could be way off. Use outside views. Increase your error bars.

Does this refer to literature values of physical constants, or the values of physical constants as guessed at by participants of psych studies?

Replies from: So8res

↑ comment by So8res · 2013-09-02T16:34:38.536Z · LW(p) · GW(p)

The former. I've edited to make it more clear. Thanks.

comment by Unnamed · 2013-09-02T22:56:44.689Z · LW(p) · GW(p)

The chapters that I've recommended, as being most important & relevant for the purpose of improving human thinking, are 1, 2, 6, 9, 14, 16, 20, 21, 22, 23, 29, 40.

Interestingly, the overlap between my recommendations and So8res's seems to be at chance levels; apparently our standards are uncorrelated.

Replies from: So8res

↑ comment by So8res · 2013-09-03T08:03:30.346Z · LW(p) · GW(p)

Note that my chapter list was geared towards the LW community and people already familiar with the sequences. I should have made that more explicit. My list had the chapters that surprised me most / taught me something new. I would have put forth a very different list if I was ordering them by usefulness to newcomers. I assumed that the community is already familiar with conjunction bais, the affect heuristic, the anchoring effect, and the other big names.

I may be suffering from false consensus; I'd be interested to hear others' thoughts.

comment by CronoDAS · 2013-09-02T22:23:09.793Z · LW(p) · GW(p)

It's hard not to neglect base rates when you don't know them. For example, considering the following "classic" example?:

[describes the stereotype of an engineer]

This person is a graduate student. Is it more likely that this person is studying psychology or engineering?

If I don't know whether or not more graduate students are studying psychology than engineering, I can't use the base rate to come to the conclusion that the person is more likely to be an "atypical" psychology student than a "typical" engineering student.

(Does anyone actually know the base rate in this case?)

Replies from: Vaniver

↑ comment by Vaniver · 2013-09-03T00:59:34.573Z · LW(p) · GW(p)

(Does anyone actually know the base rate in this case?)

Guess: Engineering is at least twice as likely, but at large state schools with significant engineering programs it's more like 5 or 10 times as likely.

Looking it up at my school: about 4 times as many mechanical engineers as psychology graduate students, and that's not counting electrical, chemical, petroleum, or various other forms of engineering (which are smaller than mechanical and don't advertise their numbers as prominently). (I go to a large state school with a significant engineering program.)

Looking it up nationally: it's easiest to track PhDs granted, and we see in 2010 there were 7,552 engineering PhDs granted and 3,421 psychology PhDs granted. This doesn't quite track the number of graduate students, since I suspect the proportion of master's degrees is higher in engineering (and the average PhD duration might be different).

It's also worth pointing out that if you don't know the base rates, you can often look them up in real-world situations, but remembering that you need to look them up is generally the hard part.

Replies from: CronoDAS

↑ comment by CronoDAS · 2013-09-03T05:10:58.263Z · LW(p) · GW(p)

Now that I think about it, I think it was education that was in the classic example, not psychology.

comment by itaibn0 · 2013-09-02T21:03:36.391Z · LW(p) · GW(p)

I have already asked about this, but it seems appropriate to re-ask it here. Chapter 22 seems like it has the information I'm looking for. Can anyone summarize what evidence it gives for the two systems model? The reason I'm asking is that I have doubts about the model but could have easily missed something.

Replies from: Unnamed, CronoDAS

↑ comment by Unnamed · 2013-09-02T22:48:33.716Z · LW(p) · GW(p)

For a summary, try Kahneman's 2003 paper, A perspective on judgment and choice: Mapping bounded rationality.

Replies from: itaibn0

↑ comment by itaibn0 · 2013-09-02T23:13:32.645Z · LW(p) · GW(p)

Thanks. I'll try to read that.

Added: After read a large portion of it, I'm updating in favor of my original conclusion. The paper describes many other things besides the two-systems model, but when it mentions the model it mostly describes how to fit various results into this framework, not why the framework is valid. They talk about how slow, deliberative thinking is different from fast, intuitive thinking, but they don't address what I consider to be the main contentious issue: weather these are natural categories. I conclude that talk of "System 1" and "System 2" in Less Wrong is more shibboleth than insight.

↑ comment by CronoDAS · 2013-09-02T22:24:05.221Z · LW(p) · GW(p)

The book Thinking, Fast and Slow is about the model...

comment by Adele_L · 2013-09-02T16:58:23.934Z · LW(p) · GW(p)

There probably is some reason why each particular bias or heuristic exists (since we are evolved beings). It must have hit some sweet spot of being beneficial in most situations in the ancestral environment, and being easy to compute given the already existing structures in the brain. So I thought it would be interesting to try steelmanning these as I read the list, and this turned out to be pretty easy for most of them.

For example, with number 4, it seems like the main problem here is that you just don't have a very complete causal model of your body. When you do have a good causal model, this heuristic seems to just be an instance of Occam's razor. This also suggests a potential remedy - first make a good causal model of the situation (which you should be doing anyway if it is important), then make sure that when you imagine possible futures, it goes through the model you developed, and doesn't shortcut through preconceived notions or intuitions. And then, this heuristic should serve you well.

comment by Vaniver · 2013-09-02T16:37:48.563Z · LW(p) · GW(p)

More generally, the data that your model was trained on is immutable, but the predictions that your model got wrong were immutable.

I suspect the second "immutable" should be "mutable."

Replies from: So8res

↑ comment by So8res · 2013-09-02T16:40:51.541Z · LW(p) · GW(p)

Fixed, thanks.