[SEQ RERUN] Availability

post by MinibearRex · 2011-08-19T04:29:51.999Z · LW · GW · Legacy · 1 comments

Contents

1 comment

Today's post, Availability was originally published on 06 September 2007. A summary (taken from the LW wiki):

 

Availability bias is a tendency to estimate the probability of an event based on whatever evidence about that event pops into your mind, without taking into account the ways in which some pieces of evidence are more memorable than others, or some pieces of evidence are easier to come by than others. This bias directly consists in considering a mismatched data set that leads to a distorted model, and biased estimate.


Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was Absurdity Heuristic, Absurdity Bias, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

1 comments

Comments sorted by top scores.

comment by MoreOn · 2011-08-19T19:57:47.391Z · LW(p) · GW(p)

Subjects thought that accidents caused about as many deaths as disease.

Lichtenstein et aliōrum research subjects were 1) college students and 2) members of a chapter of the League of Women Voters. Students thought that accidents are 1.62 times more likely than diseases, and league members thought they were 11.6 times more likely (geometric mean). Sadly, no standard deviation was given. The true value is 15.4. Note that only 57% and 79% of students and league members respectively got the direction right, which further biased the geometric average down.

There were some messed up answers. For example, students thought that tornadoes killed more people than asthma, when in fact asthma kills 20x more people than tornadoes. All accidents are about as likely as stomach cancer (well, 1.19x more likely), but they were judged to be 29 times more likely. Pairs like these represent a minority, and subjects were generally only bad at guessing which cause of death was more frequent when the ratio was less than 2:1. These are the graphs from the paper.

The following excerpt is from Judged Frequency Of Lethal Events by Lichtenstein, Slovic, Fischhoff, Layman and Combs.

Instructions. The subjects' instructions read as follows:

Each item in part one consists of two different possible causes of death. The question you are to answer is: Which cause of death is more likely? We do not mean more likely for you, we mean more likely in general, in the United States.

Consider all the people now living in the United States—children, adults, everyone. Now supposing we randomly picked just one of those people. Will that person more likely die next year from cause A or cause B ? For example: Dying in a bicycle accident versus dying from an overdose of heroin. Death from each cause is remotely possible. Our question is, which of these two is the more likely cause of death?

For each pair of possible causes of death, A and B, we want you to mark on your answer sheet which cause you think is MORE LIKELY. Next, we want you to decide how many times more likely this cause of death is, as compared with the other cause of death given in the same item. The pairs we use vary widely in their relative likelihood. For one pair, you may think that the two causes are equally likely. If so, you should write the number 1 in the space provided for that pair. Or, you may think that one cause of death is 10 times, or 100 times, or even a million times as likely as the other cause of death. You have to decide: How many times as likely is the more likely cause of death? Write the number in the space provided. If you think it's twice as likely, write 2. If it's 10 thousand times as likely, write 10,000, and so forth.

There were more instructions about relative likelihoods and scales. And there was a glossary to help the people understand some categories.

All accidents: includes any kind of accidental event; excludes diseases and natural disasters (floods, tornadoes, etc.).

All cancer: includes leukemia.

Cancer of the digestive system: includes cancer of stomach, alimentary tract, esophagus, and intestines.

Excess cold: freezing to death or death by exposure.

Nonvenomous animal: dogs, bears, etc.

Venomous bite or sting: caused by snakes, bees, wasps, etc.

Note that there was nothing about “old age” anywhere. There is no such thing as “death by old age,” but I’ll risk generalizing from my own example to say that some people think there is. And even those who know there isn’t might think, despite the instructions, “Oh, darnit, I forgot that old people count, too.”

I wish I’d tested myself BEFORE reading the correct answer. As near as I could tell, I would’ve been correct about homicide vs. suicide, but wrong about diseases vs. accidents (“Old people count, too!” facepalm). I wouldn’t even bother guessing the relative frequency. I didn’t have a clue.

When I need to know the number of square feet in an acre, or the world population it takes me seconds to get from the question to the answer. I dutifully spent ~20 minutes googling the CDC website, looking for this. It wasn’t even some heroic effort, but it’s not something I, or most other people, would casually expend on every question that starts with, “Huh, I wonder….” (we should, but we don’t).

As for what I found: I dare you, click on my link and see table 9. (http://www.cdc.gov/NCHS/data/nvsr/nvsr58/nvsr58_19.pdf). Did you? If you did, you would’ve seen that Zubon2 was right in this comment. Accidents win by quite a margin in the 15-44 demographic. I couldn’t find 1978 data, but I’d expect it to be similar (Lichtenstein’s et al tables are no help because they pool all age groups).

I spent the last two hours looking at these tables. Ask me anything! … I won’t be able to answer. Unless I have the CDC tables in front of me, I might not even do much better on Lichtenstein et aliōrum questionnaire than a typical subject (well, at least, I know tornadoes have frequency; measles doesn’t—I’ll get that question right). I suppose that people who haven’t looked at the CDC table are getting all of their information from fragmented reports like “Drive safely! Traffic accidents is the leading cause of death among teenagers who !” or “Buy our drug! is the leading cause of death in over 55!” or “5-star exhaust pipe crash safety rating!” Humans aren’t good at integrating these fragments.

Memory is a bad guide to probability estimates. But what’s the alternative? Should we carry tables around with us?

Personally, I hope that someday data that is already out there in the public domain will be made easily accessible. I hope that finding the relative frequencies of measles-related deaths and tornado-related deaths will be as quick as finding the number of square feet in an acre or the world population, and that political squabble will focus on whether or not certain data should be in the public domain (“You can’t force hospitals to put their data online! That violates the patients’ right to privacy!” “Well, but….”)