The Bat and Ball Problem Revisited
post by drossbucket · 20181213T07:16:30.017Z · LW · GW · 24 commentsContents
Thinking, inherently fast and inherently slow No, seriously, the answer isn't ten cents So, what are people doing when they solve this problem? How To Solve It How To Visualise It Final thoughts Questions None 24 comments
Cross posted from my personal blog.
In this post, I'm going to assume you've come across the Cognitive Reflection Test before and know the answers. If you haven't, it's only three quick questions, go and do it now.
One of the striking early examples in Kahneman's Thinking, Fast and Slow is the following problem:
(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.
How much does the ball cost? _____ cents
This question first turns up informally in a paper by Kahneman and Frederick, who find that most people get it wrong:
Almost everyone we ask reports an initial tendency to answer “10 cents” because the sum $1.10 separates naturally into $1 and 10 cents, and 10 cents is about the right magnitude. Many people yield to this immediate impulse. The surprisingly high rate of errors in this easy problem illustrates how lightly System 2 monitors the output of System 1: people are not accustomed to thinking hard, and are often content to trust a plausible judgment that quickly comes to mind.
In Thinking Fast and Slow, the bat and ball problem is used as an introduction to the major theme of the book: the distinction between fluent, spontaneous, fast 'System 1' mental processes, and effortful, reflective and slow 'System 2' ones. The explicit moral is that we are too willing to lean on System 1, and this gets us into trouble:
The batandball problem is our first encounter with an observation that will be a recurrent theme of this book: many people are overconfident, prone to place too much faith in their intuitions. They apparently find cognitive effort at least mildly unpleasant and avoid it as much as possible.
This story is very compelling in the case of the bat and ball problem. I got this problem wrong myself when I first saw it, and still find the intuitivebutwrong answer very plausible looking. I have to consciously remind myself to apply some extra effort and get the correct answer.
However, this becomes more complicated when you start considering other tests of this fastvsslow distinction. Frederick later combined the bat and ball problem with two other questions to create the Cognitive Reflection Test:
(2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? _____ minutes
(3) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? _____ days
These are designed to also have an 'intuitivebutwrong' answer (100 minutes, 24 days), and an 'effortfulbutright' answer (5 minutes, 47 days). But this time I seem to be immune to the wrong answers, in a way that just doesn't happen with the bat and ball:
I always have the same reaction, and I don’t know if it’s common or I’m just the lone idiot with this problem. The ‘obvious wrong answers’ for 2. and 3. are completely unappealing to me (I had to look up 3. to check what the obvious answer was supposed to be). Obviously the machinewidget ratio hasn’t changed, and obviously exponential growth works like exponential growth.
When I see 1., however, I always think ‘oh it’s that bastard bat and ball question again, I know the correct answer but cannot see it’. And I have to stare at it for a minute or so to work it out, slowed down dramatically by the fact that Obvious Wrong Answer is jumping up and down trying to distract me.
If this test was really testing my propensity for effortful thought over spontaneous intuition, I ought to score zero. I hate effortful thought! As it is, I score two out of three, because I've trained my intuitions nicely for ratios and exponential growth. The 'intuitive', 'System 1' answer that pops into my head is, in fact, the correct answer, and the supposedly 'intuitivebutwrong' answers feel bad on a visceral level. (Why the hell would the lily pads take the same amount of time to cover the second half of the lake as the first half, when the rate of growth is increasing?)
The bat and ball still gets me, though. My gut hasn't internalised anything useful, and it's super keen on shouting out the wrong answer in a distracting way. My dislike for effortful thought is definitely a problem here.
I wanted to see if others had raised the same objection, so I started doing some research into the CRT. In the process I discovered a lot of followup work that makes the story much more complex and interesting.
I've come nowhere near to doing a proper literature review. Frederick's original paper has been cited nearly 3000 times, and dredging through that for the good bits is a lot more work than I'm willing to put in. This is just a summary of the interesting stuff I found on my limited, partial dig through the literature.
Thinking, inherently fast and inherently slow
Frederick's original Cognitive Reflection Test paper describes the System 1/System 2 divide in the following way:
Recognizing that the face of the person entering the classroom belongs to your math teacher involves System 1 processes — it occurs instantly and effortlessly and is unaffected by intellect, alertness, motivation or the difficulty of the math problem being attempted at the time. Conversely, finding to two decimal places without a calculator involves System 2 processes — mental operations requiring effort, motivation, concentration, and the execution of learned rules.
I find it interesting that he frames mental processes as being inherently effortless or effortful, independent of the person doing the thinking. This is not quite true even for the examples he gives — faceblind people and calculating prodigies exist.
This framing is important for interpreting the CRT. If the problem inherently has a wrong 'System 1 solution' and a correct 'System 2 solution', the CRT can work as intended, as an efficient tool to split people by their propensity to use one strategy or the other. If there are 'System 1' ways to get the correct answer, the whole thing gets much more muddled, and it's hard to disentangle natural propensity to reflection from prior exposure to the right mathematical concepts.
My tentative guess is that the bat and ball problem is close to being this kind of efficient tool. Although in some ways it's the simplest of the three problems, solving it in a 'fast', 'intuitive' way relies on seeing the problem in a way that most people's education won't have provided. (I think this is true, anyway  I'll go into more detail later.) I suspect that this is less true the other two problems  ratios and exponential growth are topics that a mathematical or scientific education is more likely to build intuition for.
(Aside: I'd like to know how these other two problems were chosen. The paper just states the following:
Motivated by this result [the answers to the bat and ball question], two other problems found to yield impulsive erroneous responses were included with the “bat and ball” problem to form a simple, threeitem “Cognitive Reflection Test” (CRT), shown in Figure 1.
I have a vague suspicion that Frederick trawled through something like 'The Bumper Book of Annoying Riddles' to find some brainteasers that don't require too much in the way of mathematical prerequisites. The lilypads one has a family resemblance to the classic grainsofwheatonachessboard puzzle, for instance.)
However, I haven't found any great evidence either way for this guess. The original paper doesn't break down participants' scores by question – it just gives mean scores on the test as a whole. I did however find this metaanalysis of 118 CRT studies, which shows that the bat and ball question is the most difficult on average – only 32% of all participants get it right, compared with 40% for the widgets and 48% for the lilypads. It also has the biggest jump in success rate when comparing university students with nonstudents. That looks like better mathematical education does help on the bat and ball, but it doesn't clear up how it helps. It could improve participants' ability to intuitively see the answer. Or it could improve ability to come up with an 'unintuitive' solution, like solving the corresponding simultaneous equations by a rote method.
What I'd really like is some insight into what individual people actually do when they try to solve the problems, rather than just this aggregate statistical information. I haven't found exactly what I wanted, but I did turn up a few interesting studies on the way.
No, seriously, the answer isn't ten cents
My favourite thing I found was this (apparently unpublished) ‘extremely rough draft’ by Meyer, Spunt and Frederick from 2013, revisiting the bat and ball problem. The intuitivebutwrong answer turns out to be extremely sticky, and the paper is basically a series of increasingly desperate attempts to get people to actually think about the question.
One conjecture for what people are doing when they get this question wrong is the attribute substitution hypothesis. This was suggested early on by Kahneman and Frederick, and is a fancy way of saying that they are instead solving the following simpler problem:
(1) A bat and a ball cost $1.10 in total. The bat costs $1.00.
How much does the ball cost? _____ cents
Notice that this is missing the 'more than the ball' clause at the end, turning the question into a much simpler arithmetic problem. This simple problem does have 'ten cents' as the answer, so it's very plausible that people are getting confused by it.
Meyer, Spunt and Frederick tested this hypothesis by getting respondents to recall the problem from memory. This showed a clear difference: 94% of 'five cent' respondents could recall the correct question, but only 61% of 'ten cent' respondents. It's possible that there is a different common cause of both the 'ten cent' response and misremembering the question, but it at least gives some support for the substitution hypothesis.
However, getting people to actually answer the question correctly was a much more difficult problem. First they tried bolding the words more than the ball to make this clause more salient. This made surprisingly little impact: 29% of respondents solved it, compared with 24% for the original problem. Printing both versions was slightly more successful, bumping up the correct response to 35%, but it was still a small effect.
After this, they ditched subtlety and resorted to pasting these huge warnings above the question:
These were still only mildly effective, with a correct solution jumping to 50% from 45%. People just really like the answer 'ten cents', it seems.
At this point they completely gave up and just flat out added “HINT: 10 cents is not the answer.” This worked reasonably well, though there was still a hard core of 13% who persisted in writing down 'ten cents'.
That's where they left it. At this point there's not really any room to escalate beyond confiscating the respondents' pens and prefilling in the answer 'five cents', and I worry that somebody would still try and scratch in 'ten cents' in their own blood. The wrong answer is just incredibly compelling.
So, what are people doing when they solve this problem?
Unfortunately, it's hard to tell from the published literature (or at least what I found of it). What I'd really like is lots of transcripts of individuals talking through their problem solving process. The closest I found was this paper by Szaszi et al, who did carry out these sort of interview, but it doesn't include any examples of individual responses. Instead, it gives a aggregated overview of types of responses, which doesn't go into the kind of detail I'd like.
Still, the examples given for their response categories give a few clues. The categories are:

Correct answer, correct start. Example given: 'I see. This is an equation. Thus if the ball equals to x, the bat equals to x plus 1... '

Correct answer, incorrect start. Example: 'I would say 10 cents... But this cannot be true as it does not sum up to €1.10...'

Incorrect answer, reflective, i.e. some effort was made to reconsider the answer given, even if it was ultimately incorrect. Example: '... but I'm not sure... If together they cost €1.10, and the bat costs €1 more than the ball... the solution should be 10 cents. I'm done.'

No reflection. Example: 'Ok. I'm done.'
These demonstrate one way to reason your way to the correct answer (solve the simultaneous equations) and one way to be wrong (just blurt out the answer). They also demonstrate one way to recover from an incorrect solution (think about the answer you blurted out and see if it actually works). Still, it's all rather abstract and high level.
How To Solve It
However, I did manage to stumble onto another source of insight. While researching the problem I came across this article from the online magazine of the Association for Psychological Science, which discusses a variant 'Ford and Ferrari problem'. This is quite interesting in itself, but I was most excited by the comments section. Finally some examples of how the problem is solved in the wild!
The simplest 'analytical', 'System 2' solution is to rewrite the problem as two simultaneous linear equations and plugandchug your way to the correct answer. For example, writing for the bat and for the ball, we get the two equations
, ,
which we could then solve in various standard ways, e.g.
, ,
which then gives
.
There are a couple of variants of this explained in the comments. It's a very reliable way to tackle the problem: if you already know how to do this sort of rote method, there are no surprises. This sort of method would work for any similar problem involving linear equations.
However, it's pretty obvious that a lot of people won't have access to this method. Plenty of people noped out of mathematics long before they got to simultaneous equations, so they won't be able to solve it this way. What might be less obvious, at least if you mostly live in a highmathsability bubble, is that these people may also be missing the sort of tacit mathematical background that would even allow them to frame the problem in a useful form in the first place.
That sounds a bit abstract, so let's look at some responses (I'll paste all these straight in, so any typos are in the original). First, we have these two confused commenters:
The thing is, why does the ball have to be $.05? It could have been .04 0r.03 and the bat would still cost more than $1.
and
This is exactly what bothers me and resulted in me wanting to look up the question online. On the quiz the other 2 questions were definitive. This one technically could have more than one answer so this is where phycologists actually mess up when trying to give us a trick question. The ball at .4 and the bat at 1.06 doesn’t break the rule either.
These commenters don't automatically see two equations in two variables that together are enough to constrain the problem. Instead they seem to focus mainly on the first condition (adding up to $1.10) and just use the second one as a vague check at best ('the bat would still cost more than $1'). This means that they are unable to immediately tell that the problem has a unique solution.
In response, another commenter, Tony, suggests a correct solution which is an interesting mix of writing the problem out formally and then figuring out the answer by trial and error:\
I hear your pain. I feel as though psychologists and psychiatrists get together every now and then to prove how stoopid I am. However, after more than a little head scratching I’ve gained an understanding of this puzzle. It can be expressed as two facts and a question A=100+B and A+B=110, so B=? If B=2 then the solution would be 100+2+2 and A+B would be 104. If B=6 then the solution would be 100+6+6 and A+B would be 112. But as be KNOW A+B=110 the only number for B on it’s own is 5.
This suggests enough halfremembered mathematical knowledge to find a sensible abstract framing, but not enough to solve it the standard way.
Finally, commenter Marlo Eugene provides an ingenious way of solving the problem without writing all the algebraic steps out:
Linguistics makes all the difference. The conceptual emphasis seems to lie within the word MORE.
X + Y = $1.10. If X = $1 MORE then that leaves $0.10 TO WORK WITH rather than automatically assign to Y
So you divide the remainder equally (assuming negative values are disqualified) and get 0.05.
So even this small sample of comments suggests a wide diversity of problemsolving methods leading to the two common answers. Further, these solutions don't all split neatly into 'System 1' 'intuitive' and 'System 2' 'analytic'. Marlo Eugene's solution, for instance, is a mixed solution of writing the equations down in a formal way, but then finding a clever way of just seeing the answer rather than solving them by rote.
I'd still appreciate more detailed transcripts, including the time taken to solve the problem. My suspicion is still that very few people solve this problem with a fast intuitive response, in the way that I rapidly see the correct answer to the lilypad question. Even the more 'intuitive' responses, like Marlo Eugene's, seem to rely on some initial careful reflection and a good initial framing of the problem.
If I'm correct about this lack of fast responses, my tentative guess for the reason is that it has something to do with the way most of us learn simultaneous equations in school. We generally learn arithmetic as young children in a fairly concrete way, with the formal numerical problems supplemented with lots of specific examples of adding up apples and bananas and so forth.
But then, for some reason, this goes completely out of the window once the unknown quantity isn't sitting on its own on one side of the equals sign. This is instead hived off into its own separate subject, called 'algebra', and the rules are taught much later in a much more formalised style, without much attempt to build up intuition first.
(One exception is the sort of puzzle sheets that are often given to young kids, where the unknowns are just empty boxes to be filled in. Sometimes you get 2+3=□, sometimes it's 2+□=5, but either way you go about the same process of using your wits to figure out the answer. Then, for some reason I'll never understand, the worksheets get put away and the poor kids don't see the subject again until years later, when the box is now called for some reason and you have to find the answer by defined rules. Anyway, this is a separate rant.)
This lack of a rich background in puzzling out the answer to specific concrete problems means most of us lean hard on formal rules in this domain, even if we're relatively mathematically sophisticated. Only a few build up the necessary repertoire of tricks to solve the problem quickly by insight. I'm reminded of a story in Feynman's The Pleasure of Finding Things Out:
Around that time my cousin, who was three years older, was in high school. He was having considerable difficulty with his algebra, so a tutor would come. I was allowed to sit in a corner while the tutor would try to teach my cousin algebra. I'd hear him talking about x.
I said to my cousin, "What are you trying to do?"
"I'm trying to find out what x is, like in 2x + 7 = 15."
I say, "You mean 4."
"Yeah, but you did it by arithmetic. You have to do it by algebra."
I learned algebra, fortunately, not by going to school, but by finding my aunt's old schoolbook in the attic, and understanding that the whole idea was to find out what x is  it doesn't make any difference how you do it.
I think this reliance on formal methods might be somewhat less true for exponential growth and ratios, the subjects underpinning the lilypad and widget questions. Certainly I seem to have better intuition there, without having to resort to rote calculation. But I'm not sure how general this is.
How To Visualise It
If you wanted to solve the bat and ball problem without having to 'do it by algebra', how would you go about it?
My original post on the problem was a pretty quick, throwaway job, but over time it picked up some truly excellent comments by anders and Kyzentun, which really start to dig into the structure of the problem and suggest ways to 'just see' the answer. The thread with anders in particular goes into lots of other examples of how we think through solving various problems, and is well worth reading in full. I'll only summarise the batandballrelated parts of the comments here.
We all used some variant of the method suggested by Marlo Eugene in the comments above. Writing out the basic problem again, we have:
, .
Now, instead of immediately jumping to the standard method of eliminating one of the variables, we can just look at what these two equations are saying and solve it directly 'by thinking'. We have a bat, . If you add the price of the ball, , you get 110 cents. If you instead remove the same quantity you get 100 cents. So the bat's price must be exactly halfway between these two numbers, at 105 cents. That leaves five for the ball.
Now that I'm thinking of the problem in this way, I directly see the equations as being 'about a bat that's halfway between 100 and 110 cents', and the answer is incredibly obvious.
Kyzentun suggests a variant on the problem that is much less counterintuitive than the original:
A centered piece of text and its margins are 110 columns wide. The text is 100 columns wide. How wide is one margin?
Same numbers, same mathematical formula to reach the solution. But less misleading because you know there are two margins, and thus know to divide by two after subtracting.
In the original problem, the 110 units and 100 units both refer to something abstract, the sum and difference of the bat and ball. In Kyzentun's version these become much more concrete objects, the width of the text and the total width of the margins. The work of seeing the equations as relating to something concrete has mostly been done for you.
Similarly, anders works the problem by 'getting rid of the 100 cents', and splitting the remainder in half to get at the price of the ball:
I just had an easy time with #1 which I haven’t before. What I did was take away the difference so that all the items are the same (subtract 100), evenly divide the remainder among the items (divide 10 by 2) and then add the residuals back on to get 105 and 5.
The heuristic I seem to be using is to treat objects as made up of a value plus a residual. So when they gave me the residual my next thought was “now all the objects are the same, so whatever I do to one I do to all of them”.
I think that after reasoning my way through all these perspectives, I'm finally at the point where I have a quick, 'intuitive' understanding of the problem. But it's surprising how much work it was for such a simple bit of algebra.
Final thoughts
Rather than making any big conclusions, the main thing I wanted to demonstrate in this post is how complicated the story gets when you look at one problem in detail. I've written about close reading recently, and this has been something like a close reading of the bat and ball problem.
Frederick's original paper on the Cognitive Reflection Test is in that generic social science style where you define a new metric and then see how it correlates with a bunch of other macroscale factors (either big social categories like gender or education level, or the results of other statistical tests that try to measure factors like time preference or risk preference). There's a strange indifference to the details of the test itself – at no point does he discuss why he picked those specific three questions, and there's no attempt to model what was making the intuitivebutwrong answer appealing.
The later paper by Meyer, Spunt and Frederick is much more interesting to me, because it really starts to pick apart the specifics of the bat and ball problem. Is an easier question getting substituted? Can participants reproduce the correct question from memory?
I learned the most from the individual responses, though. This is where you really get to see the variety of ways that people tackle the problem. Careful reflection definitely seems to improve the chance of a correct answer in general, but many of the responses don't really fit the neat 'fast vs slow' division of the original setup.
Questions
I'm interested in any comments on the post, but here are a few specific things I'd like to get your answers to:

My rapid, intuitive answer for the bat and ball question is wrong (at least until I retrained it by thinking about the problem way too much). However, for the other two I 'just see' the correct answer. Is this common for other people, or do you have a different split?

If you're able to rapidly 'just see' the answer to the bat and ball question, how do you do it?

How do people go about designing tests like these? This isn't at all my field and I'd be interested in any good sources. I'd kind of assumed that there'd be some kind of seriousbusiness Test Creation Methodology, but for the CRT at least it looks like people just noticed they got surprising answers for the bat and ball question and looked around for similar questions. Is that unusual compared to other psychological tests?
24 comments
Comments sorted by top scores.
comment by moridinamael · 20181213T17:06:51.234Z · LW(p) · GW(p)
My daughter is just starting to learn subtraction. She was very frustrated by it, and if I verbally asked "What's seven minus five?" she was about 50% likely to give the right answer. I asked her a sequence of simple subtraction problems and she consistently performed at about that level. In the course of our back and forth I switch my phrasing to the form "You have seven apples and you take away five, how many left?" and she immediately started answering the questions 100% correctly, very rapidly too. Experimentally I switched back to the prior form and she started getting them wrong again. It was apparent to me that simply phrasing the problem in terms of concrete objects was activating something like visualization which made the problems easy, and just phrasing it as abstract numbers was failing to activate this switch. So as you say, for more tricky arithmetic problems, it may be the case that what mental circuits are "activated automatically" determine the first answer you arrive at, and you can exploit that effect with edge cases like this.
comment by drossbucket · 20181213T19:08:03.157Z · LW(p) · GW(p)
Strangely, it can sometimes also go the other way!
One of my most eyeopening teaching experiences occurred when I was helping a sixyearold who was struggling with basic addition – or so it appeared. She was trying to work through a book that helped her to the concept of addition via various examples such as “If Nellie has three apples and is then given two more, how many apples does she have?” The poor little girl didn’t have a clue.
However, after spending a short time with her I discovered that she could do 3+2 with no problem whatsoever. In fact, she had no trouble with addition. She just couldn’t get her head around all these wretched apples, cakes, monkeys etc that were being used to “explain” the concept of addition to her. She needed to work through the book almost “backwards” – I had to help her understand that adding up apples was just an example of an abstract addition she could do perfectly well! Her problem was that all the books for sixyearolds went the other way round.
I think this is unusual though.
comment by Sniffnoy · 20190218T18:16:52.201Z · LW(p) · GW(p)
comment by drossbucket · 20190218T19:17:23.001Z · LW(p) · GW(p)
Ooh, I'd forgotten about that test, and how the beer version was much easier  that would be another good one to read up on.
comment by drossbucket · 20191224T17:50:51.832Z · LW(p) · GW(p)
I haven't thought about the bat and ball question specifically very much since writing this post, but I did get a lot of interesting comments and suggestions that have sort of been rolling around my head in background mode ever since. Here's a few I wanted to highlight:
Is the bat and ball question really different to the others? First off, it was interesting to see how much agreement there was with my intuition that the bat and ball question was interestingly different to the other two questions in the CRT. Reading through the comments I count four other people who explicitly agree with this (1, 2, 3, 4 [LW(p) · GW(p)]) and three who either explicitly disagree or point out that they find the widget problem hardest (5, 6 [LW(p) · GW(p)], 7 [LW(p) · GW(p)]). I'd be intrigued to know if other people also disagree that the bat and ball feels different to them.
Concrete vs abstract quantities. Out of the people who agreed with that the bat and ball is different, this comment from @awbery does a particularly good job of giving a potential explanation for why:
The problem is a ‘two things’ problem. The first sentence presents two things, a bat and a ball. The language correctly reflects there are two things we should consider. The first sentence is ‘this plus that equals $1.10’. It correctly sounds like a + b; two things. The first sentence presents the state of affairs, not the problem itself. The second sentence presents the problem. The language of the second sentence reinforces the two things idea because there’s still the bat and the ball and they’re compared against each other: ‘there’s this one and it’s more than that one’. The trickiness is that it is a two things problem, but the two things we need to consider are not the most object level single units, but the bat, and the batplusball. Our brains are pulled toward the object level division of things by the language and the visual nature of the problem. We have to think really hard to understand that the abstract construct of the problem is the same shape as the state of affairs – there are two things to consider in relation to each other – but while the bat and the ball are still involved, they’re reconfigured by a nonintuitive/nonobjectlike division.
There’s no object level mirror trick in the other two problems, they’re straight forward maths mapping an object level visual representation. The widget problem presents a process which doesn’t change how the machines and widgets relate to each other in its solution. Our brains don’t have to mash up the pond and the lilies to separate the visual presentation to an abstract level. We can see that the pond is the same pond, half covered with lilies then fully covered with lilies at the next step. We don’t suddenly have some new abstract unreal configuration of lilies and pond to contend with.
I think this is why Kyzentun and Ander’s methods help get at the bat and ball problem intuitively – because they bypass the conflict between object level and abstract and translate it into the formal algebra realm. The problem as presented is nonintuitive because the objects visualization it suggests doesn’t reflect the shape of the formal solution.
So I think this is a particular type of problem, one in which visual shape and language of the presentation collude to obfuscate the visualization of the solution at an abstract/formal level. It’s a different type of problem to the other two in this sense, because the objects they present can be used as given in the solution.
Closeness to correct answer. Another interesting possibility is in TheManxLoiner's comment  that the bat and ball problem is difficult because the incorrect answer is 'close to the real one', whereas for the other two problems the incorrect answer is 'wildly off'. I've written a comment [LW(p) · GW(p)] in response but I need to think about this more.
Ethnomethodology. David Chapman pointed out that these introspective accounts of what people are thinking when they solve maths problems are very unreliable, and that I'd probably be better concentrating strictly on what people do, as in ethnomethodology:
Yes, the fundamental principle of ethnomethodological methodology is “look at what people say and do, and don’t ever speculate about what’s happening in their head, because we can’t know.” At first that seems like a straitjacket, and highly unintuitive; but it forces you to really look, and then you see what is going on.
This sounds promising. I'm only just getting round to reading some ethnomethodology, and I haven't got my bearings yet.
Cognitive decoupling. There's a link with cognitive decoupling (in Stanovich's original sense) that could be worth exploring further. Success in the bat and ball problem seems to involve decoupling from the noisy wrong answer. David Chapman recommended Formal Languages in Logic by Dutilh Novaes for more background on this. So far I've read maybe a third of it. I've also written a bit more about cognitive decoupling and the history of the term here.
Next steps. I'm not sure where I'm going to take this next. Probably nowhere much for a while, as I have other priorities. But some options are:
 Anders came up with a load of similar problems in the comments. These are designed to be cognitively unpleasant in the same way as the bat and ball, so I keep putting them off. I should actually go through them!
 I'm going to continue reading Dutilh Novaes and some ethnomethodology.
 Connect more specifically to Stanovich's idea of cognitive decoupling.
Testing theories? Further out, it could be interesting to actually test some theories by trying alternative, disguised versions of the question, on Mechanical Turk or something. Right now I've barely considered this, because I haven't thought through what I'd want carefully enough yet, but it might be interesting to test variations in:
 how concrete the things the quantities refer to are (e.g. really concrete like 'the price of the bat', or more abstract like 'the difference between the price of the bat and ball'. Some of Anders' variant questions might fit the bill
 how close in magnitude the intuitivebutwrong answer is, as in TheManxLoiner's comment
I'm very ignorant about experiment design, so to do this I'd to get help from someone more knowledgeable. And psych research sounds like a gigantic minefield even if you are knowledgeable, so I'd probably end up wasting my time. But probably I'd learn something from going through the process, and it's something that could maybe happen in the future.
comment by Scott Alexander (Yvain) · 20191220T21:49:31.895Z · LW(p) · GW(p)
It's nice to see such an indepth analysis of the CRT questions. I don't really share drossbucket's intuition  for me the 100 widget question feels counterintuitive the same way as the ball and bat question, but neither feels really aversive, so it was hard for me to appreciate the feelings that generated this post. But this gives a good example of an idea of "training mathematical intuitions" I hadn't thought about before.
comment by Mary Chernyshenko (marychernyshenko) · 20181213T10:53:02.877Z · LW(p) · GW(p)
I didn't "just see" the answers to the questions the first time I saw them, but neither would I say that I had to solve them entirely formally. It was more like docking a boat  the river keeps tugging at the tail end, until you feel the boat's side touch the berth and know it has stopped. There's a kind of natural inertia to this kind of puzzles.
Also, there is a kind of problems like "one wallet contains ten coins, another one contains twice more, and the total is twenty; explain" that get asked much earlier than kids learn algebra, if I remember right. But it gets dismissed, in favour of cases where you must learn not to count the same bits of evidence twice (cough Bayes cough). I like to think this dismissal bites people in the backside when they learn Mendelian genetics (more easily seen when the genes in question interact hierarchically) or, Merlin forbid, massspectrometry, where the math difficulty is complicated by the chem difficulty of molecules not dividing into usual subunits.
Whew, I was thinking to write a separate post on this, but now I don't have to! Profit!
comment by TheManxLoiner · 20181218T03:44:41.237Z · LW(p) · GW(p)
I have the same experience as you, drossbucket: my rapid answer to (1) was the common incorrect answer, but for (2) and (3) my intuition is wellhoned.
A possible reason for this is that the intuitive but incorrect answer in (1) is a decent approximation to the correct answer, whereas the common incorrect answers in (2) and (3) are wildly off the correct answer. For (1) I have to explicitly do a calculation to verify the incorrectness of the rapid answer, whereas in (2) and (3) my understanding of the situation immediately rules out the incorrect answers.
Here are questions which might be similar to (I):
(4a) I booked seats J23 to J29 in a cinema. How many seats have I booked?
(4b) There is a 20m fence in which the fence posts are 2m apart. How many fence posts are there?
(4c) How many numbers are there in this list: 200,201,202,203,204,...,300.
(5) In 24 hours, how many times do the hourhand and minutehand of a standard clock overlap?
(6) You are in a race and you just overtake second place. What is your new position in the race?
comment by drossbucket · 20191224T17:14:40.020Z · LW(p) · GW(p)
A possible reason for this is that the intuitive but incorrect answer in (1) is a decent approximation to the correct answer, whereas the common incorrect answers in (2) and (3) are wildly off the correct answer. For (1) I have to explicitly do a calculation to verify the incorrectness of the rapid answer, whereas in (2) and (3) my understanding of the situation immediately rules out the incorrect answers.
I must have missed this comment before, sorry. This is a really interesting point. Just to write it out explicitly,
(1) correct answer: 5, incorrect answer: 10
(2) correct answer: 5, incorrect answer: 100
(3) correct answer: 47, incorrect answers: 24
Now, for both (1) and (3) the wrong answer is off by roughly a factor of two. But I also share your sense that the answer to (3) is 'wildly off', whereas the answer to (1) is 'close enough'.
There are a couple of possible reasons for this. One is that 5 cents and 10 cents both just register as 'some small change', whereas 24 days and 47 days feel meaningfully different.
But also, it could be to do with relative size compared to the other numbers that appear in the problem setup. In (1), 5 and 10 are both similarly small compared to 100 and 110. In (3), 24 is small compared to 48, but 47 isn't.
Or something else. I haven't thought about this much.
There's a variant 'Ford and Ferrari' problem that is somewhat related:
> A Ferrari and a Ford together cost $190,000. The Ferrari costs $100,000 more than the Ford. How much does the Ford cost?
So here we have correct answer: 45000, incorrect answer: 90000
Here the incorrect answer feels somewhat wrong, as the Ford is improbably close in price to the Ferrari. People appeared to do better on this modified problem than the bat and ball, but I haven't looked into the details.
comment by habryka (habryka4) · 20191202T03:50:15.023Z · LW(p) · GW(p)
I've referenced the cognitive reflection test as one of those litmus tests of rationality, where I feel like any decent practice of rationality should get people to reliably answer the questions on that test. I found this to actually be the best coverage of the whole test, and it's analysis of people's reasoning to be a significant step up from what I've seen in other coverages of the test.
comment by Ben Pace (Benito) · 20191129T23:00:34.053Z · LW(p) · GW(p)
I do the bat and the ball problem with low effort, and the exponential growth one with no effort, but I find the machines one a bit confusing.
For the bat and the ball, I do something similar to the margins example. I visualise an amount (represented by a length on the number line), I visualise a dollar higher than that amount on that number line, and then move them around so that they're not overlapping, then see that the sum is $1.10. Then I realise there's two identical bits that are added to the $1, which means they're 10/2 each.
(Btw, the bit about them adding a hint and there still being people who wrote 10 cents made me laugh out loud, that's hilarious.)
I'm not sure how to visualise machines taking 5 mins to make 5 things. Do they all do a different bit of the job? Can they all work on one widget simultaneously, speeding that one up? I guess you're expected to assume they each work on one widget. Okay, I guess that kind makes sense, and is intuitive if that's true.
comment by lionhearted · 20200106T03:25:01.660Z · LW(p) · GW(p)
I just wanted to say this was a really fun read. I hadn't considered the multiple ways people could get to the right or wrong answer.
comment by Ben Pace (Benito) · 20191202T19:05:52.708Z · LW(p) · GW(p)
Seconding Habryka. I’d really like to see this reviewed.
comment by asimovio · 20201102T18:38:28.221Z · LW(p) · GW(p)
It was around the same for me: I knew I had to be careful for the first problem (that was accentuated by the fact that we were warned about the failure rate). For the second one though, I was too careful: I immediately started transcribing the problem in a pertinent format, i.e. I created the machine.minute unit (equivalent to the kWh unit) that allowed me to understand that a widget is made in 5 machine.minutes. Then I looked at the question, and computed that you’d need 500 machine.minutes to make 100 widgets, so 5 minutes with 100 machines. During that last step, something was bugging me, and it was the realization that the problem was much simpler and intuitive. I still finished what I was doing, then looked at it as a whole, and saw the "you need more widgets but have more machines so you need the same time" intuitively.
And for the last one, it was very obvious of course.
comment by Alia1d · 20190225T23:07:45.854Z · LW(p) · GW(p)
It’s been may years since I first saw this question, so my memories may not be accurate, but I think my internal thoughts went something like this: ‘Well 1.10 minus 1 is .10, but wait I know this is a trick question so … Ah! I also need to divide by 2. The answer is .05.’ And then I checked my answer by doing 1.05 + .05 and 1.05  .05. Introspecting now on why I leaped to the idea of dividing by two, I think what I was seeing was something like: In this context “costs $1.00 more than” means Exactly $1 more than, so it’s saying that without the $1 the two things are equal and you need to divide the cost between them.
This makes me think of ordinary real life contexts where I would say “costs $1.00 (or $20 or $100) more than.” It seems possible it might be clear to both me and my listener I meant ‘at least x more than,’ ‘as much as x more than,’ or ‘approximately x more than.’ I wonder if changing the wording to “The bat costs exactly $1.00 more than the ball” would help any.
comment by po8crg · 20190221T17:41:47.467Z · LW(p) · GW(p)
This is exactly what bothers me and resulted in me wanting to look up the question online. On the quiz the other 2 questions were definitive. This one technically could have more than one answer so this is where phycologists actually mess up when trying to give us a trick question. The ball at .4 and the bat at 1.06 doesn’t break the rule either.
Interesting: these could cover a couple of misunderstandings, one is that B>=100, the other that "The bat costs $1.00 more than the ball" does not mean Bb=100, but that Bb>=100.
In ordinary language, "that costs $1.00 more than the other one" is not incorrect if the difference is $1.01.
I suspect that person would have been corrected by saying "the bat costs precisely one dollar more than the ball"
comment by Rana Dexsin · 20181213T23:10:26.798Z · LW(p) · GW(p)
The bat and ball problem I answer in what I'll call one conscious timestep with the correct “five cents”, but it happens too fast for me to verify how (beyond the usual trouble with verifying internal reflection). I would speculate, in decreasing order of intuitive probability, that in order to get the answer, either (a) I've seen an exactly analogous “trick” problem before and am patternmatching on that or (b) I'm doing the algebra quickly using my seemingly welldeveloped mathematical intuition. I can also imagine (c) I'm leaping to the “wrong” answer, then trying to verify it, noticing it's wrong, and correcting it, all in the same subconscious flash, but that feels off. Imagining the “ten cents” answer doesn't actually feel compelling; it just feels wrong. (It feels like a similar emotion to noticing I've gotten the wrong amount of change, in fact.)
The widgets problem I do a noticeable doubletake on, but it's rapidly corrected within one conscious timestep; the “100” is a momentary flicker before my brain settles on the correct answer. Imagining “100” afterwards feels wrong, but less immediately so than “ten cents” did. It feels like I have a bias there toward answering “how many widgets can you produce in a fixed time” questions, so I might have an echo of the misreading “how many widgets can 100 machines produce in [assumed to be the same amount of time as before, since no contrary time value is presented to override this]”.
The lily pads question takes me a conscious timestep longer to answer than either of the other two; the initial flash is “inconclusive”, and then I see myself rechecking the part where the quantity doubles every step before answering “47”. (I notice I didn't remember that the steps were days, only remembering that there was a time unit; I don't know if that's relevant.) Imagining “24” afterwards feels some intermediate level of wrong between “ten cents” and “100”; my mental graph of the growth curve puts the expected value 24 at “way too low” intuitively before I can compute the actual exponent.
comment by anna_macdonald · 20181213T20:43:17.901Z · LW(p) · GW(p)
However, for the other two I 'just see' the correct answer. Is this common for other people, or do you have a different split?
For all three questions, the wrong answer comes to my mind first*. But especially in the context of expecting a trick question, I secondguess it and come up with the correct answer fairly quickly.
*In the third question, the actual answer "24" does not come to mind first, but the general sense of "half that number" does. My mind does not actually calculate what half of 48 is before finishing thinking through the problem.
comment by clone of saturn · 20181213T09:23:25.060Z · LW(p) · GW(p)
I just saw the answer to the bat and ball problem within a few seconds. As I remember, my thought process was something like: Could it be 10 cents? No, that adds up to $1.20. So there's an extra 10 centsoh, of course, the difference between $1 and $1.10 has to be distributed evenly between both items, so the answer is 5 cents.
I've taken a course that covered simultaneous equations, but my memory of it is hazy enough that I'm sure that method would've taken me much longer.
comment by Pattern · 20181213T19:32:05.136Z · LW(p) · GW(p)
I'm going to pull a reverse true scotsman here and say that is simultaneous equations. (When we think of 'solving simultaneous equations' we imagine people pulling the answer out, rather than pushing the solution in and seeing if it fits  solving versus checking as it were.)
comment by Lanrian · 20181213T09:00:00.323Z · LW(p) · GW(p)
However, for the other two I 'just see' the correct answer. Is this common for other people, or do you have a different split?
I think I figured out and verified the answer to all 3 questions in 510 seconds each, when I first heard them (though I was exposed to them in the context of "Take the cognitive reflection test which people fail because the obvious answer is wrong", which always felt like cheating to me).
If I recall correctly, the third question was easier than the second question, which was easier than bat & ball: I think I generated the correct answer as a suggestion for 2 and 3 pretty much immediately (alongside the supposedly obvious answers), and I just had to check them. I can't quite remember my strategy for bat & ball, but I think I generated the $0.1 ball, $1 bat answer, saw that the difference was $0.9 instead of $1, adjusted to $0.05, $1.05, and found that that one was correct.
comment by Bucky · 20181213T09:56:01.389Z · LW(p) · GW(p)
This is pretty much the same for me. I think the solution to bat and ball of "10cents, oh no, that doesn't work. Split the difference evenly for 5 cents? yup that's better" is all done on system 1.
Kahneman's examples of system 1 thinking include (I think) a Chess Grandmaster seeing a good chess move, so he includes the possibility of training your system 1 to be able to do more things. In the case of the OP, system 1 has been trained to really understand exponential growth and ratios. I think that for me both "quickly check that your answer is right" and "try something vaguely sensible and see what happens" are both ingrained as general principles that I don't have to exert effort to apply them to simple problems.
A problem which I would volunteer for a CRT is the snail climbing out of a well. Here there's an obvious but wrong answer but I think if you realise that it's wrong then the correct answer isn't too hard to figure out.