Regression To The Mean [Draft][Request for Feedback]

post by faul_sname · 2012-06-22T17:55:51.917Z · LW · GW · Legacy · 14 comments

"Rewarding good performance leads to faster improvement than punishing bad performance"

"In general, unusually bad performance improves after punishment, but good performance tends not to improve and sometimes even gets worse after praise is administered."

 

These statements seem contradictory, yet both describe real effects. The apparent contradiction is caused by a phenomenon known as "regression to the mean," which states that the measurement after an exceptional measurement will be closer to average. The improvement after a reprimand is caused not by any effect that reprimand had, nor was the worsening after praise due to the praise. Both observations were due to regression to the mean.

 

Regression to the mean is caused by two things.

1. Exceptionally good performance is far above average, and exceptionally bad performance is far below average.

2. Most performance is about average.

 

Let's put this in concrete terms. Let's say we are trying to teach our friend Bob to play darts. He's not very good yet, and while he almost always hits the board, he can't really get a higher level of accuracy than that.

 

On his 12th throw, Bob misses the dartboard entirely. This is extraordinarily bad, even for him. On his next throw, he gets an 8, which is fairly typical, and much better. 

 

On his 57th throw, your friend manages to get a bullseye. You slap him on the back, congratulating him on his improvement. It seems your friend really is getting the hang of this after all. Proud of his accomplishment, Bob lines up his next attempt. He cocks his arm, throws, and...

Gets a 12. 

 

This is regression to the mean. Since there is a large random factor in darts, especially for unskilled players, a good throw will probably not be followed up with another good throw (since good throws are rare, and one shot is independent of another). This effect shows up whenever there is a large random component in performance.

 

It's important to be aware of when an exceptional observation was likely due to random variation in measurement, not an exceptional characteristic of the thing being measured. For example, stock performance is mostly random. If one of your stocks does extremely well this year, you should expect it to perform much closer to average next year. If your kid scores 160 on his/her first IQ test, you should expect lower performance on later tests. If it took you 3 hours to get to work yesterday because of traffic instead of the usual 1, you shouldn't worry too much about it taking 3 hours again today.

 

 


 

This is a very rough draft of a post on regression to the mean. I wrote it because the sequences didn't cover this particular bias blind spot, and it's an important one. I appreciate any feedback, both in terms of providing examples and making it easy to understand and in terms of cleaning up any errors or awkward phrasings.

14 comments

Comments sorted by top scores.

comment by OrphanWilde · 2012-06-22T18:54:38.557Z · LW(p) · GW(p)

I would suggest leading with a scenario in which this bias is relevant.

I/e, "Bob hates his boss; Bob's boss is always criticizing employees, and never says anything positive about their performance. Bob's boss believes the evidence supports this policy, because every time bad performance has been criticized, performance has improved - and every time praise has been offered for good performance, performance has shown no improvement, or sometimes even gotten worse."

(I recommend you write something better than this, though. This particular example reads like something horrible out of an HR handbook written by an eldritch god who trying to pass as human. Extra eldritch god points for a continuing narrative in which Bob's boss manages to do everything precisely wrong.)

Replies from: gwern, faul_sname
comment by gwern · 2012-06-22T19:27:34.182Z · LW(p) · GW(p)

I've read before that coaches do this a lot.

comment by faul_sname · 2012-06-22T22:14:18.084Z · LW(p) · GW(p)

I definitely like this. I don't guarantee that my example would be any better.

Then again, I could just give credit where credit is due and quote Thinking Fast and Slow here. Kahneman has a good real-world example of a flight commander who did just that.

comment by jsalvatier · 2012-06-24T02:38:54.658Z · LW(p) · GW(p)

This is an excellent topic, and this is a good start, but I think this post could use some work. In fact, if you wanted focused help with this, I would be interested in writing this with you. You can email me at jsalvatier@gmail.com or talk to me on skype (jsalvatier).

The way you've described this is a little vague, I think, and also makes the concept seem more complex than it is. I also think it would be useful to explore ways in which regression to the mean can lead you astray.

The core of regression to the mean is not that complicated: take a set of variables for which it's semi realistic to say they're independently and identically distributed (that is they have the same distribution and each one is a separate realization from that distribution). For example, the number of pushups you can do each day this week, or the number of minutes you're early/late for school each day. If you notice that a value is unusually high compared to ones that came before it, then later values are likely to be lower than it. If one value is higher than the mean of the distribution, then ,on average, later values will be below it because the mean of these variables is the mean of the distribution.

Here's an illustration which I think might help: Lets say you're a math teacher and you have the students take two tests during the year. After a while, you notice that students who did really well on the first test tend to do worse than before on the next test. It may be tempting to conclude that this is because students are afraid of success, or worried about doing too much better than their peers or because students get lazy when they're successful. However, there's a simpler explanation, tests are noisy measurements, so students who did very well on the first test are probably good students but also got a bit lucky. If they had been unlucky, they might have still been quite good, but not at the top. Since on the next test each person is just as likely to be lucky as unlucky (the mean of luck is 0), they'll probably do worse the next time around.

You also mention stock returns. I think it would be good to explain why "go invest all your money in the mutual fund that did the best last year" isn't a very effective strategy.

comment by Evercy · 2012-06-23T01:07:25.329Z · LW(p) · GW(p)

Good explanation of the phenomenon. Some thoughts:

The first two quotes and your explanations for them are good. But in this part:

"These statements seem contradictory, yet both describe real effects. The apparent contradiction is caused by a phenomenon known as "regression to the mean,""

I would add a sentence stating that positive reinforcement is not the actual cause of the regression. This will end any confusion. Additionally, the example with the IQ test needs a bit more clarification. While the average human IQ is around 100, the kid doesn't necessarily regress to the same average. Most humans have their own "average" that they regress or improve to after periods of extreme positive or negative performance. So the kid might not regress at all if his natural ability is indeed that high.

comment by TimS · 2012-06-22T19:17:12.974Z · LW(p) · GW(p)

I don't see the relationship between positive reinforcement and regression to the mean. As discussed in this post, positive reinforcement increases the frequency of a particular behavior, not the quality of the behavior. By contrast, regression to the mean tells us facts about the quality of future behavior, not the quantity.

Concretely, I reward my son for voiding in the potty to increase the frequency that he voids in the potty. (Sorry, potty training is on the mind). But I'm not sure what a quality measure would look like in this circumstance. By contrast, we can imagine a soccer goalkeeper who allows no goals for an entire season. Regression to the mean strongly suggests he won't achieve that feat next season. But what behavior is becoming more or less frequent?

I'm not saying that there's no insight in the convergence of the two concepts. But the current draft suggests a conflict that is not clearly demonstrated.

Replies from: beoShaffer, TheOtherDave, OrphanWilde
comment by beoShaffer · 2012-06-30T19:55:06.564Z · LW(p) · GW(p)

Regression to the mean can work with non-contious variables. Sticking with the potty training example call using the toilet properly a 1 and failing to use it properly a 0. If a baby has a a fairly stable tendency to use the toilet 3 out of 4 time that the situation comes up the baby would have a mean of .75. If you observe a 0 from the baby you should expect the next time to be closer to the mean, which would mean a 1.

comment by TheOtherDave · 2012-06-22T19:48:59.304Z · LW(p) · GW(p)

Increasingly unrelated to the OP... you could certainly introduce a quality measure to your son's potty training, if you wanted. For example, you could differentially reward for latency, or for predictable schedules, or for whistling more tunefully while sitting on the toilet, or whatever quality standards you wished to impose.

There still wouldn't be any particular relationship to regression to the mean, though.

Replies from: TimS
comment by TimS · 2012-06-22T19:59:11.438Z · LW(p) · GW(p)

That's a good point. Reinforcement is pretty narrowly focused on the individual. By contrast, regression to the mean makes a lot more sense if there is a population of data (the rate of goals allowed by other goalkeepers this season or by the specified goalkeeper in prior seasons - but my example is rapidly going to become less useful because of endpoint issues - there's no way to allow fewer than zero goals per season).

comment by OrphanWilde · 2012-06-22T19:24:45.418Z · LW(p) · GW(p)

I believe that's what he was trying to address by discussing the "random component" - he omits the opposing nonrandom and controllable component. The only situation I was able to find to match this bias was in the workplace, where working harder can compensate for random components to a limited extent, but not sufficiently to erase variability altogether.

Which I guess is what should be emphasized - the distinction between the random and the nonrandom component, and their apparent convergence.

comment by Bruno_Coelho · 2012-06-25T18:26:09.406Z · LW(p) · GW(p)

great post.

I think the measure of performance is something we can do almost to any varible factor, but been a average are not expectable after outlier performances. In the private sector people are recompensate to be exceptionale, even a tiny better then they peers. If people know that they are not so good in things that they think are so good, updates of you own performance would less frequent.

comment by [deleted] · 2012-06-25T08:59:29.659Z · LW(p) · GW(p)

Added your blog to the Blogs by LWers list.

comment by Douglas_Knight · 2012-06-22T20:35:38.293Z · LW(p) · GW(p)

whenever there is a large random component in performance.

I think that is a good way of explaining regression to the mean, but it's possible that "large" may be misleading. Any random component of performance will produce to regression to mean, although the amount depends on the amount of randomness (or really, unmeasured-ness). I'm not sure what to do about this. Perhaps address it at the end. Or just mislead.

People make the errors you describe, so we need some way of talking about them, but here is someone who doesn't like the way people use the phrase "regression to the mean."

comment by DanielLC · 2012-06-22T19:41:46.680Z · LW(p) · GW(p)

"In general, unusually bad performance improves after punishment, but good performance tends not to improve and sometimes even gets worse after praise is administered."

I interpreted this to mean that punishment and praise results in that general effect on the performance. If you mean that the immediate next action is better than the current one, then that's really misleadin.