No apparent Dunning-Kruger effect for LW participation

post by John_Maxwell (John_Maxwell_IV) · 2012-12-22T04:39:29.271Z · score: 5 (14 votes) · LW · GW · Legacy · 18 comments

Precommitted to publishing this in Discussion to fight publication bias.  It looks like intelligence (as measured by IQ, SAT scores, etc.) isn't meaningfully related to how much one posts to LW.  Probably in the ideal case, they would be related and higher-IQ people would post more, but that doesn't appear to be going on either.

How well-educated you are doesn't seem to be much related to participation either.  I'm not controlling for hours spent on LW for any of this, though.

Script output:

Correlation between "IQ" and "KarmaScore": 0.0343
Correlation between "SATscoresoutof1600" and "KarmaScore": 0.0517
Correlation between "SATscoresoutof2400" and "KarmaScore": 0.1000
Correlation between "TimeinCommunity" and "KarmaScore": 0.1770
Breakdown of average "IQ" by "LessWrongUse":
  137.2500    "I lurk, but never registered an account"
  139.3659    "I've registered an account, but never posted"
  138.8491    "I've posted a comment, but never a top-level post"
  137.8182    "I've posted in Discussion, but not Main"
  138.9394    "I've posted in Main"
Breakdown of average "SATscoresoutof1600" by "LessWrongUse":
 1469.5495    "I lurk, but never registered an account"
 1462.1429    "I've registered an account, but never posted"
 1488.0000    "I've posted a comment, but never a top-level post"
 1510.2941    "I've posted in Discussion, but not Main"
 1515.1515    "I've posted in Main"
Breakdown of average "SATscoresoutof2400" by "LessWrongUse":
 2202.9848    "I lurk, but never registered an account"
 2242.7273    "I've registered an account, but never posted"
 2211.8000    "I've posted a comment, but never a top-level post"
 2244.5455    "I've posted in Discussion, but not Main"
 2212.7273    "I've posted in Main"
Breakdown of average "KarmaScore" by "Sequences":
    0.0000    "Never even knew they existed until this moment"
    2.4444    "Know they existed, but never looked at them"
   48.9677    "Some, but less than 25%"
  105.9658    "About 25% of the Sequences"
  280.6434    "About 50% of the Sequences"
  704.3240    "About 75% of the Sequences"
 1185.0264    "Nearly all of the Sequences"
Breakdown of average "TimeinCommunity" by "LessWrongUse":
   17.3262    "I lurk, but never registered an account"
   23.6875    "I've registered an account, but never posted"
   30.0064    "I've posted a comment, but never a top-level post"
   29.5035    "I've posted in Discussion, but not Main"
   44.9663    "I've posted in Main"
Breakdown of average "AutismScore" by "LessWrongUse":
   24.3504    "I lurk, but never registered an account"
   28.0526    "I've registered an account, but never posted"
   22.7227    "I've posted a comment, but never a top-level post"
   23.7917    "I've posted in Discussion, but not Main"
   23.7391    "I've posted in Main"
Breakdown of average "KarmaScore" by "Profession":
  200.9000    "Other "social science""
  368.1176    "Biology"
  898.8333    "Statistics"
  373.3000    "Art"
  441.0035    "Computers (practical: IT, programming, etc.)"
  193.5263    "Business"
  260.2281    "Finance / Economics"
 1129.8438    "Computers (AI)"
 1719.5161    "Philosophy"
  113.8507    "Computers (other academic, computer science)"
  351.9024    "Engineering"
  335.8081    "Other"
  531.9570    "Mathematics"
 2505.8571    "Medicine"
  393.6364    "Neuroscience"
   81.5000    "Law"
  530.1607    "Physics"
  498.2941    "Other "hard science""
 1033.7391    "Psychology"
Breakdown of average "KarmaScore" by "Degree":
  612.8599    "Bachelor's"
  503.3694    "Master's"
  195.7708    "None"
  241.7024    "High school"
 1484.6757    "2 year degree"
   77.9444    "MD/JD/other professional degree"
  389.4167    "Other"
 1099.7925    "Ph D."

Script source here.


Comments sorted by top scores.

comment by Unnamed · 2012-12-22T22:17:52.489Z · score: 4 (6 votes) · LW(p) · GW(p)

The Intelligence metric I used here (which combines the 5 questions on SAT, ACT, and IQ) correlates positively with each of the five measures of ties to LW, with correlations ranging from 0.07 to 0.17. Correlations (all statistically significant at .05 level, n range from 813 to 874):

0.10 karma (log(karma+1))
0.17 sequence reading (treated as continuous scale)
0.11 LW use (treated as continuous scale)
0.07 time in community (sqrt)
0.16 meetup attendance
0.16 composite of those 5 questions

comment by buybuydandavis · 2012-12-22T06:55:41.067Z · score: 3 (3 votes) · LW(p) · GW(p)

I would think that all the karma score break downs by category would be better by median than mean, as the top scorers would really skew the data.

comment by John_Maxwell (John_Maxwell_IV) · 2012-12-22T07:17:29.866Z · score: 5 (5 votes) · LW(p) · GW(p)
Breakdown of median "IQ" by "LessWrongUse":
  136.0000    "I lurk, but never registered an account"
  140.0000    "I've registered an account, but never posted"
  139.5000    "I've posted a comment, but never a top-level post"
  138.0000    "I've posted in Discussion, but not Main"
  138.0000    "I've posted in Main"

Breakdown of median "SATscoresoutof1600" by "LessWrongUse":
 1490.0000    "I lurk, but never registered an account"
 1515.0000    "I've registered an account, but never posted"
 1505.0000    "I've posted a comment, but never a top-level post"
 1515.0000    "I've posted in Discussion, but not Main"
 1530.0000    "I've posted in Main"

Breakdown of median "SATscoresoutof2400" by "LessWrongUse":
 2245.0000    "I lurk, but never registered an account"
 2250.0000    "I've registered an account, but never posted"
 2240.0000    "I've posted a comment, but never a top-level post"
 2260.0000    "I've posted in Discussion, but not Main"
 2180.0000    "I've posted in Main"

Breakdown of median "KarmaScore" by "Sequences":
    0.0000    "Never even knew they existed until this moment"
    0.0000    "Know they existed, but never looked at them"
    0.0000    "Some, but less than 25%"
    0.0000    "About 25% of the Sequences"
    4.0000    "About 50% of the Sequences"
   59.0000    "About 75% of the Sequences"
  100.0000    "Nearly all of the Sequences"

Breakdown of median "TimeinCommunity" by "LessWrongUse":
   12.0000    "I lurk, but never registered an account"
   19.0000    "I've registered an account, but never posted"
   24.0000    "I've posted a comment, but never a top-level post"
   24.0000    "I've posted in Discussion, but not Main"
   55.0000    "I've posted in Main"

Breakdown of median "AutismScore" by "LessWrongUse":
   24.0000    "I lurk, but never registered an account"
   26.0000    "I've registered an account, but never posted"
   23.0000    "I've posted a comment, but never a top-level post"
   24.0000    "I've posted in Discussion, but not Main"
   22.0000    "I've posted in Main"

Breakdown of median "KarmaScore" by "Profession":
    0.0000    "Other "social science""
    0.0000    "Biology"
    0.0000    "Statistics"
    0.0000    "Art"
    3.0000    "Computers (practical: IT, programming, etc.)"
    7.0000    "Business"
   10.0000    "Finance / Economics"
   65.0000    "Computers (AI)"
   10.0000    "Philosophy"
    0.0000    "Computers (other academic, computer science)"
    0.0000    "Engineering"
    0.0000    "Other"
   18.0000    "Mathematics"
    0.0000    "Medicine"
   10.0000    "Neuroscience"
    0.0000    "Law"
    6.5000    "Physics"
   10.0000    "Other "hard science""
   43.0000    "Psychology"

Breakdown of median "KarmaScore" by "Degree":
    0.0000    "Bachelor's"
    8.0000    "Master's"
    0.0000    "None"
    5.0000    "High school"
    0.0000    "2 year degree"
    0.0000    "MD/JD/other professional degree"
   80.5000    "Other"
  100.0000    "Ph D."

Of 1063 survey takers, 981 reported a karma score, and 466 reported that it was zero (Yvain instructed users to put in zero if they didn't have an account). I'm actually surprised there weren't more zeros.

comment by buybuydandavis · 2012-12-22T20:15:27.881Z · score: 1 (1 votes) · LW(p) · GW(p)

Of 1063 survey takers, 981 reported a karma score, and 466 reported that it was zero (Yvain instructed users to put in zero if they didn't have an account). I'm actually surprised there weren't more zeros.

Which is why you do exploratory data analysis, and look at your distributions first.

Me, I'm surprised that people who don't post bothered to take a long survey. You have to have an account to post, right? Do you have access to the system data, so that you could compare the distribution of karma scores for registered users versus the poll results?

Next suggestion - I'd probably make separate analyses based on LessWrongUse - never posted and never registered as one class , posted and more (interactive users) as another. And substitute rank(karmascore) over the whole population for karmascore.

comment by Kevin Kostlan (kevin-kostlan) · 2019-10-01T15:20:40.170Z · score: 1 (1 votes) · LW(p) · GW(p)

Why are the IQ's Mensa levels when unlike mensa we don't have a cutoff? This is a *strong* self-selection effect.

comment by peuddO · 2012-12-28T07:18:07.276Z · score: 1 (1 votes) · LW(p) · GW(p)

Time to abandon cryosleep. I hope this post isn't too big.

This comparison seems to rely on too many dubious assumptions: First, that the IQ scores reported in the survey were precise for a uniform standard deviation. Second, that these scores correlate strongly with the forms of competence relevant to LessWrong. Third, that this correlation will further correlate strongly with the total Karma of a user. Fourth, it rests on an understanding of the Dunning-Kruger effect and its implications that I either don't understand or don't at all agree with.

Pertaining to the question of IQ, I have yet to see a LessWrong survey that required specificity on the IQ question. Standard deviation and test type aren't included with the answers, and so the answers are hard to standardize. The internal relationship of these scores is obfuscated by us not knowing which tests they were derived from. Yvain's request to only include "respectable tests" is sensible, but still leaves a lot of room for interpretation, and could reasonably include differing standard deviations. Even assuming a strong prior for the more common standard deviations of 15 and 16, a lot of these IQ scores are out of bounds of what might be considered accurate testing. Tests with a wide battery of subtests will be especially likely to make some scores roof out and lead to distortions of the average - g may be strong, but it's difficult to combat test design. Don't expect anything above mensa entry level to be measured especially accurately. 150 is probably indicative of higher ability than 135, but it's hard to say how strongly a score has been distorted by an arbitrary roof to the level tested.

IQ score totals (when not accounting for standard deviations) are not especially well-correlated to begin with, and not accounting for these variables and many others besides them will only compound that problem. I'm sure there's interesting stuff you could achieve with the survey numbers, but I doubt an accurate intra-community comparison of the user base is one of those things.

As for the second and third problem, IQ obviously has strong correlations with some forms of competence. However, I would also expect most posters here to be at least vaguely aware of the Dunning-Kruger effect or the general concept it derives from, and so post selectively on the stuff they are fairly sure they know. This would skew the correlations towards widely supported sentiments, well-crafted posts and total volume of posts, except for users who are polymaths of some description (of which, admittedly, we have a few).

As for the fourth problem: if the IQ results are anywhere near accurate, sub-normal ability is very abnormal on LessWrong. Most of us posting here aren't stupid, or even close to normal intelligence, let alone significantly sub-normal intelligence. The Donning-Kruger effect does not operate on a sliding scale where people of higher intelligence tend to think of themselves as even smarter. It inverts. People of actual ability tend to underestimate themselves. Accounting for that, it is also difficult to quantify the effect on people who are far above the LessWrong mean IQ (if we accept that measurement at all), mainly because those people are very rare. Do they tend to hold extremely pessimistic views of their own ability, or do they estimate more rational than less intelligent individuals? It's difficult to muster a normative study of something like that - being far out of bounds of any conventional IQ test certainly doesn't help - and arguing from conjecture would be inaccurate in lieu of some very strong priors.

comment by John_Maxwell (John_Maxwell_IV) · 2012-12-28T08:30:05.949Z · score: 1 (1 votes) · LW(p) · GW(p)

Good to know, thanks.

comment by gjm · 2012-12-22T11:08:46.415Z · score: 1 (1 votes) · LW(p) · GW(p)

Interesting, but I'm having trouble thinking of any possible set of figures that would indicate a Dunning-Kruger effect. What was it you were looking for?

Maybe we have a difference in terminology. To me, the Dunning-Kruger effect is where less competent people have higher opinions of their ability than more competent people. The LW survey could have shown that if, e.g.,people with lower IQ as measured by had higher self-reported IQs. But I don't see how any statistical relationship between intelligence and LW karma could be an instance of D-K. What am I missing?

comment by John_Maxwell (John_Maxwell_IV) · 2012-12-22T13:57:26.493Z · score: 0 (2 votes) · LW(p) · GW(p)

I was equating "inclination to contribute to LW" with "opinion of one's ability" (in this case to come up with useful and accurate insights). In other words, if D-K is correct, maybe there are a bunch of high-IQ LW readers who never contribute (because they're underestimating their ability, and they don't think they have anything useful to say) and lots of low-IQ LW readers who contribute lots (because they incorrectly see themselves as brilliant and full of insights).

Of course, voting does give pretty good feedback. But still, interesting that there's no apparent trend for folks with higher IQs to contribute more or be more willing to post in Main.

comment by ChristianKl · 2012-12-23T13:44:12.537Z · score: 2 (2 votes) · LW(p) · GW(p)

I don't think online contribution has much to do with estimation of your own ability. Contributing is just a habit. Someone might learn his contributing habit on reddit. Afterwards when he reads something on LessWrong where he has an opinion, he will also write a comment if he has the time.

comment by gjm · 2012-12-22T16:12:36.258Z · score: 2 (2 votes) · LW(p) · GW(p)

Hmm. But high LW karma indicates not only inclination to contribute, but also sucess in contributing usefully and accurately. Someone in the grip of the Dunning-Kruger effect who posts a lot of useless insight-free twaddle will get relatively little karma. So if there's no observed relationship between intelligence and LW karma, and if intelligence correlates with value of contributions, then that is evidence of something Dunning-Kruger-esque.

comment by dougclow · 2012-12-23T08:18:11.491Z · score: 4 (4 votes) · LW(p) · GW(p)

Someone in the grip of the Dunning-Kruger effect who posts a lot of useless insight-free twaddle will get relatively little karma.

How do you know? (Genuine question.)

comment by gjm · 2012-12-23T15:08:54.510Z · score: 2 (2 votes) · LW(p) · GW(p)

I don't know; it's a guess, but it seems a fairly reasonable one. It looks to me as if useless insight-free twaddle tends to get little (or, often, substantially negative) karma.

I'm not claiming this as some sort of exceptionless universal truth, though. Probably carefully tuned insight-free twaddle could get quite a lot of karma.

What's your opinion on this?

comment by MixedNuts · 2012-12-24T12:24:39.667Z · score: 2 (2 votes) · LW(p) · GW(p)

Um. From experience (I'm very reluctant to give examples, for obvious reasons): Comments that don't take much thought to come up with seem to net around as much karma as moderately thoughtful ideas, and can be produced in much less time and effort. Of course excellent insights shoot through the roof, but someone who can't produce many of them in a lifetime can get much more karma pointing out the obvious and making silly jokes than racking their brain. Also, basic ideas in domains LW knows little about gain a lot of karma, which is defensible as comparative advantage, but so do well-phrased restatements of basic ideas on LW.

comment by gjm · 2012-12-24T16:44:07.604Z · score: 1 (1 votes) · LW(p) · GW(p)

I'm not claiming that the karma system successfully elicits insight-optimizing behaviour from LW participants (so, in particular, I don't think anything I've said is inconsistent with saying that some people can get more karma from pointing out the obvious than from racking their brain). Only that there's some correlation between karma and quality and that an intelligent, knowledgeable, wise person will (for any given level of effort and matchedness between their ideas and LW's collective prejudices and interests) tend to do better than someone less intelligent, knowledgeable and wise.

I still don't have anything I could reasonably call evidence for this, of course, but if you disagree with it then my casual observations apparently differ from yours.

So what's a third party to think? Well, obviously they should take your position more seriously than mine because you have a bit more karma than I do. No, wait ...

comment by dougclow · 2012-12-23T19:32:12.776Z · score: 1 (1 votes) · LW(p) · GW(p)

I think it's very likely to be the case that karma score is strongly correlated with behaviours of which the LW community approves. I think it's probably the case that rewarded behaviours are more likely to include demonstrating insight, or at least, particular sorts of insight. I think it might be the case that these are correlated with being useful but am not (yet?) convinced.

Fundamentally, it strikes me strongly that it's very, very hard to assess this empirically. For my day job I've been thinking about how you could do a very similar task (for a totally different site) and it's really very hard without either getting a few known-expert people (who have had no prior contact with the site) to re-rate all the contributions (and hoping for decent inter-rater reliability) or being able to grab a stratified sample of participants and giving them some sort of independent test.

Your good point about carefully tuned insight-free twaddle amused me (thinking of a few examples) and then when I started thinking about how I could tell that apart ... rather disturbed me.

comment by ygert · 2012-12-23T08:51:17.931Z · score: 1 (1 votes) · LW(p) · GW(p)

It is testable. Costly to test, but testable. If someone wants to test it, they could make a new account, and use it to post lots of useless insight-free twaddle, and see how much karma they get on that account. However, probably the costs of that experiment would be to high to actually pull it off. (Both the cost in time to the tester, and the cost to the signal/noise ratio of Less Wrong of adding that many more posts of useless insight-free twaddle.)

comment by John_Maxwell (John_Maxwell_IV) · 2012-12-23T00:28:07.380Z · score: 1 (1 votes) · LW(p) · GW(p)

that is evidence of something Dunning-Kruger-esque.


Had a thought: It's possible that D-K susceptible folks posted some comments, got voted down, decided they didn't like LW, and left.