post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by Unnamed · 2020-01-20T03:17:24.562Z · LW(p) · GW(p)

The popular conception of Dunning-Kruger has strayed from what's in Kruger & Dunning's research. Their empirical results look like this, not like the "Mt. Stupid" graph.

comment by Isnasene · 2020-01-20T06:14:34.366Z · LW(p) · GW(p)

When I first looked at these plots, I thought "ahhh, the top one has two valleys and the bottom one has two peaks. So, accounting for one reflecting error and the other reflecting accuracy, they capture the same behavior." But this isn't really what's happening.

Comparing these plots is a little tricky. For instance, the double-descent graph shows two curves -- "train error" (which can be interpreted as lack of confidence in model performance) and "test error" (which can be interpreted as lack of actual performance/lack of wisdom). Analogizing the double-descent curve to Dunning Kruger might be easier if one just plots "test error" on the y-axis and "train error" on the x-axis. Or better yet 1-error for both axes.

But actually trying to dig into the plots in this way is confusing. In the underfitted regime, there's a pretty high level of knowledge (ie test error near the minimum value) withpretty low confidence (ie train error far from zero). In the overfitted regime, we then get double-descent into a higher level of knowledge (ie test error at the minimum) but now with extremely high confidence. Maybe we can tentatively interpret these minima as the "valley of despair" and "slope of enlightenment" but

  • In both cases, our train error is lower than our test error -- implying a disproportionate amount of confidence all the time. This is not consistent with the Dunning-Kruger effect
    • The "slope of enlightenment" especially has way more unjustified confidence (ie train error near zero) despite still having some objectively pretty high test error (around 0.3). This is not consistent with the Dunning-Kruger effect
  • We see the same test error associated with both a high train error (in the underfit regime) and with a low train error (in the overfit regime). The Dunning-Kruger effect doesn't capture the potential for different levels of confidence at the same level of wisdom

To me, the above deviations from Dunning-Kruger make sense. My mechanistic understanding of the effect is that it appears in fields of knowledge that are vast, but whose vastness can only be explored by those with enough introductory knowledge. So what happens is

  • You start out learning something new and you're not confident
  • You master the introductory material and feel confident that you get things
  • You now realize that your introductory understanding gives you a glimpse into the vast frontier of the subject
  • Exposure to this vast frontier reduces your confidence
  • But as you explore it, both your understanding and confidence rise again

And this process can't really be captured in a set-up with a fixed train and test set. Maybe it could show up in reinforcement learning though since exploration is possible.