How do I choose the best metric to measure my calibration?
post by ChristianKl · 2017-01-04T19:06:39.059Z · LW · GW · Legacy · 3 commentsThis is a link post for http://stats.stackexchange.com/q/253443/3807
Contents
3 comments
3 comments
Comments sorted by top scores.
comment by Manfred · 2017-01-05T01:39:13.367Z · LW(p) · GW(p)
"Proper scoring rule" just means that you attain the best score by giving the most accurate probabilities you can. In that sense, any concave proper scoring rule will give you a good feedback mechanism. The reason people like log scoring rule is because it corresponds to information (the kind you can measure in bits and bytes), and so a given amount of score increase has some meaning in terms of you using your information better.
The information measured by your log score is identical to Shannon's idea of information carried by digital signals. When a binary event is completely unknown to you, you can gain 1 bit of information by learning about it. For events that you can predict to high accuracy, the entropy of the event (according to your distribution) is lower, and you gain less information by learning the result. In fact, if you look at the expected score, it goes to zero as the event becomes more and more predictable (though you're still incentivized to answer correctly).
But I think this leaves out something interesting that I don't have a good answer for, which that this straightforward interpretation only works when you, the human, don't screw up. When you do screw up, I'm not sure there's a clear interpretation of score.
Replies from: wubblescomment by ignoranceprior · 2017-01-05T02:49:43.450Z · LW(p) · GW(p)
This may or may not help: Choosing a Strictly Proper Scoring Rule