How do I choose the best metric to measure my calibration?

christiankl

How do I choose the best metric to measure my calibration?

post by ChristianKl · 2017-01-04T19:06:39.059Z · LW · GW · Legacy · 3 comments

This is a link post for http://stats.stackexchange.com/q/253443/3807

3 comments

3 comments

Comments sorted by top scores.

comment by Manfred · 2017-01-05T01:39:13.367Z · LW(p) · GW(p)

"Proper scoring rule" just means that you attain the best score by giving the most accurate probabilities you can. In that sense, any concave proper scoring rule will give you a good feedback mechanism. The reason people like log scoring rule is because it corresponds to information (the kind you can measure in bits and bytes), and so a given amount of score increase has some meaning in terms of you using your information better.

The information measured by your log score is identical to Shannon's idea of information carried by digital signals. When a binary event is completely unknown to you, you can gain 1 bit of information by learning about it. For events that you can predict to high accuracy, the entropy of the event (according to your distribution) is lower, and you gain less information by learning the result. In fact, if you look at the expected score, it goes to zero as the event becomes more and more predictable (though you're still incentivized to answer correctly).

But I think this leaves out something interesting that I don't have a good answer for, which that this straightforward interpretation only works when you, the human, don't screw up. When you do screw up, I'm not sure there's a clear interpretation of score.

Replies from: wubbles

↑ comment by wubbles · 2017-01-07T15:21:00.558Z · LW(p) · GW(p)

The logarithmic scoring rule measures the information carried by the event given your predictions. Reducing its expectation corresponds to reducing the information carried by the event when it arrives.

comment by ignoranceprior · 2017-01-05T02:49:43.450Z · LW(p) · GW(p)

This may or may not help: Choosing a Strictly Proper Scoring Rule

How do I choose the best metric to measure my calibration?

Contents

3 comments