comment by Bucky ·
2020-11-05T12:15:44.928Z · LW(p) · GW(p)
In my 2020 predictions [LW · GW] I mentioned that I found the calibration buckets used on e.g. SSC (50%, 60%, 70%, 80%, 90% and 95%) difficult to work with at the top end as there is a large difference in odds ratio between adjacent buckets (2.25 between 80% and 90%, 2.11 between 90% and 95%). This means that when I want to say 85% both buckets are a decent way off.
I suggested at the time using 50%, 65%, 75%, 85%, 91% and 95% to keep the ratios between buckets fairly similar across the range (maximum 1.89) and to work with relatively nice round numbers.
Alternatively I suggested not having a 50% bucket as answers here don't help towards measuring calibration and you could further reduce the gaps between buckets without increasing the number of buckets.
At the time I couldn't come up with nice round percentage values which would keep the ratios similar. The best numbers I got were 57%, 69%, 79%, 87%, 92%, 95% (max difference of 1.78) which seemed hard to work with as they're difficult to remember.
An alternative scheme I've come up with is not to use percentage values but to use odds ratios. The buckets would be:
The percentage equivalents are similar to the scheme mentioned previously with the same max difference between buckets. I prefer this as it has a simple pattern to remember and adjacent buckets are easy to compare (e.g. for every 3 times X doesn't occur, would I expect X to occur 7 or 12 times?).
I've tried this out and found it nice to work with (not initially but after getting used to it) but that may just be a personal thing.