Posts
Comments
It's because I changed it to only show estimations for probabilities which have received at least 4 answers and you have not yet answered enough questions. I am not confident that this change is good and I might revert it.
Thanks for your recommendation! I have corrected the problem with the asymmetric distribution (now computing the whole distribution) and added a second graph showing exactly what you suggest and it looks good.
Unfortunately for the first approach that I implemented, the MAP is not always within a 90% confidence interval (It is outside of it when the MAP is 1 or 0). I agree that it is confusing and seems undesirable.
(You might need to hard-refresh the page if you want to see the update CTRL+SHIFT+R)
You are right about the proportion of dots within the error bars. This sounds like something I would want to change.
100% is not within the error bar, because they are not exactly error bars, but bayesian estimations of where your true probability lies using a uniform prior between 0% and 100%. If I pick a coin which has a probability p of Head picked uniformly between 0% and 100%, then after observing 4 Heads out of 4 throws, you should still believe in average that the probability of Head is 80% ( = n_heads / (n_throws + 1) ) in average and a 75% confidence interval would not contain the probability 100%.
So you need to show more proofs that your 100% answers are indeed right 100% of the time. I agree this is confusing, and I want to change it for the better, but I am unsure how.
For all answers with probability p, I count the number of times it has been the right answer and a wrong answer. If anyone as a recommendation on how to compute the top and bottom percentage of the error bars from these, I would really appreciate it.
I was waiting to make the app a bit better first. I made a post out of it today:
https://www.lesswrong.com/posts/7KRWCRBmvLhTMZB5Y/bayes-up-an-app-for-sharing-bayesian-mcq
Here you can see a graph of calibration of a user (available in the app):
https://twitter.com/le_science4all/status/1225498307348377600
And here you can see graphs of calibration for some of the quizzes of the app:
https://twitter.com/le_science4all/status/1225527782647705600
They clearly show overconfidence in the answers of the participants.
Since I read this post I have implemented this small app:
- Github: https://github.com/Stokastix/bayes-up
- Deployed at: https://bayes-up.web.app/
- Using MCQ from here: https://opentdb.com/
I make apps only as a hobby, so it is not bug-free, scalable, or great. Feel free to send advice, comments, or requests.
Several similar apps exists which all had to solve the difficulty of making a set of interesting questions. I could make a small list if you are interested.