Introducing bayescalc.io

post by Adele Lopez (adele-lopez-1) · 2023-07-07T16:11:12.854Z · LW · GW · 29 comments

This is a link post for https://bayescalc.io

I made a simple online calculator for doing elementary hypothesis testing!

Link to example shown.

I was disappointed that an intuitive and easy-to-use app for using bayes' theorem apparently did not exist, so I decided to make it. My goal was to make something that:

  1. Helped people correctly and quickly evaluate the effect of evidence while comparing hypotheses.
  2. Was easy enough for someone who didn't know math to use.
    1. And which also helped show what the math was doing in an intuitive way, so that you didn't have to trust math you didn't understand.
  3. Felt good enough to use that it would actually be used.
  4. Could be used to share simple models of things in a way that would help people have more productive discussions, and promote shared model building.

Hopefully I've at least made substantial progress on these goals, and I'd really appreciate feedback on ways in which it falls short! This includes even minor interface or design issues. You can leave feedback as a comment here, or on the issues page.

I'd also be really happy to see people share examples they've made in the comments!

 

29 comments

Comments sorted by top scores.

comment by Misaligned-Semi-intelligence (MisalignedIntelligence) · 2023-07-07T21:18:41.508Z · LW(p) · GW(p)

I think this is very well-made and I already have uses for it. 

I'm not sure how intuitive it would be for someone who really doesn't know math, and who was new to the concept of bayes' theorem entirely. It's easy to forget how confusing things (especially math-related things) can be once you have the benefit of hindsight.

I think something like a "show me an example" button that fills it with realistic data could help. With descriptive labels that connect the written description on the right with the different components in the visual representation. As well as a clearer "start here" visual on the screen. I know it seems obvious that "How to use" is where to start, but it didn't immediately draw my attention when I opened the page, and I think that helps with accessibility.

If you really wanted to push accessibility, a "wizard" that asks questions to help you fill out the data, and that help clarify what questions those percentages are actually an answer to.

Really I think it's great as-is. I only think it could be improved a little considering goal 2.

Replies from: fxgn, adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-08T17:59:24.494Z · LW(p) · GW(p)

Thanks for the feedback, I'm really happy to hear that you already have uses for it!

You're right about needing examples; I'm thinking I'll add a tutorial that walks someone completely unfamiliar with Bayes' theorem through what it means and how it works, with lots of examples. That will take a while to design and write though.

I'm curious to know if other people felt the same way "How to use" part. I'm reluctant to make it more attention grabbing, because I want it to feel unobtrusive. My current thinking is that the main interface will catch the user's attention first, and if that's not clear they'll look at the wall of text to the right.

Instead of a wizard, I was thinking of adding a feature that explains what a specific component means when the user is hovering over it. Does that seem like it would address the issue adequately? I don't like wizards because I feel like they get in the way, but maybe that's an unusual preference.

Replies from: programcrafter
comment by ProgramCrafter (programcrafter) · 2023-07-08T20:08:12.657Z · LW(p) · GW(p)

Bayes' theorem already has a tutorial! However, I think that more common examples than there will improve the page.

For example, "do I really have certain illness" isn't as resonating as "does someone offend me intentionally", though the latter is a bit too emotional. I think that "is certain letter fraud" would make a good example - for instance, it contains different pieces of evidence.

comment by romeostevensit · 2023-07-08T07:48:08.747Z · LW(p) · GW(p)

Nice! relatedly: some EA made a Shapley Value calculator: http://shapleyvalue.com/

comment by Sabiola (bbleeker) · 2023-07-08T15:49:28.777Z · LW(p) · GW(p)

Nitpick: in the help text, "effect your beliefs" should be "affect your beliefs".

Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-09T07:28:20.407Z · LW(p) · GW(p)

Fixed now (but may require a cache refresh)!

comment by ProgramCrafter (programcrafter) · 2023-07-08T13:35:53.286Z · LW(p) · GW(p)

Could you add probability logarithms (credibility/evidence decibels), please?

Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-09T07:27:49.391Z · LW(p) · GW(p)

Added! I hope you like the design :)

comment by ryan_b · 2023-07-07T20:05:11.840Z · LW(p) · GW(p)

Strong upvote for Did The Thing!

comment by Mart_Korz (Korz) · 2023-07-07T19:43:08.293Z · LW(p) · GW(p)

After playing around für a few minutes, I like your app with >95% Probability ;) compare this bayescalc.io calculation

comment by charcombination · 2023-07-28T10:43:32.257Z · LW(p) · GW(p)

I came across your site from a comment you made on the discussion about the UAP Disclosure Act [LW(p) · GW(p)]. Since my comment focuses mostly on the general usage of the tool and the application of Bayes, I'll post it here.

The design is very nice and the tool itself is very intuitive. It would be nice if every evidence element had a button to remove it, currently this is only possible for the last one.


For someone not too familiar with the practical application of Bayes, I'm wondering how to rate the probability for evidence when it is not known. In your example, you give "not aliens" a probability of 99.999999% - and the probability that politicians would take this seriously in such a world 5%. This seems like a reasonable guess for the time before they took it seriously. Now they do, so it happened in a world where aliens (presumably) certainly don't exist. Could I not very well reason that it's therefore also 50% - 90% likely to happen? How do I choose this number - intuition, base rates?

Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-30T07:16:58.426Z · LW(p) · GW(p)

Thanks, I'm very glad you find it intuitive!

Only allowing the last piece of evidence to be deleted was a deliberate decision. The problem is that deleting evidence from the middle changes the meaning of all the likelihood values (the sliders) for all of the evidence below it, and which therefore may change in value. If I allowed it to be deleted anyway, it would make it very easy to mistakenly use the now incorrect values (and give the impression that that was fine). I know this makes it more annoying and inconvenient, but it's because the math itself is annoying and inconvenient!

The meaning of the e.g. Hypothesis B slider for Evidence #3 is "In what percentage of worlds where Hypothesis B is true would I see Evidence #3?" (hopefully this was clear, just reiterating to make sure we're on the same page). This is called the likelihood of Evidence #3 given Hypothesis B. When answering this, we don't use the fact that we've seen this piece of evidence (in this case that politicians are taking this seriously), which is always just going to be true for actual evidence. Hopefully that makes sense?

As for choosing this number, or the prior values, it's in general a difficult problem that has been debated a lot. My recommendation is that you make up numbers that feel right (or at least are not obviously wrong), and then play around with the sliders a bit to see how much the exact value effects things. The intended use of the tool is not to make you commit to numbers, but to help you develop intuition on how much to update your beliefs given the evidence, as well as to help you figure out what numbers correspond to your intuitive feelings.

If you're serious about choosing the right number, then here is what it takes to figure it out: Each hypothesis represents a model of how some part of the world works. To properly get a number out of it, you need to develop the model in technical detail, to the point where you can represent it with an equation or a computer program. Then, you need to set the evidence above the one you're computing the likelihood for to true in your model. You then need to compute what percentage of the time this evidence turns out to be true in the model. A nice general way to do this is to run the model a whole bunch of times, and see how often it happens (and if reality has been kind enough to instantiate your model enough times, then you might be able to use this to get a "base rate"). Or if your model is relatively simple, you might be able to use math to compute the exact value. This is typically a lot of work, and doesn't actually help train your intuition about the intuitive mental models you actually use on a day-to-day basis much. But going through this process is helpful for understanding what the numbers you make up are trying to be. I hope this is helpful and not just more confusing.

comment by Zane · 2023-07-11T17:45:33.562Z · LW(p) · GW(p)

This is cool! You might want to reposition the "How to use" message a little; it's currently covering up the button that lets you add more hypotheses, so it took me a while to find it.

Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-11T19:18:38.131Z · LW(p) · GW(p)

Thanks!

That was a deliberate decision designed to emphasize the core features of the app, but enough people have pointed this out now that I'm considering changing it.

comment by Odd anon · 2023-07-11T08:36:09.769Z · LW(p) · GW(p)

Suggestions:

  • Allow for more than two hypotheses.
  • Maybe make the sliders "snap" to integer values, so that it looks cleaner.
  • Working with evidence percents "given all the evidence above" is sometimes hard to do. It may be useful to allow evidence-combination blocks just to allow filling things in as groups, even if only one of the numbers actually goes into the result, just so that the user can see that it all adds up to 100% and none of the dependent odds seem unreasonable.
  • Tooltips giving explanations of the terms "Prior" and "Posterior" could be good.
  • Some mouse-hover effect for the sliders' areas might help.
Replies from: programcrafter, adele-lopez-1
comment by ProgramCrafter (programcrafter) · 2023-07-11T12:47:36.618Z · LW(p) · GW(p)

More than two hypotheses are already supported, you just need to close the "How to use" box.

comment by Adele Lopez (adele-lopez-1) · 2023-07-11T16:04:35.366Z · LW(p) · GW(p)

Thanks for the suggestions!

As ProgramCrafter mentioned, more (up to five) hypotheses are already supported. It's limited to 5 because finding good colors is hard, and 5 seemed like enough - but if you find yourself needing more I'd be interested to know.

The sliders already snap to tenth values (but you can enter more precise values in the textbox), and I think snapping to integers would sacrifice too much precision. It's plausible that fifths could be better though, I'll have to test that. I do want to introduce a way to allow for more precise control while dragging the sliders, which might address this concern to some extent by making it easy to stop at an integer value exactly if desired. But I haven't thought of a good interface for doing that yet.

That sounds cool, but I'm not sure how to make a good interface for that that wouldn't look too cluttered. I'm also worried people would misuse it for convenience. But I'll keep thinking about it!

Tooltips to explain things would be cool and I have a similar thing planned already.

That's a good idea, thanks!

comment by Sting · 2023-07-10T03:29:34.039Z · LW(p) · GW(p)

Things I like:

  1. The dark color theme looks good
  2. It's nice to be able to set the hypotheses as a non-percentage, such as 10:1, and then click "%" to convert to a percentage.
  3. Being able to see the decibels for each piece of evidence is nice. So is being able to link or export a calculation. 

Possible improvements:

  1. Adding 10 decibels of evidence results in a different outcome depending on whether the decibels are added one-at-a-time or all-at-once. Compare [case 1](https://bayescalc.io/#KCdoLWVzKic3QSd-N0InMnAzMTQsMS4wMnBvc3RlMzU0LDU0MmU2KidFNiAxJzJsaWtlbGlob29kcypbNDEsMC4xXV0pKiFbLXlwb3RoZXMyXX4zcmlvcl9vZGRzKjQwLjA2dmlkZW5jZTdILWlzIAE3NjQzMi0qXw==) and [case 2](https://bayescalc.io/#KCdoRmVzRCdLQSd-S0InSXBKMTAuMCwxLjBJcG9zdGVKNTEuNzc3OTg5NTA1NDA5MTMsNDguMjIyMDEwNDk0NTkwODc2SWVHRCdMMS0yLTMtNC01LTYtNy04LTktMTAnSWxpa2VsaWhvb2RzRE1NTU1DXSlDLC0nfkxDKlswLjA4LDAuMV1EIVtGeXBvdGhlc0d2aWRlbmNlSV1-SnJpb3Jfb2Rkc0RLSEZpcyBMRUcgTSoqAU1MS0pJR0ZEQy0qXw==)
  2. When the "help" is open, the "add new hypothesis" button and decibels are hidden. 
  3. A button to toggle between showing decibels and bits of evidence would be nice. I more naturally think in bits. 
  4. Enable equations in the evidence percentage fields. It's nicer to type 1/3 rather than 33.3333333333.
  5. Allow deleting any piece of evidence, not just the last piece.
Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-10T18:26:49.036Z · LW(p) · GW(p)

Thank you! I'm glad you like those features, and I'm also glad to hear that the way the percent button feature worked was clear to you.

Regarding the possible improvements:

  1. That's not a bug, it's just a limitation of the choice to show only one digit after the decimal. The number of decibels in case 2 for each evidence is 0.96910013..., whereas in case 1 it's exactly 10.

  2. That's a deliberate nudge to suggest that the new hypothesis and decibel features are more advanced and not part of the essential core of the app.

  3. That's a good idea, I'll probably do that at some point.

  4. That's also a good idea but seems fairly complicated to implement, so it will have to wait until I've finished planned improvements with a higher expected ROI.

  5. That's deliberate, because deleting evidence changes the meaning of the likelihoods for all subsequent evidence. Thus, having to delete all the evidence following the evidence you want to delete is a more honest way to convey what needs to be done, and prevents the user from shooting themselves in the foot by assuming that the subsequent likelihoods are independent. I'll explain this in the more fleshed out version of the help panel I have planned.

Replies from: Sting
comment by Sting · 2023-07-10T23:05:48.559Z · LW(p) · GW(p)

I see! Thank you for the detailed explanations. 

Regarding point 1: The posterior percentages are shown to 5 decimal places, so I wrongly assumed that 1.0 db meant exactly 1. 

What do you think of showing the sum of the decibels of all pieces of evidence? That would have prevented my confusion. 

You could also include 2 digits after the decimal for quantities smaller than 1.1. (Although this has the cost of introducing clutter.)

Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-11T16:11:38.487Z · LW(p) · GW(p)

I like the idea of showing the total decibels, I'll probably add that in soon!

comment by mukashi (adrian-arellano-davin) · 2023-07-08T00:40:14.161Z · LW(p) · GW(p)

Great! Can you make that, if I input P for hypothesis A, 1 - P appears automatically for Hypothesis B?

Replies from: adele-lopez-1
comment by Adele Lopez (adele-lopez-1) · 2023-07-08T00:50:37.517Z · LW(p) · GW(p)

Hmm, you could use the slider to set the prior P for hypothesis A and it will set the prior for hypothesis B to 1 - P; does that not work for you for some reason?

The problem with having that behavior when you type in the number is that I want people to be able to enter the priors as odds, so I don't want to presume that the other numbers will change to allow for that.

comment by fx (fxgn) · 2023-07-07T21:31:38.718Z · LW(p) · GW(p)

Thanks, that's really nice. I'll definitely use it, if not for real decisions, then at least for Metaculus predictions

comment by AlphaAndOmega · 2023-07-07T18:26:11.171Z · LW(p) · GW(p)

Nice.

I admit it's a moderately shameful fact about my cognition that I consistently forget the equation for Bayes' theorem even when I constantly trumpet that other doctors should be more consistent and explicit in using it.

I can sorta figure it out when needed, but this eases a small but real pain point.

Replies from: fxgn
comment by fx (fxgn) · 2023-07-07T21:17:22.107Z · LW(p) · GW(p)

Are you using a spaced repetition system like Anki? I find it to be great for learning theorems and formulas, you should try using that if you aren't already. It's literally like a memory hack, you can just take any information and embed it into your memory (assuming you find the time to go through your due cards every day)

Replies from: AlphaAndOmega
comment by AlphaAndOmega · 2023-07-08T00:43:18.680Z · LW(p) · GW(p)

I have ADHD, and found creating my own decks to be a chore. The freely available ones related to medicine are usually oriented towards people giving the USMLE, and I'm not the target demographic.

I do still use the principles of spaced repetition in how I review my own notes, especially before exams, because of how obviously effective it is.

I hadn't considered making them for memorizing formulae, but truth be told I could just save them to my phone, which I always have on me.

If I need to refer to Baye's theorem during a surgery, something has clearly gone wrong haha.

I did say it was only a minor issue! Thank you for the advice nonetheless, it's good advice after all.