How to estimate confidence intervals for fermi estimate?
post by Sønderjye · 2022-02-20T01:26:12.625Z · LW · GW · 1 commentThis is a question post.
Contents
Answers 5 D0TheMath 3 TLW 3 GuySrinivasan None 1 comment
Suppose you want to take a guess at the number of candyfloss sold over a month in some area and you would like a 90% confidence interval(CI) intead of a point estimate. You fermi estimates for two central subcomponents are:
(1) a 90% CI of the number of candyfloss a single candyfloss seller sells per month, say 10k-40k.
(2) a 90% CI of the number of candyfloss that a professional candyfloss seller sells during a month, say 50-8000.
Can you estimate a 90% CI of candyfloss sold over a month based on that information? If not, could you if you made some assumptions about the distribution of (1) or (2)(e.g. could you do it if they were uniformly distributed)? Could you use the percentiles of the root or squared fo the extremes(e.g. combining either the 0.25th(5%^2) or the 23th(square root of 5%) percentiles)?
My intuition is that you can't just multiply the extremes of (1) and (2) but I'm not confident in what you need to make event approximately correct claims.
Edited to fix an error in (1)
Answers
You could use a monte-carlo simulation. That is, if you have a distribution over the possible values of parameters for your model, you can randomly select from that distribution for each parameter many, many times, then apply the model to those point values, and aggregate the results.
This is done automatically for you in the program guesstimate.
Can you estimate a 90% CI of candyfloss sold over a month based on that information?
Not without additional assumptions, some of which are "obviously" incorrect. In particular:
- If the estimates are correlated, combining estimates does not improve the CI as much as otherwise.
- (In the worst case, combining estimates may not improve the CI at all!)
- Consider, for instance, if your estimate of the number of candyfloss purchases over the month was based on a sample including the same shop as your other estimate.
- You need to make assumptions (or have information about) the distribution, not just the extremes.
- A common (and bad) assumption is that everything is a Normal distribution (or some simple transformation of a normal distribution, like log-normal).
Unfortunately, in most cases the product of two distributions is a mess. (If you want a pointer, look here.)
One notable set of exceptions is log-transformations of various distributions. (This is because in logspace the multiplication turns into a convolution, which is often a whole lot easier to calculate.)
For instance: the product of two log-normal distributions is "easy". (Of course, then you need to with with the distributions not straight CIs). Beware correlations however.
I'm confused about what concrete question about candyfloss the example is trying to answer, But my usual heuristic for combining estimates is that in the absence of more information (or more realistically, more desire to investigate), I will assume a uniform distribution over some natural scale. For example 10k-40k is on magnitude, so pretend it's uniform over log(10k) to log(40k). 50-8000 is also on magnitude.
↑ comment by TLW · 2022-02-19T18:21:52.680Z · LW(p) · GW(p)
Unfortunately, the product of two log-uniform distributions is not a log-uniform distribution...
Replies from: GuySrinivasan↑ comment by SarahNibs (GuySrinivasan) · 2022-02-19T18:30:17.762Z · LW(p) · GW(p)
I assumed the question was "I have two endpoints of several intervals but that's not enough to combine intervals". My answer is "assume uniform over some natural scale". If the actual question is "I don't know how to combine distributions, help" then I think my answer is "if you don't already know the answer then probably you should simulate and if you can't do that then I guess use Guesstimate because anything else will take too much scaffolding to reasonably learn".
Replies from: TLW↑ comment by TLW · 2022-02-19T22:28:33.532Z · LW(p) · GW(p)
I assumed the question was "I have two endpoints of several intervals but that's not enough to combine intervals". My answer is "assume uniform over some natural scale"
My point is precisely that "[...] for combining estimates [...] assume a uniform distribution over some natural scale" doesn't accomplish your stated goal of being able to combine estimates.
1 comment
Comments sorted by top scores.