Some thoughts about an estimator by Taleb

post by paladim · 2022-01-04T05:37:07.658Z · LW · GW · 1 comments

Contents

1 comment

I recently read Maximum ignorance probability, with applications to surgery's error rates by N.N. Taleb  where he proposes a new estimator for the parameter  of a Bernoulli random variable. In this article, I review the main points of it and also share my own thoughts about it.

The estimator in question (which I will call maximum ignorance estimator)  takes the following form

where  is the regularized beta function,   is the number of independent trials and  is the number of successes.

This estimator is derived by solving the following equation

where  is the cumulative distribution function of a binomial with n trials and probability p of success. In words, this estimator sets  to a value such that the probability of observing  successes or less is exactly . How do we pick ? The author sets  to 0.5 as it maximizes the entropy (more on this later).

Finally, the estimator is applied to a real world problem. A surgeon works in area with a mortality rate of 5% and he has performed 60 procedures with no fatalities. What can we say about his error probability? By applying the estimator described earlier 

Taleb argues that the empirical approach  () does not provide a lot of information because the sample is small, i.e. the estimate is ,  however, we "know" that this is value is not 0, it is just that we have not observed enough samples to see a failure. 

On the other hand, the Bayesian would pick a Beta prior for . A Beta distribution has two parameters and we have only one constraint (mortality rate of 5% on the area) which leaves us with one degree of freedom. The choice of this remaining degree of freedom is arbitrary and it is shown that it has a significant impact on the final estimate obtained.

Having gone through the main points of the article, here follow my own thoughts:

1 comments

Comments sorted by top scores.

comment by Carlos Javier Gil Bellosta (carlos-javier-gil-bellosta) · 2022-01-10T03:24:14.158Z · LW(p) · GW(p)

There are many things to say about this result by N. Taleb. To start with, a minor detail: I's would have written $\hat{p} = I^{-1}_{1/2}(m+1, n - m)$, which is much more coherent with the fact that he is inverting the CDF.

He is inverting the CDF of a Beta distribution with parameters (m+1, n-m) which is a posterior in the Beta-Binomial model of a Beta(1, 0) distribution (!!!), with no explanation at all! It would have made slightly more sense to use a Beta(1, 1) instead.

Note that all he does by selecting q = 1/2 choosing as this "optimal estimate" the median of the Beta(m+1, n-m) distribution, i.e., the median of the posterior distribution.

Note that he ignores completely the base rate of 5%. Cannot he make use of it at all? So, even better than a Beta(1, 1), I'd have chosen the maximum entropy distribution among those betas with mean .05. I.e., one with a large variance; in fact, Taleb complains that the Bayesian approach provides funny results with highly informative beta priors. 

If I had been facing the problem, I would have inquired about the distribution of those historical records whose aggregation is a 5% average and use it as a prior to model this new doctor.

All in all, I do not thing Taleb wrote his best page on that day. But he has many other great ones to learn from!