Expect to know better when you know more

stuart_armstrong

Expect to know better when you know more

post by Stuart_Armstrong · 2016-04-21T15:47:20.679Z · LW · GW · Legacy · 6 comments

  Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.
  Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.
  Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.
  Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.
None
6 comments

A seemingly trivial result, that I haven't seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Let H be the true hypothesis/model/environment/distribution, and ~H its negation. Let e be evidence we receive, taking values e₁, e₂, ... e_n. Let p_i=P(e=e_i|H) and q_i=P(E=e_i|~H).

The expected posterior weighting of H, P(e|H), is Σp_ip_i while the expected posterior weighting of ~H, P(e|~H), is Σq_ip_i. Then since the p_i and q_i both sum to 1, Cauchy–Schwarz implies that

E(P(e|H)) ≥ E(P(e|~H)).

Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.

This, however, doesn't mean that the Bayes factor - P(e|H)/P(e|~H) - must have expectation greater than one, since ratios of expectation are not the same as expectations of ratio. The Bayes factor given e=e_i is (p_i/q_i). Thus the expected Bayes factor is Σ(p_i/q_i)p_i. The negative logarithm is a convex function; hence by Jensen's inequality, -log[E(P(e|H)/P(e|~H))] ≤ -E[log(P(e|H)/P(e|~H))]. That last expectation is Σ(log(p_i/q_i))p_i. This is the Kullback–Leibler divergence of P(e|~H) from P(e|H), and hence is non-negative. Thus log[E(P(e|H)/P(e|~H))] ≥ 0, and hence

E(P(e|H)/P(e|~H)) ≥ 1.

Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.

Note that this is not true for the inverse. Indeed E(P(e|~H)/P(e|H)) = Σ(q_i/p_i)p_i = Σq_i = 1.

In the preceding proofs, ~H played no specific role, and hence

For all K, E(P(e|H)) ≥ E(P(e|K)) and E(P(e|H)/P(e|K)) ≥ 1 (and E(P(e|K)/P(e|H)) = 1).

Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.

Now we can turn to the posterior probability P(H|e). For e=e_i, this is P(H)*P(e=e_i|H)/P(e=e_i). We can compute the expectation of P(e|H)/P(e) as above, using the non-negative Kullback–Leibler divergence of P(e) from P(e|H), and thus showing it has an expectation greater than or equal to 1. Hence:

E(P(H|e)) ≥ P(H).

Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.

6 comments

Comments sorted by top scores.

comment by Lumifer · 2016-04-21T17:27:03.701Z · LW(p) · GW(p)

... is higher than the probability

...is equal to or higher than the probability

is greater than one

is equal to or greater than one

Thus, in expectation, the posterior probability of the true hypothesis is greater than its prior probability.

Thus, in expectation, the posterior probability of the true hypothesis is equal to or greater than its prior probability.

That matters.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2016-04-21T17:55:37.599Z · LW(p) · GW(p)

I tend to go for greater being => and strictly greater being >.

Replies from: Lumifer

↑ comment by Lumifer · 2016-04-21T18:22:27.897Z · LW(p) · GW(p)

That's not how English works and that's not how people will understand your words.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2016-04-21T19:00:21.371Z · LW(p) · GW(p)

Thinking about it, you are correct; I will put off my efforts to reform mathematical terminology to another time.

comment by gwern · 2016-04-21T21:27:31.835Z · LW(p) · GW(p)

A seemingly trivial result, that I haven't seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Is it the proof or result which is supposed to be new? I would be really surprised if there were no proofs that Bayesian estimators are consistent and concentrate on the true posterior result.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2016-04-22T05:28:57.434Z · LW(p) · GW(p)

The proof is so trivial that it must have been proved before, but I spent two hours searching for the result and couldn't find it (it's very plausible I just lack the correct search terms). The closest I could find were things like the Bernstein–von Mises theorem, but that's not exactly it.

Expect to know better when you know more

Contents

6 comments