Posts
Comments
Both the original post and subsequent comments seem to have some misconceptions regarding the distribution as introduced by Jaynes.
(i) @criticalpoints claims that "way to think about the proposition is as a kind of limit ... The proposition can be thought of a shorthand for an infinite collection of evidences." But as also noted, Jaynes wants to say that "regardless of anything else you may have been told, the probability of is ." But in general, this need not involve any limit, or an infinite amount of evidence. For any proposition A, and probability value p, the proposition is something of a formal maneuver, which merely asserts whatever would be minimally required for a rational agent possessing said information to assign a probability of p to proposition A. For a proposition A, and any suitable finite body evidence E, there is a corresponding distribution.
(ii) This may have been a typo, or a result of Jaynes's very non-standard parenthetical notation used for probabilities, but @criticalpoints claims:
For any proposition , the probability of can be found by integrating over our probabilities of
[emphasis added]. But evaluation of p(A|E) is to be achieved by calculating the expectation of p with respect to an distribution, not just the integral---the integral evaluates to unity, , because the distribution is normalized, while
(iii) @transhumanist_atom_understanderder claims that
The reason nobody else talks about the distribution is because the same concept appears in standard probability expositions as a random variable representing an unknown probability
The point is that Jaynes is actually extending the idea of "probability of a probability" that arises in binomial parameter estimation, or in de Finetti's theorem for exchangeable sequences, as a way to compress the information in a body of evidence E relevant to some other proposition A. While Jaynes admits that the trick might not always work, if it does, rather than store every last detail of the information asserted in some compound evidentiary statement E, a rational agent would instead only need to store a description of a probability distribution which is equivalent to some probability distribution over a parameter , so might be approximated by, say, some beta distribution or mixture over beta distributions. The mean of the distribution reproduces the probability of A, while the shape of this distribution characterizes how labile the probability for A will tend to be upon acquisition of additional evidence. Bayesian updating can then take place at the level of the distribution, rather than over some vastly complicated Boolean algebra of possible explicit conditioning information.
So Jaynes' concept of the distribution goes beyond the usual "probability of a probability" introduced in introductory Bayesian texts (such as Hoff or Lee or Sivia) for binomial models. In my opinion, it is a clever construct that does deserve more attention.