Infra-Bayesian physicalism: proofs part I

post by Vanessa Kosoy (vanessa-kosoy) · 2021-11-30T22:26:33.149Z · LW · GW · None comments

This post is an appendix to "Infra-Bayesian physicalism: a formal theory of naturalized induction [AF · GW]".

Lemma 1: occurs for all iff, for all and ,

This lemma will be implicitly used all over the place, in order to deal with the "presence in the bridge transform" condition algebraically, in terms of function expectations. The bulk of the conditions for a contribution to lie in the bridge transform is this endofunction condition (the support condition is trivial most of the time), so it's advantageous to reformulate it. First-off, we have that iff, for all , we have By LF-duality for ultradistributions, proved in Less Basic Inframeasure theory. A contribution lies in an ultracontribution set iff its expectations, w.r.t. all functions, are less than or equal to the ultracontribution expectations.

Now, we just need to unpack the left-hand-side into our desired form. Start off with Apply how projections work Now, we can move the indicator function into the function we're taking the expectation again, because there's no difference between deleting all measure outside of an event, or taking the expectation of a function that's 0 outside of that event. So, we get Then we use how pushforwards are defined in probability theory. And that's our desired form. So, we've shown that for any particular , we have iff, for all , And so, we get our desired iff statement by going from the iff for one to the iff for all .

Proposition 2.1: For any , and , exists and satisfies . In particular, if then .

Proof sketch: We'll show that for any particular contribution in , there's a contribution which lies within that projects down to equal . And then, show the other direction, that any contribution in lands within when you project it down. Thus, the projection of must be exactly.

For the first direction, given some , let . Note that since , this means that , so the type signatures line up. Clearly, projecting down to makes again. So that leaves showing that . Applying Lemma 1, an equivalent way of stating the bridge transform is that it consists precisely of all the s.t. for all and , and also, .

Clearly, for our given , everything works out with the support condition, so that leaves the endofunction condition. Let be arbitrary. In order, the equalities were unpacking the semidirect product, substituting the dirac-delta in, reexpressing the condition for the indicator function, using that inside the indicator function, applying monotonicity of , using that , and then packing up the definition of . And that inequality has been fulfilled, so, yes, lies within . Since was arbitrary within , this shows that the projection set is as-big-or-bigger than .

Now to show the reverse direction, that anything in the projection set lies within . For any particular , remember, it must fulfill, for all , that So, in particular, we can let be the identity function, and be anything, and we'd get And, since is in , it's supported on the pairs s.t. , so that indicator function drops out of existence, and we get And then this can be written as a projection And since this holds for all the various , this means that And we're done with the other direction of things, the projection must be exactly equal to since projections of contributions in always land in , and we can surject onto .

Proposition 2.2: Let be a poset, . Then, if and only if there exists s.t.: For all , : .

Vanessa proved this one, with a very nice proof involving the max-flow min-cut theorem.

Make the following directed graph: The nodes are the elements of . Basically, two copies of the finite poset , a source node , and a sink node . will be used to denote a variable from , while a subscript of 0 or 1 denotes that the 0 or 1 version of that , respectively.

As for the edges, there is an edge from to every point in . And an edge from every point in to . And, for some and , there's an edge iff . The capacities of the edges are, for all , , and for all , , and for s.t. , .

To work up to applying min-cut max-flow, we must show that the cut of all the edges between and is a min-cut of the network.

Remember, all cuts are induced by partitions of the nodes into two sets, and , where includes the source and includes the sink . And the value of a cut is Implicitly converting both of the following sets to be subsets of , we have for any non-infinite cut. Why? Well, if it was false, then there'd be some where , and yet . So . Then, the cost of the cut would include , contradiction. So, now, we have that for any non-infinite cut, and holds.

Now, we'll change our cut. Given some cut , let our new cut be defined as (which still respects that superset property listed above since both sets would be the same), and be the complement. Letting some stuff cancel out, we get Now, because by that subset property, it means that there's no , so we get As for that other minus term, we have that by how it was defined, so , so that other minus term is 0, and we get Rearranging this a bit, we have .

And now we'll show that is underscored by . Let's go. We have This simplifies a bit as This can be rewritten a bit as And then, using what is defined as, we can get and using what the costs are, it's This holds because and the indicator function for is monotone. And so, we get . And we previously showed that . So, this means that the cut around is a minimal cut, it underscores all other cuts.

By the max-flow min-cut theorem, there's a way of having stuff flow from to that saturates the capacities of all the edges that are cut. will be used to denote the flow from to according to this max-flow way. Let's finish things up.

Define as follows. For some , if , then If , let be any probability distribution on . Note that is always a probability distribution supported on , by fiat if , and otherwise, This is because, the flow out of must equal the flow into from the source. And is supported on because the only edges out of go to where . Now that we've got this demonstrated, we'll show our desired inequality. Fix an arbitrary . We have And then we use that all the capacities of the edges cut in the min-cut are saturated according to the max-flow plan, so , and we have Now, if , then because the flow in equals the flow out, that means that , and otherwise we can cancel out, so we can rewrite as Where the first equality came from flow in equals flow out, the inequality came from the flow plan respecting the capacity of the paths, and the equality came from how the capacities were defined. So, for all , we have so we have .

For the reverse direction, if there's some s.t is supported on s.t. , then for any monotone function , we have And then we use that, since is supported on , and is monotone, is less than or equal to the expectation of w.r.t (remember, for you have a 100 percent chance of drawing something at-or-above , which guarantees that f of whatever you picked is above ). And so, we get And since this holds for every monotone , we have .

Proposition 2.3: Let be a poset, . Then, if and only if there exists s.t. For all , : .

Proof: Because , we can get a maximum flow in exactly the same way as Proposition 2.2. Then, just flip the direction of all flows, which will be denoted by swapping the order of the subscripts. Now, define And, again, if it's 0, it should be an arbitrary probability distribution supported on . Note that is always a probability distribution supported on , by fiat if , and otherwise, This is because, the flow out of must equal the flow into from the source (the old sink). And is supported on because the only edges out of go to where . Now that we've got this demonstrated, we'll show our desired inequality. Fix an arbitrary . We have Then use that (because even with the flow reversed, the flow through a path must be less than or the same as the capacity. Accordingly, we get And we're done, inequality established, using definitions and the flow saturating all the paths out of .

So, for all , we have so we have .

For the reverse direction, if there's some s.t is supported on s.t. , then for any monotone function , we have And then we use that, since is supported on , and is monotone, is greater than or equal to the expectation of w.r.t (remember, for you have a 100 percent chance of drawing something at-or-below , which guarantees that f of whatever you picked is below ). And so, we get And since this holds for every monotone , we have .

Proposition 2.4: For any , and , is downwards closed w.r.t. the induced order on . That is, if and then .

Remember, the condition for some to lie in is that it be supported on , and that, for all , and , So, given that (lower score for all monotone functions) we'll show that fulfills both conditions. The support thing is taken care of by . As for the other one, we have That inequality occurs because as you make larger, ie, go up in the partial ordering, the value assigned to the relevant point increases since is more likely now, so the function value increases from 0 to something that may be greater than 0. So, it's a monotone function, and we then use that .

Proposition 2.5: Consider a finite set , , , and . Then, . Conversly, consider any . Then, there exist and s.t. for .

Ok, so let be some arbitrary function . We have an alternate characterization of the total variation distance as And then from there we can go And since , we have and are done.

Conversely, let the value of be , and be . It is clear that this is a probability distribution because of how was defined. The is the minimum/common overlap of the two probability distributions. Then, let , and similar for the 1. Well, as long as . If , it can be any probability distribution you want. It's a probability distribution because And since , everything works out. Now we just need to show that these add up to make and that is the same as the total variation distance. And this works symmetrically for , showing that we indeed have equality. As for showing that is the total variation distance, referring back to what we've already proved, we have And now, since and , the supports of these two probability distributions are disjoint, which implies that the total variation distance is 1, so we have .

Proposition 2.6: Consider any and . Denote (the event ``program is unrealized''). Let . Then,

This will take some work to establish. One direction, showing that the bridge transform value exceeds the total variation distance value, is fairly easy. As for the other direction, it'll take some work. Let's begin. We'll make a particular contribution . It is defined as Or, put another way, restricting to , it's , and conditioning on , it's . So, for a given , we have Now, we'll go through two exhaustive possibilities for what could be. If it maps 0 to 0, then it's And our desired inequality is established. If maps 0 to 1, then it's And that is taken care of. So, our lies in .

Now, where were we? Ah right, we were at But now we can continue with Which unpacks as And then, the value of the overlap between two distributions is . So, we've shown one direction, we have In the other direction of the equality we're trying to prove, we need to constrain what might be. Starting off with what we definitely know, we have Now it's time to show that for any , that the measure on can't be too high. Specifically, to proceed further, we'll need to show that Time to establish it. On , either , or vice-versa. Without loss of generality, assume that it's that's lower.

Then, let be the constant-0 function, and be 1, and 0 everywhere else. Then, we have Just abbreviating, using that lies in the bridge transform, deabbreviating with what is, unpacking things a little bit and canceling out. At the end we used that .

Ok, so, now that we've established the key fact that We can resume our work. We were previously at and we can proceed to and from there, using the inequality we proved, proceed to And we're done, we established that it's an upper bound as well a lower bound, so we have equality.

Lemma 2: If there's a function and , then for all , is supported on the set .

Assume that the conclusion is false, that there is some nonzero probability of drawing a pair where . In particular, there must be some special value that witnesses that isn't a subset, by and . Remember that for all and , that since , we have Now, in particular, let , and be the constant function that maps everything to . Then, this turns into which is impossible. The equality as the end was our starting assumption. The middle inequality was just specializing our inequality to a particular pair of functions. And it's greater than 0 because there's a nonzero probability of drawing . And was selected to lie in and outside of , so there's a nonzero probability of getting a nonzero value. Therefore, the result follows.

Lemma 3: For any , and , if , then .

So, the support condition is trivially fulfilled, because we're restricting to the event . That just leaves the endofunction condition. Let be arbitrary, and be arbitrary. Then we have And, since , we can use that to get And we're done, we've established that our modified version of remains in .

Proposition 2.7: If is a poset and , then will denote downward closure of . For any , and if