LBIT Proofs 2: Propositions 10-18
post by Diffractor · 2020-12-16T03:45:42.643Z · LW · GW · 0 commentsContents
No comments
Proposition 10: Mixture, updating, and continuous pushforward preserve the properties indicated by the diagram, and always produce an infradistribution.
We'll start with showing that mixture, updating, and continuous pushfoward are always infradistributions, and then turn to property verification.
We know from the last post that mixture, updating, and continuous pushfoward preserve all infradistribution properties (although you need to be careful about whether mixture preserves Lipschitzness, you need that the expected value of the Lipschitz constant is finite), but we added the new one about compact almost-support, so that's the only part we need to re-verify.
To show that mixture has compact almost-support, remember that
Now, fix an , we will craft a compact set that accounts for all but of why functions have the expectation values they do. There is some n where , where is the Lipschitz constant of the infradistribution . Then, let be , the union of the compact -almost-supports for the infradistributions . This is a finite union of compact sets, so it's compact.
Now we can go:
The first equality is reexpressing mixtures, and the first inequality is moving the expectation outside the absolute value which doesn't decrease value, then we break up the expectation for the second equality. The second inequality is because the gap between and has a trivial upper bound from the Lipschitzness of , and for , we have that and agree on the union of the -almost-supports for the , so a particular infradistribution, by the definition of an almost-support, has these two expectations having not-very-different values. Then we just pull the gap between and out, and use the fact that for the mixture to work, , and we picked n big enough for that last tail of the infinite sum to be small. Then we're done.
Now, we will show compact almost-support for assuming has compact almost-support. Fix an . Your relevant set for will be
Where the first term is a compact set that is a -almost-support for , and that last set is a sort of "this point must be likely enough". will be the Lipschitz constant of the original . Yes, this intersection may be empty.
Now, here's how things go. Let and agree on that intersection. (if it's the empty set, then it can be any two functions). We can go:
So far, this is just a standard sequence of rewrites. The definition of the update, pulling the fraction out, using to abbreviate the rescaling term, and unpacking what means.
Now, let's see how different and are on the set . One of two things will occur. Our first possibility is that an in that compact set also has . Then
and were selected to be equal on that set, so the two functions will be identical on that point. Our second possibility is that in that compact set will have . In that case,
Because .
Putting this together, and are only apart when restricted to the compact set . By Lemma 2, we can then show that
And, we also know that:
Because . Making that substitution, we have:
Backing up to earlier, we had established that
and from shortly above, we established that
Putting these together,
For any two functions and which agree on
Witnessing that said set is an -almost-support for .
All we need to finish up is to show that this is a compact set in equipped with the subspace topology. This can be done by observing that in the original space it's a compact set, due to being the intersection of a compact set and a closed set. In the subspace topology, if we try to make an open cover of it, all the open sets that cover it in the subspace topology are the restrictions of open sets in the original topology, so we have an open cover of this set in the original topology, and we can make a finite subcover, so it's compact in the subspace topology as well.
Thus, for any , we can make a compact (in ) -almost-support for , so has compact almost-support and we've verified the last condition for an update of an infradistribution to be an update.
Now for deterministic pushfoward. Fix an , and let your appropriate set for be where is a compact -almost-support for . The image of a compact set is compact, so that part is taken care of. We still need to check that it's an -almost-support for . Let be equal on this set. Then
And we're done. This is because, for any point , feeding it through makes a point in , and feeding it through and produces identical results because they agree on . Therefore, and agree on and thus can have values only apart, which is actually upper-bounded by . is thus a compact -almost-support for , and this can be done for any , so has compact almost-support.
Since these three operations always produce infradistributions (as we've shown, we verified the last condition). Updating only has two properties to check, preserving homogenity when and cohomogenity when , so let's get that knocked out.
Homogenity using homogenity for h
Cohomogenity using cohomogenity for h
Now for mixtures, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness.
Homogenity:
1-Lipschitz:
Cohomogenity:
C-additivity:
Crispness: Observe that both homogenity and C-additivity are preserved, and crispness is equivalent to the conjunction of the two.
Now for deterministic pushforwards, we'll verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness.
Homogenity:
1-Lipschitzness:
Cohomogenity:
C-additivity:
Crispness: Both homogenity and C-additivity are preserved, so crispness is preserved too.
Sharpness:
And is the image of a compact set, so it's compact. And we're done!
Proposition 11: The inf of two infradistributions is always an infradistribution, and inf preserves the infradistribution properties indicated by the diagram at the start of this section.
We'll first verify the infradistribution properties of the inf, and then show it preserves the indicated properties if both components have them.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if , then
This was done by monotonicity for the components. For concavity,
The first happened because and are concave, the second is because .
For normalization,
And the same argument applies to 0, so the inf is normalized.
For Lipschitzness, the inf of two Lipschitz functions is Lipschitz.
That just leaves compact almost-support. Fix an arbitary , and get a compact -almost-support for , and a for . We will show that is a compact -almost-support for . It's compact because it's a finite union of compact sets.
Now, let and agree on . We can go:
There are four possible cases for evaluating this quantity. In case 1, and . Then our above term turns into . However, since and agree on , they must agree on , and only have expectations apart. Case 2 where and is symmetric and can be disposed of by a nearly identical argument, we just do it with and .
Case 3 where and takes a slightly fancier argument. We can go:
The end inequalities are because and agree on the -almost-supports of and , respectively, from agreeing on the union. The two inner inequalities are derived from the assumed inequalities in Case 3.
Thus,
Case 4 where the assumed starting inequalities go in the other direction is symmetric. So, no matter which infradistributions are lower in the two infs, we have
And we're done, we made a compact almost-support for assuming an arbitrary . So the inf of two infradistributions is a infradistribution.
Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.
Homogenity:
1-Lipschitzness:
Now we can split into four cases. In cases 1 and 2 where the infs turn into (and same for in case 2), we have:
(and same for ), and we're done with those cases. In cases 3 and 4 where the infs turn into (and vice-versa for case 4), we have:
Because and are 1-Lipschitz. Thus,
A symmetric argument works for case 4. So, no matter what,
And we're done, the inf is 1-Lipschitz too.
Cohomogenity:
C-additivity:
Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.
Sharpness:
And we're done.
Proposition 12:
Proposition 13: If a family of infradistributions has a shared upper bound on the Lipschitz constant, and for all , there is a compact set that is an -almost support for all , then , defined as , is an infradistribution. Further, for all conditions listed in the table, if all the fulfill them, then fulfills the same property.
We'll first verify the infradistribution properties of the infinite inf, and then show it preserves the indicated properties if all components have them.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if , then
This was done by monotonicity for all components. For concavity,
The first happened because and are concave, the second is because .
For normalization,
And the same argument applies to 0, so the inf is normalized.
For Lipschitzness, let be your uniform upper bound on the Lipschitz constants of the . Then,
And then, for all the , they only think those functions differ by or less, and the same property applies to the inf by picking a and that very very nearly attain the two minimums, and showing that if the infinimums were apart, you could have appreciably undershoot , and in fact, undershoot , which is impossible. Thus,
And we're done.
That just leaves compact almost-support. Fix an arbitary . We know there is some that is a compact -almost-support for all the . We will show that is an -almost-support for .
Let and agree on . We can go:
Pick a and that very very very nearly attain the inf. Then we can approximately reexpress this quantity as:
We're approximately in a case where and , so we can go:
The end inequalities are because and agree on the -almost-support of and . The two inner inequalities are derived from the assumed inequalities in our case. Thus,
And we're done, we made a compact almost-support for assuming an arbitrary . So the inf of this family of infradistributions is a infradistribution.
Now to verify homogenity, 1-Lipschitzness, cohomogenity, C-additivity, crispness, and sharpness preservation.
Homogenity:
1-Lipschitzness: Same as the Lipschitz argument, everyone has a Lipschitz constant of 1, so the inf has the same Lipschitz constant.
Cohomogenity:
C-additivity:
Crispness: Homogenity and C-additivity are both preserved, so crispness is preserved.
Sharpness:
We do have to check whether or not is compact, however. We'll start by showing that for an arbitrary , any compact set where can't be an -support of for any . The proof proceeds as follows:
Let be some point in but not in . It must be some finite distance away from . Craft a continuous function supported on . is 1 on and 0 on . Use the Tietze extension theorem to extend to all of . Then
However, and 1 agree on , so can't be an -almost-support for any .
Thus, in order for there to be a compact set that's an -almost-support for all , it must be that . Then
because all the are in it and is closed. So, the closure of our union is a closed subset of a compact set and thus is compact, so is minimizing over a compact set and thus is crisp.
Proposition 14: If and , then the supremum is an infradistribution.
The supremum is defined as:
We'll verify the infradistribution properties of the sup.
We must check monotonicity, concavity, normalization, Lipschitzness, and compact almost-support. For monotonicity, if , then
This was done by so there's more options available. For concavity,
Pick your that very very very nearly attain the supremum.
Also, we can verify that:
Therefore, it is a suitable parameter and pair of functions to lower bound
. Accordingly
Putting all this together, and picking better and better approximations to the two suprema, we can conclude that:
And we have concavity.
For normalization, we're assuming it holds at the start.
Lipschitzness takes a slightly more involved argument. Pick two functions and , and without loss of generality, assume . Now, what we can do is pick a , and which approximately obtain the defining supremum for , so we have:
Now, we can note two things. First,
Therefore, the same , and , and are suitable things to lower-bound the value of . In particular, we have:
Also, we have the result that:
Because of Lipschitzness of and . Now we can begin showing our inequalities. So, we've shown that:
Therefore,
With this result, we can go:
Let's save this result for a bit later.
Also, we had:
And we also picked and and to approximately attain the supremum, so we know:
Therefore, we approximately have:
Reshuffling this around a bit, we have:
Using this with our saved result, we can get:
That last inequality was because we assumed at the start without loss of generality that got an equal or higher expectation than .
Therefore, we have our result that, in general,
And thus, the supremum of two Lipschitz infradistributions is Lipschitz. That just leaves compact almost-support, which is quite tricky to show.
Fix an arbitary , and get a compact -almost-support for , and a for . We will show that is a compact -almost-support for . It's compact because it's a finite union of compact sets.
Now, let and agree on . Without loss of generality, assume that (if not, flip and ). We'll show that they have similar expectations by showing that is below a small number (we already know that it's above 0 by our without-loss-of-generality assumption).
We can go:
Where we picked a particular spectacularly close to the highest possible value s.t. . In particular, if is 0 or 1, we can ensure that or is itself, by monotonicity of or respectively.
For successive arguments, we need so we have to address those endpoints. Assume . Then, . Then, we have:
The way this works is our substitution, and then using that and are identical on , and so are identical on , which is -almost-support of , we can upper-bound with . And then, we just use that . If , the exact same argument works, just with and instead. That leaves the case where , which requires far more involved arguments.
As a recap, we're assuming that , and that , and . Now, we're going to pick out a continuous function with some special properties, so let the set-valued function be defined as: If , then . Otherwise, equals the intersection of:
and
We'll find a continuous selection of this set-valued function, so let's start checking the properties needed to invoke the Michael selection theorem. We need that is paracompact (all polish spaces are paracompact, check), that is a Banach space (check), that for all , is convex (it's either a single point or the intersection of a rectangle and a half-space, which is convex in both cases), closed (yup, it's either a point or the intersection of two closed sets, ie closed), nonempty, and lower-hemicontinuous.
Nonemptiness isn't too bad to show. It's nonempty for all points in our compact set of interest (the set consisting of a single point), and for x not in said set, witnesses the nonemptiness, because:
Lower-hemicontinuity is much more challenging to establish. Again, we have a sequence limiting to , a point , and we must find a subsequence which limits to .
We can divide into three cases. In the first case, lies in , and infinitely many members of the sequence lie in said set. In particular, since lies in the compact set, the pair associated with it must be . Then we can isolate that particular subsequence that lies in the compact set, and have be , which, by continuity of and , and the definition of for in the compact set, lie in and limit to ie .
In preparation for the second and third cases, we'll show that the function which just takes the second branch of the function is continuous w.r.t. the Hausdorff-metric. Ie, for all ,
is continuous when the space of compact subsets of is equipped with a Hausdorff distance.
Accordingly, let limit to . Our task is to show that, no matter how tiny of a number you name, you can find a tail of the sequence where the Hausdorff distance between and is that tiny.
Specifically, we'll show that for all , there is some where all later have within Hasdorff distance of . Because and we can shrink to 0, this shows that the function is continuous in Hausdorff-distance.
Because and and are continuous functions, there's some very very large where , , and will only vary by from that point forward, regardless of which you pick. Pick some arbitrary . We'll show that it's close to a , and the argument will only depend on distances, not position in sequence, so we can flip it to show the other half of Hausdorff-distance (all points in are close to a point in ).
We can divide into four possible cases. In cases 1 and 2, we have the following property holding.
With the negation for cases 3 and 4.
And in cases 1 and 3, we have:
With the negation for cases 2 and 4.
In cases 1 and 2, you can let your selected point be . We have the result that , because:
In order, the first inequality is because only varies by over such tiny distances due to continuity of , the second inequality is being paired with something to be in so it has a known upper bound on its value, then the third inequality is because , the equality is our definition of our , then for the next inequality using the fact that we're assuming that has a particular lower bound since we're in cases 1 and 2, Then there's just a cancellation, and only varying by over such tiny distances.
You can use nearly identical arguments in cases 1 and 3 to get that, when you define to be . you have the result that
Now, in cases 3 and 4, we can let be: , and then we have:
The first equality is just pair-creation, then the second one is packing up the definition of . The first inequality is because only varies by over that distance, the second inequality is because so it's got the usual lower bound, then the next inequality after that is because we're in cases 3 and 4 so
Then, it's just another " doesn't change much over the tiny distance", moving the 's together, unpacking , and cancelling out. The net result is that we have:
You can use nearly identical arguments in cases 2 and 4 to get that, when you define to be be you have the result that .
At this point, we can resume our progress on the four cases and go "ok, in case 1, we have..."
And we know that those properties lead to being defined as and being defined as . And we know that in that case,
So, all we have to check is that in order to conclude that . Let's do that.
And we have that , accordingly. The first equality was unpacking definitions, then the second was some cancellation, and then the first inequality was because by assumption so we have . The second inequality was because doesn't change much over such tiny distances and then it's just trivial cleanup.
Thus, when we picked a point where is sufficiently close to , and we're in case 1, we have that there are points , and
This is from the definitions of and in Case 1.
Now, let's address case 2, where
In this case, is defined as , and is defined as
And we know that in that case,
(the first part on the is the same argument from case 1, the second interval is from the value of )
So, all we have to check is that in order to conclude that . We know that
So we can flip this a bit to get
Accordingly, from that, we get:
And we have that , accordingly.The first inequality was definition unpacking and the inequality we just got, then the first equality is just breaking things up a bit, then the second inequality is just observing that , and then doesn't change much over such tiny distances.
Thus, when we picked a point where is sufficiently close to , and we're in case 2, we have that there are points , and
This is from the definitions of and in Case 2, and the fact that in case 2 we can derive
Extremely similar arguments to case 2 dispatch case 3 with a resolution of the corresponding lying in and
Finally, for case 4, we have:
In this case, is defined as , and is defined as
Trivially, we have:
So, all we have to check is that in order to conclude that . To do this, we have:
We know that
So we can flip this a bit to get
Accordingly, from that, we get:
(because )
And we have that , accordingly.
Thus, when we picked a point where is sufficiently close to , and we're in case 4, we have that there are points , and
This is from the definitions of and in Case 4, and the fact that in case 4 we can derive (and same for )
These 4 cases were exhaustive, so we now know that, given any and sequence of points limiting to , and any , there is a tail of sufficiently large m's where the distance from any point in to is or less. We can also flip and and use our four cases (our argument is symmetric) to show that actually, this is a bound on the Hausdorff distance between and . was arbitrary, as was the sequence and the , so this means that is continuous in Hausdorff-distance.
Ok, we're a bit in the weeds here, how does that help? Well, we were trying to verify the compact almost-support property for the supremum. This requires, as part of it, getting a continuous function with some special properties. We're going to apply a selection function to get it, but we could only take care of the prerequisites that aren't lower-hemicontinuity. And to show lower-Hemicontinuity in general, we needed to take this detour through showing that the modified set-valued function is continuous in Hausdorff-distance. So let's pop back up the stack.
One level back up the stack, we were trying to show lower-Hemicontinuity. It is the property that given any sequence which limits to , and any , there is some subsequence and where limits to . We dispatched the case where infinitely many elements of the sequence were in our , leaving two cases. There's the case where only finitely elements of that sequence are in that compact set, but the limit point lies in that set. There's also the case where the limit point doesn't lie in that set.
Dealing with case 3, we have a sequence heading to . Strip off all the that lie in the compact set, making your . And let be whichever point in is closest to . Now, by how they were defined, , and is continuous in Hausdorff-distance, so "take the closest point" is definitely going to get you the convergence you seek to your arbitrarily selected point.
For case 2, where we're limiting to from outside the compact set, all we need to show is that (we don't necessarily have equality because and start being different on that compact set), in order to get a sequence converging to the point. So, let's do this. Because lies in , we have that .
The conditions for to be in are:
Which is obviously true for , and:
Which is the case because:
By how was made, and on that compact set.
Thus, we're done, we verified lower-hemicontinuity for in all the cases, so we can invoke the Michael selection theorem and get a continuous selection with three valuable properties. Let's abbreviate as , for notational convenience. It's projecting it to the first coordinate. is defined similarly.
Our first notable property is:
(ie, projecting down to the two coordinates makes functions which perfectly mimic and on the compact set of interest)
Our second one is:
And the same for .
And our third notable property is that:
But why do these properties hold of our selection function? Well, when lies in that compact set, , so our selection function is forced to have its projections mimic and on said compact set, taking care of the first one.
For our second property, we have:
Accordingly, we know that the projections to the two coordinates can't be too far away from and respectively.
For our third property, we have:
Accordingly, the projections to the two coordinates can't mix to exceed the function .
So, where to from here? Well, we have:
Here's why. We assumed at the very start that without loss of generality, we'd take to be the one with higher expectation value. We found a that nearly replicated the expectation value of . copies on a compact almost-support of , namely , and we also have , and similar for and . And finally, since , that mix must have lower value than . And we're done! and were arbitrary except that they agreed on , a compact set, and we got:
Witnessing that said set is a compact -almost-support. was arbitrary, so is compactly-almost-supported. This is the last condition needed to check to see that it's an infradistribution.
Proposition 15: All three characterizations of the supremum given in Definition 13 are identical.
So, the first characterization we gave was:
And the second characterization was the least infradistribution greater than in the information ordering.
And the third characterization was as the concave monotone hull of .
We will use for these three characterizations of the supremum of two infradistributions and show that they are equal.
Let's begin showing this.
This occurs by monotonicity, any mix of functions which undershoots must get a lower score because is an infradistribution.
This is because of convexity of , since it's an infradistribution. The value of the mix is as good or better than the mix of the values.
This is because (and same for ), so making that swap decreases the value. Also, this quantity is the concave monotone hull of the supremum of . Why? Well, is our first attempt at assessing the value of a function . However, it isn't necessarily monotone. So, is the monotone hull, we're saying that if there's a value below you that outscores you, then you should update the value of to be big enough. And then, to get the concave monotone hull, we replace the lower bound on with a countable/arbitrary finite mix of functions because any concave function should have the value of the mix be the mix of the values, so we have to bump the value of up to at least the mix of the values to not violate concavity. Anyways, now that we know this is , we can go further to:
This is lower because now we're specializing to only certain sorts of probability distributions over , those that are only supported on the first two values, so it's harder to attain suprema. And now,
We swapped out the supremum for a specific term in it in order to do this, and used our given definition of . And then we can specialize to 1 and to itself, to get
Similarly, we could specialize to and to get . So taking stock of what we have,
For all functions, so:
(and same for ) We recall that in Proposition 14 we proved that always makes an infradistribution. Since is above both component infradistributions, and was defined as the least infradistribution that is above and , we must have equality, and
(and same for ) And we've shown the three definitions of the supremum are identical.
Proposition 16:
To recap,
Now, can be turned into a concave monotone functional , by LF-duality. Further, it's convex, closed, and upper-complete due to being the intersection of two convex closed upper-complete sets. Let's use to refer to its corresponding functional. Then:
And the same applies to , and this applies to all functions, so (and same for ).
We know from Proposition 15 that the least concave monotone functional above and is , so (and same for ) Call the corresponding set of as . Thus, translating this information ordering back to sets,
And same for . Therefore.
Therefore, all the subsets must be actual equalities, and so in particular we have:
Then we can go:
By being equivalent to the infradistribution set induced by , expanding our definition of the sup, and translating back. And we're done!
Proposition 17: For any property in the table at the start of this section, will fulfill the property if both components fulfill the property.
The way to show this is to use the alternate characterizations of supremum as intersection of the infradistribution sets, and the alternate characterizations of the various properties in terms of properties of minimal points.
We will make an observation used in all further proofs of properties. In order for to have in it, there must be a minimal point of the form with below it. Similarly, for to contain , there must be a minimal point of the form below it, with .
Thus, for to lie in , and and . Part of this is because said point lies in and , the other part is because is the lowest possible point in associated with a measure component of , and it's the minimal. This observation will be used for all future sub-proofs in this proposition.
Homogenity: This is equivalent to "all minimal points have ", so if , then (homogenity for ), and same for , so .
1-Lipschitzness: This is equivalent to "all minimal points have ", so if , then , and (1-Lipschitzness of ), so .
Cohomogenity: This is equivalent to "all minimal points have ", so if , then , and (cohomogenity of ), and , and , so . Then, .
C-additivity: This is equivalent to "all minimal points have ", so if , then , and (C-additivity of ), so .
Crispness: This is equivalent to the conjunction of homogenity and C-additivity, both of which are preserved, so crispness is preserved as well.
Sharpness: Because all sharp infradistributions are crisp, must be composed entirely of probability distributions if and are sharp. If any of the probability distributions in aren't supported on (the compact set associated with the sharp infradistribution ), then they aren't in , which is impossible. Symmetric arguments apply to . Thus, only has probability distributions supported on . If there was any probability distribution supported on that set that was missing from , then it'd be present in and , and thus present in , and minimal, so we have a contradiction. Therefore consists of all probability distributions supported on which is a compact set, so the supremum is sharp as well.
Proposition 18: If a family of infradistributions is directifiable, then (defined as the functional corresponding to the set ) exists and is an infradistribution. Further, for all conditions listed in the table, if all the fulfill them, then fulfills the same property.
A family of infradistributions being directifiable is equivalent to "for any collection of finitely many infradistributions, the supremum exists". We also know that the supremum is exactly equivalent to set intersection. So, we'll show that directifiability (any collection of finitely many infradistributions has a supremum) implies that the intersection of all the infradistribution sets has the exact properties of a set-form infradistribution.
We have six properties to check. Nonemptiness, normalization (the existence of a point , existence of a point with and nonexistence of points with ), closure, convexity, upper-completion, and compact-projection (the measure components of the infradistribution are contained in a compact set of measures).
For closure, it's the intersection of closed sets, so it's closed. For convexity, it's the intersection of convex sets, so it's convex. For upper-completion, it's the intersection of upper-complete sets, so it's upper-complete. For compact-projection, the measure components of the countable intersection are contained within the countable intersection of the sets of measure components, which is contained in a compact set, so it fulfills that property too.
This just leaves nonemptiness and normalization. We'll show normalization, which automatically implies nonemptiness. The nonexistence of points with is definitely not preserved under intersection.
However, the compact-projection property means that for any infradistribution set , the intersection of it with the surface of a-measures where is compact, so we're intersecting a bunch of compact sets. Due to the existence of supremum infradistributions for each collection of finitely many infradistributions (directifiability), we have the nonempty finite intersection property needed to conclude that the intersection of compact sets is nonempty. The same argument applies to the existence of a point with . The presence of those two points witnesses nonemptiness and normalization.
These are the last two conditions we needed to conclude the set represents an infradistribution, so the infinite supremum exists and is the infradistribution we need.
For preservation of the various properties, we can just reuse the arguments from Proposition 17 with only trivial modifications.
0 comments
Comments sorted by top scores.