LBIT Proofs 6: Propositions 39-47
post by Diffractor · 2020-12-16T03:33:22.665Z · LW · GW · 0 commentsContents
No comments
Proposition 39: Given a crisp infradistribution over , an infrakernel from to infradistributions over , and suggestively abbreviating as (hypothesis ) and as (your infraprior where you have Knightian uncertainty over how to mix the hypotheses), then
Proof: Assume that and are functions of type and respectively, ie, likeliehood and utility doesn't depend on which hypothesis you're in, just what happens. First, unpack our abbreviations and what an update means.
Then use the definition of an infrakernel pushforward.
For the next thing, we're just making types a bit more explicit, only depend on , not .
Then we pack the semidirect product back up.
And pack the update back up.
At this point, we invoke the Infra-Disintegration Theorem.
We unpack what our new modified prior is, via the Infra-Disintegration Theorem.
and unpack the semidirect product.
Now we unpack and .
And unpack what is
And reabbreviate as ,
And then pack it back up into a suggestive form as a sort of expectation.
And we're done.
Proposition 40: If a likelihood function is 0 when , and and , then
And then we apply Markov's inequality, that for any probability distribution,
Also, (because is 0 when ), so monotonicity means that
So, we can get:
And we're done.
Proposition 41: The IKR-metric is a metric.
So, symmetry is obvious, as is one direction of identity of indiscernibles (that the distance from an infradistribution to itself is 0). That just leaves the triangle inequality and the other direction of identity of indiscernibles. For the triangle inequality, observe that for any particular (instead of the supremum), it would fulfill the triangle inequality, and then it's an easy exercise for the reader to verify that the same property applies to the supremum, so the only tricky part is the reverse direction of identity of indiscernibles, that two infradistributions which have a distance of 0 are identical.
First, if , then and must perfectly agree on all the Lipschitz functions. And then, because uniformly continuous functions are the uniform limit of Lipschitz functions, and must perfectly agree on all the uniformly continuous functions.
Now, we're going to need a somewhat more sophisticated argument. Let's say that the sequence is uniformly bounded and limits to in equipped with the compact-open topology (ie, we get uniform convergence of to on all compact sets). Then, for any infradistributions, will limit to . Here's why. For any , there's some compact set that accounts for almost all of why a function inputted into an infradistribution has the value it does. Then, what we can do is realize that will, in the limit, be incredibly close to , due to and disagreeing by a bounded amount outside the set and only disagreeing by a tiny amount on the set , and the Lipschitzness of .
Further, according to this mathoverflow answer, uniformly continuous functions are dense in the space of all continuous functions when is equipped with the compact-open topology, so given any function , we can find a sequence of uniformly continuous functions limiting to in the compact-open topology, and then,
And so, and agree on all continuous functions, and are identical, if they have a distance of 0, giving us our last piece needed to conclude that is a metric.
Proposition 42: The IKR-metric for infradistributions is strongly equivalent to the Hausdorff distance (w.r.t. the KR-metric) between their corresponding infradistribution sets.
Let's show both directions of this. For the first one, if the Hausdorff-distance between is , then for all a-measures in , there's an a-measure in that's only or less distance away, according to the KR-metric (on a-measures).
Now, by LF-duality, a-measures in H correspond to hyperplanes above . Two a-measures being apart means, by the definition of the KR-metric for a-measures, that they will assign values at most distance apart for 1-Lipschitz functions in .
So, translating to the concave functional view of things, and being apart means that every hyperplane above h has another hyperplane above that can only differ on the 1-Lipschitz 1-bounded functions by at most , and vice-versa.
Let's say we've got a Lipschitz function . Fix an affine functional/hyperplane that touches the graph of at . Let's try to set an upper bound on what can be. If is 1-Lipschitz and 1-bounded, then we can craft a above that's nearby, and
Symmetrically, we can swap and to get , and put them together to get:
For the 1-Lipschitz functions.
Let's tackle the case where is either more than 1-Lipschitz, or strays outside of . In that case, is 1-Lipschitz and bounded in . We can craft a that only differs on 1-Lipschitz functions by or less. Then, since, for affine functionals, and using that and are close on 1-Lipschitz functions, which and 0 are, we can go:
And then we swap out for with a known penalty in value, we're taking an overestimate at this point.
This argument works for all . And, even though we just got an upper bound, to rule out being significantly below , we could run through the same upper bound argument with instead of , to show that can't be more than above .
So, for all Lipschitz , . Thus, for all Lipschitz ,
And therefore,
This establishes one part of our inequalities. Now for the other direction.
Here's how things are going to work. Let's say we know the IKR-distance between and . Our task will be to stick an upper bound on the Hausdorff-distance between and . Remember that the Hausdorff-distance being low is equivalent to "any hyperplane above has a corresponding hyperplane above that attains similar values on the 1-or-less-Lipschitz functions".
So, let's say we've got , and a . Our task is, knowing , to craft a hyperplane above that's close to on the 1-Lipschitz functions. Then we can just swap and , and since every hyperplane above is close (on the 1-Lipschitz functions) to a hyperplane above , and vice-versa, and can be shown to be close. We'll use Hahn-Banach separation for this one.
Accordingly, let the set be the set of where , and:
That's... quite a mess. It can be thought of as the convex hull of the hypograph of , and the hypograph of restricted to the 1-Lipschitz functions in and shifted down a bit. If there was a that cuts into and scores lower than it, ie , we could have , and to observe that cuts into the set . Conversely, if an affine functional doesn't cut into the set , then it lies on-or-above the graph of .
Similarly, if undershoots over the 1-or-less-Lipschitz functions in , it'd also cut into . Conversely, if the hyperplane doesn't cut into , then it sticks close to over the 1-or-less-Lipschitz functions.
This is pretty much what is doing. If we don't cut into it, we're above and not too low on the functions with a Lipschitz norm of 1 or less.
For Hahn-Banach separation, we must verify that is convex and open. Convexity is pretty easy.
First verification: Those numbers at the front add up to 1 (easy to verify), are both in (this is trivial to verify), and + isn't 1 (this is a mix of two numbers that are both below , so this is easy). Ok, that condition is down. Next up: Is our mix of and 1-Lipschitz and in ? Yes, the mix of 1-Lipschitz functions in that range is 1-Lipschitz and in that range too. Also, is our mix of and still in ? Yup.
That leaves the conditions on the b terms. For the first one, just observe that mixing two points that lie strictly below (a hyperplane) lies strictly below it as well. For the second one, since is concave, mixing two points that lie strictly below its graph also lies strictly below its graph. Admittedly, there may be divide-by-zero errors, but only when is 0, in which case, we can have our new and be anything we want as long as it fulfills the conditions, it still defines the same point (because that term gets multiplied by 0 anyways). So is convex.
But... is open? Well, observe that the region under the graph of on is open, due to Lipschitzness of . We can wiggle and around a tiny tiny little bit in any direction without matching or exceeding the graph of . So, given a point in , fix your tiny little open ball around . Since can't be 1, when you mix with , you can do the same mix with your little open ball instead of the center point, and it just gets scaled down (but doesn't collapse to a point), making a little tiny open ball around your arbitrarily chosen point in . So is open.
Now, let's define a that should be convex, so we can get Hahn-Banach separation going (as long as we can show that and are disjoint). It should be chosen to forbid our separating hyperplane being too much above over the 1-or-less Lipschitz functions. So, let be:
Obviously, cutting into this means your hyperplane is too far above over the 1-or-less-Lipschitz functions in . And it's obviously convex, because 1-or-less-Lipschitz functions in are a convex set, and so is the region above a hyperplane .
All we need to do now for Hahn-Banach separation is show that the two sets are disjoint. We'll assume there's a point in both of them and derive a contradiction. So, let's say that is in both and . Since it's in ,
But also, with the 's and 's and fulfilling the appropriate properties, because it's in . Since and , we'll write as and as , where and are nonzero. Thus, we rewrite as:
We'll be folding into a single term so I don't have to write as much stuff. Also, is an affine function, so we can split things up with that, and make:
Remember, because . So, we get:
And, if , we get a contradiction straightaway because the left side is negative, and the right side is nonnegative. Therefore, , and we can rewrite as:
And now, we should notice something really really important. Since can't be , does consistute a nonzero part of , because .
However, is a 1-or-less Lipschitz function, and bounded in , due to being in ! If wasn't Lipschitz, then given any slope, you could find areas where it's ascending faster than that rate. This still happens when it's scaled down, and can only ascend or descend at a rate of 1 or slower there since it's 1-Lipschitz as well. So, in order for to be 1-or-less Lipschitz, must be Lipschitz as well. Actually, we get something stronger, if has a really high Lipschitz constant, then needs to be pretty high. Otherwise, again, wouldn't be 1-or-less Lipschitz, since of it is composed of , which has areas of big slope. Further, if has a norm sufficiently far away from 0, then needs to be pretty high, because otherwise f wouldn't be in , since of it is composed of which has areas distant from 0.
Our most recent inequality (derived under the assumption that there's a point in and ) was:
Assuming hypothetically were were able to show that
then because , we'd get a contradiction, showing that and are disjoint. So let's shift our proof target to trying to show
Let's begin. So, our first order of business is that
This should be trivial to verify, remember that .
Now, , and is 1-Lipschitz, and so is . Our goal now is to impose an upper bound on the Lipschitz constant of . Let us assume that said Lipschitz constant of is above 1. We can find a pair of points where the rise of from the first point to the next, divided by the distance between the points is exceptionally close to the Lipschitz constant of , or equal. If we're trying to have slope up as hard as it possibly can while mixing to make , which is 1-Lipschitz, then the best case for that is one where is sloping down as hard as it can, at a rate of -1. Therefore, we have that
Ie, mixing sloping up as hard as possible and sloping down as hard as possible had better make something that slopes up at a rate of 1 or less. Rearranging this equation, we get:
We can run through almost the same exact argument, but with the norm of . Let us assume that said norm is above 1. We can find a point where attains its maximum/minimum, whichever is further from 0. Now, if you're trying to have be as negative/positive as it possibly can be, while mixing to make , which lies in , then the best case for that is one where is as positive/negative as it can possibly be there, ie, has a value of -1 or 1. In both cases, we have:
Now we can proceed. Since we established that all three of these quantities (1, Lipschitz constant, and norm) are upper bounded by , we have:
And we have exactly our critical
inequality necessary to force a contradiction. Therefore, and must be disjoint. Since is open and convex, and is convex, we can do Hahn-Banach separation to get something that touches and doesn't cut into .
Therefore, we've crafted a that lies above , and is within of over the 1-or-less-Lipschitz functions in , because it doesn't cut into and touches .
This same argument works for any , and it works if we swap and . Thus, since hyperplanes above the graph of an infradistribution function or correspond to points in the corresponding and , and we can take any point in /affine functional above and make a point in /affine functional above (and same if the two are swapped) that approximately agree on , there's always a point in the other infradistribution set that's close in KR-distance and so and have
And with that, we get
And we're done! Hausdorff distance between sets is within a factor of 2 of the IKR-distance between their corresponding infradistributions.
Proposition 43: A Cauchy sequence of infradistributions converges to an infradistribution, ie, the space is complete under .
So, the space of closed subsets of is complete under the Hausdorff metric. Pretty much, by proposition 42, a Cauchy sequence of infradistributions in the IKR-distance corresponds to a Cauchy sequence of infradistribution sets converging in Hausdorff-distance, so to verify completeness, we merely need to double-check that the Hausdorff-limit of the sets fulfills the various different properties of an infradistribution. Every point in , the limiting set, has the property that there exists some Cauchy sequence of points from the sets that limit to it, and also every Cauchy sequence of points from the sets has its limit point be in .
So, for nonemptiness, you have a sequence of nonempty sets of a-measures limiting to each other in Hausdorff-distance, so the limit is going to be nonempty.
For upper completion, given any point , and any a-measure, you can fix a Cauchy sequence limiting to , and then consider the sequence , which is obviously Cauchy (you're just adding the same amount to everything, which doesn't affect the KR-distance), and limits to , certifying that , so is upper-complete.
For closure, the Hausdorff limit of a sequence of closed sets is closed.
For convexity, given any two points and in , and any , we can fix a Cauchy sequence and converging to those two points, respectively, and then consider the sequence , which lies in (due to convexity of all the ), and converges to , witnessing that this point is in , and we've just shown convexity.
For normalization, it's most convenient to work with the positive functionals, and observe that, because all the and all the because of normalization, the same property must apply to the limit, and this transfers over to get normalization for your infradistribution set.
Finally, there's the compact-projection property. We will observe that the projection of the a-measures in to just their measure components, call the set , must converge in Hausdorff-distance. The reason for this is because if they didn't, then you could find some and arbitrarily late pairs of inframeasures where and have Hausdorff-distance , and then pick a point in (or ) that's KR-distance away from the other projection. Then you can pair that measure with some gigantic term to get a point in (or , depending on which one you're picking from), and there'd be no point in (or ) within distance of it, because the measure component would only be able to change by if you moved that far, and you need to change the measure component by to land within (or ).
Because this situation occurs infinitely often, it contradicts the Cauchy-sequence-ness of the sequence, so the projections must converge in Hausdorff distance on the space of measures over . Further, they're precompact by the compact-projection property for the (which are infradistributions), so their closures are compact. Further, the Hausdorff-limit of a series of compact sets is compact, so the Hausdorff limit of the projections (technically, their closures) is a compact set of measures. Further, any sequence which converges to some , has its projection being , which limits to show that is in this Hausdorff limit. Thus, all points in project down to be in a compact set of measures, and we have compact-projection for , which is the last condition we need to check to see if it's an infradistribution.
So, the Hausdorff-limit of a Cauchy sequence of infradistribution sets is an infradistribution set, and by the strong equivalence of the infra-KR metric and Hausdorff-distance, a Cauchy limit of the infra-KR metric must be an infradistribution, and the space is complete under the infra-KR metric.
Proposition 44: If a sequence of infradistributions converges in the IKR distance for one complete metric that is equipped with, it will converge in the IKR distance for all complete metrics that could be equipped with.
So, as a brief recap, could be equipped with many different complete metrics that produce the relevant topology. Each choice of metric affects what counts as a Lipschitz function, affecting the infra-KR metric on infradistributions, as well as the KR-distance between a-measures, and the Hausdorff-distance. So, we need to show that regardless of the metric on , a sequence of convergent infradistributions will still converge. Use for the original metric on and for the modified metric on , and similarly, and for the KR-metrics on measures, and for the Hausdorff distance induced by the two measures.
Remember, our infradistribution sets are closed under adding to them, and converge according to to the set .
What we'll be doing is slicing up the sets in a particular way. In order to do this, the first result we'll need is that, for all , the set
converges, according to , to the set
So, here's the argument for this. We know that the projection sets
are precompact, ie, have compact closure, and Hausdorff-limit according to to the set
(well, actually, they limit to the closure of that set)
According to our Lemma 3, this means that the set
(well, actually, its closure) is a compact set in the space of measures. Thus, it must have some maximal amount of measure present, call that quantity , the maximal Lipschitz constant of any of the infradistributions in the sequence. It doesn't depend on the distance metric is equipped with.
Now, fix any . There's some timestep where, for all greater timesteps, .
Now, picking a point in with , we can travel distance according to and get a point in , and the term can only change by or less when we move our a-measure a little bit, so we know that our nearby point lies in
But, what if our point in has ? Well then, we can pick some arbitrary point (by normalization for ), and go:
And then we have to be a little careful. by assumption. Also, we can unpack the distance to get
And the worst-case for distance, since all the measures have their total amount of measure bounded above by , would be being 1 on one of the measures and -1 on another one of the measures, producing:
So, the distance from to
according to is at most
And then, because this point has a value of at most
Because , the value upper bound turns into
Which is a sufficient condition for that mix of two points to be only distance from a point in with a upper bound on the term, so we have that the distance from
to
is at most
Conversely, we can flip and , to get this upper bound on the Hausdorff distance between these two sets according to .
And, since and are fixed, and for any , we can find some time where the distance between these two "lower parts" of the and sets is upper-bounded by
We can have this quantity limit to 0, showing that
For any .
Ok, this is part of our result. No matter which we chop off the infradistribution sets at, we get convergence of those chopped pieces according to .
Now, we'll need a second important result, that:
Now, we only have to establish one direction of low Hausdorff distance in the limit, that any point in the latter set is close to a point in the former set, because the former set is a subset of the latter set and has distance 0 to it.
What we can do is, because has the compact-projection property, the set is precompact, so for any , we can select finitely many points in it such that every point in is within distance of our finite subset according to . For these finitely many measures, there must be some term associated with them where , so you can just take the largest one of them, and let that be your . Then, all your finitely many measures, when paired with or any larger number, will be present in , so
Because all points in the latter set are close to one of finitely many points, which are all present in the former set, so the Hausdorff-1 distance must be low.
At this point, we can truly begin. We have produced the dual results:
And
And we also know that, because limits to according to 1-Hausdorff distance, and projection is 1-Lipschitz,
Now, here's the thing. (The closure of) all of these sets are compact. For instance,
will always be compact, because any sequence in here must have a subsequence where its measure converges according to (due to the compact-projection property applied to ), and then because is bounded in , we can pick out another convergent subsequence for that. Plus, it's the intersection of a closed set () and another closed set , so it's closed. All sequences have a convergent subsequence and it's closed, so this set is compact. By identical arguments,
is compact. And for
it's the projection of a compact set from earlier arguments, and
must be precompact by the compact-projection property, so it has compact closure. The exact same argument applies to
as well.
Now, for compact sets, convergence in Hausdorff-distance only depends on the topology of the underlying space, not the specific metric it's equipped with. Just as long as the metrics induce the same topology. And the weak topology on the space of measures, or on the space of a-measures, doesn't depend one bit on the metric that is equipped with, just with the topology. So, the properties of these sets limiting to each other still works when has its metric changed. Because, for measures/a-measures, we end up using the metric, but that induces the same topology on the space of a-measures, so the compact sets still converge in the metric. So, we still have our triple results of:
And
And
Now, here's how to argue that limits to in . Fix some . From our limits above, there's some value of where
And for that value of , and that , we have that there's some value of where, for all greater numbers,
And
Now, we're going to need to go in two directions for this. First, we pick a point in and show that it's close to a point in . Second, we pick a point in and show it's close to a point in .
Let . We have two possibilities. One possibility is that . Then, because
we only have to go distance to get to . The second possibility is that .
In this case, lies in the set
Which has distance from
Because we have that
Just scooch over and keep the term the same. Additionally, the set
has distance from the set
Because we have:
Further, the set
is a subset of , because is upper-closed. So, either way, we only have to travel 2-distance from to get to
Now for the reverse direction, starting with a point and getting to a nearby point in . Again, we can split into two cases. In our first case, , and because
we only have to go distance to get to . The second possibility is that . In such a case, would be guaranteed to lie in the set
which has distance from the set
Because we have:
Further, the set
has distance according to from the set
Because the latter components are the projection of the sets
and
And we already know that
So, given our point , we just have to go distance to get to the set
And all points in this set lie in because of upper completion.
Thus, given any , there's a tail of the sequence where the are all within distance (according to ) of , so if thinks that converge to , will think that as well. Further, the metric on which induces and are arbitrary, so a sequence of infradistributions converging happens regardless of which complete metric is equipped with.
Proposition 45: If a sequence of infradistributions converges to in the infra-KR distance, then for all bounded continuous functions , .
Now, the infra-KR metric is:
So, to begin with, if converges to , all bounded Lipschitz functions must have or else the infra-KR distance wouldn't converge.
For the next two, since the infra-KR distance is strongly equivalent to Hausdorff distance, and we know that
is always precompact, and they Hausdorff-limit to
And we have our Lemma 3 that the union of compact-sets which Hausdorff-limit to something is compact, so the set
is compact (well, actually precompact, but just take the closure).
Because compactness of a set of measures implies that the amount of measure doesn't run off to infinity, there's some that's a shared Lipschitz constant for all the .
Also, any uniformly continuous function can be built as the uniform limit of Lipschitz-continuous functions from above and below, so given some uniformly continuous , we can make a sequence limiting to it from above, and a sequence limiting to it from below. Then, we have:
And similarly, we can get:
Now, regardless of and ,
So, even though we don't necessarily know that the limit actually exists for , we at least know that all the values are bounded in an interval of known maximum size, which converges to the interval
Which, by monotonicity for , lies in that interval.
So, all the limit points of the sequence are in that interval. Now, as gets unboundedly high, the difference between and gets unboundedly small, so for gigantic , we have that any limit points of the sequence must be in a really tiny interval. Taking the limit, we have that the interval crunches down to a single point, and actually limits to . We've shown it now for uniformly continuous functions.
Time to expand this to continuous functions in full generality. Again,
\{m|\exists b,n:(m,b)\in H_n\}
is precompact, so this implies that for all , there is a compact set where all minimal points of (regardless of the ! Even for the final infradistribution set !) have measure outside of that compact set.
Transferring to functionals, this means that for all the h_n (and h), C_{\eps} is an \eps-almost-support, and any two functions that differ on that set have expectations correspondingly close together.
Given some arbitrary , let be identical to on , (ie, uniformly continuous on that compact set), and extend it in an arbitrary uniformly continuous way to all of while staying in , by the Tietze Extension Theorem.
Regardless of the , since is a -almost-support for , we have that
Why? Well, and are identical on a -almost support for , so the magnitude of their difference is proportional to , and the maximum level of difference between the two, and and are both in , so they can differ by at most twice that much. The same result extends to the limit itself.
Because is bounded, and is arbitrary, we have that limits to uniformly in .
Now, we can go:
And now, to invoke the Moore-Osgood theorem to swap the two limits, we need two results. One is that, for all ,
(which is true because was selected to be uniformly continuous).
The second result we need is that for all ,
uniformly in . Which is true. So, we can invoke the Moore-Osgood theorem and swap the two results, to get
So, we have our final result that
For all continuous bounded functions , and we're done.
Proposition 46: A set of infradistributions is precompact in the topology induced by the IKR distance iff:
1:There's an upper bound on the Lipschitz constant of all the infradistributions in the set
2: There's a sequence of compact sets , one for each , that are compact -almost-supports for all infradistributions in the set.
3: The set of infradistributions is b-uniform.
This proof will proceed in three phases. The first phase is showing that compactness implies conditions 1 and 2. The second phase is showing that a failure of condition 3 permits you to construct a sequence with no convergent subsequence, so a failure of condition 3 implies non-precompactness, and taking the contrapositive, precompactness implies condition 3. That gets us one half of the iff implication, that precompactness implies the three conditions. For the second half of the iff implication, we assume the three conditions, and construct a convergent subsequence.
So, for our first step, due to working in Hausdorff spaces, we can characterize precompactness as "is a subset of a compact set"
Also, the projection mapping of type
Which takes a closed set of a-measures (an infradistribution) and projects it down (and takes the closure) to make a compact set of measures (by the compact-projection property), is Lipschitz (projection of sets down to one coordinate keeps their Hausdorff-distance the same or contracts it), so it's continuous. So, a compact set of infradistributions (because the infra-KR metric is strongly equivalent to the Hausdorff-distance), would get mapped to a compact set of sets of measures (because the image of a compact set is compact), which by Lemma 3, unions together to make a compact set of measures.
Doing the same process (taking your precompact set of infradistributions, mapping it through the projection, unioning together all the sets) makes a subset of that compact set of measures, so it's precompact.
Also, the necessary-and-sufficient condition for precompactness of a set of measures is that: There be a maximum amount of measure present, and for all there is a compact set where all the measures assign measure outside of that compact set.
So, if you take a precompact set of infradistributions, all the measure components of points in any of them have a uniform upper bound on the amount of measure present, and we also have the shared compact almost-support property. So, precompactness implies conditions 1 and 2.
Time for phase 2 of our proof, showing that a failure of condition 3 implies that there's a sequence from it with no convergent subsequence in the KR-metric.
Assume, for contradiction, that we indeed have a precompact set which fails condition 3. Using I to index your set of infradistributions, Condition 3 is:
Where is the set formed from the set by deleting all points with and taking the upper completion again. Negating this, we see that the set of infradistribution sets failing this condition is stated as:
So, let be your of choice, and let be the infradistribution such that .
Because we're assuming that this sequence of infradistributions was selected from a precompact set, we have a guarantee that the sequence has a convergent subsequence limiting to some . We'll still be using n as our limiting variable, hopefully this doesn't cause too much confusion.
Now, from our earlier proof of Proposition 44, we can crib two results from the proof. From this proof, we know that because limits to in Hausdorff-distance,
and also,
For any . To craft this into a more usable form, we can realize that for all ,
So the distance from the former set to the latter set is 0. Also, any point in can be written as . Either , in which case the same point is present in and the distance to enter that set is 0, or , in which case the m component is present in , and from
For large , you just have to adjust the component a little bit to and then you know there's some , so by upper completion, , and this point is close to .
We took a point in and showed it's in (trivially), and took a point in and showed there's a nearby point in , so we have our modified result that:
For another modified result, due to the fact that we know
We can take any point in , descend to a point in (but cut off at ), shift over a bit to get to (but cut off at ), and add the same amount of value to this point as you took off, to make a point in that's nearby to the point you started with, and flip the two sets, to argue that
Now, here's what you do from here. We know our value. Because of the fact that
we can identify some finite value (call it ) where, for it and all greater values,
Locking this value in, and because of
and limiting to , so
We can find some finite where, for all greater values,
and
There's one last thing to note. The sequence was selected as a subsequence of a sequence of infradistributions selected so that the Hausdorff-distance between an infradistribution and its truncation of minimal points at a certain value was always or more.
Accordingly let be the value of the cutoff for (ie, the index of before we did the reindexing when we passed to a subsequence). Due to our construction process for the , we have that:
Further, diverges to infinity, so there's some where . Because, for that , , we have that
Taking stock of all we have, we know that there is some n where:
and
and
and
and, by our construction process for the sequence,
So now we can go:
But we just showed , a contradiction. Our one assumption that we made was that there could be a set of infradistributions that was both precompact and that failed to meet the shared b-uniformity condition. Therefore, if a set of infradistributions is precompact, it must fulfill the shared b-uniformity condition.
Because we've shown that precompactness implies a Lipschitz bound and shared compact-almost-support in part 1 of the proof, and that precompactness implies the shared b-uniformity condition, we have one direction of our iff statement. Precompactness implies these three properties.
Now we'll go in the other direction and establish that if these three properties are fulfilled, then every sequence of infradistributions has a convergent subsequence.
So, let's say we have some set of infradistributions that fulfills the following three properties:
(this is bounded Lipschitz constant)
(this is shared almost-compact-support)
(this is the b-uniformity condition)
Note that is but you chop off all the points in it with and regenerate it via upper-completion.
First, the compact almost-support condition and bounded amount of measure (and closure) are necessary-and-sufficient conditions for a set of measures to be compact. Thus, letting be defined as:
(ie, measures where the measure outside of the compact set is or less, for all , and the amount of measure is upper-bounded by , where that sequence of compact sets and measure upper bound came from the relevant sequence of compact sets and measure upper bound on the set , from the fact that we assumed a Lipschitz upper bound and shared compact-almost-support for it).
We know that is a compact set. All the measure components of all the points in all the lie in this set. Thus, all sets can be thought of as being a subset of the space
In particular, all our (from our arbitrily selected sequence) are a subset of this space.
Now, here's what we do. Fix any . From the b-uniformity condition on the , there is some quantity where
What we're going to do is find a subsequence of the sequence where the sequence converges in Hausdorff-distance.
Here's how to do it. We can take each and chop it off at a value of , to make a closed set which is a subset of
Which, being a product of two compact sets, is compact. Further, the space of compact subsets of a compact space (equipped with a Hausdorff distance-metric) is compact. So, we can isolate some subsequence where the sets converge in Hausdorff-distance. If sets converge in Hausdorff-distance, their upper completions do too, so we have isolated a subsequence of our sequence where the sets converge in Hausdorff-distance. Also, each infradistribution set is only Hausdorff-distance away, at most, from the corresponding . So, for sufficiently large , the subsequence we picked out is all wandering around in a ball of size .
Now, here's what we do. Start with your sequence. Use the argument we described above for to isolate a subsequence where the Hausdorff-distance of the subsequence eventually is wandering around in a ball (w.r.t. Hausdorff-distance) of size 2 in the tail. Now, use the argument for to isolate a subsequence of that wandering around in a ball (w.r.t. Hausdorff-distance) of size 1 in the tail. And, y'know, repeat for all finite , to get a subsequence embedded in all previous subsequences which, in the tail, is wandering around in a ball of size in the tail.
Now build one final subsequence, which takes the first element of the subsequence, the second element of the subsequence, the third element of the subsequence, and so on. It eventually enters the tail of the sequence for all finite , so, regardless of , the tail of that sequence starts wandering around in a ball of size . Thus, the sequence is actually Cauchy, and must converge, as we've previously shown that the space is complete in the KR/Hausdorff metric.
Assuming the three conditions on a set of infradistributions has let us show that every sequence has a convergent subsequence, and thus must be precompact, so we have the reverse direction of our iff statement and we're done.
Proposition 47: When is a compact Polish space, the spaces of cohomogenous, crisp, and sharp infradistributions are all compact in equipped with the infra-KR metric.
So, from Proposition 46, necessary-and-sufficient conditions for a set of infradistributions to be compact is:
1: Bounded Lipschitz constant/bounded amount of measure on minimal points. 1-Lipschitz, C-additive, cohomogenous, crisp, and sharp infradistributions fulfill this because of their iff minimal point characterizations.
2: Shared compact almost-supports. is compact by assumption, and it's the whole space so it must be a support of everything, and thus an -almost-support of everything, so this is trivially fulfilled for all infradistributions when is compact.
3: b-uniformity. Homogenous, cohomogenous, crisp, and sharp infradistributions fulfill this because they all have their minimal points having , and the condition is "there's gotta be some value you can go up to in order to have a guarantee of being within of the full set in Hausdorff-distance if you delete all the minimal points with a higher value, for all ".
Thus, cohomogenous, crisp, and sharp infradistributions fulfill the necessary-and-sufficient conditions for precompactness, and all we need is to check that the set of them is closed in the KR-metric.
To do this, we'll invoke Proposition 45, that: If a sequence of infradistributions converges to in the infra-KR distance, then for all bounded continuous functions , .
The characterization for cohomogenity was that So, we can go:
Showing that the limit of cohomogenous infradistributions is cohomogenous, and we've verified closure, which is the last property we needed for cohomogenity.
The characterization for crispness was that: for . To show it's preserved under limits, we can go:
Showing that the limit of crisp infradistributions is crisp, and we've verified closure. Sharpness is a bit more tricky.
Let's say a sequence of sharp infradistributions limits to , and all the are associated with the compact set . The minimal points of the consist of all probability distributions supported over , with a value of 0. Thus, all the sets can be written as , and so, if they converge in Hausdorff-distance, then the sets of probability distributions must converge in Hausdorff-distance, which is impossible if don't converge in Hausdorff-distance, because the dirac-delta distributions on points in the sets can transport a failure of Hausdorff-convergence of the sets up to a failure of Hausdorff-convergence of the sets of probability distributions.
Thus, the converge to a compact set in Hausdorff-distance.
We also know that, because sharp infradistributions are crisp infradistributions, and crisp infradistributions are preserved under limits, all we have to check is if the minimal points of consist exactly of all probability distributions supported over . Now, is the closed convex hull of all the dirac-delta distributions on points in , and all those points have a sequence from the that converge to them, so the associated dirac-delta distributions converge and witness that all the dirac-delta distributions on points in are present in the set . So, because infradistribution sets are closed and convex, all of must be present as minimal points in . Now we just need to rule out the presence of additional points.
Let's say we've got some probability distribution which is not supported entirely on , there's probability mass outside that set. Because probability distributions in Polish spaces have the property that the probability on open supersets of the set of interest can be shrunk down to have arbitrarily similar measure, we can find some open superset of , call it , which has probability mass outside of it. Any point outside of must be some distance away from , because otherwise, you could pick a sequence of points in which gets arbitrarily close to the (closed) complement of , find a convergent subsequence since is compact, and you'd have a limit point which is in (due to closure) and also in the complement of (due to getting arbitrarily close to said closed set), disproving that the two sets are disjoint (because is a superset of )
Ok, so our hypothetical "bad" probability distribution has probability measure at a distance of or more from our set of interest, . The KR distance is equivalent to the earthmover distance, which is "how much effort would it take to move this pile of dirt/pile of probability mass into the other distribution/pile of dirt".
All minimal points in must have a sequence of minimal points in limiting to them, because it's the Hausdorff-limit of those infradistributions. So, we've got some sequence limiting to our hypothetical bad distribution , but all the lie in .
There is some value where , and also where . Now, we can get something really interesting.
So, we agree that has probability mass a distance of or more away from the set , right? This means that the earthmover distance from to any point in must be or more, because you've gotta move measure a distance of at the very least.
However, the earthmover distance from to is strictly below , and because , it's only got an earthmover distance of less than to go to arrive at a probability distribution in , because all dirt piled up in is only distance away from . So, the distance from to is only
distance. But we know it's impossible for it to be any closer than distance from that set, so we have a contradiction, and no such can exist in . Thus, has all the probability distributions over and nothing else, so the limit of sharp infradistributions is sharp, and we're done.
0 comments
Comments sorted by top scores.