LBIT Proofs 6: Propositions 39-47

post by Diffractor · 2020-12-16T03:33:22.665Z · LW · GW · 0 comments

Contents

No comments

 

Proposition 39: Given a crisp infradistribution  over , an infrakernel  from  to infradistributions over , and suggestively abbreviating  as (hypothesis ) and  as  (your infraprior where you have Knightian uncertainty over how to mix the hypotheses), then

Proof: Assume that  and  are functions of type  and  respectively, ie, likeliehood and utility doesn't depend on which hypothesis you're in, just what happens. First, unpack our abbreviations and what an update means. 

Then use the definition of an infrakernel pushforward.

For the next thing, we're just making types a bit more explicit,  only depend on , not .

Then we pack the semidirect product back up.

And pack the update back up.

At this point, we invoke the Infra-Disintegration Theorem.

We unpack what our new modified prior is, via the Infra-Disintegration Theorem.

and unpack the semidirect product.

Now we unpack  and .

And unpack what  is

And reabbreviate  as ,

And then pack it back up into a suggestive form as a sort of expectation.

And we're done.

Proposition 40: If a likelihood function  is 0 when , and  and , then 

And then we apply Markov's inequality, that for any probability distribution,

Also, (because  is 0 when ), so monotonicity means that

So, we can get:

And we're done.

 

Proposition 41: The IKR-metric is a metric.

So, symmetry is obvious, as is one direction of identity of indiscernibles (that the distance from an infradistribution to itself is 0). That just leaves the triangle inequality and the other direction of identity of indiscernibles. For the triangle inequality, observe that for any particular  (instead of the supremum), it would fulfill the triangle inequality, and then it's an easy exercise for the reader to verify that the same property applies to the supremum, so the only tricky part is the reverse direction of identity of indiscernibles, that two infradistributions which have a distance of 0 are identical.

First, if , then  and  must perfectly agree on all the Lipschitz functions. And then, because uniformly continuous functions are the uniform limit of Lipschitz functions,  and  must perfectly agree on all the uniformly continuous functions.

Now, we're going to need a somewhat more sophisticated argument. Let's say that the sequence  is uniformly bounded and limits to  in  equipped with the compact-open topology (ie, we get uniform convergence of  to  on all compact sets). Then, for any infradistributions,  will limit to . Here's why. For any , there's some compact set  that accounts for almost all of why a function inputted into an infradistribution has the value it does. Then, what we can do is realize that  will, in the limit, be incredibly close to , due to  and  disagreeing by a bounded amount outside the set  and only disagreeing by a tiny amount on the set , and the Lipschitzness of .

Further, according to this mathoverflow answer, uniformly continuous functions are dense in the space of all continuous functions when  is equipped with the compact-open topology, so given any function , we can find a sequence of uniformly continuous functions  limiting to  in the compact-open topology, and then,

And so,  and  agree on all continuous functions, and are identical, if they have a distance of 0, giving us our last piece needed to conclude that  is a metric.

 

Proposition 42: The IKR-metric for infradistributions is strongly equivalent to the Hausdorff distance (w.r.t. the KR-metric) between their corresponding infradistribution sets.

Let's show both directions of this. For the first one, if the Hausdorff-distance between  is , then for all a-measures  in , there's an a-measure  in  that's only  or less distance away, according to the KR-metric (on a-measures).

Now, by LF-duality, a-measures in H correspond to hyperplanes above . Two a-measures being  apart means, by the definition of the KR-metric for a-measures, that they will assign values at most  distance apart for 1-Lipschitz functions in .

So, translating to the concave functional view of things,  and  being  apart means that every hyperplane above h has another hyperplane above  that can only differ on the 1-Lipschitz 1-bounded functions by at most , and vice-versa.

Let's say we've got a Lipschitz function . Fix an affine functional/hyperplane  that touches the graph of  at . Let's try to set an upper bound on what  can be. If  is 1-Lipschitz and 1-bounded, then we can craft a  above  that's nearby, and

Symmetrically, we can swap  and  to get , and put them together to get:

For the 1-Lipschitz functions.

Let's tackle the case where  is either more than 1-Lipschitz, or strays outside of . In that case,  is 1-Lipschitz and bounded in . We can craft a  that only differs on 1-Lipschitz functions by  or less. Then, since, for affine functionals,  and using that  and  are close on 1-Lipschitz functions, which  and 0 are, we can go:

And then we swap out  for  with a known penalty in value, we're taking an overestimate at this point.

This argument works for all . And, even though we just got an upper bound, to rule out  being significantly below , we could run through the same upper bound argument with  instead of , to show that  can't be more than  above .

So, for all Lipschitz . Thus, for all Lipschitz ,

And therefore,

This establishes one part of our inequalities. Now for the other direction.

Here's how things are going to work. Let's say we know the IKR-distance between  and . Our task will be to stick an upper bound on the Hausdorff-distance between  and . Remember that the Hausdorff-distance being low is equivalent to "any hyperplane above  has a corresponding hyperplane above  that attains similar values on the 1-or-less-Lipschitz functions".

So, let's say we've got , and a . Our task is, knowing , to craft a hyperplane above  that's close to  on the 1-Lipschitz functions. Then we can just swap  and , and since every hyperplane above  is close (on the 1-Lipschitz functions) to a hyperplane above , and vice-versa,  and  can be shown to be close. We'll use Hahn-Banach separation for this one.

Accordingly, let the set  be the set of  where , and:

That's... quite a mess. It can be thought of as the convex hull of the hypograph of , and the hypograph of  restricted to the 1-Lipschitz functions in  and shifted down a bit. If there was a  that cuts into  and scores lower than it, ie , we could have , and  to observe that  cuts into the set . Conversely, if an affine functional doesn't cut into the set , then it lies on-or-above the graph of .

Similarly, if  undershoots  over the 1-or-less-Lipschitz functions in , it'd also cut into . Conversely, if the hyperplane  doesn't cut into , then it sticks close to  over the 1-or-less-Lipschitz functions.

This is pretty much what  is doing. If we don't cut into it, we're above  and not too low on the functions with a Lipschitz norm of 1 or less.

For Hahn-Banach separation, we must verify that  is convex and open. Convexity is pretty easy.

First verification: Those numbers at the front add up to 1 (easy to verify), are both in  (this is trivial to verify), and + isn't 1 (this is a mix of two numbers that are both below , so this is easy). Ok, that condition is down. Next up: Is our mix of  and  1-Lipschitz and in ? Yes, the mix of 1-Lipschitz functions in that range is 1-Lipschitz and in that range too. Also, is our mix of  and  still in ? Yup.

That leaves the conditions on the b terms. For the first one, just observe that mixing two points that lie strictly below  (a hyperplane) lies strictly below it as well. For the second one, since  is concave, mixing two points that lie strictly below its graph also lies strictly below its graph. Admittedly, there may be divide-by-zero errors, but only when  is 0, in which case, we can have our new  and  be anything we want as long as it fulfills the conditions, it still defines the same point (because that term gets multiplied by 0 anyways). So  is convex.

But... is  open? Well, observe that the region under the graph of  on  is open, due to Lipschitzness of . We can wiggle  and  around a tiny tiny little bit in any direction without matching or exceeding the graph of . So, given a point in , fix your tiny little open ball around . Since  can't be 1, when you mix with , you can do the same mix with your little open ball instead of the center point, and it just gets scaled down (but doesn't collapse to a point), making a little tiny open ball around your arbitrarily chosen point in . So  is open.

Now, let's define a  that should be convex, so we can get Hahn-Banach separation going (as long as we can show that  and  are disjoint). It should be chosen to forbid our separating hyperplane being too much above  over the 1-or-less Lipschitz functions. So, let  be:

Obviously, cutting into this means your hyperplane is too far above  over the 1-or-less-Lipschitz functions in . And it's obviously convex, because 1-or-less-Lipschitz functions in  are a convex set, and so is the region above a hyperplane .

All we need to do now for Hahn-Banach separation is show that the two sets are disjoint. We'll assume there's a point in both of them and derive a contradiction. So, let's say that  is in both  and . Since it's in ,

But also,  with the 's and 's and  fulfilling the appropriate properties, because it's in . Since  and , we'll write  as  and  as , where  and  are nonzero. Thus, we rewrite as:

We'll be folding  into a single  term so I don't have to write as much stuff. Also,  is an affine function, so we can split things up with that, and make:

Remember,  because . So, we get:

And, if , we get a contradiction straightaway because the left side is negative, and the right side is nonnegative. Therefore, , and we can rewrite as:

And now, we should notice something really really important. Since  can't be  does consistute a nonzero part of , because .

However,  is a 1-or-less Lipschitz function, and bounded in , due to being in ! If  wasn't Lipschitz, then given any slope, you could find areas where it's ascending faster than that rate. This still happens when it's scaled down, and  can only ascend or descend at a rate of 1 or slower there since it's 1-Lipschitz as well. So, in order for  to be 1-or-less Lipschitz,  must be Lipschitz as well.  Actually, we get something stronger, if  has a really high Lipschitz constant, then  needs to be pretty high. Otherwise, again,  wouldn't be 1-or-less Lipschitz, since  of it is composed of , which has areas of big slope. Further, if  has a norm sufficiently far away from 0, then  needs to be pretty high, because otherwise f wouldn't be in , since  of it is composed of  which has areas distant from 0.

Our most recent inequality (derived under the assumption that there's a point in  and ) was:

Assuming hypothetically were were able to show that

then because , we'd get a contradiction, showing that  and  are disjoint. So let's shift our proof target to trying to show

Let's begin. So, our first order of business is that

This should be trivial to verify, remember that 

Now, , and  is 1-Lipschitz, and so is . Our goal now is to impose an upper bound on the Lipschitz constant of . Let us assume that said Lipschitz constant of  is above 1. We can find a pair of points where the rise of  from the first point to the next, divided by the distance between the points is exceptionally close to the Lipschitz constant of , or equal. If we're trying to have  slope up as hard as it possibly can while mixing to make , which is 1-Lipschitz, then the best case for that is one where  is sloping down as hard as it can, at a rate of -1. Therefore, we have that

Ie, mixing  sloping up as hard as possible and  sloping down as hard as possible had better make something that slopes up at a rate of 1 or less. Rearranging this equation, we get:

We can run through almost the same exact argument, but with the norm of . Let us assume that said norm is above 1. We can find a point where  attains its maximum/minimum, whichever is further from 0. Now, if you're trying to have  be as negative/positive as it possibly can be, while mixing to make , which lies in , then the best case for that is one where  is as positive/negative as it can possibly be there, ie, has a value of -1 or 1. In both cases, we have:

Now we can proceed. Since we established that all three of these quantities (1, Lipschitz constant, and norm) are upper bounded by , we have:

And we have exactly our critical

inequality necessary to force a contradiction. Therefore,  and  must be disjoint. Since  is open and convex, and  is convex, we can do Hahn-Banach separation to get something that touches  and doesn't cut into .

Therefore, we've crafted a  that lies above , and is within  of  over the 1-or-less-Lipschitz functions in , because it doesn't cut into  and touches .

This same argument works for any , and it works if we swap  and . Thus, since hyperplanes above the graph of an infradistribution function  or  correspond to points in the corresponding  and , and we can take any point in /affine functional above  and make a point in /affine functional above  (and same if the two are swapped) that approximately agree on , there's always a point in the other infradistribution set that's close in KR-distance and so  and  have 

And with that, we get

And we're done! Hausdorff distance between sets is within a factor of 2 of the IKR-distance between their corresponding infradistributions.

 

Proposition 43: A Cauchy sequence of infradistributions converges to an infradistribution, ie, the space  is complete under .

So, the space of closed subsets of  is complete under the Hausdorff metric. Pretty much, by proposition 42, a Cauchy sequence of infradistributions  in the IKR-distance corresponds to a Cauchy sequence of infradistribution sets  converging in Hausdorff-distance, so to verify completeness, we merely need to double-check that the Hausdorff-limit of the  sets fulfills the various different properties of an infradistribution. Every point in , the limiting set, has the property that there exists some Cauchy sequence of points from the  sets that limit to it, and also every Cauchy sequence of points from the  sets has its limit point be in .

So, for nonemptiness, you have a sequence of nonempty sets of a-measures limiting to each other in Hausdorff-distance, so the limit is going to be nonempty.

For upper completion, given any point , and any  a-measure, you can fix a Cauchy sequence  limiting to , and then consider the sequence , which is obviously Cauchy (you're just adding the same amount to everything, which doesn't affect the KR-distance), and limits to , certifying that , so  is upper-complete.

For closure, the Hausdorff limit of a sequence of closed sets is closed.

For convexity, given any two points  and  in  , and any , we can fix a Cauchy sequence  and  converging to those two points, respectively, and then consider the sequence , which lies in  (due to convexity of all the ), and converges to , witnessing that this point is in , and we've just shown convexity.

For normalization, it's most convenient to work with the positive functionals, and observe that, because all the  and all the  because of normalization, the same property must apply to the limit, and this transfers over to get normalization for your infradistribution set.

Finally, there's the compact-projection property. We will observe that the projection of the a-measures in  to just their measure components, call the set , must converge in Hausdorff-distance. The reason for this is because if they didn't, then you could find some  and arbitrarily late pairs of inframeasures where  and  have Hausdorff-distance , and then pick a point in  (or ) that's  KR-distance away from the other projection. Then you can pair that measure with some gigantic  term to get a point in  (or , depending on which one you're picking from), and there'd be no point in  (or ) within  distance of it, because the measure component would only be able to change by  if you moved that far, and you need to change the measure component by  to land within  (or ).

Because this situation occurs infinitely often, it contradicts the Cauchy-sequence-ness of the  sequence, so the projections  must converge in Hausdorff distance on the space of measures over . Further, they're precompact by the compact-projection property for the  (which are infradistributions), so their closures are compact. Further, the Hausdorff-limit of a series of compact sets is compact, so the Hausdorff limit of the projections  (technically, their closures) is a compact set of measures. Further, any sequence  which converges to some , has its projection being , which limits to show that  is in this Hausdorff limit. Thus, all points in  project down to be in a compact set of measures, and we have compact-projection for , which is the last condition we need to check to see if it's an infradistribution.

So, the Hausdorff-limit of a Cauchy sequence of infradistribution sets is an infradistribution set, and by the strong equivalence of the infra-KR metric and Hausdorff-distance, a Cauchy limit of the infra-KR metric must be an infradistribution, and the space  is complete under the infra-KR metric.

 

Proposition 44: If a sequence of infradistributions converges in the IKR distance for one complete metric that  is equipped with, it will converge in the IKR distance for all complete metrics that  could be equipped with.

So, as a brief recap,  could be equipped with many different complete metrics that produce the relevant topology. Each choice of metric affects what counts as a Lipschitz function, affecting the infra-KR metric on infradistributions, as well as the KR-distance between a-measures, and the Hausdorff-distance. So, we need to show that regardless of the metric on , a sequence of convergent infradistributions will still converge. Use  for the original metric on  and  for the modified metric on , and similarly,  and  for the KR-metrics on measures, and  for the Hausdorff distance induced by the two measures. 

Remember, our infradistribution sets are closed under adding  to them, and converge according to  to the set .

What we'll be doing is slicing up the sets in a particular way. In order to do this, the first result we'll need is that, for all , the set

converges, according to , to the set

So, here's the argument for this. We know that the projection sets

are precompact, ie, have compact closure, and Hausdorff-limit according to  to the set

(well, actually, they limit to the closure of that set)

According to our Lemma 3, this means that the set

(well, actually, its closure) is a compact set in the space of measures. Thus, it must have some maximal amount of measure present, call that quantity , the maximal Lipschitz constant of any of the infradistributions in the sequence. It doesn't depend on the distance metric  is equipped with.

Now, fix any . There's some timestep  where, for all greater timesteps, .

Now, picking a point  in  with , we can travel  distance according to  and get a point in , and the  term can only change by  or less when we move our a-measure a little bit, so we know that our nearby point lies in

But, what if our point  in  has ? Well then, we can pick some arbitrary point  (by normalization for ), and go:

And then we have to be a little careful.  by assumption. Also, we can unpack the distance to get

And the worst-case for distance, since all the measures have their total amount of measure bounded above by , would be  being 1 on one of the measures and -1 on another one of the measures, producing:

So, the distance from  to

according to  is at most 

And then, because this point has a  value of at most

Because , the  value upper bound turns into 

Which is a sufficient condition for that mix of two points to be only  distance from a point in  with a  upper bound on the  term, so we have that the distance from

to

is at most

Conversely, we can flip  and , to get this upper bound on the Hausdorff distance between these two sets according to .

And, since  and  are fixed, and for any , we can find some time where the distance between these two "lower parts" of the  and  sets is upper-bounded by 

We can have this quantity limit to 0, showing that

For any .

Ok, this is part of our result. No matter which  we chop off the infradistribution sets at, we get convergence of those chopped pieces according to .

Now, we'll need a second important result, that:

Now, we only have to establish one direction of low Hausdorff distance in the limit, that any point in the latter set is close to a point in the former set, because the former set is a subset of the latter set and has distance 0 to it.

What we can do is, because  has the compact-projection property, the set  is precompact, so for any , we can select finitely many points in it such that every point in  is within  distance of our finite subset according to . For these finitely many measures, there must be some  term associated with them where , so you can just take the largest one of them, and let that be your . Then, all your finitely many measures, when paired with  or any larger number, will be present in , so

Because all points in the latter set are close to one of finitely many points, which are all present in the former set, so the Hausdorff-1 distance must be low.

At this point, we can truly begin. We have produced the dual results:

And

And we also know that, because  limits to  according to 1-Hausdorff distance, and projection is 1-Lipschitz,

Now, here's the thing. (The closure of) all of these sets are compact. For instance,

will always be compact, because any sequence in here must have a subsequence where its measure converges according to  (due to the compact-projection property applied to ), and then because  is bounded in , we can pick out another convergent subsequence for that. Plus, it's the intersection of a closed set () and another closed set , so it's closed. All sequences have a convergent subsequence and it's closed, so this set is compact. By identical arguments,

is compact. And for 

it's the projection of a compact set from earlier arguments, and

must be precompact by the compact-projection property, so it has compact closure. The exact same argument applies to

as well.

Now, for compact sets, convergence in Hausdorff-distance only depends on the topology of the underlying space, not the specific metric it's equipped with. Just as long as the metrics induce the same topology. And the weak topology on the space of measures, or on the space of a-measures, doesn't depend one bit on the metric that  is equipped with, just with the topology. So, the properties of these sets limiting to each other still works when  has its metric changed. Because, for measures/a-measures, we end up using the  metric, but that induces the same topology on the space of a-measures, so the compact sets still converge in the  metric. So, we still have our triple results of:

And

And

Now, here's how to argue that  limits to  in . Fix some . From our limits above, there's some value of  where 

And for that value of , and that , we have that there's some value of  where, for all greater numbers,

And

Now, we're going to need to go in two directions for this. First, we pick a point in  and show that it's close to a point in . Second, we pick a point in  and show it's close to a point in .

Let . We have two possibilities. One possibility is that . Then, because 

we only have to go  distance to get to . The second possibility is that .

In this case,  lies in the set

Which has distance  from

Because we have that 

Just scooch over and keep the  term the same. Additionally, the set

has distance  from the set

Because we have: 

Further, the set

is a subset of , because  is upper-closed. So, either way, we only have to travel  2-distance from  to get to 

Now for the reverse direction, starting with a point  and getting to a nearby point in . Again, we can split into two cases. In our first case, , and because

we only have to go  distance to get to . The second possibility is that . In such a case,  would be guaranteed to lie in the set

which has distance  from the set

Because we have: 

Further, the set

has distance  according to  from the set

Because the latter components are the projection of the sets

and

And we already know that

So, given our point , we just have to go  distance to get to the set

And all points in this set lie in  because of upper completion.

Thus, given any , there's a tail of the  sequence where the  are all within  distance (according to ) of , so if  thinks that  converge to  will think that as well. Further, the metric on  which induces  and  are arbitrary, so a sequence of infradistributions converging happens regardless of which complete metric  is equipped with.

 

Proposition 45: If a sequence of infradistributions  converges to  in the infra-KR distance, then for all bounded continuous functions .

Now, the infra-KR metric is:

So, to begin with, if  converges to , all bounded Lipschitz functions must have  or else the infra-KR distance wouldn't converge.

For the next two, since the infra-KR distance is strongly equivalent to Hausdorff distance, and we know that

is always precompact, and they Hausdorff-limit to

And we have our Lemma 3 that the union of compact-sets which Hausdorff-limit to something is compact, so the set

is compact (well, actually precompact, but just take the closure).

Because compactness of a set of measures implies that the amount of measure doesn't run off to infinity, there's some  that's a shared Lipschitz constant for all the .

Also, any uniformly continuous function can be built as the uniform limit of Lipschitz-continuous functions from above and below, so given some uniformly continuous , we can make a  sequence limiting to it from above, and a  sequence limiting to it from below. Then, we have:

And similarly, we can get:

Now, regardless of  and ,

So, even though we don't necessarily know that the limit actually exists for , we at least know that all the values are bounded in an interval of known maximum size, which converges to the interval

Which, by monotonicity for  lies in that interval.

So, all the limit points of the  sequence are in that interval. Now, as  gets unboundedly high, the difference between  and  gets unboundedly small, so for gigantic , we have that any limit points of the  sequence must be in a really tiny interval. Taking the limit, we have that the interval crunches down to a single point, and  actually limits to . We've shown it now for uniformly continuous functions.

Time to expand this to continuous functions in full generality. Again,

\{m|\exists b,n:(m,b)\in H_n\}

is precompact, so this implies that for all , there is a compact set  where all minimal points of  (regardless of the ! Even for the final infradistribution set !) have  measure outside of that compact set.

Transferring to functionals, this means that for all the h_n (and h), C_{\eps} is an \eps-almost-support, and any two functions that differ on that set have expectations correspondingly close together.

Given some arbitrary , let  be identical to  on , (ie, uniformly continuous on that compact set), and extend it in an arbitrary uniformly continuous way to all of  while staying in , by the Tietze Extension Theorem.

Regardless of the , since  is a -almost-support for , we have that

Why? Well,  and  are identical on a -almost support for , so the magnitude of their difference is proportional to , and the maximum level of difference between the two, and  and  are both in , so they can differ by at most twice that much. The same result extends to the limit  itself.

Because  is bounded, and  is arbitrary, we have that  limits to  uniformly in .

Now, we can go:

And now, to invoke the Moore-Osgood theorem to swap the two limits, we need two results. One is that, for all ,

(which is true because  was selected to be uniformly continuous).

The second result we need is that for all ,

uniformly in . Which is true. So, we can invoke the Moore-Osgood theorem and swap the two results, to get

So, we have our final result that 

For all continuous bounded functions , and we're done.

 

Proposition 46: A set of infradistributions  is precompact in the topology induced by the IKR distance iff: 
1:There's an upper bound on the Lipschitz constant of all the infradistributions in the set
2: There's a sequence of compact sets , one for each , that are compact -almost-supports for all infradistributions in the set.
3: The set of infradistributions is b-uniform.

This proof will proceed in three phases. The first phase is showing that compactness implies conditions 1 and 2. The second phase is showing that a failure of condition 3 permits you to construct a sequence with no convergent subsequence, so a failure of condition 3 implies non-precompactness, and taking the contrapositive, precompactness implies condition 3. That gets us one half of the iff implication, that precompactness implies the three conditions. For the second half of the iff implication, we assume the three conditions, and construct a convergent subsequence.

So, for our first step, due to working in Hausdorff spaces, we can characterize precompactness as "is a subset of a compact set"

Also, the projection mapping of type

Which takes a closed set of a-measures (an infradistribution) and projects it down (and takes the closure) to make a compact set of measures (by the compact-projection property), is Lipschitz (projection of sets down to one coordinate keeps their Hausdorff-distance the same or contracts it), so it's continuous. So, a compact set of infradistributions (because the infra-KR metric is strongly equivalent to the Hausdorff-distance), would get mapped to a compact set of sets of measures (because the image of a compact set is compact), which by Lemma 3, unions together to make a compact set of measures.

Doing the same process (taking your precompact set of infradistributions, mapping it through the projection, unioning together all the sets) makes a subset of that compact set of measures, so it's precompact.

Also, the necessary-and-sufficient condition for precompactness of a set of measures is that: There be a maximum amount of measure present, and for all  there is a compact set  where all the measures assign  measure outside of that compact set.

So, if you take a precompact set of infradistributions, all the measure components of points in any of them have a uniform upper bound on the amount of measure present, and we also have the shared compact almost-support property. So, precompactness implies conditions 1 and 2.

Time for phase 2 of our proof, showing that a failure of condition 3 implies that there's a sequence from it with no convergent subsequence in the KR-metric.

Assume, for contradiction, that we indeed have a precompact set which fails condition 3. Using I to index your set of infradistributions, Condition 3 is:

Where  is the set formed from the set  by deleting all points with  and taking the upper completion again. Negating this, we see that the set of infradistribution sets  failing this condition is stated as:

So, let  be your  of choice, and let  be the infradistribution  such that .

Because we're assuming that this sequence of infradistributions was selected from a precompact set, we have a guarantee that the sequence  has a convergent subsequence limiting to some . We'll still be using n as our limiting variable, hopefully this doesn't cause too much confusion. 

Now, from our earlier proof of Proposition 44, we can crib two results from the proof. From this proof, we know that because  limits to  in Hausdorff-distance,

and also,

For any . To craft this into a more usable form, we can realize that for all 

So the distance from the former set to the latter set is 0. Also, any point in  can be written as . Either , in which case the same point is present in  and the distance to enter that set is 0, or , in which case the m component is present in , and from

For large , you just have to adjust the  component a little bit to  and then you know there's some , so by upper completion, , and this point is close to .

We took a point in  and showed it's in  (trivially), and took a point in  and showed there's a nearby point in , so we have our modified result that:

For another modified result, due to the fact that we know 

We can take any point in , descend to a point in  (but cut off at ), shift over a bit to get to  (but cut off at ), and add the same amount of  value to this point as you took off, to make a point in  that's nearby to the point you started with, and flip the two sets, to argue that

Now, here's what you do from here. We know our  value. Because of the fact that

we can identify some finite  value (call it ) where, for it and all greater values,

Locking this value in, and because of

and  limiting to , so

We can find some finite  where, for all greater values,

and

There's one last thing to note. The sequence  was selected as a subsequence of a sequence of infradistributions selected so that the Hausdorff-distance between an infradistribution and its truncation of minimal points at a certain  value was always  or more.

Accordingly let  be the value of the cutoff for  (ie, the index of  before we did the reindexing when we passed to a subsequence). Due to our construction process for the , we have that:

Further,  diverges to infinity, so there's some  where . Because, for that , we have that 

Taking stock of all we have, we know that there is some n where:

and

and

and 

and, by our construction process for the  sequence,

So now we can go:

But we just showed , a contradiction. Our one assumption that we made was that there could be a set of infradistributions that was both precompact and that failed to meet the shared b-uniformity condition. Therefore, if a set of infradistributions is precompact, it must fulfill the shared b-uniformity condition.

Because we've shown that precompactness implies a Lipschitz bound and shared compact-almost-support in part 1 of the proof, and that precompactness implies the shared b-uniformity condition, we have one direction of our iff statement. Precompactness implies these three properties.

Now we'll go in the other direction and establish that if these three properties are fulfilled, then every sequence of infradistributions has a convergent subsequence.

So, let's say we have some set of infradistributions  that fulfills the following three properties:

(this is bounded Lipschitz constant)

(this is shared almost-compact-support)

(this is the b-uniformity condition)

Note that  is  but you chop off all the points in it with  and regenerate it via upper-completion.

First, the compact almost-support condition and bounded amount of measure (and closure) are necessary-and-sufficient conditions for a set of measures to be compact. Thus, letting  be defined as:

(ie, measures where the measure outside of the compact set  is  or less, for all , and the amount of measure is upper-bounded by , where that sequence of compact sets and measure upper bound came from the relevant sequence of compact sets and measure upper bound on the set , from the fact that we assumed a Lipschitz upper bound and shared compact-almost-support for it).

We know that  is a compact set. All the measure components of all the points in all the  lie in this set. Thus, all sets  can be thought of as being a subset of the space 

In particular, all our  (from our arbitrily selected sequence) are a subset of this space.

Now, here's what we do. Fix any . From the b-uniformity condition on the , there is some quantity  where

What we're going to do is find a subsequence of the  sequence where the  sequence converges in Hausdorff-distance.

Here's how to do it. We can take each  and chop it off at a  value of , to make a closed set  which is a subset of 

Which, being a product of two compact sets, is compact. Further, the space of compact subsets of a compact space (equipped with a Hausdorff distance-metric) is compact. So, we can isolate some subsequence where the  sets converge in Hausdorff-distance. If sets converge in Hausdorff-distance, their upper completions do too, so we have isolated a subsequence of our  sequence where the sets  converge in Hausdorff-distance. Also, each  infradistribution set is only  Hausdorff-distance away, at most, from the corresponding . So, for sufficiently large , the  subsequence we picked out is all wandering around in a ball of size .

Now, here's what we do. Start with your  sequence. Use the argument we described above for  to isolate a subsequence where the Hausdorff-distance of the subsequence eventually is wandering around in a ball (w.r.t. Hausdorff-distance) of size 2 in the tail. Now, use the argument for  to isolate a subsequence of that wandering around in a ball (w.r.t. Hausdorff-distance) of size 1 in the tail. And, y'know, repeat for all finite , to get a subsequence embedded in all previous subsequences which, in the tail, is wandering around in a ball of size  in the tail.

Now build one final subsequence, which takes the first element of the  subsequence, the second element of the  subsequence, the third element of the  subsequence, and so on. It eventually enters the tail of the sequence for all finite , so, regardless of , the tail of that sequence starts wandering around in a ball of size . Thus, the sequence is actually Cauchy, and must converge, as we've previously shown that the space  is complete in the KR/Hausdorff metric.

Assuming the three conditions on a set of infradistributions has let us show that every sequence has a convergent subsequence, and thus must be precompact, so we have the reverse direction of our iff statement and we're done.

 

Proposition 47: When  is a compact Polish space, the spaces of cohomogenous, crisp, and sharp infradistributions are all compact in  equipped with the infra-KR metric.

So, from Proposition 46, necessary-and-sufficient conditions for a set of infradistributions to be compact is:

1: Bounded Lipschitz constant/bounded amount of measure on minimal points. 1-Lipschitz, C-additive, cohomogenous, crisp, and sharp infradistributions fulfill this because of their iff minimal point characterizations.

2: Shared compact almost-supports.  is compact by assumption, and it's the whole space so it must be a support of everything, and thus an -almost-support of everything, so this is trivially fulfilled for all infradistributions when  is compact.

3: b-uniformity. Homogenous, cohomogenous, crisp, and sharp infradistributions fulfill this because they all have their minimal points having , and the condition is "there's gotta be some  value you can go up to in order to have a guarantee of being within  of the full  set in Hausdorff-distance if you delete all the minimal points with a higher  value, for all ".

Thus, cohomogenous, crisp, and sharp infradistributions fulfill the necessary-and-sufficient conditions for precompactness, and all we need is to check that the set of them is closed in the KR-metric. 

To do this, we'll invoke Proposition 45, that: If a sequence of infradistributions  converges to  in the infra-KR distance, then for all bounded continuous functions .

The characterization for cohomogenity was that  So, we can go: 

Showing that the limit of cohomogenous infradistributions is cohomogenous, and we've verified closure, which is the last property we needed for cohomogenity.

The characterization for crispness was that:  for . To show it's preserved under limits, we can go:

Showing that the limit of crisp infradistributions is crisp, and we've verified closure. Sharpness is a bit more tricky.

Let's say a sequence of sharp infradistributions  limits to , and all the  are associated with the compact set . The minimal points of the  consist of all probability distributions supported over , with a  value of 0. Thus, all the  sets can be written as , and so, if they converge in Hausdorff-distance, then the sets of probability distributions  must converge in Hausdorff-distance, which is impossible if  don't converge in Hausdorff-distance, because the dirac-delta distributions on points in the  sets can transport a failure of Hausdorff-convergence of the  sets up to a failure of Hausdorff-convergence of the  sets of probability distributions.

Thus, the  converge to a compact set  in Hausdorff-distance.

We also know that, because sharp infradistributions are crisp infradistributions, and crisp infradistributions are preserved under limits, all we have to check is if the minimal points of  consist exactly of all probability distributions supported over . Now,  is the closed convex hull of all the dirac-delta distributions on points in , and all those points have a sequence from the  that converge to them, so the associated dirac-delta distributions converge and witness that all the dirac-delta distributions on points in  are present in the set . So, because infradistribution sets are closed and convex, all of  must be present as minimal points in . Now we just need to rule out the presence of additional points.

Let's say we've got some probability distribution  which is not supported entirely on , there's  probability mass outside that set. Because probability distributions in Polish spaces have the property that the probability on open supersets of the set of interest can be shrunk down to have arbitrarily similar measure, we can find some open superset of , call it , which has  probability mass outside of it. Any point outside of  must be some  distance away from , because otherwise, you could pick a sequence of points in  which gets arbitrarily close to the (closed) complement of , find a convergent subsequence since  is compact, and you'd have a limit point which is in  (due to closure) and also in the complement of  (due to getting arbitrarily close to said closed set), disproving that the two sets are disjoint (because  is a superset of )

Ok, so our hypothetical "bad" probability distribution has  probability measure at a distance of  or more from our set of interest, . The KR distance is equivalent to the earthmover distance, which is "how much effort would it take to move this pile of dirt/pile of probability mass into the other distribution/pile of dirt".

All minimal points in  must have a sequence of minimal points in  limiting to them, because it's the Hausdorff-limit of those infradistributions. So, we've got some sequence  limiting to our hypothetical bad distribution , but all the  lie in .

There is some  value where , and also where . Now, we can get something really interesting.

So, we agree that  has  probability mass a distance of  or more away from the set , right? This means that the earthmover distance from  to any point in  must be  or more, because you've gotta move  measure a distance of  at the very least.

However, the earthmover distance from  to  is strictly below , and because , it's only got an earthmover distance of less than  to go to arrive at a probability distribution in , because all dirt piled up in  is only  distance away from . So, the distance from  to  is only 

distance. But we know it's impossible for it to be any closer than  distance from that set, so we have a contradiction, and no such  can exist in . Thus,  has all the probability distributions over  and nothing else, so the limit of sharp infradistributions is sharp, and we're done.

0 comments

Comments sorted by top scores.