LBIT Proofs 7: Propositions 48-52

post by Diffractor · 2020-12-16T03:31:23.768Z · LW · GW · 0 comments

Contents

No comments

 

 

Proposition 48: A function  is an infrakernel and fulfills the property:

iff the function  is bounded and continuous when  is equipped with the IKR metric.

So, our first direction is to assume that the relevant function is bounded and continuous, and show that it fulfills the infrakernel properties and the one extra compact b-uniformity property. The second direction is to show that, assuming the b-uniformity property and being an infrakernel, the function is bounded and continuous.

Anyways, so let's assume the function  is bounded and continuous. Ie,

And, if  limits to , then  limits to  in Hausdorff-distance/the infra-KR metric.

First, for purposes of disproof, let's assume  doesn't have bounded Lipschitz constant/amount of measure present. That is,

Then, you can let  be any number you want, no matter how large, and find some  where  and  and . Now, let's look at the norm of the infradistribution . Plug in the constant function  and we have:

This is because  had  or more measure present. Then, making that substitution, we have:

And at this point we observe that the Lipschitz constant of a constant is 0, and the norm of a constant is just its absolute value, and  by assumption, to get:

Now we use our earlier inequalities and the absolute value (everything in our earlier inequalities is a negative quantity) to get:

But  can be unboundedly high, so our maximum norm is infinity, which we stipulated isn't the case. Thus,  does have bounded Lipschitz constant (this was derived from 's bounded IKR-norm and continuity).

Now, we just need to derive the other two conditions on an infrakernel from boundedness and continuity of .

If  is a continuous function , and  is an arbitrary compact subset of , then  must be a compact subset of , which, from our characterization of the iff conditions of compactness in Proposition 46, means that it fulfills the shared compact-almost-support property and the b-uniformity condition. Thus the function  fulfills the compact-shared compact-almost-support property, and the compact b-uniformity condition. Finally, for the pointwise-convergence condition of an infrakernel, continuity of the function  means that if  limits to , then  limits to  in Hausdorff-distance, so by Proposition 45, for all continuous bounded functions f,  limits to , and this is the last condition we need to call  an infrakernel.

So, bounded and continuous functions  imply that the function is an infrakernel and fulfills the compact b-uniformity condition. Time for the reverse direction. Let's assume that  is an infrakernel  and fufills the compact b-uniformity condition. For showing boundedness, there's some Lipschitz constant upper bound , and we can go:

And we have established boundedness. For continuity, it will take a more detailed argument. Let  limit to . Our task is to show that  limits to  in Hausdorff-distance/infra-KR distance.

Now, the set  is compact, so by the conditions for an infrakernel, there's a bound on the  value of the minimal points of the  infradistributions, and the  sets have shared compact-almost-supports. We're also assuming the compact b-uniformity condition, so the  sets have the b-uniformity condition on them. By our necessary and sufficient conditions to be precompact, we have that the set of points  is precompact in , so we can isolate a convergent subsequence of the sequence of infradistributions K.

What we'll do is show that all convergent subsequences of  have the same limit point, namely . Here's the argument for it. Index the convergent subsequence with  and have  be the limiting infradistribution of it. From the pointwise convergence condition on an infrakernel,

Thus, our arbitrarily generated limit point  must perfectly match up with  and be identical to it, so since there's only one limit point, the sequence  actually converges to . So, because convergent sequences in the input induce convergent sequences in the output, the function  must be continuous, and we've shown the other direction, that infrakernels fulfilling the compact b-uniformity condition must be bounded and continuous.

 

 

Proposition 49: The space  equipped with the IKR-topology is a Polish space if  is too.

So, we've got a metric, the infra-KR distance ie the Hausdorff-distance. We know that the notion of convergence remains independent of which metric on  these metrics are being induced by, so the topology is unchanged, and they're all metrizations. We also know from Proposition 43 that  is complete under any infra-KR distance ie Hausdorff-distance. So,  is completely metrizable by any infra-KR distance induced by a complete metric on , all that remains is to show that it's a separable space, we need to find a countable dense subset.

We know that a single point is compact, and we also know that compactness implies that for all , there is a  value where  by Proposition 46, b-uniformity.

So, the space of infradistributions with the  values of their minimal points having an upper bound is dense in the space of infradistributions. All we need to do now is to have a dense countable subset of the space of infradistributions with upper bounds on the  values of their minimal points.

The way we'll be doing this is taking a dense countable subset of the space of a-measures. There are only countably many ways to choose finitely many points from this subset, and a choice of finitely many a-measures can be taken to define an infradistribution by convex hull, closure and upper-completion. Assuming the fiddly details work out, we'll have countably many infradistributions which are (hopefully) dense in the space of infradistributions with bounded  values, which are dense in the space of infradistributions, so we've got our dense countable subset.

There's still much to do. We need to show that there's a dense countable subset of the space of a-measures at all, we need to deal with technicalities involving normalization to show that we always get an infradistribution by this process, and we need to show that given any infradistribution with bounded  values, we can make an infradistribution of this form arbitrarily close to it in Hausdorff-distance.

For our first part, the cone of a-measures can be viewed as , and you can also view the space of measures over  as  (a pair of a distribution and a scaling term). This isn't quite right, because there's an identification of everything when the amount of measure is 0, but this doesn't really matter. So, we're trying to find a dense countable subset of

Now, when  is a Polish space,  equipped with the KR-metric is also a Polish space, so there's a dense countable subset. You can take the product of this with the dense countable subset of  consisting of the rational numbers 0 and above, to get a dense countable subset of the cone of a-measures.

For the normalization technicalities, we need to make sure that whenever we pick finitely many points from our dense countable subset, there's a point with , a point with , and all points have , in order to have normalization work and your batch of points define  an actual infradistribution. This can always be done, throwing out the "bad" choices of finitely many points still leaves you with a countable set.

Finally, given an infradistribution  with its minimal points all having , can we make infradistributions arbitrarily close to it in Hausdorff-distance of this form where we just specify finitely many points?

Well, yes. Because of the compact-projection property of infradistribution sets, and truncating at , we can see that all the minimal points of  lie in a compact set. So, given any , we can cover  (well, the part with  values below ) with -sized balls centered at its points, get a finite subcover from compactness, and then use denseness of our countable subset to take our finitely many balls with center points and pick something from the dense countable subset in each ball in order to get: A collection of finitely many points from the dense countable subset where each point in  with  value below  is  distance away from one of these finitely many points, so the closed convex hull of these finitely many points is only  distance away from  (but truncated at ), and then when upper completion is done, they stay only  distance away from each other.

So, given any infradistribution with bounded  values, and any , we can find finitely many points from a dense countable subset of the space of a-measures, which induces an infradistribution that's only  distance away. Because there are only countably many ways to pick finitely many points from a countable set, we have a countable set of infradistributions where any infradistribution has infradistributions of this restricted form arbitrarily close to them, and so we have separability, the last condition we need for Polishness.

 

 

Proposition 50: The infra-Vietoris topology and the topology induced by the IKR metrics are the same.

This proof will proceed in three ways. The first is showing that this collection of sets declared-to-be-open in the infra-Vietoris topology is closed under finite intersections, and thus forms a basis (instead of a sub-basis). The second way is taking any basis open set from the basis induced by the IKR metric (we'll be using the strongly equivalent Hausdorff-metric for this) and any point/infradistribution in it, and finding a Infra-Vietoris open set from the basis which is a subset of the open ball in the IKR metric. Finally, we taken any basis open set from the infra-Vietoris topology, and any point/infradistribution in such a set, and find a tiny open ball w.r.t. the IKR metric that fits within the infra-Vietoris open set.

So, here's what we're going to do: First, we're going to show that the intersection of two open sets from the basis is also in the basis. Obviously, we only need to show this for pairwise intersection, as that shows closure under finite intersection.

Consider the open set  of infradistributions to be generated by the finite family of bounded-b open sets , and the open set  of infradistributions to be generated by the finite family of bounded-b open sets .

Claim: The finite family of bounded-b open sets  and , for all the  and , is a suitable batch of open sets in the a-measures to perfectly replicate the  open set of infradistributions.

The properties for  are:

And, the properties for  lying in the set induced by the  and  families of open sets (upper completion of open set is open) is:

So let's get started on showing equivalence of these things. We'll use  for an a-measure, and use a subscript of  or  to denote which sort of set it's in.

First, let's take the conjunction of these two things.

This is equivalent to "for any , it is above some  and also above some ". Since the only notion of upper completion is adding the  term, this is equivalent to "for any , we are in one of two cases, either there is a  such that , or that "

Similarly,

Is saying "for any , we are one of two cases, either there is a  such that , or vice-versa". And so these two conditions are equivalent.

Also, if you have both of:

Then you trivially have

And in the reverse direction, if you have:

then you'll get both of

That just leaves the projection property.

If you have

then, let's say  is your arbitrary point in the closure of the projection of . This is saying  and . Again, by how upper completion is just adding , this is equivalent to "given an arbitary point  in the closure of the projection of , either  or "

And the condition

is saying "given an arbitary point  in the closure of the projection of , either  or ", which is clearly equivalent. So, we've got our bidirectional equivalence, and any binary (and thus finite) intersection of infra-Vietoris basis opens is an infra-Vietoris basis open, so it is indeed a basis.

Now we must show that this infra-Vietoris topology is identical to the topology induced by the infra-KR metric/Hausdorff distance metric. This can be done by taking any ball in any Hausdorff distance metric centered at a point/infradistribution, and finding an infra-Vietoris open set that contains the same point and is a subset of the open ball. Conversely, we should be able to take any infra-Vietoris open set, and any infradistribution/point within it, and find an IKR open ball centered on that point that lies entirely within the infra-Vietoris open set, showing that the two topologies are equal.

So, first, assuming we've got a Hausdorff-distance of  centered at some infradistribution, can we make an infra-Vietoris set that lies entirely within the ball and engulfs the center point?

Well, there's some  cutoff where the projection of the infradistribution  (cut off at that  value) is within Hausdorff-distance  of the projection of the entire infradistribution  to the measure component. Accordingly, we have one of our open sets be a -thickening of the chopped-off infradistribution, as this upper completion engulfs the entire infradistribution  within it with  room to spare. The rest is done by covering the compact chopped-off portion with all possible open balls of size , and using compactness of the chopped-off infradistribution to pick a finite subcover/finitely many open balls.  has the projection of its closure lie entirely within the -thickening of the lower part, has nonempty intersection with each open ball, and is a subset of the upper completion of the union of everything.

Now, our task is to show that any infradistribution that lies in the open set induced by that finite collection of open sets lies in the Hausdorff-distance-of- ball centered at . So, let our arbitrary new infradistribution  be a subset of the union of the upper completion of all these open sets, and have nonempty intersection with each of them, and have its closure lie in the projection of the union of them. Note that the initial open set which engulfs the entire bottom portion has all the other open sets as subsets of it, so the union and upper closure of the opens is just the upper completion of the "big" open set. Any point in  must lie in this "big" open set, and it's made by thickening up  (but chopped off) and taking the upper completion, witnessing that the distance from  to  is  or less.

For the other direction, pick an arbitrary point in . It's within  of the upper completion of one of the finitely many balls used for the open cover. And because those center points are center points of -open balls which  must have nonempty intersection with, and  is upper-complete, it has a point of distance  or less from your starting point in . Thus, anything in this basis open set has Hausdorff-distance of  or less from the center point , so all open balls in the infra-KR metric topology can have their central point engulfed by an infra-Vietoris open set, which is a subset of the initial open ball. So we're half-done, we know that all open sets of the infra-KR metric are open in the infra-Vietoris topology.

In the other direction, we need to take an infra-Vietoris open set, and an infradistribution in it, and find some tiny hausdorff-distance from that arbitrary infradistribution that remains in the infra-Vietoris open set. Let  be your infradistribution, which is a subset of the union of the upper completion of the , and has nonempty intersection with each , and the closure of its projection is a subset of the projections of the .

Now, there's one important fact we have to establish first. It is impossible to have a sequence  that gets arbitrarily close to the complement of . Fix some sequence  and assume that this sequence gets arbitrarily close to the complement of 

One of two things can happen. The first thing that can happen is that  has a subsequence with bounded  value. Then, we can pick out a convergent subsequence of that subsequence because infradistributions, when cut off at any finite  value, are compact, and take the limit point , which lies in the set  due to closure, and yet has distance 0 from the exterior of , which, being the complement of an open set, is closed. So, we've got a point in  and also not in , which is impossible.

Therefore, the  sequence must have unboundedly large  value as  increases. This projects down to make an  sequence of measures, in the compact set , and we can isolate a convergent subsequence of those, with limit point . Now, something interesting happens. 

For the  sequence of a-measures, eventually the  value of them goes well past the largest  value of one of the finitely many bounded- open sets. And when we go a tiny distance away to a point  which lies outside of , then... well, it's got nothing from any of the  below it. And nothing from any of the  above it either, because the  value of  is too high since it's near . So,  projects down to some measure  which lies outside of . And since  is really close to  (in the closure of the projection of ) is really close to the exterior of the projection of , as witnessed by . And this is a closed set, because the complement of an open set is closed, and the projection of an open set is open.

This lets us show that our  limit point lies in the complement of , which is impossible, because  lies in  which is a subset of .

So our original assumption was wrong, and there is some  where any point in  is always  or more distance away from the exterior of . Further, for the , since , you should be able to pick said point, and find some -sized open ball you can stick around it that lies entirely within the open set .

Now, a Hausdorff-distance of  around , all infradistributions that close must lie in the same Vietoris-open set. Here's why. Since all points in your new infradistribution  are at most  distance away from a point in , and given any point in  you need to travel  distance to get to the exterior of  is a subset of that open set as well. 

Also, , because given the central points in the  sets, you can travel  distance to get to a point in , which lies in the -sized open ball around said point which still lies entirely within the . So any  that close must intersect the same collection of open sets.

Finally, the projection thing is taken care of by the same sort of Hausdorff-distance argument we used to argue that said  must lie in the union of upper-completions of the open sets. So, given any Vietoris-open, and any point in it, we can find an IKR-open which engulfs said point and lies entirely in the Vietoris open.

Therefore, the two topologies are the same.

 

 

Theorem 3: Given a crisp infrakernel   then if there's a  where, regardless of  and , there's a unique crisp infradistribution  where , and can be found by starting with any crisp infradistribution on , and repeatedly applying .

This theorem is going to require showing a lot of fiddly measure-theoretic stuff, and then once everything's all set up, the actual theorem itself falls pretty fast. So, we're going to start off by noting or proving three things needed to make progress.

The first is: "given any measurable function , and any probability distribution , and any , we have that there is a continuous function  s.t. ". 

This is an analogue of the celebrated result that continuous functions are dense in the measurable functions. I'm gonna skip small bits of it, to be reconstructed if you ask. Pretty much, we use the fact that  is separable in the KR-metric, and the results from this pdf to get a few results.  is separable according to the KR-metric, and  has all its output values in that separable space, so it's separably valued. Also s is measurable, so by proposition 1.8 in the paper,  is strongly measurable, so it's the pointwise limit of a sequence of simple functions. The actual proof of this fact goes through Theorem 1.5 in that paper, and looking carefully at Theorem 1.5, the simple functions constructed have their value in , because the proof is like "fix a dense countable subset of your space of interest, then modify  to be a simple function by rounding off the outputs to the nearest of a finite collection of points from that subset". Proposition 1.10 from that paper says that since  is strongly measurable, then given any  is strongly -measurable, and then proposition 1.16 from that paper says that  is Bochner-integrable because , because all the  are probability distributions. Further, part of the proof of proposition 1.16 from that paper shows that the -valued simple functions, due to their bounded norm, converge to  in the  norm by the dominated convergence theorem.

Anyways, at this point, we've shown that any measurable function  has simple -valued functions that limit to it, so the simple functions are dense in the measurable functions . Thus, at this point, we just need to show that we can make continuous functions  that are arbitrarily close to some simple function .

Let's begin. Our simple functions , from looking carefully at the proof of Theorem 1.5 from the linked paper, take the form of partitioning  into finitely many disjoint and covering measurable sets , and mapping each of them to some probability distribution . Let  be how many disjoint sets there are.

Thus, our task is to take a simple function , and find a continuous function  where  If we can do this for any , then it will show that continuous functions are dense in the simple functions which are dense in the measurable functions in 

So, here's what you do. Every probability distribution over a Polish space has the property that the measure of any measurable set  can be approximated from below by the measure of a compact subset . We have finitely many disjoint measurable sets , and we associate each of them with a compact subset  where .

Let's abuse notation a bit and have  be the minimum distance from  to a point in . Then, let  be defined as:

Admittedly, the distance may be 0, leading to , however, this will be considered to be 1. The reason this works and is continuous is because all the  are bounded away from each other. This is because they're all subsets of  which are disjoint, so all the  are disjoint, and any two compact sets which are disjoint must be bounded away from each other, because otherwise you could fix a sequence in one compact set getting closer and closer to the other compact set, isolate a convergent subsequence, and it'd limit to be distance 0 from the other compact set, and so in it, contradicting disjointness. Now, we can go:

And then, for the next step, we observe that when  is like "well it's in , I'll return  as my output", and  is like "well, it's distance 0 from , so I turn into  by convention". Further,  always returns probability distributions, and the maximum KR-distance two probability distributions can have (when the original space  has its metric bounded above by 1, which can always be imposed without altering the topology) is 1. So, we get

And we're done. We showed that any measurable function  could be arbitrarily closely approximated in  by a simple function , and that any simple function could be arbitrarily closely approximated in  by a continuous function .

The next fact we'll need is that, when  has its distance metric bounded above by 1, which can always be done without altering the topology, then the KR-distance on probability distributions is exactly equal to the Earthmover distance, the first Wasserstein metric.

The third and final fact we need is that, if you have an function  (function to the space of compact subsets of ), which is continuous w.r.t. the Hausdorff metric, and always produces compact convex sets, then the function  (defined relative to some continuous function  and some ), defined by:

Is lower-hemicontinuous. 

The requirement for being lower-hemicontinuous is if you have some sequence  limiting to , and some , then you should be able to find some sequence of points  where  limits to .

First, we'll need that, if  limits to , then  is a compact set. We know that  is continuous and  is compact, so Lemma 3 steps in to show that said set is compact.

The relevant implication of this, when paired with Hausdorff-continuity of , is that you can take any sequence of distributions , and find a convergent subsequence that converges to something in .

The second piece for this argument we'll need is that:

So, first, we observe that since all the  are compact, we can find actual minimizers . We don't know that the limit actually exists, so pass to a subsequence indexed by  where we're taking the liminf. Pass to another subsequence where the  actually converge to a point in , which we can do by the immediately preceding point about compactness. Then,

This was done by  being continuous, so  converges to , and .

Second, fix some minimizer , and by Hausdorff-continuity of , we can find a sequence  which limits to it. Then,

And at this point, we can go:

So, we have 

Alright, that's part two of the argument for part 3 of the preliminaries for this theorem down. Time for part 3 of the argument. We'll assume that  limits to  limits to , and pick an arbitrary  s.t. , and

Let  be some point in  that's as close as possible to . Obviously, these limit to , due to Hausdorff-continuity of . Non-obviously, a tail of them (all but finitely many), will lie in . Here's why.

Let  be:

There is some finite  by which you are guaranteed three things forever afterwards.

First, , because  is Hausdorff-continuous and  limits to .

Second, , because  is continuous and  limits to .

Third, 

Because we know that this minimum-distance quantity limits to the minimum-distance quantity of the limit, we showed it above. Now, at this point, we can go:

This was splitting the distance up, observing that  was selected to be as close as possible to , so it'd be as close or closer than the Hausdorff distance between the sets they were selected from, using that two of the distances were small, using how  was defined, and using our bound on the minimum-distance quantity. Anyways, this shows that forever after some timestep, 

And therefore, since

We have that

Time for our final part! Showing lower-hemicontinuity of  requires showing that, for all sequences  limiting to , and all , we can find a sequence  limiting to . We just showed that for  fulfilling the defining inequality. But we took the closure, didn't we? What if our  was added in the closure? Then we can take a sequence  (but without the closure part) that does fulfill the defining inequality, and do a thing where we make a sequence  which limits to , one sequence for each . Then, we basically proceed for a while in  until we have a guarantee that  will remain close to , and switch to picking from the sequence , and repeat, slowly ratcheting up the second index over time. Said sequence will get arbitrarily close to the  values, which get arbitrarily close to  itself, so this "step up the sequence index when you've waited long enough" sequence can limit to 

And with this argument, we know that  is lower-hemicontinuous.

Alright, now we can start embarking on the true proof instead of just setting everything up. Remember, we assumed that there was a crisp infrakernel , all the  have compact sets of minimal points (by CAS), and that there was some fixed  where  regardless of  and , and our goal is to show that there's a unique stationary crisp infradistribution which can be found by selecting any crisp infradistribution and repeatedly iterating .

There's a bit of setup to do first, but thankfully it isn't as demanding as we've done before. First, we'll abuse notation a bit by using  and  and  and  to refer, not to the infradistribution as a whole, but to their compact convex sets of minimal points, ie, sets of probability distributions.

Second, we recall from the proof of proposition what the form for  (the pushforward of ) is. Well, actually, the set of minimal points. It is:

Ie, the closed convex hull of the union of mixtures of the sets of minimal points of . Mixing sets with regards to a probability distribution is done by fixing a measurable selection function  s.t. for all  is a probability distribution, and then you just union all those points together for all the possible selection functions to make your set .

Our final thing to recall (well, actually, look up from Wikipedia), is that since the Earthmover distance/1st Wasserstein metric/KR-distance is all the same, we'll be using an alternate formulation of distance. If  are probability distributions over , then

Ie, of all probability distributions over , restrict to those which make  and  when you project it down to the appropriate coordinates, and try to pick one that makes the integral of the distance function as low as possible. The lowest integral of distance you can get is the earthmover distance. You can check Wikipedia for this formulation. We'll be using this form once.

Let's begin. First, by the form of the set of minimal points of the pushforward,

And then, we observe that taking the closure doesn't affect the Hausdorff distance at all, so we can go:

The reason for that inequality is that convex hull doesn't increase the Hausdorff distance. If you've got a finite mix of points from  in the convex hull, then you could just take each of those, find the closest point from the analogue for , and mix those together to make a point close to your thing added in the convex hull, so distance doesn't go up when you do convex hull.

At this point, we have the inequality:

Our mission is to impose a suitable bound on the second term, and once we've done that, show that  has type  (maps compact sets of probability distributions to cmpact sets of probability distributions), and then we can apply the Banach fixpoint theorem on  to get a fix-set of probability distributions that doesn't alter when you apply  (pushforward), and can be found by iteration.

For our first part, imposing a bound on

We shall do the following. We will pick a point in the first set, and find a nearby point in the second set. A point from the first set must be associated with some , and some measurable selection function  where, for all . Said point is written as , or alternatively, as . Our task is to find a point in the second set.

Construct it as follows: Select  to be as close to  as you can get. Pick an . Then, pick some  s.t. the projection of  to the first coordinate is  and the projection to the second coordinate is , and:

Such a  can always be found. Pick a continuous function  such that

We can always pick such a function because we've already shown that continuous functions are dense in the measurable functions.

At this point, recall our function  defined as:

We know that said function will be lower-hemicontinuous. Further, it always produces closed sets (we explicitly took the closure), and they're nonempty (just look at it a bit), and are convex. Convexity is a bit more involved, and it occurs because a mix of points a certain distance or less from some set point ( in this case), will also be the same or less distance away from said point, and the  are all convex. At this point, we can invoke the Michael selection theorem to get a continuous function  where, for all .

At this point, remember our distribution  which projects down to ? Well, by the disintegration theorem (the usual probability theory version),  can be written as  where  gives the conditional probability distribution over  if you see some .

And now we can finally define our measurable selection function , it is defined as:

And, our final point that we will pick to be close to  will be: .

Uh... Is  even a legit selection function? What we need is that for all . Well, fear not.

And, because  is closed and we took the closure of a batch of points within it, we have:

And, because  is closed and compact, making  by integrating over a bunch of different choices of  to plug in just makes the average of a batch of points within , which lies in , so we're good,  regardless of  and it's a legit selection function. Now, we can get into inequality shuffling!

And then we can notice something really interesting. We can rewrite that final integral as

Look familiar? That's just the semidirect product!! It's the exact same as

And, by our construction, . So, writing this as an integral, we have:

Also, because  projects to  in the first coordinate, we can write our integral over  as an integral over  instead, it doesn't change the value.

And move the norm inside to get:

We know that  and  are close w.r.t. , so we can get:

Now, we should observe something interesting. Because , and using how  was defined, we have that:

Using the equivalence between distance of two points and the norm of subtracting the two points, we have:

And then we can use another distance inequality to go:

And now, since the latter term isn't affected by the inf, we can pull it out, and get:

Again, the projection of  to the first coordinate is , so we have:

And, because  and  are close w.r.t , we have:

And then... , and  is as close as possible within , so we have:

And we know that, as one of our starting assumptions,  where  is a constant below 1. So,

But wait, we selected \xi such that:

So, making that substitution, we have:

And this is just the definition of earthmover distance, so we have:

And,  was selected from  to be as close as possible to , so we've got one last inequality.

Our final inequality, putting all this together, is that

So, given an arbitrary point in , we can craft a point in  that isn't too far away, and switch and  to get:

But wait,  was arbitrary! So we can shrink it to 0 to get:

Putting this bound together with what we were working on at the start, 

Or, alternately,

So,  is definitely a contraction mapping. But we do need to check that if  is a compact set of probability distributions, then  is compact as well. Fortunately, this is doable because the pushforward of an infradistribution is an infradistribution, and you can chop off an infradistribution at any  level and get a compact set of a-measures. So if we chop it off at 0, we get exactly what we want.

The Hausdorff-limit of a series of convex compact sets is convex compact, and we know that for all  that are compact convex,

So,  is a contraction mapping on this closed complete metric space (the space of compact subsets of a complete space is complete), and we can apply the Banach fixpoint theorem to find a unique fixset  s.t.  is a convex compact set of probability distributions, so it corresponds to a crisp infradistribution that can be found by taking any crisp infradistribution and iterating  (the pushfoward), and we're done!

 

 

Proposition 51: If  is the infradistribution corresponding to a probability distribution , then if , where the latter  is the conventional definition of Renyi entropy for probability distributions, and the former  is the definition of Renyi Entropy for infradistributions.

For an infradistribution, the Renyi entropy would be:

But it's a probability distribution corresponding to , so the  turns into an expectation.

Note that , it's a probability distribution. To solve the maximization, we will be using Lagrange multipliers. We maximize  subject to the constraint that . Thus, we make the function

and set all the partial derivatives w.r.t. the  and  to 0, so we get a series of equations of the form:

for the various 's, and for , we have:

And then, if we can derive what the various  and what  is, we can conclude that those maximize our function. Solving this yields:

Now, for the partial derivative w.r.t. , we have:

and for the partial derivative w.r.t. , we have:

So, these are legit. Now, when we substitute these values into our maximization problem, we get:

Which is the standard formulation of Renyi entropy for a probability distribution.

 

 

Proposition 52:  exists for all infradistributions , and equals 

Our task will be to bound the difference between 

and

as .

Well, for this endeavor, our first task is to bound the distance between the an arbitrary function  and the function , for sufficiently large .

Remember, the distance between these functions is 
Now, since all our functions are bounded in , we're raising a number below 1 to a power slightly below 1, so it actually stays the same or gets larger. So, we don't need the absolute value here.

Thus, to figure out what the largest possible value of this quantity is for a given , let's let  be our variable, and  be our constant. Then we differentiate, set to 0 solve the equation for  and substitute that back in to find out the largest possible value that this quantity can be, and thus the largest possible difference between the functions  and , as a function of .

Differentiating  w.r.t.  and setting it equal to 0, we must solve for  in the following equation:

The value of  that solves this equation is:

as can be seen by substituting this back in. This is our  which attains the maximum value (no weird stuff going on, as shown by graphing the function), and we're raising a number below 1 to a high power, so it's in , and thus a permissible function.

Substituting this back into , we get:

And, this is the maximum possible difference between the function  and , as a function of .

Therefore, 

Now, due to Lipschitzness of , in the  limit,

limits to 0 uniformly in . Therefore, in the  limit,

 limits to .

Therefore,  of these quantities limit to each other. And  limits to -1.

Accordingly, in the  limit,

limits to

and so, we can define the Renyi entropy for .

0 comments

Comments sorted by top scores.