LBIT Proofs 4: Propositions 22-28

post by Diffractor · 2020-12-16T03:38:02.959Z · LW · GW · 0 comments

Contents

No comments

 

 

Part 6: Showing the basic four infradistribution properties: Normalization, monotonicity, concavity, and Lipschitzness (specifically 1-Lipschitzness). For normalization, observe that regardless of ,

The limit of the all-1 sequence is 1 and the limit of the all-0 sequence is 0, so:

For monotonicity, let  Then, regardless of ,

This is because, since , the "extend with worst-case outputs" function is bigger for  than , and then by monotonicity for , the inequality transfers to the outside.

Accordingly,

Now for concavity. For all 

This was by monotonicity (minimizing over the two parts separately produces lower values, and monotonicity transfers that to the outside), and concavity respectively, because  is an infradistribution. And now:

Concavity is shown. Now for 1-Lipschitzness. Fix an arbitrary . Our proof target is:

Now, if we knew:

Then we'd be able to pair that with the 1-Lipschitzness of  to get our desired result. So let this be our new proof target. Reexpressing it a bit, it's:

Now, if two functions are only  apart, then their minimal values over a compact set can only be  apart at most. Let your compact set be  to see the result. Thus, we have proved our proof target and we have the result that, for all 

Since they're always only  apart, the same property transfers to the limit, so we get:

And so,  is 1-Lipschitz for all . This takes care of the Lipschitzness condition on infradistributions and the uniform Lipschitz bound condition on infrakernels. All that's left is the compact almost-support condition on infradistributions and the compact-shared compact almost-support condition on infrakernels, and the pointwise convergence condition on infrakernels.

Part 7: This will use our result from Part 4 about making a suitable compact set. Our desired thing we want to prove is that

Because this property is exactly the compact-shared compact-almost-support property for .

So, here's our plan of attack. Fix our  and , so our target is

Let our  be , as defined from Part 4, and let  be arbitrary in  and  be arbitrary and agree with each other on . Then our goal becomes

Now, letting our chosen defining sequence be , our goal is to show

If we could do this, all the approximating points being only  away from each other shows that the same property transfers to the limit and we get our result. Accordingly, let  be arbitrary, and our proof goal is now:

Hypothetically, if we were able to show:

Then we could conclude the two functions were equal on the set , which, from Part 4, is an -almost support for all the  where , and it would yield our result. So our proof target is now

Accordingly, let  be arbitrary within said set, turning our proof target into:

However,  and  are equal on the set , which breaks down as . And we know that , and since the second part is being selected from , the two values are equal, and we've proved our proof target. Thus, we have compactly-shared  compact almost-support, and that's the second condition for  to be an infrakernel, as well as the last condition needed for all the  to be infradistributions. Just pointwise convergence left!

Part 8: Alright, for showing pointwise convergence, we'll need a lemma. We need to show:

This is roughly saying that the rate of convergence of  to  is uniform over compact sets. Let's begin. Let  be arbitrary, so our proof target is now:

Here's what we'll do. Because we have a  and , we can craft our sequence  of compact -supports. Now, on the set:

which is compact, we know that  is uniformly continuous, so there's some  you need to go to to ensure that function values are only  away from each other, which translates into an  via . Any two inputs which only differ beyond that point will be only  apart and can only get an -difference in values from . Now that that's defined, let  and  be arbitrary. Our proof target is now to show:

We know from Part 3 that

So let's make that substitution, turning our proof target into

Assuming hypothetically we were able to show that:

Then, because  is an -almost-support for  as long as  (which we know to be the case), we could apply Lemma 2 to get our desired proof target.

The "Lipschitz constant times deviation over compact set" part would make one , and the "degree of almost-support times difference in the two functions" part would make  in the worst-case due to being an -almost-support and the worst possible case where the two functions differ by  somewhere.

So now our proof target is:

Thus, let  be arbitrary within the appropriate set, so our proof target is now:

Because we're minimizing over a compact set in both instances, we can consider the infs to be attained by a  and a , so our proof target is now

Now, because , and , we have:

And also, because , and ,

So, both inputs lie in the relevant compact set, and they agree on , so the inputs are only  apart, so they only have value differing by , and we're done, our desired result of 

goes through.

Part 9: Now, for showing our pointwise continuity property, fix a sequence  which limits to , and an arbitrary . Our task is now to show that:

Fixing some arbitrary  sequence, we could do this if we could show that

Now, because 

we can rewrite our desired proof target as:

Huh, we just need to swap our limits! And the Moore-Osgood theorem says you can swap limits while preserving equality if, for fixed ,

and (this part is harder)

limits to

uniformly in . However, back in Part 8 we established uniform convergence on any compact set. And  is a compact set! So we have uniform convergence and can invoke Moore-Osgood to swap the limits and get our proof target, showing our result, that 

So,  fulfills pointwise convergence and we've verified all properties needed to show it's an infrakernel.

Part 10: Showing that if all the  have some nice property, then  inherits it too. Homogenity, Cohomogenity, C-additivity, crispness, and sharpness.

Proof of homogenity preservation:

Proof of cohomogenity preservation: Let your  be a mere sequence of points , that's compact.

Proof of C-additivity preservation:

Proof of crispness preservation: Crispness is equivalent to the conjunction of C-additivity and homogenity and both of those are preserved.

Proof of sharpness preservation: So, this will take a bit of work because we do have to get a fairly explicit form for the set you're minimizing over. Remember that when  and  are crisp, with associated compact sets  and , then the compact minimizing set for  is , from the proof of sharpness preservation for semidirect product. Another way of writing this minimizing set of the semidirect product is as the set

Now, we'll give a form for the associated compact set of . It is:

For the base case, observe that when , it's

which works out perfectly. For the induction step, because

And we are, by induction, assuming the relevant form for the associated compact set of , and know how to craft the minimizing set for semidirect products, putting them together gets us:

And so, we have the proper set form for the compact minimizing sets associated with all the sharp infradistributions .

Our conjectured set form for the infinite semidirect product would be:

Let's call the projections of this to coordinate  as . All of these are compact, because it yields the exact same result as projecting

Down to coordinate , and we know this is compact because it's the compact set associated with , and projections of compact sets are compact. Note that since there's a dependence on earlier selected points, it isn't as simple as our  set being a product. But we do at least have the result that

So, since it's a subset of a compact set (product of compact sets), it's at least precompact. All we need is to verify closure to see that  is compact.

Fix a convergent sequence  limiting to , and all the  lie in

So, if this arbitrary convergent sequence didn't have its limit point be in the set, then we could conclude that:

However, we could imagine projecting our set (and convergent sequence) down to coordinates 1-through-n+1 and we'd still get that exact same failure of closure, but in the set

Ie, the compact set for , which, due to being compact, is closed, and it does contain that limit point, getting a contradiction, Therefore, 

Is indeed a compact subset of .

Yes, but can  be expressed as minimizing over this set? Remember, we're using  as an abbreviation for the projection of this set to coordinate i, and let's abbreviate the set as a whole as , and the projections of it to coordinates 1-through-n+1 as  which is the same as the minimizing compact set associated with  Then:

 is a compact set, so  is uniformly continuous over it so there's always some huge  where two inputs agreeing perfectly on the first  coordinates will produce values extremely close together. Then we can go:

And because  as a set agrees with  on more and more coordinates as  increases (which we know drives  closer and closer to its true minimum value), and is constantly narrowing down, we have a monotonically increasing sequence, with limit:

And thus, all the  are sharp. We're finally done!

Proposition 23: If all the  are C-additive, then 

So, 

Now, when , the inner function turns into

And, when , we'll use as our induction assumption that

and show the same result for

So let's begin rewriting this. By unpacking  as

we get:

And then, because  if , that chunk gets clipped off by the projection, and we have

And then, since there's no dependence on , it's treated as a constant, and due to C-additivity of , we can pull it out of the  to yield

And then, because adding any further coordinates would get clipped away by the projection, we can minimize over further coordinates, it does nothing, to get

and then by our induction assumption

So, past , the value of that inner function freezes, and it determines the limit. Therefore, we have the result that

Therefore,

Regardless of , and we have our result.

 

For further progress, we need to put in some work on characterizing how infrakernels work. We'll be shamelessly stealing a result about the infra-KR metric from proofs in the future of this post, because there's no cyclic dependencies that show up.

Lemma 4: For an infrakernel , if  limits to , then if you let  be the set
  then .

Interestingly enough, the only result we need to steal from later-number propositions is that, if a sequence of infradistributions  limits to  in Hausdorff-distance, then for all functions  limits to , which is proposition 45.

We'll be working backwards by taking our goal, and repeatedly showing that it can be shown if a slightly easier result can be shown, which eventually grounds out in something we can prove outright. So, our goal is: 

Now, we want  to be an arbitrary integer. Let  (distance metric between two infradistribution sets) be defined as:

Where  is the Hausdorff-distance between sets (the two truncated sets cut off at some  value), according to some KR-metric on measures. Which is induced by a metric on . It doesn't really matter which metric you use, any one will work.

Now, assuming you had the result:

Then, by the definition of that distance metric, and  being an integer, said distance converging to 0 would imply your desired 

by the definition of the modified Hausdorff metric. So, let's try to prove that the modified Hausdorff distance limits to 0. The best way to do this is by assuming that it actually doesn't limit to 0, and deriving a contradiction. So, let's change our proof goal to try to derive bottom, and assume that there is some  and some subsequence of  where  always stays  away from the set  according to the modified Hausdorff metric.

At this point, given any value of , we can notice that , due to the compact-almost-support condition on our compact collection of infrakernels (the  sequence), has, for all , some component set  where all the measure components of the  lie in a compact set. Also, due to the Lipschitz-boundedness condition, there's an upper bound on the amount of measure present. This is a necessary-and-sufficient condition for the measure components of the infrakernel family  to lie in a compact set of measures. Further, the  upper bound means that the last avenue to a failure of compactness is closed off, and all the , lie in some compact set of a-measures. Call the set that all the  lie in . Closure of the  sets, and being a subset of a compact set means that they're all compact. They can then be considered as points in the space , the space of compact subsets of , which, being the space of compact subsets of a compact set equipped with the Hausdorff distance metric, is compact.

By compactness, we can pick yet another subsequence which converges in , and we then get that the  converge on some subsequence. This argument always works, so we can find a subsequence where the  converge in Hausdorff-distance, and then a subsequence of that where the  converge in Hausdorff-distance, and so on, and take a diagonal (first element from first subsequence, second element from second subsequence, etc..)

And so, we eventually arrive at a subsequence of the  where, regardless of  is a Cauchy sequence.

So, we can find a subsequence where the  converge, regardless of what  is. Therefore, since

We have that on the subsequence,  is a Cauchy sequence according to . Also, all the  are staying  away from , according to .

Assuming, hypothetically, we were able to show that our Cauchy  sequence actually converges to , we'd have our contradiction.

What we'll do is show that the  sets do have a set, which we'll dub , that they do limit to according to , then we'll show that said set must agree with  re: the expectations of every function, and must be an identical infradistribution, so our convergent subsequence which stays away from  actually limits to it and we have a contradiction.

So, let's specify the sets. For each , let  be . (For our convergent subsequence, this set is always well-defined)

What we want to show is that:

Ie, our proposed limit set of the convergent  sequence (according to ) is . And if we can show that chopping this set off at any b value in particular makes the limit of the chopped-off sets, then the modified Hausdorff-metric limits to 0, and this is indeed the limit point of our  sequence according to the modified Hausdorff-metric.

One direction of this is very easy, we trivially have

For the other direction, fix a particular b where equality fails. Then there exists some point where it doesn't lie in . From this, we know that  and there is some  where , so we also have . Now, if , and yet , then there is some Cauchy sequence from the  that limits to , and eventually the  terms of this sequence will drop below b itself, and they will start all being present in , giving you a sequence of points in that sequence of sets that limits to , witnessing that said point lies in , the limit of . However, what if ? Then, there's a Cauchy sequence from the  that limits to , and eventually the  terms of the sequence will approach  itself, and then mixing a little tiny bit with some point in  with , to make a point nearby which still undershoots the cutoff, and this sequence still limits to , again witnessing that said point lies in . That's both cases down, so we've shown

For all the . Accordingly, we now know that the  sequence limits to this set.

Now, we just have to show that said limit set equals , in order to derive our desired contradiction. We do this by, letting  be arbitrary, and  be the Lipschitz constant of the infrakernel,

So, we've got six equalities. Equality 2 is what we just showed, equality 5 is because the limit for a subsequence is the same as the limit for the original sequence, and equality 6 comes from the pointwise-convergence property for infradistributions, so that leaves equalities 1, 3, and 4. Equalities 1 and 4 can be dispatched by observing that there's some fixed  upper-bound on the Lipschitz constant/amount of measure present in the infradistribution points, so if there was a minimizing point for the expectation of  in any of these infradistributions with , the expectation value of  would be so high it'd violate the Lipschitz bound on the infradistribution. Thus, clipping off the  value at this height doesn't change the expectation of the function . Finally, there's equality 3, which is addressed with Proposition 45, because the upper completions of said sets limit to each other in Hausdorff-distance.

We have now derived a contradiction, so our desired result of  for all  holds. We'll be using this.

Proposition 24: 

Our attempted set definition  will be:

So, we should be precise about this. We start with a  where  is actually a measure. Then we fix a selection function mapping x to a point in . Due to  and the cone of a-measures being separable, weak measurability and strong measurability coincide, as do the Bochner and Pettis integrals, so we can just talk about "measurability" and "the integral". Said choice function, due to lying in  is measurable (in this case, we're equipping  with the Borel -algebra). Further, said selection function doesn't "run off to infinity", ie, . This measurability and being  integrable in a suitable sense means the Bochner integral is well defined, so  is well-defined and does indeed denote a unique point.

Now that we know how the selection function behaves, we should clarify what  means. Semidirect products are technically only defined when  is a Markov kernel. But we're working with measures, so we can lift the restriction that for all  is a probability distribution. So, we need to just verify the first measurability condition on the Wikipedia page for "Markov kernel" instead. But wait,  needs to be a measure for this to work! Instead it's an element of  instead! Well... we can consider that last coordinate to be "amount of measure on a special disconnected point", and  is then isomorphic to . Taking this variant, the semidirect product (assuming the measurability condition) would then be a measure in . Then we just apply the projection mapping , which maps  to  and maps  to our special disconnected  point. So now we have something in , which, again, is isomorphic to . That's basically what's going on here when we take semidirect product w.r.t. a selection function, there's a couple isomorphisms and type conversions happening in the background here.

So, first, to show that this is even well-defined, we need to verify the relevant measurability condition for  to do semidirect product with it. Said measurability condition is "for all Borel  is a measurable function".

Now, here's the argument. We can split this up into two parts. First, there's the function  of type . and then, there's the function , of type . Our function we're trying to show is measurable is the composition of these two. So if we can show they're both measurable, we're good, we've verified the condition. We immediately know that the first function is measurable, because our selection function has to be. So, we just need to check the second condition. The tricky part is that we've got the weak topology on , not the strong topology. The definition of a measurable function is that the preimage of a measurable set in the target is measurable in the source. The weak topology has less open sets, so it's got less measurable sets, so it's more difficult for a preimage of a measurable to be measurable. Fortunately, by the answer to this mathoverflow question (and it generalizes because the answer didn't essentially use the "probability distribution" assumption in the setup, they only used properties of the weak topology),  is indeed measurable with the -algebra induced by the weak topology on the space of finite signed measures if  is Borel-measurable. So, the composition is measurable, and we've verified the condition to make a legit semidirect product. 

Now, let's show that this set has the same expectation values as the semidirect product as defined for the functionals.  was defined as

This is the definition of our set. Let's minimize a function over it.

Now, we can minimize over  and our selection function seperately, the selection function must come later since it depends on . Further, the  component of  can be written as , because we're folding all the measure over  into the value of a single point. The downwards arrow is restriction. We'll abbreviate  as  for readability.

Then, we can unpack the value of a semidirect product evaluating a function like our usual thing, producing...

Now, hypothetically, if

then we could proceed further. So let's show that. First, observe that if we replace  with , the above line is true, because the selection function is always going to pick (for ), a point from , which will have a value as-high-or-higher than the worst-case point from . So, regardless of selection function, we have the  version of the above line being true. So now we must establish that having a  in the above line is impossible. Given any , we'll craft a selection function where the two sides differ by only , and we can let  limit to 0 as everything else is fixed, yielding our result.

Our fundamental tool here will be the Kuratowski-Ryll-Nardzewski measurable selection theorem to measurably choose a suitable point from each , getting us our selection function. We do have to be careful, though, because there's one additional property we need. Namely, that , for said selection function to be .

Our multifunction for KRN-selection (given an ) can be regarded as being split into two parts. So, given our  from earlier, there must exist some compact set  which accounts for all but of its measure.

Accordingly, our multifunction  for KRN-selection will be:

If , then

If , then  is the set:

This is basically "on a compact set, act in a nearly worst-case way, and outside the compact set, just pick some point with bounded  value".

To verify the conditions to invoke KRN-selection... well, the first one is that each point gets mapped to a closed (check) and nonempty (check) set.

In order to make KRN-selection work, we need a measurability condition on our multifunction . In order to set it up, we need to show that our multifunction has closed graph for . Ie, if  limits to , and each , and  limits to , then .

To be in  at least, observe that any infradistribution can be characterized as "a point is included as long as it exceeds our worst-case  values for all the ". Fixing a particular , the  limit to . And the  limit to . Thus, regardless of , because each  lies above  (by ). This certifies that, regardless of  (the worst-case value), so .

We still have to check that in the limit,  fulfills the "almost-worst-case" defining inequality in order to lie in . To work towards this result, we'll take another detour.

If  limits to  and  limits to  uniformly on all compact sets, then  limits to , because it gets arbitrarily close on the -almost-supports for the family , for arbitrarily low  (Lemma 2). Also,  limits to  by pointwise convergence for an infrakernel.

Further, we'll show that if  limits to , and  limits uniformly to  on all compact sets, then  limits to . We obviously have convergence of the  terms, so that leaves the measure terms. For sufficiently large , the gap between  and  shrinks to 0, because compact sets of measures induce compact almost-supports for all , and we have convergence on compact sets. Also,  limits to , so that checks out.

Now that we have this, we must show that .

Well, since f is continuous, then on the compact set  (where  can be any compact subset of ), it's uniformly continuous, so the sequence  limits uniformly to . on . Ie,  limits uniformly to  on all compact sets since  was arbitrary.

Thus, by our earlier two results, we have  limiting to  and  limits to . And the former sequence is always within \eps of the latter sequence, so the same applies to the limit points. Thus,  fulfills our final condition of approximately minimizing .

Alright, so we've verified that the function  maps each  to a closed nonempty set, and it has closed graph when restricted to  Now, which condition do we need to check to invoke KRN-selection? The precise condition we need is that for every open set  in , the set of  where  is measurable.

Let's think about that. We can view the graph of our multifunction as a subset of
. Remember it's divided into two parts, one for  and one for not. Also, we can take our open set , extend it back to get an open set of 

intersect with the graph of our multifunction, and project down to the space , and we want to show that said projection is always Borel.

Said projection is the projection of the intersection of the open set with the two "chunks" ( where , and  where this is not the case), so it's the union of two projected sets. If we can show that the projection of the intersection of  with these two "chunks" is measurable, then the net result is measurable, because the union of two measurable sets is measurable.

For one part of it, we'll be showing that the projection of 

Is a  set, a countable union of closed sets, which is measurable. Here's how it works. That set that you're intersecting with ? Well, it isn't just closed (we know that already, we showed closed graph), it's compact. As you can clearly see, projecting it down to , it lands in a compact set. So, we just need to show that projecting it down to  lands in a compact set, and then we can go "huh, this set is a closed subset of the product of two compact sets, ie, compact, so it must be compact."

Necessary-and-sufficient conditions for a set of a-measures to be compact are: Bounded amount of measure present ( is selecting from  and the  sets have a uniform upper bound on the amount of measure present in them), compact almost-support for all the measure components of all the  for all  (works because of the equivalence between CAS and the measure present in points in the infradistributions, and the compact-shared compact-almost-support property of an infrakernel, so the various  do have all their measure components lying in a compact set because the  are selected from a compact set), and bounded  value (the finite Lipschitz constant and finite norm of the function of interest is incompatible with approximately minimizing your function with unboundedly large  values).

So, the a-measures of the various  are contained in a compact set.

Where were we? Oh right, showing that the projection of 

Is a  set. Pretty much, by the previous argument, we know that the set we're intersecting with  is a compact set. Also, in Polish spaces, all open sets can be written as a countable union of closed sets, so this set as a whole can be written as a countable union of (closed intersected with compact) sets. Ie, a countable union of compact sets. The projection of a compact set is compact, so the projection of the set is a countable union of compact sets (ie countable union of closed sets) and thus is .

Ok, that's part of things done. We still have to show that the projection of the set

Is measurable. We'll actually show something a bit stronger, that it's an open set. Here's how. You can consider  to be a function of type  (the space of compact subsets of a-measures) defined as , which matches up with how  is defined outside of the compact set of interest. The projection of our set of interest is:

Or, it can be rephrased as:

The complement of a closed set is open, so if we can just show that

is open, we'll be done.

Now, the topology that  is equipped with is the Vietoris topology. Ie, the basis of open sets of compact sets is given by a finite collection of open sets in , and taking all the compact sets which are a subset of the union of the opens and intersect each open.

Letting your two open sets for a Vietoris-open be the whole space itself, and the set , this shows that the set of all compact sets of a-measures where they have nonempty intersection with  is open in . So, our set of interest can be viewed as the preimage under  of an open set of compact sets. Further  is a continuous function by Lemma 4, the Hausdorff-distance induces the Vietoris topology, so the preimage of an open set is open.

So, the set of  overall where, for a given open set  is nonempty, is a measurable set, and we can now invoke the Kuratowski-Ryll-Nardzewswki measurable selection theorem and craft a measurable choice function .

Said measurable choice function never picks points with a norm above certain bounds (it always picks from , so there's a uniform bound on the amount of measure present, and the b value is either 2 or less in one case, or upper-bounded in the other case because too high of a b value would lead to violation of the Lipschitz constant), so it's in  and we can validly use it. And now we can go:

At this point, we split m into  (the measure component on the compact set that makes up all but  of its value), and , the measure component off the compact set.

And noticing our conditions on our  multifunction that we selected from, in particular how it behaves off ,

That last part is because all but  of the measure of m was captured by our compact set, so there's very little measure off it. Proceeding a bit further, and using what  is for , particularly how close it is to the true minimum value, we have:

m is fixed for now, and this argument works for all , so we arrive at the conclusion that

So now, we can proceed with our sequence of rewrites (our last point was as follows:)

So we'll pick up from there.

And we're done, we've shown that the expectations line up, so we have the right set form of semidirect product.

Proposition 25: The direct product is associative. 

Proof:

Done.

Proposition 26: If  and  are C-additive, then  and 

That's one direction. For the second,

Done.

Proposition 27: 

Proof of well-definedness of the pushforward: We'll use the same notational conveniences as in the proof of well-definedness of the product.

Our attempted set definition  will be:

Where  equals

The usual considerations about choice functions and weak measurability/strong measurability coinciding, along with the Bochner and Pettis integrals, means we aren't being vague here. The choice function is measurable and doesn't run off to infinity. Fortunately, we don't have to do the weird type conversions and isomorphisms from the semidirect product case, or do measurability arguments.

And now we can go:

Now, hypothetically, if

We could proceed further. And we can run through the exact same argument as from the semidirect product case to establish this equality, I'm not typing it again. So at this, point, we're at

And the expectations match up and we're done.

Proposition 28: 

This is an easy one, , So then

0 comments

Comments sorted by top scores.