Proofs Section 2.1 (Theorem 1, Lemmas)

diffractor

Proofs Section 2.1 (Theorem 1, Lemmas)

post by Diffractor · 2020-08-27T07:54:59.744Z · LW · GW · 0 comments

  Lemma 19:
None
No comments

Fair upfront warning: This is not a particularly readable proof section. There's a bunch of dense notation, logical leaps due to illusion of transparency since I've spent months getting fluent with these concepts, and a relative lack of editing since it's long. If you really want to read this, I'd suggest PM-ing me to get a link to MIRIxDiscord, where I'd be able to guide you through it and answer questions. This post will be recapping the notions and building up an arsenal of lemmas, the next one [AF · GW] will show the isomorphism theorem, translation theorems, and behavior of mixing, and the last one [AF · GW] is about updates and the decision-theory results. It's advised to have them open in different tabs and go between them as needed.

With that said, let's establish some notation and mental intuition. I'll err on the side of including more stuff because illusion of transparency. First, visualize the tree of alternating actions and observations in an environment. A full policy $π$ can be viewed as that tree with some branches pruned off, specifying every history that's possible with your policy of interest. All our policies are deterministic. A policy stub $π_{s t}$ is a policy tree that's been mostly pruned down (doesn't extend further than some finite time $n$ ). A partial policy $π_{p a}$ is just any policy tree in any state of specification or lack thereof, from tiny stubs to full policies to trees that are infinite down some branches but not others.

$π_{\emptyset}$ denotes the empty policy (a stub) which specifies nothing about what a policy does, and $π_{\neg h}$ is some partial policy which specifies everything (acts like a full policy) everywhere except on history $h$ and afterwards.

There's a distance metric on histories, as well as a distance metric on partial policies. Both of them are of the form $γ^{t}$ where $γ < 1$ , and $t$ is the "time of first difference". For histories, it's "what's the first time these histories differ", for policies, it's "what's the shortest time by which one partial policy is defined and the other is undefined, or where the policies differ on what to do". So, thinking of the distance as getting smaller as the time of first difference gets bigger is a reliable guide.

The outcome set $F (π_{p a})$ is... take the tree corresponding to $π_{p a}$ , and it's the set of all the observation leaf nodes and infinite paths. No matter what, if you're interacting with an environment and acting according to $π_{p a}$ , the history you get is guaranteed to have, as a prefix, something in $F (π_{p a})$ . $F^{N F} (π_{p a})$ is that same set but minus all the Nirvana observations. Nirvana is a special observation which can occur at any time, counts as infinite reward, and ends the history. This is our compact metric space of interest that we're using to define a-measures and sa-measures. We assume that there's only finitely many discrete actions/observations available at any given point in time.

In this setting, sa-measures and a-measures over $F^{N F} (π_{p a})$ are defined as usual (a pair of a signed measure $m$ and a number $b$ where $b + m^{-} (1) \geq 0$ for sa-measures, and a measure $m$ with no negative parts and $b \geq 0$ , respectively), because there's no infinite reward shenanigans. Sa-measures over $F (π_{p a})$ require a technicality, though, which is that no nirvana event can have negative measure. $λ$ will denote the total amount of measure you have. So, for a probability distribution, $λ$ will be $1$ . We'll just use this for a-measures, and talk freely about the $λ$ and $b$ values of an a-measure. We use the KR-metric for measuring the distance between sa-measures (or a-measures), which is like "if two measures are really similar for a long time and then start diverging at late times, they're pretty similar." It's also equivalent to the earthmover distance, which is "how much effort does it take to rearrange the pile-of-dirt-that-is-this-measure into the pile-of-dirt-that-is-that-measure."

One important note. While $m (f)$ is "what's the expectation of the continuous function $f$ over histories, according to the measure we have", we frequently abuse notation and use $m (h)$ to refer to what technically should be "what's the expectation of the indicator function for "this history has $h$ as a prefix" w.r.t the measure". The reason we can do this is because the indicator function for the finite history $h$ is a continuous function! So we can just view it as "what's the measure assigned to history $h$ ". Similarly, $f ★^{h} g$ is the continuous function that's $f$ on histories with $h$ as a prefix and $g$ on histories without $h$ as a prefix.

For a given $M^{a} (F (π_{p a}))$ or the nirvana-free variant, $N F$ is just the subset of that where the measure components of the a-measures assign 0 measure to Nirvana occurring. They're safe from infinite reward. We suppress the dependency on $π_{p a}$ . Similarly,

$E_{B} (f) = {inf}_{(m, b) \in B \cap N F} (m (f) + b)$

because if a Nirvana-containing measure was selected by Murphy, you'd get infinite expected value, so Murphy won't pick anything with Nirvana in it. Keep that in mind.

There's a fiddly thing to take into account about upper completion. We're usually working in the space of a-measures $M^{a} (F (π_{p a}))$ or the nirvana-free equivalent. But the variant of upper completion we impose on our sets is: take the nirvana-free part of your set of interest, take the upper completion w.r.t the cone of nirvana-free sa-measures, then intersect with a-measures again. So, instead of the earlier setting where we could have any old sa-measure in our set and we could add any old sa-measure to them, now, since we're working purely in the space of a-measures and only demanding upper closure of the nirvana-free part, our notion of upper completion is something more like "start with a nirvana-free a-measure, you can add a nirvana-free sa-measure to it, and adding them has to make a nirvana-free a-measure"

Even worse, this is the notion of upper completion we impose, but for checking whether a point counts as minimal, we use the cone of sa-measures (with nirvana). So, for certifying that a point is non-minimal, we have to go "hey, there's another a-measure where we can add an sa-measure and make our point of interest". A different notion of upper completion here.

And, to make matters even worse, sometimes we do arguments involving the cone of sa-measures or nirvana-free sa-measures and don't impose the a-measure restriction. I'll try to clarify which case we're dealing with, but I can't guarantee it'll all be clear or sufficiently edited.

There's a partial ordering on partial policies, which is $π_{p a} \geq π_{p a}^{'}$ if the two policies never disagree on which action to take, and $π_{p a}$ is defined on more histories than $π_{p a}^{'}$ is (is a bigger tree). So, instead of viewing a partial policy as a tree, we can view the set of partial policies as a big poset. The full policies $π$ are at the top, the empty policy $π_{\emptyset}$ is at the bottom. Along with this, we've got two important notions. One is the fundamental sequence of a partial policy. Envisioning it at the tree level, $π_{p a}^{n}$ is "the tree that is $π_{p a}$ , just cut off at level $n$ ". Envisioning it at the poset level, the sequence $π_{p a}^{n}$ is a chain of points in our poset ascending up to the point $π_{p a}$ .

Also synergizing with the partial-order view, we've got functions heading down the partial policy poset. $p r_{*}^{π_{p a}^{h i}, π_{p a}^{l o}}$ is the function that takes an a-measure or sa-measure over $F (π_{p a}^{h i})$ , and is like "ok, everything in $F (π_{p a}^{h i})$ has a unique prefix in $F (π_{p a}^{l o})$ , push your measure component down, keep the $b$ term the same". A good way of intuiting this is that this sort of projection describes what happens when you crunch down a measure over 10-bit-bitstrings to a measure over 8-bit-bitstrings. So view your poset of partial policies as being linked together by a bunch of projection arrows heading down.

There's a function $Θ$ mapping each partial policy $π_{p a}$ to a set of a-measures over $F (π_{p a})$ (or the nirvana-free variant), fulfilling some special properties. Maybe $Θ$ is only defined over policy stubs or full policies, in which case we use $Θ^{s t}$ or $Θ^{ω}$ , respectively. So, the best mental visualization/sketching aid for a lot of the proofs is visualizing your partial policies of interest with an ordering on them where bigger=up and smaller=down, and have a set/point for each one, that organizes things fairly well and is how many of these proofs were created.

Every $Θ$ (or the stub/full policy analogue) is associated with a $λ^{⊙}$ and $b^{⊙}$ value, which is "smallest upper bound on the $λ$ of the minimal points of the $Θ (π_{p a})$ sets" and "smallest upper bound on the $b$ of the minimal points of the $Θ (π_{p a})$ sets". Accordingly, the set ${\leq ⊙}$ is defined as ${(λ μ, b) | λ + b \leq λ^{⊙} + b^{⊙}}$ , and is a way of slicing out a bounded region of a set that contains all minimal points, if we need to do compactness arguments.

Finally, we'll reiterate two ultra-important results from basic inframeasure theory that get used a lot in here and will be tossed around casually for arguments without citing where they came from. There's the Compactness Lemma, which says that if you have a bound on the $λ$ values and the $b$ values of a closed set of a-measures, the set is compact. There's also Theorem 2, which says that you can break down any sa-measure into (minimal point + sa-measure), we use that decomposition a whole lot.

Other results we'll casually use are that projections commute (projecting down and then down again is the same as doing one big projection down), projections are linear (it doesn't matter whether you mix before or after projecting), projections don't expand distance (if two a-measures are $ϵ$ apart before being projected down, they'll be $ϵ$ or less apart after being projected down), if two a-measures are distinct in $M^{a} (F (π_{p a}))$ , then the measure components differ at some finite time (or the $b$ terms differ), so we can project down to some finite $π_{p a}^{n}$ (same thing, just end history at time $n$ ) and they'll still be different, and projections preserve the $λ$ and $b$ value of an a-measure.

One last note, $M^{a} (\infty)$ is the space of a-measures on nirvana-free histories. This is all histories, not just the ones compatible with a specific policy. And a surmeasure $S M$ is like a measure, but it can assign $0^{+}$ value to a nirvana event, marking it as "possible" even though it has 0 (arbitrarily low) measure.

Now, we can begin. Our first order of business is showing how the surtopology/surmetric/surmeasures are made and link together, but the bulk of this is the Isomorphism theorem. Which takes about 20 lemmas to set up first, in order to compress all the tools we need for it, and then the proof itself is extremely long. After that, things go a bit faster.

Lemma 1: $d (x, x^{'}) := max (d_{1} (x, x^{'}), d_{2} (x, x^{'}))$ is a metric if $d_{1}$ is a metric and $d_{2}$ is a pseudometric.

For identity of indiscernibles, $d (x, x^{'}) = 0 \to d_{1} (x, x^{'}) = 0 \to x = x^{'}$ because $d_{1}$ is a metric, and in the reverse direction, if $x = x^{'}$ , then $d_{2} (x, x^{'}) = 0$ (pseudometrics have $0$ distance from a point to itself) and $d_{1} (x, x^{'}) = 0$ , so $d (x, x^{'}) = 0$ .

For symmetry, both metrics and pseudometrics have symmetry, so

$d (x, x^{'}) = max (d_{1} (x, x^{'}), d_{2} (x, x^{'})) = max (d_{1} (x^{'}, x), d_{2} (x^{'}, x)) = d (x^{'}, x)$

For triangle inequality, both metrics and pseudometrics fulfill the triangle inequality, so

$d (x, z) = max (d_{1} (x, z) + d_{2} (x, z)) \leq max (d_{1} (x, y) + d_{1} (y, z), d_{2} (x, y) + d_{2} (y, z))$

$\leq max (d_{1} (x, y), d_{2} (x, y)) + max (d_{1} (y, z), d_{2} (y, z)) = d (x, y) + d (y, z)$

And we're done.

Lemma 2: The surmetric is a metric.

To recap, the surmetric over sa-measures is

$d_{s} ((m, b), (m^{'}, b^{'})) := max (d (m, m^{'}) + | b - b^{'} |, γ^{t (m, m^{'})})$

where $γ < 1$ , and $t (m, m^{'})$ is the minimum length of a Nirvana-containing history that has positive measure according to $m$ and $0$ measure according to $m^{'}$ (or vice-versa) We'll show that $γ^{t (m, m^{'})}$ acts as a pseudometric, and then invoke Lemma 1.

The first three conditions of nonnegativity, $γ^{t (m, m)} = γ^{\infty} = 0$ , and symmetry are immediate. That just leaves checking the triangle inequality. Let $t_{1} := t (m_{1}, m_{2})$ , $t_{2} := t (m_{2}, m_{3})$ , and $t_{3} := t (m_{1}, m_{3})$ .

Assume $t_{3} < min (t_{1}, t_{2})$ . Then, going from $m_{1}$ to $m_{2}$ , all changes in the possibility of a Nirvana-history take place strictly after $t_{3}$ , and going from $m_{2}$ to $m_{3}$ , all changes in the possibility of a Nirvana-history also take place strictly after $t_{3}$ , so $m_{1}$ and $m_{3}$ behave identically (w.r.t. Nirvana-possibilities) up to and including time $t_{3}$ , which is impossible, because of $t_{3}$ being "what's the shortest time where $m_{1}$ and $m_{3}$ disagree on the possiblility of a Nirvana-history".
Therefore, $t_{3} \geq min (t_{1}, t_{2})$ . and this case is ruled out.

In one case, either $t_{1}$ or $t_{2}$ are $> t_{3}$ . Without loss of generality, assume it's $t_{1}$ . Then, $γ^{t_{3}} < γ^{t_{1}} \leq γ^{t_{1}} + γ^{t_{2}}$ and the triangle inequality is shown.

In the other case, $t_{1} = t_{2} = t_{3}$ . In that case, $γ^{t_{3}} = γ^{t_{1}} \leq γ^{t_{1}} + γ^{t_{2}}$ And the triangle inequality is shown.

Therefore, $γ^{t (m, m^{'})}$ is a pseudometric. Now, we can invoke Lemma 1 to show that $d_{s}$ is a metric.

Theorem 1: The surmetric on the space of sa-measures $M^{a} (F (π_{p a}))$ induces the surtopology. The Cauchy completion of $M^{a} (F (π_{p a}))$ w.r.t the surmetric is exactly the space of sa-surmeasures.

Proof sketch: First, use the metric to get an entourage (check the Wikipedia page on "Uniform Space"), and use the entourage to get a topology. Then, we go in both directions, and check that entourage-open sets are open according to the surtopology and the surtopology subbasis sets are entourage-open, to conclude that the topology induced by the surmetric is exactly the surtopology. Then, for the Cauchy completion, we'll show a bijection between equivalence classes of Cauchy sequences w.r.t. the surmetric and sa-surmeasures.

The surmetric is $d_{s} ((m, b), (m^{'}, b^{'})) := max (d (m, m^{'}) + | b - b^{'} |, γ^{t (m, m^{'})})$ where $γ < 1$ , and $t$ is the minimum length of a Nirvana-containing history that has positive measure according to $m$ and $0$ measure according to $m^{'}$ (or vice-versa)

From the Wikipedia page on "Uniform Space", a fundamental system of entourages for $M^{s a} (F (π_{p a}))$ is given by

${(M, M^{'}) \in M^{s a} (F (π_{p a})) \times M^{s a} (F (π_{p a})) : d_{s} (M, M^{'}) \leq ϵ}$

A set $O$ is open w.r.t. the uniformity iff for all $M \in O$ , there exists an entourage $V$ where $V [M]$ lies entirely within $O$ (wikipedia page). Because $V$ is a subset of $M^{a} (F (π_{p a})) \times M^{a} (F (π_{p a}))$ , $V [M]$ is the set of all the second components paired with a given sa-measure.
So, let's say $O$ is open w.r.t. the uniformity. Then, for all $M \in O$ , there's an entourage $V$ where $V [M]$ lies entirely within $O$ . A fundamental system of entourages has the property that every entourage is a superset of some set from the fundamental system. Thus, from our earlier definition of the fundamental system, there exists some $ϵ_{M}$ where

${M^{'} \in M^{s a} (F (π_{p a})) : d_{s} (M, M^{'}) \leq ϵ_{M}} \subseteq O$

We'll construct an open set from the surtopology that is a subset of this set and contains $M$ , as follows. First, observe that if $d_{s} (M, M^{'}) \leq ϵ_{M}$ , then $d (M, M^{'}) \leq ϵ_{M}$ , and $γ^{t (m, m^{'})} \leq ϵ_{M}$ . For the latter, there are finitely many nirvana-containing histories with a length less than ${log}_{γ} (ϵ_{M}) + 1$ , and if a $M^{'}$ matches $M$ w.r.t. which nirvana-containing histories of that finite set are possible or impossible, then $γ^{t (m, m^{'})} < ϵ_{M}$ (because $M$ and $M^{'}$ only differ on which Nirvana-histories are possible at very late times)

Accordingly, intersect the following sets:

1: the open ball centered at $M$ with a size of $ϵ_{M}$

2: For all the nirvana-histories $h N$ where $| h N | \leq {log}_{γ} (ϵ_{M}) + 1$ and $m (h N) > 0$ , intersect all the sets of a-measures where that history has positive measure. These are open because they're the complements of "this finite history has zero measure", which is a closed set of sa-measures.

3: For all the nirvana-histories $h N$ where $| h N | \leq {log}_{γ} (ϵ_{M}) + 1$ and $m (h N) = 0$ , intersect all the sets of a-measures where that nirvana-history has 0 measure. These are open because they are subbasis sets for the surtopology.

We intersected finitely many open sets, so the result is open. Due to 2 and 3 and our earlier discussion, any $M^{'}$ in the intersection must have $γ^{t (m, m^{'})} < ϵ_{M}$ . Due to 1, $d (M, M^{'}) < ϵ_{M}$ .

This finite intersection of open sets (in the surtopology) produces an open set that contains $M$ (obviously) and is a subset of ${M^{'} \in M^{s a} (F (π_{p a})) : d_{s} (M, M^{'}) \leq ϵ_{M}}$ , which is a subset of $V [M]$ which is a subset of $O$ .

Because this argument can be applied to every point $M \in O$ to get an open set (in the surtopology) that contains $M$ and is a subset of $O$ , we can make $O$ itself by just unioning all our open sets together, which shows that $O$ is open in the surtopology.

In the reverse direction, let's show that all sets in the subbasis of the surtopology are open w.r.t. the uniform structure.

First, we'll address the open balls around a point $M$ . Every point $M^{'}$ in such an open ball has some $ϵ_{M^{'}}$ -sized open ball which fits entirely within the original open ball. Then, we can just consider our entourage $V$ being

${M, M^{''} \in M^{s a} (F (π_{p a})) \times M^{s a} (F (π_{p a})) : d_{s} (M, M^{''}) \leq \frac{ϵ_{M^{'}}}{2}}$

And then $V [M^{'}]$ is all points that are $\frac{ϵ_{M^{'}}}{2}$ or less away from $M^{'}$ according to the surmetric, and $d_{s} (M^{'}, M^{''}) \geq d (M^{'}, M^{''})$ so this is a subset of the $ϵ_{M^{'}}$ -sized ball around $M^{'}$ , which is a subset of the ball around $M$ . The extra measure we added in total on step $n$ is (note that no nirvana-history can have a length of 0, so we start at 1, and $t$ denotes timesteps in the history)

$\sum_{t} \sum_{h N : | h N | = t} \frac{2^{- (n + t)}}{# (t)} = \sum_{t} # (t) \frac{2^{- (n + t)}}{# (t)} \leq \sum_{t} 2^{- (n + t)} = 2^{- n} \sum_{t} 2^{- t} \leq 2^{- n}$

So, as $n$ increases, the deviation of this sequence of sa-measures from the limit sa-surmeasure approaches $0$ w.r.t. the usual metric, and every component in this sequence agrees with the others and the limit sa-surmeasure on which nirvana events are possible or impossible, so it's a Cauchy sequence limiting to the sa-surmeasure of interest.

Thus, all parts have been shown. The surmetric induces the surtopology, and the Cauchy completion of the sa-measures w.r.t. the surmetric is the set of sa-surmeasures. The same proof works if you want a-surmeasures (get it from the a-measures), or surmeasures (get it from the measures).

Alright, now we can begin the lemma setup for the Isomorphism Theorem, which is the Big One. See you again at Lemma 21.

Lemma 3: If $B \subseteq M^{a} (F^{N F} (π_{s t}))$ and $B$ is nonempty, closed, convex, nirvana-free upper-complete, and has bounded-minimals, then $c . h (B^{min}) = c . h (B^{xmin})$

So, first, $B^{xmin}$ refers to the set of extreme minimal points of $B$ . An extreme point of $B$ is one that cannot be written as a mixture of other points in $B$ .

Proof Sketch: One subset direction, $c . h (B^{min}) \supseteq c . h (B^{xmin})$ is immediate. For the other direction, we need a way to write a minimal point as a finite mixture of extreme minimal points. What we do is first show that all extreme points in $B$ must lie below the $λ^{⊙} + b^{⊙}$ bound by crafting a way to write them as a mix of different points with upper completion if they violate the bound. Then, we slice off the top of $B$ to get a compact convex set with all the original minimal (and extreme) points in it. Since $π_{s t}$ is a policy stub, $F^{N F} (π_{s t})$ has finitely many possible outcomes, so we're working in a finite-dimensional vector space. In finite dimensions, a convex compact set is the convex hull of its extreme points, which are all either (extreme points in $B$ originally), or (points on the high hyperplane we sliced at). Further, a minimal point can only be made by mixing together other minimal points. Putting this together, our minimal point of interest can be made by mixing together extreme minimal points, and the other subset direction is immediate from there.

Proof: As stated in the proof sketch, one subset direction is immediate, so we'll work on the other one. To begin with, fix a $M^{e x}$ that is extreme in $B$ . It's an a-measure. If $M^{e x}$ has $λ + b > λ^{⊙} + b^{⊙}$ , then it's not minimal ( $B$ has bounded-minimals) so we can decompose it into a minimal point $M^{min}$ respecting the bound and some nonzero sa-measure $M^{*}$ . $M^{e x} = M^{min} + M^{*}$ . Now, consider the point $M^{min} + (m^{* -}, - m^{* -} (1))$ instead. We're adding on the negative part of $m^{*}$ , and just enough of a $b$ term to compensate, so it's an sa-measure. The sum of these two points is an a-measure, because we already know from $M^{e x}$ being an a-measure that the negative part of $m^{*}$ isn't enough to make any negative parts when we add it to $m^{min}$ .

Anyways, summing the two parts like that saps a bit from the $λ$ value of $M^{min}$ , but adds an equal amount on the $b$ value, so it lies below the $λ^{⊙} + b^{⊙}$ "barrier", and by nirvana-free upper-completeness, it also lies in $B$ . Then, we can express $M^{e x}$ as

$M^{e x} = M^{min} + M^{*} = M^{min} + (m^{*}, b^{*}) = M^{min} + (m^{* -}, - m^{* -} (1)) + (m^{* +}, b^{*} + m^{* -} (1))$

$= 0.5 (M^{min} + (m^{* -}, - m^{* -} (1))) + 0.5 (M^{min} + (m^{* -}, - m^{* -} (1)) + 2 (m^{* +}, b^{*} + m^{* -} (1)))$

Now, we already know that $M^{min} + (m^{* -}, - m^{* -} (1))$ is an a-measure, and $(m^{* +}, b^{*} + m^{* -} (1))$ is an a-measure (no negative parts, end term is $\geq 0$ ). So, we just wrote our extreme point as a mix of two distinct a-measures, so it's not extreme. Contradiction. Therefore, all extreme points have $λ + b \leq λ^{⊙} + b^{⊙}$ .

Let's resume. From bounded-minimals, we know that $B$ has a suitable bound on $λ + b$ , so the minimal points respect the $λ^{⊙} + b^{⊙}$ bound. Take $B$ and chop it off at some high hyperplane, like $λ + b \leq 2 (λ^{⊙} + b^{⊙})$ . (the constant 2 isn't that important, it just has to be above 1 so we net all the extreme points and minimal points). Call the resulting set $C$ . Due to the bound, and $B$ being closed, $C$ is compact by the Compactness Lemma. It's also convex.

Now, invoke the Krein-Milman theorem (and also, we're in a finite-dimensional space since we're working with a stub, finitely many observation leaf nodes, so we don't need to close afterwards, check the Wikipedia page for Krein-Milman Theorem at the bottom) to go $C = c . h (extreme (C))$ . The only extreme points in $C$ are either points that were originally extreme in $B$ , or points on the high hyperplane that we chopped at.

Fix some $M^{min} \in B^{min}$ . $B^{min} \subseteq C$ , so $M^{min} \in C$ . Thus, $M^{min}$ can be written as a finite mixture of points from $extreme (C)$ . However, because $M^{min}$ is minimal, it can only be a mixture of minimal points, as we will show now.

Decompose $M^{min}$ into $E_{ζ} M_{i}$ , and then decompose the $M_{i}$ into $M_{i}^{min} + M_{i}^{*}$ . To derive a contradiction, assume there exists some $i^{'}$ where $M_{i^{'}}$ isn't minimal, so that $M_{i^{'}}^{*}$ isn't 0. Then,

$M^{min} = E_{ζ} M_{i} = E_{ζ} (M_{i}^{min} + M_{i}^{*}) = E_{ζ} (M_{i}^{min}) + E_{ζ} (M_{i}^{*})$

Thus, we have decomposed our minimal point into another point which is also present in $B$ , and a nonzero sa-measure because $M_{i^{'}}^{*}$ is nonzero, so our original minimal point is actually nonminimal. and we have a contradiction. Therefore, all decompositions of a minimal point into a mix of points must have every component point being minimal as well.

So, when we decomposed $M^{min}$ into a mix of points in $extreme (C)$ , all the extreme points we decomposed it into are minimal, so there's no component on the high hyperplane. $M^{min}$ was arbitrary in $B^{min}$ establishing that $B^{min} \subseteq c . h (B^{xmin})$ . Therefore, $c . h (B^{min}) \subseteq c . h (B^{xmin})$

So we have our desired result.

Lemma 4: If $π_{p a} \geq π_{s t}^{h i} \geq π_{s t}^{l o}$ , and $A \subseteq M^{a} (F (π_{s t}^{h i}))$ and $B \subseteq M^{a} (F (π_{s t}^{l o}))$ (also works with the nirvana-free variants) and $p r_{*}^{π_{s t}^{h i}, π_{s t}^{l o}} (A) \subseteq B$ then $(p r_{*}^{π_{p a}, π_{s t}^{h i}})^{- 1} (A) \subseteq (p r_{*}^{π_{p a}, π_{s t}^{l o}})^{- 1} (B)$ This works for surmeasures too.

A point $M$ in the preimage of $A$ has $p r_{*}^{π_{p a}, π_{s t}^{h i}} (M) \in A$ , and by projections commuting and projecting down further landing you in $B$ , we get $p r_{*}^{π_{p a}, π_{s t}^{l o}} (M) \in B$ , so $M$ is in the preimage of $B$ too.

Lemma 5: Given a partial policy $π_{p a}$ and stub $π_{s t}$ , if $π_{p a} \geq π_{s t}$ , then $\exists n : π_{p a}^{n} \geq π_{s t}$

$π_{s t}$ is a stub that specifies less about what the policy does than $π_{p a}$ , and because it's a stub it has a minimum time beyond which it's guaranteed to be undefined, so just let that be your $n$ . $π_{p a}^{n}$ then specifies everything that $π_{s t}$ does, and maybe more, because it has all the data of $π_{p a}$ up till time $n$ .

Lemma 6: If $π_{p a}$ is a partial policy, and $\forall π_{s t}^{l o}, π_{s t}^{h i} \geq π_{s t}^{l o} : p r_{*}^{π_{s t}^{h i}, π_{s t}^{l o}} (Θ^{s t} (π_{s t}^{h i})) \subseteq Θ^{s t} (π_{s t}^{l o})$ holds, then, for all $m$ , $⋂_{π_{s t} \leq π_{p a}} (p r_{*}^{π_{p a}, π_{s t}})^{- 1} (Θ^{s t} (π_{s t})) = ⋂_{n \geq m} (p r_{*}^{π_{p a}, π_{p a}^{n}})^{- 1} (Θ^{s t} (π_{p a}^{n}))$ This works for surmeasures too.

First, all the $π_{p a}^{n} \leq π_{p a}$ are stubs, so we get one subset direction immediately.

$⋂_{π_{s t} \leq π_{p a}} (p r_{*}^{π_{p a}, π_{s t}})^{- 1} (Θ^{s t} (π_{s t})) \subseteq ⋂_{n \geq m} (p r_{*}^{π_{p a}, π_{p a}^{n}})^{- 1} (Θ^{s t} (π_{p a}^{n}))$

In the other direction, use Lemma 5 to find a $π_{p a}^{n} \geq π_{s t}$ , with $n \geq m$ , and then pair

$\forall π_{s t}^{l o}, π_{s t}^{h i} \geq π_{s t}^{h i} : p r_{*}^{π_{s t}^{h i}, π_{s t}^{l o}} (Θ^{s t} (π_{s t}^{h i})) \subseteq Θ^{s t} (π_{s t}^{l o})$

with Lemma 4 to deduce that

$(p r_{*}^{π_{p a}, π_{p a}^{n}})^{- 1} (Θ^{s t} (π_{p a}^{n})) \subseteq (p r_{*}^{π_{p a}, π_{s t}})^{- 1} (Θ^{s t} (π_{s t}))$

Due to being able to take any stub preimage and find a smaller preimage amongst the fundamental sequence for $π_{p a}$ (with an initial segment clipped off) we don't need anything other than the preimages of the fundamental sequence (with an initial segment clipped off), which establishes the other direction and thus our result.

Lemma 7: If $M$ is an a-measure and $p r_{*}^{π_{p a}, π_{s t}} (M) = M^{'}$ and $M^{'} = M^{^{'} l o} + M^{^{'} *}$ and $M^{^{'} l o}$ is an a-measure, then there exists a $M^{l o} \in M^{a} (F (π_{p a}))$ (or the nirvana-free variant) s.t. $p r_{*}^{π_{p a}, π_{s t}} (M^{l o}) = M^{^{'} l o}$ and there's an sa-measure $M^{*}$ s.t. $M^{l o} + M^{*} = M$ . This works for a-surmeasures and sa-surmeasures too.

What this essentially says is "let's say we start with a $M$ and project it down to $M^{'}$ , and then find a point $M^{^{'} l o}$ below $M^{'}$ . Can we "go back up" and view $M^{^{'} l o}$ as the projection of some point below $M$ ? Yes". It's advised to sketch out the setup of this one, if not the proof itself.

Proof sketch: To build our $M^{l o}$ and $M^{*}$ , the $b$ components are preserved, but crafting the measure component for them is tricky. They've gotta project down to $M^{^{'} l o}$ and $M^{^{'} *}$ so those two give us our base case to start working from with the measures (and automatically get the "must project down appropriately" requirement) and then we can recursively build up by extending both of them with the conditional probabilities that $M$ gives us. However, we must be wary of division-by-zero errors and accidentally assigning negative measure on Nirvana, which complicates things considerably. Once we've shown how to recursively build up the measure components of our $M^{l o}$ and $M^{*}$ , we then need to check four things. That they're both well formed (sum of measure on 1-step extensions of a history=measure on the history, no semimeasures here), that they sum up to make $M$ , the measure component of $M^{l o}$ can't go negative anywhere (to certify that it's an a-measure), and that the $b$ term attached to $M^{*}$ is big enough to cancel out the negative regions (to certify that it's an sa-measure).

Proof: Let $M^{l o} = (m^{l o}, b^{l o})$ where $b^{l o}$ is the $b$ term of $M^{^{'} l o}$ . Let $M^{*} = (m^{*}, b^{*})$ where $b^{*}$ is the $b$ term of $M^{^{'} *}$ . Recursively define $m^{l o}$ and $m^{*}$ on $h$ that are prefixes of something in $F (π_{p a})$ (or the nirvana-free variant) as follows:

If $h$ is a prefix of something in $F (π_{s t})$ (or the nirvana-free variant), $m^{l o} (h) = m^{^{'} l o} (h)$ and $m^{*} (h) = m^{^{'} *} (h)$ . That defines our base case. Now for how to inductively build up by mutual recursion. Let's use $h a N$ for a nirvana-history and $h a o$ for a non-nirvana history.

If $m^{*} (h) < 0$ , then

$m^{l o} (h a N) = m (h a N), m^{l o} (h a o) = m (h a o) - \frac{m^{*} (h)}{# o}$

$m^{*} (h a N) = 0, m^{*} (h a o) = \frac{m^{*} (h)}{# o}$

$# o$ is the number of non-nirvana observations that can come after $h a$ .

If $m^{*} (h) \geq 0$ and $m (h) > 0$ , then

$m^{l o} (h a o) = \frac{m (h a o)}{m (h)} m^{l o} (h), m^{*} (h a o) = \frac{m (h a o)}{m (h)} m^{*} (h)$

and the same holds for defining $m^{l o} (h a N)$ and $m^{*} (h a N)$ .

If $m^{*} (h) \geq 0$ and $m (h) = 0$ , then $m^{l o} (h a o) = m^{*} (h a o) = 0$

We need to verify that these sum up to $m$ , that they're both well-formed signed measures, that $m^{l o}$ has no negative parts, and that the $b$ value for $M^{*}$ is big enough. $m^{l o}$ having no negative parts is immediate by the way we defined it, because it's nonnegative on all the base cases since $m^{^{'} l o}$ came from an a-measure, and $m$ came from an a-measure as well which lets you use induction to transfer that property all the way up the histories.

To verify that they sum up to $m$ , observe that for base-case histories in $F (π_{s t})$ ,

$m^{l o} (h) + m^{*} (h) = m^{^{'} l o} (h) + m^{^{'} *} (h) = m^{'} (h) = m (h)$

For non-base-case histories $h a o$ we can use induction (assume it's true for $h$ ) and go:

Negative case, $m^{*} (h) < 0$ .

$m^{l o} (h a N) + m^{*} (h a N) = m (h a N) + 0 = m (h a N)$

$m^{l o} (h a o) + m^{*} (h a o) = m (h a o) - \frac{m^{*} (h)}{# o} + \frac{m^{*} (h)}{# o} = m (h a o)$

Nonnegative case, no division by zero.

$m^{l o} (h a o) + m^{*} (h a o) = \frac{m (h a o)}{m (h)} (m^{l o} (h) + m^{*} (h)) = \frac{m (h a o)}{m (h)} m (h) = m (h a o)$

Zero case: $m (h a o) = 0$ because $m (h) = 0$ and $m$ came from an a-measure and has no negative parts. $m^{l o} (h a o) + m^{*} (h a o) = 0 + 0 = 0 = m (h a o)$

Ok, so we've shown that $m^{l o} + m^{*} = m$ .

What about checking that they're well-formed signed measures? To do this, it suffices to check that summing up their measure-mass over $h a o_{i}$ gets the measure-mass over $h$ . This works over the base case, so we just have to check the induction steps.

In the negative case, for $m^{*}$ ,

$m^{*} (h a N) + \sum_{i} m^{*} (h a o_{i}) = \sum_{i} \frac{m^{*} (h)}{# o} = m^{*} (h)$

and for $m^{l o}$

$m^{l o} (h a N) + \sum_{i} m^{l o} (h a o_{i}) = m (h a N) + \sum_{i} (m (h a o_{i}) - \frac{m^{*} (h)}{# o}) = m (h) - m^{*} (h) = m^{l o} (h)$

In the nonnegative case, no division by zero, then

$m^{l o} (h a N) + \sum_{i} m^{l o} (h a o_{i}) = \frac{m (h a N)}{m (h)} m^{l o} (h) + \sum_{i} \frac{m (h a o_{i})}{m (h)} m^{l o} (h)$

$= \frac{m^{l o} (h)}{m (h)} (m (h a N) + \sum_{i} m (h a o_{i})) = \frac{m^{l o} (h)}{m (h)} m (h) = m^{l o} (h)$

And similar for $m^{*}$ .

In the zero case where $m (h) = 0$ , we need to show that $m^{l o} (h)$ and $m^{*} (h)$ will also be zero. Winding $h$ back, there's some longest prefix $h^{'}$ where $m (h^{'}) > 0$ . Now, knowing that $m (h^{'}) = m^{l o} (h^{'}) + m^{*} (h^{'})$ , we have two possible options here.

In the first case, $m^{*} (h^{'}) \geq 0$ , so $m^{l o} (h^{'} a o)$ (advancing one step) is:

$m^{l o} (h^{'} a o) = \frac{m (h^{'} a o)}{m (h^{'})} m^{l o} (h^{'}) = \frac{0}{m (h^{'})} m^{l o} (h^{'}) = 0$

And similar for $m^{*} (h^{'} a o)$ , so they're both 0, along with $m^{'}$ , on $h^{'} a o$ , and then the zero case transfers the "they're both zero" property all the way up to $h$ .

In the second case, $m^{*} (h^{'}) < 0$ and $m^{l o} (h^{'}) > 0$ . Then, proceeding forward, $m^{*} (h^{'} a o) < 0$ , and this keeps holding all the way up to $h$ , so we're actually in the negative case, not the zero case.

So, if $m (h) = 0$ , then $m^{l o} (h) = m^{*} (h) = 0$ as long as we're in the case where $m^{*} (h) \geq 0$ and $m (h) = 0$ . Then, it's easy. $m^{l o} (h a N) + \sum_{i} m^{l o} (h a o_{i}) = 0 = m^{l o} (h)$ and the same for $m^{*}$ .

Also, $m^{*}$ , by the way we defined it, never puts negative measure on a nirvana event, so we're good there, they're both well-formed signed measures. For the $b^{*}$ value being sufficient to compensate for the negative-measure of $m^{*}$ , observe that the way we did the extension, the negative-measure for $m^{*}$ is the same as the negative measure for $m^{^{'} *}$ , and $b^{*} = b^{^{'} *}$ , and the latter is sufficient to cancel out the negative measure for $m^{^{'} *}$ , so we're good there.

We're done now, and this can be extended to a-surmeasures by taking the $0^{+}$ nirvana-events in $m$ and saying that all those nirvana-events have $0^{+}$ measure in $m^{l o}$ .

Lemma 8: Having a $λ + b$ bound on a set of a-surmeasures is sufficient to ensure that it's contained in a compact set w.r.t the surtopology.

This is the analogue of the Compactness Lemma for the sur-case. We'll keep it in the background instead of explicitly invoking it each time we go "there's a bound, therefore compactness". It's important.

Proof sketch: Given a sequence, the bound gets convergence of the measure part by the Compactness Lemma, and then we use Tychonoff to show that we can get a subsequence where the a-surmeasures start agreeing on which nirvana events are possible or impossible, for all nirvana events, so their first time of disagreement gets pushed arbitrarily far out, forcing convergence w.r.t. the surmetric.

Proof: given a sequence of a-surmeasures $S M_{n}$ , and rounding them off to their "standard" part (slicing off the $0^{+}$ probability), we can get a convergent subsequence, where the measure part and $b$ part converges, by the Compactness Lemma since we have a $λ + b$ bound, which translates into bounds on $λ$ and $b$ .

Now, all we need is a subsequence of that subsequence that ensures that, for each nirvana-event, the sequence of a-surmeasures starts agreeing on whether it's possible or impossible. There are countably many finite histories, and each nirvana-history is a finite history, so we index our nirvana events by natural numbers, and we can view our sequence as wandering around within ${0, 1}^{ω}$ , where the t'th coordinate keeps track of whether the t'th nirvana event is possible or impossible. ${0, 1}^{ω}$ is compact by Tychonoff's Theorem, so we can find a convergent subsequence, which corresponds to a sequence of a-surmeasures that, for any nirvana event, eventually start agreeing on whether it's possible or impossible. There's finitely many nirvana events before a certain finite time, so if we go far enough out in the $n$ , the a-surmeasures agree on what nirvana events are possible or impossible for a very long time, and so the surdistance shrinks to 0 and they converge, establishing that all sequences have a convergent subsequence, so the set is compact.

Lemma 9: Given a $π_{p a}$ and a sequence of nonempty compact sets $B_{n} \in M^{a} (F (π_{p a}^{n}))$ (or the nirvana-free variant) where $\forall n : p r_{*}^{π_{p a}^{n + 1}, π_{p a}^{n}} (B_{n + 1}) \subseteq B_{n}$ then there is a point $M \in M^{a} (F (π_{p a}))$ (or the nirvana-free variant) where $\forall n : p r_{*}^{π_{p a}, π_{p a}^{n}} (M) \in B_{n}$ . This also works with a-surmeasures.

Sketch this one out. It's essentially "if sets get smaller and smaller, but not empty, as you ascend up the chain $π_{p a}^{n}$ towards $π_{p a}$ , and are nested in each other, then there's something at the $π_{p a}$ level that projects down into all the $π_{p a}^{n}$ "

Proof sketch: Projection preserves $λ$ and $b$ , the Compactness Lemma says that compactness means you have a $λ$ and $b$ bound, so the preimage of a compact set is compact. Then, we just have to verify the finite intersection property to show that the intersection of the preimages is nonempty, which is pretty easy since all our preimages are nested in each other like an infinite onion.

Proof: Consider the intersection $⋂_{n} (p r_{*}^{π_{p a}, π_{p a}^{n}})^{- 1} (B_{n})$ . Because $B_{n}$ are all compact, they have a $λ$ and $b$ bound. Projection preserves the $λ$ and $b$ values, so the preimage of $B_{n}$ has a $λ$ and $b$ bound, therefore lies in a compact set (By Lemma 8 for the sur-case). The preimage of a closed set is also closed set, so all these preimages are compact. This is then an intersection of a family of compact sets, so we just need to check the nonempty intersection property. Fixing finitely many $m$ , we can find an $n$ above them all, and pick an arbitrary point in the preimage of $B_{n}$ , and invoke Lemma 4 on $B_{n}$ to conclude that said point lies in all lower preimages, thus demonstrating finite intersection. Therefore, the intersection is nonempty.

Lemma 10: Given a sequence of nonempty closed sets $B_{n}$ where $p r_{*}^{π_{p a}^{n + 1}, π_{p a}^{n}} (B_{n + 1}) \subseteq B_{n}$ , and a sequence of points $M_{n} \in (p r_{*}^{π_{p a}, π_{p a}^{n}})^{- 1} (B_{n})$ , all limit points of the sequence $M_{n}$ (if they exist) lie in $⋂_{n} (p r_{*}^{π_{p a}, π_{p a}^{n}})^{- 1} (B_{n})$ (works in the a-surmeasure case)

Proof: Assume a limit point exists, isolate a subsequence limiting to it. By Lemma 4, the preimages are nested in each other. Also, the preimage of a closed set is closed. Thus, for our subsequence, past $n$ , the points are in the preimage of $B_{n}$ and don't ever leave, so the limit point is in the preimage of $B_{n}$ . This argument works for all $n$ , so the limit point is in the intersection of the preimages.

The next three Lemmas are typically used in close succession to establish nirvana-free upper-completeness for projecting down a bunch of nirvana-free upper complete sets, and taking the closed convex hull of them, which is an operation we use a lot. The first one says that projecting down a nirvana-free upper-complete set is upper-complete, the second one says that convex hull preserves the property, and the third one says that closure preserves the property. The first one requires building up a suitable measure via recursion on conditional probabilities, the second one requires building up a whole bunch of sa-measures via recursion on conditional probabilities and taking limits of them to get suitable stuff to mix together, and the third one also requires building up a whole bunch of sa-measures via recursion on conditional probabilities and then some fanciness with defining a limiting sequence.

Lemma 11: In the nirvana-free setting, a projection of an upper-complete set is upper-complete.

Proof sketch: To be precise about exactly what this says, since we're working with a-measures, it says "if you take the fragment of the upper completion composed of a-measures, and project it down, then the thing you get is: the fragment of the upper completion of (projected set) composed of a-measures". Basically, since we're not working in the full space of sa-measures, and just looking at the a-measure part of the upper completion, that's what makes this one tricky and not immediate.

The proof path is: Take an arbitrary point $M^{l o}$ in the projection of $B$ which has been crafted by projecting down $M^{h i}$ . Given an arbitrary $M^{^{'} l o} := M^{l o} + M^{* l o}$ (assuming it's an a-measure) which lies in the upper completion of the projection of $B$ , we need to certify that it's in the projection of $B$ to show that $B$ is upper-complete. In order to do this, we craft a $M^{* h i}$ and $M^{^{'} h i}$ (an a-measure) s.t: $M^{h i} + M^{* h i} = M^{^{'} h i}$ (certifying that $M^{^{'} h i}$ is in $B$ since $B$ is upper complete), and $M^{^{'} h i}$ projects down to hit our $M^{^{'} l o}$ point of interest.

These a-measures are crafted by starting with the base case of $M^{* l o}$ and $M^{^{'} l o}$ , and recursively building up the conditional probabilities in accordance with the conditional probabilities of $M^{h i}$ . Then we just verify the basic conditions like the measures being well-formed, $M^{^{'} h i}$ being an a-measure, $M^{* h i}$ having a big enough $b$ term, and $M^{h i} + M^{* h i} = M^{^{'} h i}$ , to get our result. Working in the Nirvana-free case is nice since we don't need to worry about assigning negative measure to Nirvana.

Proof: Let $B \subseteq M^{a} (F^{N F} (π_{p a}^{h i}))$ be our upper-complete set. We want to show that $p r_{*}^{π_{p a}^{h i}, π_{p a}^{l o}} (B)$ is upper-complete. To that end, fix a $M^{l o}$ in the projection of $B$ that's the projection of a $M^{h i} \in B$ . Let $M^{^{'} l o} := M^{l o} + M^{* l o}$ , where $M^{^{'} l o}$ is an a-measure. Can we find an a-measure $M^{^{'} h i}$ in $B$ that projects down to $M^{^{'} l o}$ ? Let's define $M^{* h i}$ and $M^{^{'} h i}$ as follows:

Let $M^{^{'} h i} = (m^{^{'} h i}, b^{^{'} h i})$ where $b^{^{'} h i}$ is $b^{^{'} l o}$ . Let $M^{* h i} = (m^{* h i}, b^{* h i})$ where $b^{* h i}$ is $b^{* l o}$ . Recursively define $m^{^{'} h i}$ and $m^{* h i}$ on $h$ that are prefixes of something in $F^{N F} (π_{p a}^{h i})$ as follows:

If $h$ is a prefix of something in $F (π_{p a}^{l o})$ , $m^{^{'} h i} (h) = m^{^{'} l o} (h)$ and $m^{* h i} (h) = m^{* l o} (h)$ .

Otherwise, recursively define the measure components $m^{^{'} h i}$ and $m^{* h i}$ as:

If $m^{h i} (h) > 0$ , then

$m^{^{'} h i} (h a o) = \frac{m^{h i} (h a o)}{m^{h i} (h)} m^{^{'} h i} (h), m^{* h i} (h a o) = \frac{m^{h i} (h a o)}{m^{h i} (h)} m^{* h i} (h)$

If $m^{h i} (h) = 0$ , then $m^{^{'} h i} (h a o) = m^{* h i} (h a o) = 0$ .

We need to verify that $m^{^{'} h i}$ has no negative parts so it's fitting for an a-measure, that $m^{h i} + m^{* h i} = m^{^{'} h i}$ , that the $b$ value for $s m^{* h i}$ works, and that they're both well-formed signed measures. The first part is easy to establish, $m^{^{'} h i} (h) = m^{^{'} l o} (h) \geq 0$ in the base cases since $M^{^{'} l o}$ is an a-measure and a quick induction as well as $m^{h i}$ coming from the a-measure $M^{h i}$ (so no negatives anywhere) establish $m^{^{'} h i}$ as not having any negative parts.

To verify that they sum up to $m^{^{'} h i}$ , observe that for base-case histories in $F^{N F} (π_{p a}^{l o})$ , $m^{h i} (h) + m^{* h i} (h) = m^{l o} (h) + m^{* l o} (h) = m^{^{'} l o} (h) = m^{^{'} h i} (h)$ . Then, in the general case, we can use induction (assume it's true for $h$ ) and go:

If $m^{h i} (h) > 0$ , then

$m^{h i} (h a o) + m^{* h i} (h a o) = m^{h i} (h a o) + \frac{m^{h i} (h a o)}{m^{h i} (h)} m^{* h i} (h) = \frac{m^{h i} (h a o)}{m^{h i} (h)} (m^{h i} (h) + m^{* h i} (h))$

$= \frac{m^{h i} (h a o)}{m^{h i} (h)} m^{^{'} h i} (h) = m^{^{'} h i} (h)$

If $m^{h i} (h) = 0$ , then $m^{h i} (h a o) = 0$ , so

$m^{h i} (h a o) + m^{* h i} (h a o) = 0 + 0 = 0 = m^{^{'} h i} (h a o)$

What about checking that they're well-formed measures? To do this, it suffices to check that summing up their measure-mass over $h a o_{i}$ gets the measure-mass over $h$ . If $m^{h i} (h) > 0$ , then:

$\sum_{i} m^{^{'} h i} (h a o_{i}) = \sum_{i} \frac{m^{h i} (h a o_{i})}{m^{h i} (h)} m^{^{'} h i} (h) = \frac{m^{h i} (h)}{m^{h i} (h)} m^{^{'} h i} (h) = m^{^{'} h i} (h)$

And similar for $m^{*}$ .

If $m^{h i} (h) = 0$ , then, when we trace back to some longest prefix $h^{'}$ where $m^{h i} (h^{'}) > 0$ , then $m^{h i} (h^{'} a o) = 0$ , so:

$m^{^{'} h i} (h a o) = \frac{m^{h i} (h a o)}{m^{h i} (h)} m^{^{'} h i} (h) = \frac{0}{m^{h i} (h^{'})} m^{^{'} h i} (h^{'}) = 0$

And the same for $m^{* h i} (h a o)$ . This extends forward up to $h$ , so $m^{h i} (h) = 0$ implies $m^{^{'} h i} (h) = m^{* h i} (h) = 0$ . And we get:

$\sum_{i} m^{^{'} h i} (h a o_{i}) = 0 = m^{^{'} h i} (h)$

and the same for $m^{* h i} (h)$ .

For the $b$ value being sufficient, observe that the way we did the extension, the negative-measure for $m^{* h i}$ is the same as the negative measure for $m^{* l o}$ , and $b^{* h i} = b^{* l o}$ , and the latter is sufficient to cancel out the negative measure for $m^{* l o}$ , so we're good there. And for projecting down appropriately, observe that $m^{* h i}$ and $m^{^{'} h i}$ copy $m^{* l o}$ and $m^{^{'} l o}$ wherever they get a chance to do so.

Thus, $M^{^{'} h i} = M^{h i} + M^{* h i}$ so $M^{^{'} h i}$ lies in the upper completion of $B$ , so it's in $B$ , and projects down onto $M^{^{'} l o}$ , certifying that $M^{^{'} l o}$ lies in the projection of $B$ , so the projection of an upper-complete set of a-measures is upper-complete.

Lemma 12: In the nirvana-free setting, the convex hull of an upper-closed set is upper-closed.

Proof sketch: Again, this is tricky since we're working with a-measures. We have a point $M$ in the convex hull, which shatters into $E_{ζ} M_{i}$ where the $M_{i}$ are in the (non-convex) set $B$ itself. If $M + M^{*}$ is an a-measure, we want to find $M_{i}^{*}$ s.t. $M_{i} + M_{i}^{*}$ is an a-measure, and mixing these makes $M + M^{*}$ . However, it's really hard to define these $M_{i}^{*}$ directly, so we craft approximations indexed by n where $M_{i} + M_{i, n}^{*}$ is an a-measure, and mixing the $m_{i, n}^{*}$ matches $m^{*}$ perfectly up till time n. We get weak bounds on the amount of positive measure and the $b$ term to invoke the compactness lemma, and then use Tychonoff to get a suitable convergent subsequence for all the i to define our final $M_{i}^{*}$ that, when combined with $M_{i}$ , makes an a-measure. Mixing these together replicates the measure component of $M^{*}$ , but not the $b$ term. However, that's easily fixable by adding a bit of excess to one of the $b$ terms, and we're done. We took an arbitrary $M + M^{*}$ where $M \in c . h (B)$ , and crafted $M_{i}^{*}$ where, for all i, $M_{i} + M_{i}^{*}$ is an a-measure (and in $B$ by its upper-completeness) and it mixes together to make $M + M^{*}$ , certifying that the convex hull is upper-closed.

Our way of constructing the $m_{i, n}^{*}$ is basically: start at length n, use $\frac{m_{i} (h)}{m (h)}$ as your scale factor for filling in the measure of histories, extend down, and then extend up with the conditional probabilities of $M_{i}$ .

Proof: Take a point $M \in c . h (B)$ . It decomposes into $E_{ζ} M_{i}$ for finitely many i, where $M_{i} \in B$ . Fix some arbitrary $M + M^{*}$ (as long as it's an a-measure). We'll craft $M_{i}^{*}$ where $M_{i} + M_{i}^{*} \in B$ by upper completeness of $B$ ,and the $M_{i}^{*}$ mix together to make $M^{*}$ itself, certifying that $M + M^{*}$ lies in $c . h (B)$ .

Let $m_{i, n}^{*}$ be defined by: If $h$ is of length n, or lies in $F^{N F} (π_{p a})$ and is shorter than length n,

$m_{i, n}^{*} (h) = \frac{m_{i} (h)}{m (h)} m^{*} (h)$

( $m_{i, n}^{*} (h)$ defaults to $m^{*} (h)$ if $m (h) = 0$ )

And then it's defined for shorter $h$ via:

$m_{i, n}^{*} (h) = \sum_{j} m_{i, n}^{*} (h a o_{j})$

The $b$ value is (we're summing over the base-case histories where $h$ is of length n or lies in $F^{N F} (π_{p a})$ and is shorter than length n) $- \sum_{h} min (0, m_{i, n}^{*} (h))$

And if $m_{i} (h) > 0$ , extend to bigger histories via

$m_{i, n}^{*} (h a o) = \frac{m_{i} (h a o)}{m_{i} (h)} m_{i, n}^{*} (h)$

And if $m_{i} (h) = 0$ , it's $m_{i, n}^{*} (h a o) = \frac{m_{i, n}^{*} (h)}{# o}$

We've got a few things to show. first is showing that $m_{i, n}^{*} (h)$ is a well-defined signed measure. $\sum_{j} m_{i, n}^{*} (h a o_{j}) = m_{i, n}^{*} (h)$ trivially if $h$ is shorter than length n.

Otherwise, (assuming $m_{i} (h) > 0$ )

$\sum_{j} m_{i, n}^{*} (h a o_{j}) = \sum_{j} \frac{m_{i} (h a o_{j})}{m_{i} (h)} m_{i, n}^{*} (h) = \frac{m_{i} (h)}{m_{i} (h)} m_{i, n}^{*} (h) = m_{i, n}^{*} (h)$

If $m_{i} (h) = 0$ , then $\sum_{j} m_{i, n}^{*} (h a o_{j}) = \sum_{j} \frac{m_{i, n}^{*} (h)}{# o} = m_{i, n}^{*} (h)$

Ok, so it's well-defined. Also, past n, it doesn't add any more negative measure. If it's negative on a length n history, it'll stay that negative forevermore and never go positive, so the $b$ value we stuck on it is a critical $b$ value (exactly sufficient to cancel out the areas of negative measure)

We do need to show that $m_{i, n}^{*} + m_{i}$ is an a-measure. For the $h$ of length n or in $F^{N F} (π_{p a})$ and shorter, we can split into two cases. For the case where $m (h) > 0$ ,

$m_{i, n}^{*} (h) m_{i} (h) = \frac{m_{i} (h)}{m (h)} m^{*} (h) + m_{i} (h) = m_{i} (h) (\frac{m^{*} (h)}{m (h)} + 1)$

And because $m^{*} (h) + m (h) \geq 0$ (they sum to make an a-measure), $m^{*} (h) \geq - m (h)$ , so $\frac{m^{*} (h)}{m (h)} \geq - 1$ , so we get a nonnegative number times a nonnegative number, so $m_{i, n}^{*} (h) + m_{i} (h) \geq 0$ .

Now for the case where $m (h) = 0$ . In that case $m_{i} (h) = 0$ because $m$ came from an a-measure and is the mix of the $m_{i}$ . Also, because $m (h) + m^{*} (h) \geq 0$ due to $M + M^{*}$ being an a-measure, $m^{*} (h) \geq 0$ . Then, $m_{i, n}^{*} (h) + m_{i} (h) = m^{*} (h) \geq 0$

Now for the short histories. By induction down,

$m_{i, n}^{*} (h) + m_{i} (h) = \sum_{j} m_{i, n}^{*} (h a o_{j}) + \sum_{j} m_{i} (h a o_{j}) = \sum_{j} (m_{n, i}^{*} (h a o_{j}) + m_{i} (h a o_{j})) \geq 0$

Now for the long histories. If $m_{i} (h) > 0$ ,

$m_{i, n}^{*} (h a o) + m_{i} (h a o) = m_{i} (h a o) + \frac{m_{i} (h a o)}{m_{i} (h)} m_{i, n}^{*} (h) = m_{i} (h a o) (1 + \frac{m_{i, n}^{*} (h)}{m_{i} (h)})$

And then, by induction up we can assume $m_{i, n}^{*} (h) + m_{i} (h) \geq 0$ , so $m_{i, n}^{*} (h) \geq - m_{i} (h)$ , so $\frac{m_{i, n}^{*} (h)}{m_{i} (h)} \geq - 1$ , so it's a multiplication of a nonnegative number and a nonnegative number, so we're good there on showing nonnegativity.

If $m_{i} (h) = 0$ , then $m_{i} (h a o) = 0$ . Since $m_{i, n}^{*} (h) + m_{i} (h) \geq 0$ , then $m_{i, n}^{*} (h^{'}) \geq 0$ . Then, $m_{i, n}^{*} (h a o) + m_{i} (h a o) = \frac{m_{i, n}^{*} (h)}{# o} \geq 0$

Ok, so, for all n, $M_{i} + M_{i, n}$ is an a-measure.

One last thing we'll want to show is that, for histories $h$ of length n or shorter, $(E_{ζ} m_{i, n}^{*}) (h) = m^{*} (h)$

First, for the histories of length n or in $F^{N F} (π_{p a})$ (assuming $m (h) > 0$ )

$(E_{ζ} m_{i, n}^{*}) (h) = E_{ζ} (m_{i, n}^{*} (h)) = E_{ζ} (\frac{m_{i} (h)}{m (h)} m^{*} (h)) = \frac{m^{*} (h)}{m (h)} E_{ζ} (m_{i} (h)) = \frac{m^{*} (h)}{m (h)} (E_{ζ} m_{i}) (h)$

$= \frac{m^{*} (h)}{m (h)} m (h) = m^{*} (h)$

Assuming $m (h) = 0$ , then $m_{i, n}^{*} (h) = m^{*} (h)$ immediately giving you your result. So, since the mixture of the $m_{i, n}^{*}$ mimics $m^{*}$ on everything of length n or shorter in $F^{N F} (π_{p a})$ , the "sum up the stuff ahead of you" thing makes it mimic $m^{*}$ on all histories of length n or shorter.

Further, $m^{*}$ is nonpositive on a history of length n or a shorter history in $F^{N F} (π_{p a})$ iff for all i, $m_{i, n}^{*}$ is nonpositive on it. So, since the $b$ values for $m_{i, n}^{*}$ are the negative component, $E_{ζ} b_{i, n}^{*}$ is the negative-measure of $m^{*}$ up till time n, which is less negative than the negative-measure of $m^{*}$ , so the mixture of the $b_{i, n}^{*}$ terms undershoots $b^{*}$ .

Consider the sequence $M_{i, n}^{*}$ (the sequence is in n, i is fixed). It's a sequence of sa-measures. To show that there's a limit point, we need a bound on the positive value part, and the negative value part (our $b$ is critical, it can't go any smaller, so bounding the negative value part bounds the $b$ ) Fixing an n, the "boundary" of stuff of length n or shorter and in $F^{N F} (π_{p a})$ suffices to establish what the mass of the negative part and positive part are. We either mimic $m^{*}$ if $m (h) = 0$ , or $m_{i} (h) = 0$ while $m (h) > 0$ so $m_{i, n} (h) = 0$ , or both quantities are positive so we have a scale term of $\frac{m_{i} (h)}{m (h)} = \frac{m_{i} (h)}{E_{ζ} m_{i} (h)} \leq \frac{m_{i} (h)}{ζ_{i} m_{i} (h)} = \frac{1}{ζ_{i}}$

So, our amount of positive and negative measure on $m_{i, n}^{*}$ on the "n boundary" is at most $\frac{1}{ζ_{i}}$ times the positive and negative measure on $m^{*}$ at the "n boundary", which is less than or equal to the amount of positive and negative measure that $m^{*}$ has overall. So, that gets us our bounds on the positive and negative part of $m_{i, n}^{*}$ of $\frac{m^{* +} (1)}{ζ_{i}}$ and $\frac{m^{* -} (1)}{ζ_{i}}$ , respectively (which bounds our $b_{i, n}^{*}$ terms)

Now, we can consider our sequence $(m_{i, n}^{*}, b_{i, n}^{*})$ as a sequence ${¯ ¯¯¯¯ ¯ M}_{n}$ in

$\prod_{i} (M^{s a} (F^{N F} (π_{p a})) \cap {(m, b) | m^{+} (1) \leq \frac{m^{* +} (1)}{ζ_{i}}, b \leq - \frac{m^{* -} (1)}{ζ_{i}}})$

where ${¯ ¯¯¯¯ ¯ M}_{n} (i) = (m_{i, n}^{*}, b_{i, n}^{*})$

By the Compactness Lemma, these sets are compact, and by Tychonoff, the product is compact. So, there's a convergent subsequence, and the limit point projects down to the coordinates to make a $M_{i}^{*}$ for each i.The set of sa-measures that, when added to an a-measure, make another a-measure, is closed, and regardless of n, $M_{i} + M_{i, n}^{*}$ is an a-measure, so $M_{i} + M_{i}^{*}$ is an a-measure.

$E_{ζ} m_{i, n}^{*}$ mimics $m^{*}$ up till timestep n, so $E_{ζ} m_{i}^{*} = m^{*}$ And because, at each step, the mixture of the $b$ values undershoots $b^{*}$ , $E_{ζ} b_{i}^{*} \leq b^{*}$ .

So, our final batch of sa-measures is $M_{i}^{*}$ for $i > 0$ , and for $i = 0$ , it's $M_{0}^{*} + (0, \frac{b^{*} - E_{ζ} b_{i}^{*}}{ζ_{0}})$ Now, all these $M_{i}^{*}$ are sa-measures that, when added to $M_{i}$ , make a-measures, and one of them has some extra b term on it, which doesn't impede it from being an a-measure. By upper-completeness of $B$ , they're all in $B$ , and mixing them makes $M + M^{*}$ exactly, because

$ζ_{0} (M_{0} + M_{0}^{*} + (0, \frac{b^{*} - E_{ζ} b_{i}^{*}}{ζ_{0}})) + \sum_{i > 0} ζ_{i} (M_{i} + M_{i}^{*}) = E_{ζ} (M_{i}) + E_{ζ} (M_{i}^{*}) + (0, b^{*} - E_{ζ} b_{i}^{*})$

$= M + E_{ζ} (m_{i}^{*}, b_{i}^{*}) + (0, b^{*} - E_{ζ} b_{i}^{*}) = M + (E_{ζ} m_{i}^{*}, E_{ζ} b_{i}^{*}) + (0, b^{*} - E_{ζ} b_{i}^{*})$

$= M + (E_{ζ} m_{i}^{*}, b^{*}) = M + (M^{*}, b^{*}) = M + M^{*}$

$M + M^{*}$ was an arbitrary a-measure above a $M \in c . h (B)$ , and we showed it's a mix of a-measures in $B$ since $B$ is upper complete so $c . h (B)$ is upper-complete.

Lemma 13: In the nirvana-free setting, the closure of an upper-closed set of a-measures is upper-closed.

Proof sketch: We have an $M \in ¯ ¯¯ ¯ B$ , and a sequence $M_{n} \in B$ limiting to $M$ . Let $M^{'} := M + M^{*}$ be an arbitrary a-measure above $M$ . We must craft a sequence limiting to $M^{'}$ . What we do, is make a bunch of $M_{n, j}^{*}$ with the special property that $m_{n} + m_{n, j}^{*}$ perfectly mimics $m^{'}$ up till time j, by basically going "copy $m^{'} - m_{n}$ for time j or before, and complete with the conditional probabilities of $m_{n}$ so $m_{n} + m_{n, j}^{*}$ doesn't go negative". And then the $b$ term is set to mimic the $b$ term of $M^{*}$ , or set to cancel out the amount of negative measure, whichever is greater. The reason we only copy up till time j instead of skipping to the chase and just going "copy $m^{'} - m_{n}$ , stick on whichever $b$ term you need" is it affords us finer control and understanding over what our $b$ terms are doing.

Then, we let j increase as $ρ (n)$ to get a sequence of one variable, $M_{n} + M_{n, ρ (n)}^{*}$ where $ρ (n)$ is selected to diverge to infinity at a suitable rate to get convergence to $M^{'}$ itself. Again, no matter what $ρ (n)$ is, as long as it diverges to infinity as n does, we get convergence of the measure term to the measure term $m^{'}$ , the hard part is selecting $ρ$ to appropriately control what the $b$ term is doing. Once $ρ (n)$ is suitably defined, then we can get upper and lower bounds on how the $b$ term of the sum compares to the $b$ term of $M^{'}$ , and show convergence.

Proof: Ok, so $B$ is upper-closed, and we want to show upper-closure of $¯ ¯¯ ¯ B$ . Thus, we have an $M \in ¯ ¯¯ ¯ B$ , a sequence of points $M_{n} \in B$ that limit to $M$ , and if $M^{'} = M + M^{*}$ is an a-measure, we want to show that $M^{'}$ is in $¯ ¯¯ ¯ B$ . This is going to require a rather intricate setup to get our limit of interest. In this case, we'll be using both n and j as limit parameters.

Let $m_{n, j}^{*}$ be defined up till time j by: If $h$ is of length j or shorter, $m_{n, j}^{*} (h) = m^{'} (h) - m_{n} (h)$

Extend to longer histories via (if $m_{n} (h) > 0$ )

$m_{n, j}^{*} (h a o) = \frac{m_{n} (h a o)}{m_{n} (h)} m_{n, j}^{*} (h)$

And if $m_{n} (h) = 0$ , go with $m_{n, j}^{*} (h a o) = \frac{m_{n, j}^{*} (h)}{# o}$

The $b$ value is defined as (we're summing over histories of length j or in $F^{N F} (π_{p a})$ and shorter) $max (b^{*}, - \sum_{h} min (0, m_{n, j}^{*} (h)))$

We've got a few things to show. first is showing that $m_{n, j}^{*} (h)$ is a well-defined signed measure. If $h$ is length j or shorter,

$\sum_{i} m_{n, j}^{*} (h a o_{i}) = \sum_{i} (m^{'} (h a o_{i}) - m_{n} (h a o_{i})) = m^{'} (h) - m_{n} (h) = m_{n, j}^{*} (h)$

Otherwise, (assuming $m_{n} (h) > 0$ )

$\sum_{i} m_{n, j}^{*} (h a o_{i}) = \sum_{i} \frac{m_{n} (h a o_{i})}{m_{n} (h)} m_{n, j}^{*} (h) = \frac{m_{n} (h)}{m_{n} (h)} m_{n, j}^{*} (h) = m_{n, j}^{*} (h)$

Assuming $m_{n} (h) = 0$ , $\sum_{i} m_{n, j}^{*} (h a o_{i}) = \sum_{i} \frac{m_{n, j}^{*} (h)}{# o} = m_{n, j}^{*} (h)$

Ok, so it's well-defined. Also, past j, it doesn't add any more negative measure. If it's negative on a length j history, it'll stay that negative forevermore and never go positive, so the $b$ value we stuck on it is either a critical $b$ value, or greater than that. In particular, this implies that our definition of $b_{n, j}^{*}$ can be reexpressed as: $b_{n, j}^{*} = max (b^{*}, - m_{n, j}^{* -} (1))$

We do need to show that $m_{n, j}^{*} + m_{n}$ is an a-measure. For the $h$ of length j or in $F^{N F} (π_{p a})$ and shorter, we can go: $m_{n, j}^{*} (h) + m_{n} (h) = m^{'} (h) - m_{n} (h) + m_{n} (h) = m^{'} (h) \geq 0$ This is because $M^{'}$ is an a-measure. This also means that $m_{n} + m_{n, j}^{*}$ perfectly mimics $m^{'}$ up till time j.

Now for the long histories. Assume $m_{n} (h) > 0$

$m_{n, j}^{*} (h a o) + m_{n} (h a o) = m_{n} (h a o) + \frac{m_{n} (h a o)}{m_{n} (h)} m_{n, j}^{*} (h) = m_{n} (h a o) (1 + \frac{m_{n, j}^{*} (h)}{m_{n} (h)})$

And then, by induction up we can assume $m_{n, j}^{*} (h) + m_{n} (h) \geq 0$ , so $m_{n, j}^{*} (h) \geq - m_{n} (h)$ , so $\frac{m_{n, j}^{*} (h)}{m_{n} (h)} \geq - 1$ , so it's a multiplication of a nonnegative number and a nonnegative number, so we're good there.

Assume $m_{n} (h) = 0$ . Then $m_{n} (h a o) = 0$ and $m_{n, j}^{*} (h a o) + m_{n} (h a o) = \frac{m_{n, j}^{*} (h)}{# o}$

And then, since $m_{n, j}^{*} (h) + m_{n} (h) \geq 0$ (induction up), and $m_{n} (h) = 0$ , $m_{n, j}^{*} (h) \geq 0$ , and we get our result, showing that $M_{n} + M_{n, j}^{*}$ is an a-measure, and by upwards closure of $B$ , all the $M_{n} + M_{n, j}^{*}$ lie in $B$ .

There's one more fiddly thing to take care of. What we'll be doing is letting j increase as $ρ (n)$ , to get a function of 1 variable, and showing that $M_{n} + M_{n, ρ (n)}^{*}$ limits to $M^{'}$ . So we should think carefully about what we want out of $ρ$ .

First, let $m^{* j -}$ be the measure gotten by restricting $m^{*}$ to only histories which are in $F^{N F} (π_{p a})$ with a length of <j with negative measure, and histories where their length j prefix has negative measure. This is kinda like a bounded way of slicing out areas with negative measure from $m^{*}$ , falling short of the optimal decomposition $m^{* -}$

Also, $(m_{n, j}^{*})^{-} (1)$ , as n increases and j remains fixed, limits to $m^{* j -} (1)$ . The reason for this is that for histories of length j or shorter,

$m_{n, j}^{*} (h) - m^{*} (h) = (m^{'} (h) - m_{n} (h)) - (m^{'} (h) - m (h)) = m (h) - m_{n} (h)$

and the end term limits to 0 because $m_{n}$ limits to $m$ . So, past a sufficently large n, $m_{n, j}^{*}$ comes extremely close to mimicking $m^{*}$ for the first j steps. So, dialing up n far enough, the negative-measure of $m_{n, j}^{*}$ comes really close to the negative measure mass in $m^{*}$ as evaluated up till time j, due to the aforementioned mimicry.

Further, $m^{* j -} (1) \geq m^{* -} (1)$ , because only evaluating up till length j isn't as good at slicing out areas of negative measure as the optimal decomposition of $m^{*}$ into positive and negative components.

With all this, our rule for $ρ (n)$ will be:

$ρ (n) := sup {j \leq n | \forall n^{'} \geq n : | m_{n^{'}, j}^{* -} (1) - m^{* j -} (1) | \leq 2^{- j}}$

$ρ (n)$ never decreases. What this is basically doing is going "ok, I'll step up j but only when there's a guarantee that I'll mimic $m^{*}$ up till timestep j (re: amount of negative measure) sufficiently closely forever afterward" $ρ (n)$ eventually diverges to infinity, though it might do so very slowly, because for all the j, $m_{n, j}^{* -} (1)$ limits to $m^{* j -} (1)$ so eventually we get to a large enough n that the defining condition is fulfilled and we can step up j.

Now we can finally define our sequence as: $M_{n, ρ (n)}^{*} + M_{n}$ . We'll show that this limits to $M^{'}$ . First, it's always an a-measure, and always in $B$ because it's in the upper completion of $B$ which is upper-complete. For a fixed n, $m_{n, ρ (n)}^{*} + m_{n}$ always flawlessly matches $m^{'}$ up till time $ρ (n)$ . Since $ρ (n)$ diverges to infinity, eventually we get a flawless match up till any finite time we name, so the measure components do converge to $m^{'}$ as n limits to infinity.

But what about the $b$ component? Well, the $b$ component of the sum is: $max (b^{*}, - m_{n, ρ (n)}^{* -} (1)) + b_{n}$ . Let's bound it. Obviously, a lower bound is $b^{*} + b_{n}$ .
An upper bound is a bit more interesting.

$max (b^{*}, - m_{n, ρ (n)}^{* -} (1)) + b_{n} \leq max (b^{*} + 2^{- ρ (n)}, - m_{n, ρ (n)}^{* -} (1)) + b_{n}$

By the way we defined $ρ (n)$ , we can go to $- m^{* ρ (n) -} (1)$ with a small constant overhead, and then swap out $- m^{* ρ (n) -} (1)$ (amount of negative measure up till time $ρ (n)$ ) for $- m^{* -} (1)$ (total amount of negative measure) which is greater.

$max (b^{*} + 2^{- ρ (n)}, - m_{n, ρ (n)}^{* -} (1)) + b_{n} \leq max (b^{*} + 2^{- ρ (n)}, - m^{* ρ (n) -} (1) + 2^{- ρ (n)}) + b_{n}$

$\leq max (b^{*} + 2^{- f (n)}, - m^{* -} (1) + 2^{- ρ (n)}) + b_{n} \leq max (b^{*} + 2^{- ρ (n)}, b^{*} + 2^{- ρ (n)}) + b_{n}$

$= b^{*} + 2^{- ρ (n)} + b_{n}$

(because $b^{*}$ must be equal to or exceed the amount of negative measure in $m^{*}$ for $M^{*}$ to be a legit sa-measure)

So, our bounds on the $b$ term for $M_{n, ρ (n)}^{*} + M_{n}$ are $b^{*} + b_{n}$ on the low end, and $b^{*} + 2^{- ρ (n)} + b_{n}$ on the high end. As n limits to infinity, so does $ρ (n)$ , so that term vanishes, and $b_{n}$ limits to $b$ , so our limiting $b$ value is $b^{*} + b = b^{'}$ , and we're done. We built a sequence of a-measures in $B$ limiting to $M^{'}$ , certifying that it's in $¯ ¯¯ ¯ B$ , and $M^{'}$ was arbitrary above some $M \in ¯ ¯¯ ¯ B$ . Thus, the closure of an upper-complete set is upper-complete.

The next one is a story proof, because I couldn't figure out how to make it formal. It essentially says that given two points near each other, their nirvana-free upper-completions (the set of a-measures, if it was a set of sa-measures, it'd be immediate to show) are close to each other.

Lemma 14: For stubs, in the nirvana-free setting, if $M$ and $M^{'}$ are $ϵ$ apart, then the Hausdorff-distance between ${M}^{u c} \cap M^{a} (F^{N F} (π_{s t}))$ and ${M^{'}}^{u c} \cap M^{a} (F^{N F} (π_{s t}))$ is $ϵ$ or less.

Ok, I don't really know how to make this formal, so all I have is a story-proof. The KR-metric (what's the maximum difference between two measures w.r.t 1-Lipschitz bounded functions) is the same as (or at least within a constant of) the earthmover distance. The earthmover distance is "interpret your measure as piles (or pits) of dirt on various spots. It takes $ϵ$ effort to move 1 unit of dirt $ϵ$ distance. Also, $ϵ$ effort lets you create or destroy $ϵ$ units of dirt. What's the minimum amount of effort it takes to rearrange one pile of dirt into the other pile of dirt?". So our proof will be a story about moving dirt.

Let's just examine the measure components of $M$ and $M^{'}$ . Since the earthmover distance is $ϵ$ (it might be less because of different $b$ ), it takes $ϵ$ effort to rearrange the dirt pile of $m$ into the dirt pile of $m^{'}$ in an optimal way. Let's say $M + M^{*}$ is an a-measure (no negative parts ie no dirt pits). We need some $M^{^{'} *}$ to add to $M^{'}$ to make an a-measure within $ϵ$ of $M + M^{*}$ .

The procedure to construct $M^{^{'} *}$ is as follows: $b^{^{'} *} = b^{*}$ . For the measure component, start with the $m^{*}$ pile (there may be dirt pits ie areas of negative measure). Now, keep a close tab on the process of rearranging $m$ into $m^{'}$ . One crumb of dirt at a time is moved, or dirt is created/destroyed. The rule is:

Let's say a crumb of dirt is moved from $h$ to $h^{'}$ , created at $h$ or destroyed at $h$ . If the pile-being-rearranged into $m^{'}$ has its measure on $h$ being greater than the size of the pit (negative measure) for $h$ for the pile-being-rearranged into $m^{*^{'}}$ , sit around and do nothing. If moving that crumb/destroying it would make $h$ have negative measure (the dirt pile on $h$ for the pile-being-turned-into $m^{'}$ would become smaller than the size of the hole for $h$ for the pile-being-turned-into $m^{^{'} *}$ ), then take the latter pile and move a crumb from $h^{'}$ into the pit for $h$ (at the same expenditure of effort), or create a crumb of dirt there instead. Once you're done, that's your $m^{^{'} *}$ . Keep the $b$ value the same.

Now, this process does several things. First, very little effort was expended ( $ϵ$ or less), to reshuffle $m^{*}$ into $m^{^{'} *}$ because you're either sitting around, or mimicking the same low-effort dirt moving process in reverse. Second, $b^{*}$ stays as a viable bound throughout, because whenever you move a crumb, you're dropping it into a pit, so an increase in negative measure at one spot is balanced out by a decrease in negative measure for the spot you moved the crumb to. Also, you never destroy crumbs, only create them. Also, in the whole process by which we rearrange $m + m^{*}$ into $m^{'} + m^{^{'} *}$ , we always preserve the invariant that (pile on $h$ + pit on $h$ $\geq 0$ ), so $m^{'} + m^{^{'} *}$ is a measure, not a signed measure.

For the final bit, we can imagine reshuffling $m + m^{*}$ into $m^{'} + m^{^{'} *}$ as a whole. Then, either a crumb is moved from point A to point B, or you move a crumb from point A to point B, and a crumb back from point B to point A so you can skip that step. Or, a crumb is created at point A, or a crumb is both destroyed and created at A, so you can skip that step. So, the dirt-moving procedure to turn $m + m^{*}$ into $m^{'} + m^{^{'} *}$ spends as much or less effort than the procedure to turn $m$ into $m^{'}$ , which takes $ϵ$ effort.

Putting it all together, we took an arbitrary point in the upper-completion of $M$ , and it only takes $ϵ$ or less effort to shift the $b$ a little bit and reshuffle the measures to get a point in the upper-completion of $M^{'}$ .

The argument works in reverse, just switch the labels, to establish that the two upper-completions are $ϵ$ apart or less.

For the next one, we have two ways of expressing uniform Hausdorff-continuity for a belief function. As a recap, $M^{a} (\infty)$ is the set of a-measures over all outcomes (regardless of whether or not they could have come from a single policy or not), and all belief functions have a critical $λ^{⊙} + b^{⊙}$ parameter that controls the $λ$ and $b$ values of the set of minimal points regardless of $π_{p a}$ . ${\leq ⊙}$ is the set ${(λ μ, b) | λ + b \leq λ^{⊙} + b^{⊙}}$ . They are:

1: For all nonzero $ϵ$ , there exists a nonzero $δ$ where $d (π_{p a}, π_{p a}^{'}) < δ$ implies $(p r_{*}^{\infty, π_{p a}})^{- 1} (Θ (π_{p a}) \cap N F \cap {\leq ⊙})$ has a Hausdorff-distance of $ϵ$ from the corresponding set for $π_{p a}^{'}$ .

2: For all nonzero $ϵ$ , there exists a nonzero $δ$ where $d (π_{p a}, π_{p a}^{'}) < δ$ implies: If $(λ μ, b) \in (p r_{*}^{\infty, π_{p a}})^{- 1} (Θ (π_{p a}) \cap N F)$ , then $(λ μ, b)$ has a distance of $ϵ (λ + 1)$ or less from the set $(p r_{*}^{\infty, π_{p a}^{'}})^{- 1} (Θ (π_{p a}^{'}) \cap N F)$ (and symmetrically for the other set)

Lemma 15: The two ways of expressing the Hausdorff-continuity requirement are equivalent for a belief function $Θ$ or $Θ^{ω}$ obeying nirvana-free nonemptiness, closure, nirvana-free upper-completion, and bounded-minimals.

Proof sketch: We start with the second $λ$ -dependent distance condition and derive the first. Roughly, that ${\leq ⊙}$ restriction means the tail where the $λ$ values are high gets clipped so the two sets are within a constant of $ϵ$ away from each other. In the other direction... Well, we start with a point $M$ in one preimage and do a bunch of projecting points down and finding minimals and taking preimages and using earlier lemmas and our first distance condition, and eventually end up with a fancy diagram, and finish up with an argument that two points are close to each other, so $M$ and one other point are "similarly close". This isn't good exposition, but I've got diagrams to keep a mental picture of the dozen different points and how they relate to each other in working memory.

Folk Result (from Vanessa): if two measures $m$ and $m^{'}$ are $ϵ$ -distance apart in the KR metric, then if you extend $m$ in some way, and extend $m^{'}$ with the same conditional probabilities, then the two resulting measures remain $ϵ$ -apart. We'll be using this in both directions.

Proof direction 1: Ok, we'll show the second way implies the first way, first. Fix some $ϵ$ , and let the $δ$ (distance between two partial policies) be low enough to guarantee that the distance parameter between the two preimages (according to definition 2, which has the $λ$ -dependent distance guarantee) is $\frac{ϵ}{2 (1 + λ^{⊙})} (1 + λ)$ . We can always do this. $λ^{⊙}$ is fixed by our belief function.

Keep this image in mind while reading the following arguments. The upper left set is the preimage of $Θ (π_{p a}) \cap N F$ , the upper right set is the preimage of $Θ (π_{p a}) \cap N F$ , and the bottom right set is $Θ (π_{p a}^{'}) \cap N F$ itself.

Now, any $M$ in the preimage of $Θ (π_{p a}) \cap N F \cap {\leq ⊙}$ has a $λ^{⊙}$ upper bound on its $λ$ value because projection preserves $λ$ . By the $λ$ -dependent distance condition, it's within $\frac{ϵ}{2 (1 + λ^{⊙})} (1 + λ^{⊙})$ from the preimage of $Θ (π_{p a}) \cap N F$ , so we can hop over that far and get a point $M^{'}$ .

Admittedly, moving over to the nearby point $M^{'}$ may involve violating the $λ^{⊙}$ bound by $\frac{ϵ}{2 (1 + λ^{⊙})} (1 + λ^{⊙})$ or less, but if that happens, we can project down our $M^{'}$ point to $Θ (π_{p a}^{'}) \cap N F$ making $M^{^{'} l o}$ , find a point $M^{^{'} min}$ (nirvana-free) in $Θ (π_{p a}^{'}) \cap N F \cap {\leq ⊙}$ (bounded-minimals) below $M^{l o}$ where $M^{^{'} min} + M^{*} = M^{^{'} l o}$ by upper-completion for $Θ (π_{p a}^{'})$ , and then consider the reexpression of $M^{l o}$ as $M^{^{'} min} + (m^{* -}, - m^{* -} (1)) + (m^{* +}, b^{*} + m^{* -} (1))$

The sum of the first two terms is a nirvana-free a-measure (because $M^{l o}$ is an a-measure, adding on the negative component does nothing) that lies below $M^{l o}$ and respects the $λ^{⊙} + b^{⊙}$ bound (exactly as much as it adds to $b$ , it takes away from the measure). and then you can add in most of the third term, going just up to the bound), to get a point $M^{^{''} l o}$ only $\frac{ϵ}{2 (1 + λ^{⊙})} (1 + λ^{⊙})$ (at most) away from $M^{^{'} l o}$ , which respects the bound (so it lies in $Θ (π_{p a}^{'}) \cap N F \cap {\leq ⊙}$ )

Now, you can complete $M^{^{''} l o}$ with the conditional probabilities of the measure part of $M^{'}$ , to make a point $M^{''}$ in the preimage of $Θ (π_{p a}^{'}) \cap N F \cap {\leq ⊙}$ that's $\frac{ϵ}{2 (1 + λ^{⊙})} (1 + λ^{⊙})$ or less distance away from $M^{'}$

Going from $M$ to $M^{'}$ , and $M^{'}$ to $M^{''}$ is $\frac{ϵ}{2 (1 + λ^{⊙})} (1 + λ^{⊙})$ distance both times, so we found a $δ$ distance between any two partial policies $π_{p a}$ and $π_{p a}^{'}$ that ensures the preimages of $Θ (π_{p a}) \cap N F \cap {\leq ⊙}$ and $Θ (π_{p a}^{'}) \cap N F \cap {\leq ⊙}$ are only $ϵ$ apart.

Proof Direction 2: Keep tabs on the following diagram to see how the 12 different points in 5 different sets relate to each other.

This $δ$ gives us an n via: $n = {log}_{γ} (δ)$ which is the first time the partial policies start disagreeing on what to do. The upper left and upper right sets are the preimages of $Θ (π_{p a}) \cap N F$ and $Θ (π_{p a}^{'}) \cap N F$ respectively, the middle-left and middle-right sets are the sets themselves, and the bottom set is: Take the inf of $π_{p a}$ and $π_{p a}^{'}$ , it's another partial policy that's fully defined before time n because those policies agree up till that time, and chop it off at time n, so it's a stub, call this stub $π_{p a}^{l o}$ . The bottom set is $M^{a} (F^{N F} (π_{p a}^{l o}))$ .

Now, follow the diagram in conjunction with the following proof. We start with an arbitrary $M$ in the preimage of $Θ (π_{p a}) \cap N F$ . We project it down to $Θ (π_{p a}) \cap N F$ to make $M^{m i d}$ . Due to bounded-minimals, we can find a minimal point below it, $M^{* m i d}$ which obeys the $λ^{⊙} + b^{⊙}$ bound. Now, we go in two directions. One is projecting $M^{m i d}$ and $M^{* m i d}$ down to make $M^{l o}$ and $M^{* l o}$ , the latter of which lies below $M^{l o}$ . Let's just keep those two in mind. In the other direction, since $M^{* m i d}$ obeys the $λ^{⊙} + b^{⊙}$ bound and lies in $Θ (π_{p a}) \cap N F \cap {\leq ⊙}$ , we can find a point $M^{*}$ in the preimage that obeys the $λ^{⊙} + b^{⊙}$ bound, and so, there's another point

$M^{^{'} *}$ in $(p r_{*}^{\infty, π_{p a}^{'}})^{- 1} (Θ (π_{p a}^{'}) \cap N F \cap {\leq ⊙})$

that's only $ϵ$ or less away, by our version of the Hausdorff condition that only works on the clipped version of the preimages. $M^{^{'} *}$ projects down to $Θ (π_{p a}^{'}) \cap N F$ to make $M^{^{'} * m i d}$ , and projects down further to make $M^{^{'} * l o}$ .

Now, projections preserve or contract distances, and $M^{* l o}$ is the projection of $M^{*}$ , and $M^{^{'} * l o}$ is the projection of $M^{^{'} *}$ , and $M^{*}$ and $M^{^{'} *}$ are only $\frac{ϵ}{2}$ apart, so $M^{* l o}$ and $M^{^{'} * l o}$ are only $\frac{ϵ}{2}$ apart, and $M^{l o}$ lies above $M^{* l o}$ . Now, we can invoke Lemma 14 to craft a $M^{^{'} l o}$ that's above $M^{^{'} * l o}$ and within $\frac{ϵ}{2}$ of $M^{l o}$ . Then, we can observe that $Θ (π_{p a}^{'}) \cap N F$ is nirvana-free and nirvana-free upper-complete. So, by Lemma 11, its projection down is nirvana-free and nirvana-free upper complete. $M^{^{'} * l o}$ is the projection down of $M^{^{'} * m i d}$ , and $M^{^{'} l o}$ is above $M^{^{'} * l o}$ , so $M^{^{'} l o}$ is in the projection of $Θ (π_{p a}^{'}) \cap N F$ and we can craft a point $M^{^{'} m i d} \in Θ (π_{p a}^{'}) \cap N F$ that projects down accordingly. And then go a level up to the preimage of $Θ (π_{p a}^{'}) \cap N F$ , and make a preimage point $M^{'}$ by extending $m^{^{'} m i d}$ with the conditional probabilities of $m$ up till time n whenever you get a chance, and then doing whatever, that'll be our $M^{'}$ point of interest. The diagram sure came in handy, huh?

We still need to somehow argue that $M$ and $M^{'}$ are close to each other in a $λ$ (the $λ$ of $M$ ) dependent way. And the only tool we have is that $M^{l o}$ and $M^{^{'} l o}$ are within $\frac{ϵ}{2}$ of each other, and $M$ and $M^{'}$ project down onto them. So how do we do that? Well, notice that before time n, $m^{'}$ and $m$ are either: in a part of the action-observation tree where $π_{p a}^{l o}$ has opinions on, and they're $\frac{ϵ}{2}$ -apart there, or $m^{'}$ is copying the conditional probabilities of $m$ . So, if we were to chop $m$ and $m^{'}$ off at timestep n, the two measures would be within $\frac{ϵ}{2}$ of each other.

However, after timestep n, things go to hell, they both end up diverging and doing their own thing.

Now, we can give the following dirt-reshuffling procedure to turn $m$ into $m^{'}$ . You've got piles of dirt on each history, corresponding to the measure component of $M$ . You can "coarse-grain" and imagine all your distinct and meticulous, but close-together, piles of dirt on histories with a prefix of $h$ , where $| h | = n$ , as just one big pile on $h$ . So, you follow the optimal dirt-reshuffling procedure for turning $m$ (clipped off at length n) into $m^{'}$ (clipped off at length n), which takes $\frac{ϵ}{2}$ effort or less. Then, we un-coarse-grain and go "oh damn, we've gotta sort out all our little close-together-piles now to make $m^{'}$ exactly! We're not done yet!"

But we've got something special. When we're sorting out all our little close-together-piles... said piles are the extensions of a finite history with length n. All those extensions will agree for the first n timesteps. And the distance between histories is $γ^{n}$ where n is the first timestep they disagree, right? And further, n was ${log}_{γ} (δ)$ , so whenever we move a bit of dirt somewhere else to rearrange all our close-together-piles, we're only moving it $δ$ distance! So, in the worst case of doing a complete rearrangement, we've gotta move our whole quantity of dirt $δ$ distance, at a cost of $δ λ^{'}$ effort (total amount of measure for $m^{'}$ )

Let's try to bound this, shall we? Our first phase of dirt rearrangement (and adjusting the $b$ values) took $\frac{ϵ}{2}$ effort or less, our second phase took $δ λ^{'}$ effort or less. Now, we can observe two crucial facts. The first is, at the outset, we insisted that $δ$ was $< \frac{ϵ}{2}$ . Our second crucial fact is that $λ^{'}$ and $λ$ can't be more than $\frac{ϵ}{2}$ apart, because projection preserves $λ$ values, and $M$ and $M^{'}$ project down to $M^{l o}$ and $M^{^{'} l o}$ respectively, which are $\frac{ϵ}{2}$ or less apart. So, the total amount of measure they have can't differ by more than $\frac{ϵ}{2}$ . This lets us get:

$d (M, M^{'}) \leq \frac{ϵ}{2} + δ λ^{'} \leq \frac{ϵ}{2} + δ (\frac{ϵ}{2} + λ) < \frac{ϵ}{2} + \frac{ϵ}{2} (\frac{ϵ}{2} + λ) < \frac{ϵ}{2} + \frac{ϵ}{2} (1 + 2 λ)$

$= ϵ + ϵ λ = ϵ (1 + λ)$

And so, given any $ϵ$ , there's a $δ$ where if $d (π_{p a}, π_{p a}^{'}) < δ$ , then for any point $M$ in the preimage of $Θ (π_{p a}) \cap N F$ , there's a point $M^{'}$ in the preimage of $Θ (π_{p a}^{'}) \cap N F$ s.t. $d (M, M^{'}) < ϵ (1 + λ)$ , deriving our second formulation of Hausdorff-continuity from our first one. And we're done! Fortunately, the next one is easier.

Lemma 16: If $M_{n}$ limits to $M$ , and $M_{n}^{l o}$ are all below their corresponding $M_{n}$ and obey a $λ^{⊙} + b^{⊙}$ bound, then all limit points of $M_{n}^{l o}$ lie below $M$ . This works for a-surmeasures too.

Proof sketch: We've got a $λ^{⊙} + b^{⊙}$ bound, so we can use the Compactness Lemma or Lemma 8 to get a convergent subsequence. Now, this is a special proof because we don't have to be as strict as we usually are about working only with a-measures and sa-measures only showing up as intermediate steps. What we do is take a limit point of the low sequence, and add some sa-measure to it that makes the resulting sa-measure close to $M$ , so $M$ is close to the upper completion of our limit point. We can make it arbitrarily close, and the upper completion of a single point is closed, so $M$ actually does lie above our limit point and we're done. To do our distance fiddling argument in the full generality that works for sur-stuff, we do need to take a detour and show that for surmeasures, $d_{s} (x + y, z + y) \leq d_{s} (x, z)$ .

Proof: The $M_{n}^{l o}$ obey the $λ^{⊙} + b^{⊙}$ bound, so convergent subsequences exist by the compactness lemma or Lemma 8. Pick out a convergent subsequence to work in, giving you a limit point $M^{l o}$ . All the $M_{n}$ can be written as $M_{n}^{l o} + M_{n}^{*}$ .

We'll take a brief detour, and observe that if we're just dealing with sa-measures, then, since we're in a Banach space, $d (x + y, z + y) = d (x, z)$ . But what about the surmetric? Well, the surmetric is the max of the usual metric and \gam raised to the power of "first time the measure components start disagreeing on what nirvana events are possible or impossible". Since sa-measures and sa-surmeasures can't assign negative probability to Nirvana, adding an sa-surmeasure adds more nirvana spots into both surmeasure components! In particular, they won't disagree more, and may disagree less, since adding that sa-surmeasure in may stick nirvana on a spot that they disagree on, so now they both agree that Nirvana happens there. So, since the standard distance component stays the same and the nirvana-sensitive component says they stayed the same or got closer, $d_{s} (x + y, z + y) \leq d_{s} (x, z)$ . We'll be using this.

Let n be large enough that $d (M_{n}, M) < ϵ$ and $d (M_{n}^{l o}, M^{l o}) < ϵ$ (same for surmetric) Now, consider the point $M^{l o} + M_{n}^{*}$ . It is an sa-measure or sa-surmeasure that lies above $M_{n}$ and we'll show that it's close to $M$ . Whether we're working with the sa-measures or sa-surmeasures,

$d (M^{l o} + M_{n}^{*}, M) \leq d (M^{l o} + M_{n}^{*}, M_{n}) + d (M_{n}, M) < d (M^{l o} + M_{n}^{*}, M_{n}^{l o} + M_{n}^{*}) + ϵ$

$\leq d (M^{l o}, M_{n}^{l o}) + ϵ < 2 ϵ$

So, $M$ is $< 2 ϵ$ distance from the upper completion of $M^{l o}$ in the space of sa-measures/sa-surmeasures, for all $ϵ$ . Said upper completion is the sum of a closed set (cone of sa-measures/sa-surmeasures) and a compact set (a single point) so it's closed, so $M$ (an a-measure/a-surmeasure) lies above $M^{l o}$ (an a-measure/a-surmeasure that was an arbitrary limit point of the $M_{n}^{l o}$ ) and we're done.

The next three, Lemmas 17, 18, and 19, are used to set up the critical Lemma 20 which we use a lot.

Lemma 17: The function $Π \to M^{a} (\infty)$ of the form $π \mapsto Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ has closed graph assuming Hausdorff-continuity for $Θ^{ω}$ on policies, and that $Θ^{ω} (π)$ is closed for all $π$ . Also works for a $Θ$ that fulfills the stated properties.

Let $π_{n}$ limit to $π$ , and let $M_{n} \in Θ^{ω} (π_{n}) \cap N F \cap {\leq ⊙}$ limit to $M$ . We'll show that $M \in Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ (the definition of closed graph) Take some really big n that guarantees that $d (π_{n}, π) < δ$ and $d (M_{n}, M) < ϵ$ . Then we go:

The distance from $M$ to $M_{n}$ is $ϵ$ or less, and since $M_{n} \in Θ^{ω} (π_{n})$ , we can invoke uniform Hausdorff continuity and conclude $M_{n}$ is only $ϵ$ or less away from a point in $Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ . So, the distance from $M$ to $Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ is $\leq 2 ϵ$ . This argument works for all $ϵ$ , so it's at distance 0 from $Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ , and that set is closed because it's an intersection of closed sets, so $M \in Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ , and we have closed-graph.

Lemma 18: $⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F \cap {\leq ⊙})$ is compact as long as $Θ^{ω} (π)$ is closed for all $π$ and $Θ^{ω}$ fulfills the Hausdorff-continuity property on policies. Also works for a $Θ$ that fulfills the stated properties.

The set of $π \geq π_{s t}$ is closed in the topology on $Π$ , because a limit of policies above $π_{s t}$ will still be above $π_{s t}$ . More specifically, because it's a closed subset of a compact space, it's compact. Also, remembering that projection preserves $λ$ and $b$ , we can consider the set ${\leq ⊙}$ (for $M^{a} (\infty)$ ) which is compact.

Take the product of those two compact sets to get a compact set in $Π \times M^{a} (\infty)$ , intersect it with the graph of our function mapping $π$ to $Θ^{ω} (π) \cap N F \cap {\leq ⊙}$ (which is closed by Lemma 17), we get a compact set, project it down to the $M^{a} (\infty)$ coordinate (still compact, projection to a coordinate is continuous), and everything in that will be safe to project down to $M^{a} (F^{N F} (π_{s t}))$ , getting you exactly the set $⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F \cap {\leq ⊙})$ which is compact because it's the image of a compact set through a continuous function.

Lemma 19:

$¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F)) = {(c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F \cap {\leq ⊙})))}^{u c}$

Where the upper completion is with respect to the cone of nirvana-free sa-measures and then we intersect with the set of nirvana-free a-measures, and $Θ^{ω} (π)$ is closed and nirvana-free upper-complete for all $π$ and $Θ^{ω}$ fulfills the Hausdorff-continuity property on policies and the bounded-minimals property. Also works for a $Θ$ that fulfills the stated properties.

One direction of this,

$¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F)) \supseteq {(c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F \cap {\leq ⊙})))}^{u c}$

is pretty easy. Everything in the convex hull of the clipped projections lies in the closed convex hull of the full projections, and then, from lemmas 11, 12, and 13, the closed convex hull of these projections is nirvana-free upper complete since $Θ^{ω} (π)$ is for all $π$ , so that gets the points added by upper completion as well, establishing one subset direction.

Now for the other direction,

$¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F)) \subseteq {(c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F \cap {\leq ⊙})))}^{u c}$

Let $M$ lie in the closed convex hull. There's a sequence $M_{n}$ that limits to it, where all the $M_{n}$ are made by taking $M_{i, n}^{\infty}$ from above, projecting down and mixing. By bounded minimals, we can find some $M_{i, n}^{\infty, min} \in Θ^{ω} (π) \cap N F$ below the $M_{i, n}^{\infty}$ , and they're minimal points so they all obey the $λ^{⊙} + b^{⊙}$ bound. Now, project the $M_{i, n}^{\infty, min}$ down, and mix in the same way, to get an a-measure $M_{n}^{l o}$ below $M_{n}$ , which lies in the convex hull of clipped projections.

From Lemma 16, we can take a limit point of $M_{n}^{l o}$ to get a $M^{l o}$ below $M$ . Now, we just have to show that $M^{l o}$ lies in the convex hull set in order to get $M$ by upper completion. $M_{l o}$ is a limit of points from the convex hull set, so we just have to show that said convex hull set is closed. The thing we're taking the convex hull of is compact (Lemma 18), and in finite dimensions (because we're working in a stub), the convex hull of a compact set is compact. Thus, $M^{l o}$ lies in the convex hull, and is below $M$ , so $M$ lies in the upper completion of the convex hull and we're done.

Lemma 20: $c . h (⋃_{π \geq π_{s t}} p r_{*}^{π, π_{s t}} (Θ^{ω} (π) \cap N F))$ is closed, if $Θ^{ω} (π)$ is closed and nirvana-free upper-complete for all $π$ and $Θ^{ω}$ fulfills the Hausdorff-continuity property on policies and the bounded-minimals property. Also works for a $Θ$ that fulfills the stated properties.

By Lemmas 11 and 12, said convex hull is nirvana-free upper-complete. Any point in the closure of the convex hull, by Lemma 19, lies above some finite mixture of nirvana-free stuff from above that respects the $λ^{⊙} + b^{⊙}$ bound, projected down. However, since the convex hull is upper-complete, our arbitrary point in the closure of the convex hull is captured by the convex hull alone.

Lemma 21: If $Θ$ is consistent and nirvana-free upper-complete for $Θ (π)$ , and obeys the extreme point condition, and obeys the Hausdorff-continuity condition on policies, then $Θ (π_{p a}) \cap N F = ¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{π \geq π_{p a}} p r_{*}^{π, π_{p a}} (Θ (π) \cap N F))$ and $Θ (π_{s t}) \cap N F = c . h (⋃_{π > π_{s t}} p r_{*}^{π, π_{s t}} (Θ (π) \cap N F))$ This works in the sur-case too.

Proof sketch: One subset direction is pretty dang easy from Consistency. The other subset direction for stubs (that any nirvana-free point in $Θ (π_{s t})$ lies in the convex hull of projections from above) is done by taking your point $M$ of interest, finding a minimal point below it, using Lemma 3 to split your minimal points into finitely many minimal extreme points, and using the extreme point condition to view them as coming from policies above, so the minimal point has been captured by the convex hull, and then Lemmas 11 and 12 say that the convex hull of those projections is nirvana-free upper-complete, so our $M$ is captured by the convex hull.
Getting it for partial policies is significantly more complex. We take our $M$ and project it down into $Θ (π_{p a}^{n})$ for some very large n. Then, using our result for stubs, we can view our projected point $M_{n}$ as a mix of nirvana-free stuff from policies above $π_{p a}^{n}$ . If n is large enough, $π_{p a}^{n}$ is very close to $π_{p a}$ itself, so we can perturb our points at the infinite level a little bit to be associated with policies above $π_{p a}$ with Hausdorff-Continuity, and then we can project down and mix, and show that this point (in the convex hull of projections of nirvana-free stuff from above) is close to $M$ itself, getting a sequence that limits to $M$ , witnessing that it's in the closed convex hull of projections of nirvana-free stuff from above. It's advised to diagram the partial policy argument, it's rather complicated.

Ok, so one direction is easy, $Θ (π_{p a}) \cap N F \supseteq ¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{π \geq π_{p a}} p r_{*}^{π, π_{p a}} (Θ (π) \cap N F))$ (and likewise for stubs). Consistency implies that $Θ (π_{p a})$ (or $Θ (π_{s t})$ ) is the closed convex hull of projections down from above, so the closed (or vanilla) convex hulls of the projections of nirvana-free stuff from above are a subset of the nirvana-free part of $Θ (π_{p a})$ (or $Θ (π_{s t})$ ).

For the other direction... we'll show the stub form, that's easier, and build on that. We're shooting for $Θ (π_{s t}) \cap N F \subseteq c . h (⋃_{π > π_{s t}} p r_{*}^{π, π_{s t}} (Θ (π) \cap N F))$

Fix some $M \in Θ (π_{p a}) \cap N F$ . Find a minimal point $M^{min}$ below it, which must be nirvana-free, because you can't make Nirvana vanish by adding sa-measures. Invoke Lemma 3 to write $M^{min}$ as a finite mixture of minimal extreme points in the nirvana-free part of $Θ (π_{s t})$ . These must be minimal and extreme and nirvana-free in $Θ (π_{s t})$ , because you can't mix nirvana-containing points and get a nirvana-free point, nor can you add something to a nirvana-containing point without getting a nirvana-containing point. By the extreme point condition, there are nirvana-free points from above that project down to those extreme points. Mixing them back together witnesses that $M^{min}$ lies in the convex hull of projections of nirvana-free stuff from above. $M$ is nirvana-free and lies above $M^{min}$ , so it's captured by the convex hull (with Lemmas 11 and 12)

Now for the other direction with partial policies, that

$Θ (π_{p a}) \cap N F \subseteq ¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{π \geq π_{p a}} p r_{*}^{π, π_{p a}} (Θ (π) \cap N F))$

Fix some $M \in Θ (π_{p a}) \cap N F$ . We can project $M$ down to all the $π_{p a}^{n}$ to get $M_{n}$ which are nirvana-free and in $Θ (π_{p a}^{n})$ by Consistency.

Our task is to express $M$ as a limit of some sequence of points that are finite mixtures of nirvana-free projected from policies above $π_{p a}$ . Also, remember the link between "time of first difference n" and the $δ$ distance between two partial policies. $δ_{n} = γ^{n}$ where $γ < 1$ . Each $δ_{n}$ induces an $ϵ_{n}$ number for the purposes of Hausdorff-continuity.

First, $π_{p a}^{n}$ is a stub, which, as we have already established has its nirvana-free part equal the convex hull of projections of nirvana-free stuff down from above. So, $M_{n}$ is made by taking finitely many $M_{i, n}^{\infty} \in Θ (π_{i}) \cap N F$ where $π_{i} \geq π_{p a}^{n}$ , projecting down, and mixing. By linearity of projection, we can mix the $M_{i, n}^{\infty}$ before projecting down and hit the same point, call this mix $M_{n}^{\infty}$ .

Since the distance between $π_{p a}^{n}$ and $π_{p a}$ is $δ_{n}$ or less, each policy $π_{i}$ has another policy within $δ_{n}$ that's above $π_{p a}$ , and by uniform Hausdorff-continuity (the variant from Lemma 15) we only have to perturb the $M_{i, n}^{\infty}$ by $ϵ_{n} (1 + λ_{i, n})$ to get $M_{i, n}^{^{'} \infty}$ in $Θ (π_{i}^{'}) \cap N F$ where $π_{i}^{'} \geq π_{p a}$ for all i.

Mixing these in the same proportion makes a $M_{n}^{^{'} \infty}$ within $ϵ_{n} (1 + λ)$ of $M_{n}^{\infty}$ , which projects down to $Θ (π_{p a}) \cap N F$ (because mix-then-project is the same as projecting the $M_{i, n}^{^{'} \infty}$ then mixing, and the convex hull of projections of stuff from above is a subset of $Θ (π_{p a})$ by consistency) The projection of $M_{n}^{^{'} \infty}$ we'll call $M_{n}^{'}$ . It lies in the convex hull of the projections of nirvana-free stuff from above.

Now, to finish up, we just need to show that $M_{n}^{'}$ limit to $M$ , witnessing that $M$ is in the closed convex hull of projections of nirvana-free stuff from above. Since $M_{n}^{'}$ is the projection of $M_{n}^{^{'} \infty}$ , which is $ϵ_{n} (1 + λ)$ away from $M_{n}^{\infty}$ , and projection doesn't increase distance, and the projection of $M_{n}^{\infty}$ is $M_{n}$ , we can go

$d (p r_{*}^{π_{p a}, π_{p a}^{n}} (M_{n}^{'}), M_{n}) = d (p r_{*}^{\infty, π_{p a}^{n}} (M_{n}^{^{'} \infty}), p r_{*}^{\infty, π_{p a}^{n}} (M_{n}^{\infty})) \leq d (M_{n}^{^{'} \infty}, M_{n}^{\infty}) < ϵ_{n} (1 + λ)$

So, we can conclude that, restricting to before time n, the measure components of $M_{n}^{'}$ and $M$ are fairly similar ( $ϵ_{n} (1 + λ)$ distance), and so are the $b$ components. Then some stuff happens after time n. Because our distance evaluation is done with Lipschitz functions, they really don't care that much what happens at late times. So, in the $n \to \infty$ limit, the difference between the $b$ terms vanishes, and the measure components agree increasingly well (limiting to perfect agreement for times before n, and then some other stuff happens, and since the other stuff happens at increasingly late time (n is diverging), the measure components converge.

So, we just built $M$ as a limit of points from the convex hull of the projections of nirvana-free parts down from above, and we're done.

Alright, we're back. We've finally accumulated a large enough lemma toolkit to attack our major theorem, the Isomorphism theorem. Time for the next post! [AF · GW]

0 comments

Comments sorted by top scores.

Proofs Section 2.1 (Theorem 1, Lemmas)

Contents

0 comments