Infra-Bayesian Logic
post by harfe, Yegreg · 2023-07-05T19:16:41.811Z · LW · GW · 2 commentsContents
Introduction Outline and Reading Guide Notation Infra-Bayesian Logic Less Formal Description of Infra-Bayesian Logic Formal Description of Infra-Bayesian Logic Syntax Semantics Constructions in IBL Pushforward Pushforward via a kernel Pushforward via kernel for finite sets Equality of functions Mixing with probability 1/2 A constant function with output bottom A constant function with output top Components of infrakernels for sum types Infra-POMDPs Setup Process Laws Examples Application: an Infra-Bayesian AIXI Analogue Candidates for the Semantics Compact Polish Spaces with Continuous Maps Allowing Discontinuity Upper Semicontinuity Measurable Maps A Different Notion of HUCs Standard setting for the alternative HUC case Allow higher Lipschitz constants for maps Another metric on products Modifying the definition of alternative HUCs Appendix Issues with Continuity Continuity Counterexamples Semi-continuous Pullback is not Compositional Stricter Homogeneity Measurability and Continuity Proofs None 2 comments
This research was produced during SERI MATS. Thanks to Vanessa Kosoy for providing ideas and reading drafts.
Introduction
In this article, we investigate "infra-Bayesian logic" (IBL for short), which is an idea introduced by Vanessa Kosoy. The original idea is outlined in Vanessa Kosoy's Shortform [AF(p) · GW(p)]. Here, we expand with some details and examples. We also try to fix the mentioned issues (e.g. lack of continuity for certain maps) by investigating several candidate categories for semantics. However, we fail to find something that works.
As a theory of beliefs and reasoning, infra-Bayesianism gives rise to semantics for a certain logic, which we might call "infra-Bayesian logic" (IBL for short). Formalizing the syntax for this logic has the advantage of providing a notion of description complexity for infra-POMDPs (partially observable Markov decision processes) expressed via this syntax, and hence could serve as a foundation for an infra-Bayesian analogue of AIXI. Infra-Bayesian logic is not the only way to define an infra-Bayesian analog of AIXI. Such an alternative could be to use oracle machines (with an extra input tape for a source of randomness): For a probabilistic oracle machine, the corresponding IB hypothesis is the closed convex hull of all stochastic environments that result from using the oracle machine with a particular oracle, and then one could use the description complexity of the oracle machine (which can be defined by choosing a universal oracle machine) as a basis for an infra-Bayesian AIXI. The analogue of oracle machines for (non-infra-Bayesian) AIXI would be (probabilistic) Turing machines.
However, infra-Bayesian logic might be an interesting alternative to describing infra-Bayesian hypotheses, and this approach would be specific to infra-Bayesianism (there is, as far as we know, no good analogue of infra-Bayesian logic in (ordinary) Bayesianism).
Note that we are currently unaware of a working semantics for the higher order logic. We discuss some candidates considered in Candidates for the Semantics, and the ways in which they fail. It's unclear whether these issues with the semantics are mostly technical in nature, or whether they point to some deeper phenomena. We can, however, restrict to a first order IBL on finite sets, which, albeit less powerful, is still expressive enough for many applications (in particular covers all the examples in Examples).
See also the paragraph on IBL in the overview of the learning theoretic Agenda [LW · GW].
Outline and Reading Guide
You can skim/skip the Notation section and use it as a reference instead. The main description of IBL is given in Infra-Bayesian Logic. To get an initial and informal idea what IBL looks like, read Less Formal Description of Infra-Bayesian Logic. For a technical description and a complete list of properties, read Formal Description of Infra-Bayesian Logic. Note that the latter description requires some familiarity with category theory.
In Constructions in IBL we list how we can combine basic elements of IBL to express more complicated things.
In Infra-POMDPs we talk about infra-POMDPs as an application of IBL. Some knowledge of infra-Bayesianism is beneficial here, but not required. We highlight the section on Examples, where we describe the IBL terms for concrete Infra-POMDPs.
In Candidates for the Semantics we discuss the various (failed) candidates we tried for the higher-order logic version of IBL. A typical reader is not expected to read this section.
The Appendix is for mostly technical proofs and counterexamples. We recommend to only look these up when you are interested in the technical details. For example, if you wish to understand why conjunction is not continuous in IBL, you can find the details there.
Notation
Let us describe notation needed for using concepts from infra-Bayesianism.
This notation section is partially copied from the Infra-Bayesian Physicalism article [AF(p) · GW(p)], but extended from finite sets to compact Polish spaces, wich requires a bit more measure theory. Also, we use the symbol for homogeneus ultracontributions (instead of as in the IBP article).
The notation from the IBP post is a bit different from some of the other posts on infra-Bayesianism in that it uses ultradistributions/ultracontributions rather than infradistributions.
We denote . We will work a lot with compact Polish spaces.
Compact Polish spaces are spaces that are topologically equivalent to some compact metric space (the metric is not fixed, however, only the topology). An important special case of compact Polish spaces are finite (discrete) sets.
Definition 1. Given a compact Polish space , a contribution on is a (Borel) measure on with . Given a measurable function and , we denote . The space of contributions is denoted , equipped with the weak- topology (i.e. the weakest topology such that the functions are continuous for all continuous functions ).
Naturally, any (probability) distribution is in particular a contribution, so . There is a natural partial order on contributions: when for all measurable .
Definition 2. A homogeneous ultracontribution (HUC) on is non-empty closed convex which is downward closed w.r.t. the partial order on . The space of HUCs on is denoted .
If a HUC has the property s.t. , we call it a homogeneuous ultradistribution.
Sometimes we need a topology on the space of HUCs . In this case we will use the topology induced by the Hausdorff metric, (where we use a metric on that fits with the weak- topology. The topology induced by the Hausdorff metric is also equivalent to the so-called Vietoris topology, which is a topology that only depends on the underlying toplogy on , see here. See also Propositions 42 and 50 in Less Basic Inframeasure Theorey [? · GW] for the same result for infra-distributioins). One can show that is a compact Polish space with this topology when is a compact Polish space, see Lemma 10. Let us introduce more notation.
For a set , denotes its closure.
Given another compact Polish space and a measurable function , is the pushforward by :
where for all measurable . We can omit the in the definition if is continuous.
A map is sometimes referred to as an infrakernel.
Given an infrakernel , is the pushforward by :
where (for measurable )
For a measurable function We also define the pullback operator via
We can omit here if is continuous.
is the projection mapping, is the inclusion map.
Given and an infrakernel , is the semidirect product:
where the measure is defined by
for all measurable , , which can be extended to arbitrary measurable sets in the usual measure-theoretic way.
We will also use the notation for the same HUC with and flipped. And, for , is the semidirect product of with the constant ultrakernel whose value is .
For a closed set , we define (we'll call these sharp ultradistributions). We also define via .
Infra-Bayesian Logic
The formal description in this section assumes some familiarity with type theory and categorical logic. For an informal overview of the syntax, we point the reader to Less Formal Description of Infra-Bayesian Logic.
The infra-Bayesian notion of beliefs gives rise to a fairly weak form of logic, in the sense that the spaces of predicates in infra-Bayesian logic don't form Heyting algebras, and are generally non-distributive and even non-modular as lattices. Also, due to the lack of function types, we cannot use the full syntax of type theory.
We will therefore work with a simplified syntax, still maintaining versions of many of the usual constructions appearing in higher order logic. This language can be used to describe infra-POMDPs (see Infra-POMDPs), which in turn is a useful beginning to something like an infra-AIXI (see Application: an Infra-Bayesian AIXI Analogue).
Less Formal Description of Infra-Bayesian Logic
Infra-Bayesian logic is a logical language. The language has types. Types can come from a set of initial types , or can be combined from other types using sums and products. Also, and are special types (where contains no element, and contains exactly one element, so to speak). There is also a more complicated construction of additional types: If is a type, then we also consider predicates on a type, which we denote by .
Infra-Bayesian logic also has maps between types. The set of maps between types and is denoted by . We will not list every kind of map that we require for infra-Bayesian logic in this informal description. But some examples are and , which correspond to a meaning of "and" and "or".
A map of type is basically the same as something of type , but it allows us to compose this with other maps. Examples are and which correspond to "true" and "false". There are also maps for equality and predicate evaluation. All kinds of different maps can be combined and composed to yield expressions in infra-Bayesian logic.
How does this connect to infra-Bayesianism? A collection of sentences in infra-Bayesian logic can have a model . There are lots of rules for a model to ensure that the symbols of infra-bayesian logic correspond to their intended meaning. A model maps types into topological spaces. In order to make sure the above-mentioned types behave reasonably under the model, we require that is the empty topological space, is the -point topological space, products and sums work as expected ( and ). A special case is the topological space for (the type of predicates on ): We require that is the topological space of homogeneous ultra-contributions over , i.e. . We will also explore alternatives to HUCs over topological spaces, but it should be some general notion of infra-distribution. The model of a map is a map between the topological spaces . We will have conditions that the model of a map is consistent with the meaning. For example, we require that the model of "" maps two HUCs to their intersection.
We will later investigate the precise conditions on toplogical spaces and maps that we would like to have for infra-Bayesian logic.
Formal Description of Infra-Bayesian Logic
Syntax
Given a set of primitive types, we can require that the types form a category , which should be the "freest" category such that:
has finite products and coproducts
supports infra-Bayesian logic (Definition 3).
We won't construct the category here (uniqueness is clear). We'll use the shorthand for the set of morphisms, and write (where is the terminal object).
Definition 3. We say that a category with finite products and coproducts supports infra-Bayesian logic, if it satisfies the following requirements:
There is a functor (recall that is the opposite category of ). For , is the object intended to correspond to predicates on . For , we write instead of to denote pullback of predicates.
There are morphisms . We require that these operations turn into a bounded lattice, and we require the functor to induce a functor into the category of lattices. In particular, we have .
We require that have natural left and right adjoints with respect to the poset structure (of the lattice) above. We also require these adjoints to come from morphisms in , denoted by respectively.
We require the existence of an equality predicate , which is the minimal predicate such that (cf. A constant function with output top).
We require the existence of (predicate evaluation). Given , we can pull back via to get . Note however that we cannot require universality here (i.e. that every predicate in arise this way), due to the failure of the disintegration theorem in infra-Bayesianism. It's not entirely clear currently what the appropriate condition to require from should be on the syntax side, even though the intended semantics can be described (cf. Item 5 in the semantics).
For we use as a syntactic constant symbol corresponding to a coin toss with probability .
Note that in Item 6 we could require the following more general set of symbols, however, the relevant ones can likely be approximated via the existing syntax. For , let (treating as an -point discrete space) be the subset of "describable" ultracontributions under some such notion. We would then introduce constant symbols for each .
Remark 4. In order to construct the syntax for the first order theory, we can instead consider two categories (base types) and (predicate types), with the functor . In this context the predicate functor cannot be iterated, hence we are left with a first order theory. The requirements in Definition 3 remain the same, except for Item 5 being removed. The construction of a pushforward via a kernel in Pushforward via a kernel no longer makes sense in that generality, but we explain how to recover it for finite sets in the first order theory in Pushforward via kernel for finite sets, which is then used when unrolling the time evolution of an infra-POMDP in Process.
Semantics
We require a model to be a functor , which preserves finite products and coproducts.
Moreover, we require to support infra-Bayesian logic (cf. Definition 3), and the model functor to respect the logical structure.
In practice, we will want to be some subcategory of . The motivating candidate is to take to be the category of compact Polish spaces with continuous maps, and the predicate functor on to be . This choice however doesn't work for higher order logic.
We nevertheless spell out what the logical semantics translate to, using as a generic symbol denoting some appropriate notion of "ultracontributions".
(cf. Lemma 10), and is the pullback (see Continuity Counterexamples about issues with continuity in the case of HUCs).
We require to induce a lattice homomorphism , where
is given by convex hull of the union
is given by intersection (see Continuity Counterexamples about issues with continuity in the case of HUCs)
Predicates. Let , , and be projection onto the second factor. The following follow from the adjunction
is the pushforward
For , we have
(see Continuity Counterexamples about issues with continuity in the case of HUCs)
(as a sharp ultradistribution)
If , then , where , and is considered as an infrakernel.
is the crisp ultradistribution corresponding to a coin toss with probability (here is the one point space).
Note in the more general setting of describable ultracontributions, we would require .
Definition 5. A subset of (i.e. a set of sentences) is called a theory. We say that models the theory if for all .
Remark 6. Finding an appropriate category turns out to be harder than first expected. We discuss various candidates considered in Candidates for the Semantics.
In general, we note that for infra-POMDPs with finite state spaces, the transition (infra-)kernels are always continuous, and some of the issues with continuity mentioned in Candidates for the Semantics are less of a problem.
Constructions in IBL
In the following we construct various useful components of IBL using the syntax outlined above. These constructions will also be used to build examples of infra-POMDPs in Infra-POMDPs.
Pushforward
Given , we can construct the pushforward as follows. First, consider the following two maps into :
(this represents the graph of )
.
Composing the product of these two with , we get , and finally post-composing with , we get .
Pushforward via a kernel
Given , we can construct the pushforward as well. This is done similarly to Pushforward, except the element in is replaced with
where is predicate evaluation.
Pushforward via kernel for finite sets
If is a finite set, then we can think of a map as a collection of points , which is meaningful in the first order syntax too, as long as the predicate category has a final object . In this case we can construct the pushforward along the kernel in the first order theory as follows. For each we have the composite
where is the inclusion of , and is the regular pushforward from Pushforward. Now, taking the product of these over , and taking (iterated) disjunction, we have
which is the "graph of ", using which we construct the pushforward analogously to Pushforward and Pushforward via a kernel.
Equality of functions
Given , we want to construct a sentence . This can be done as follows. First, take the composite with the diagonal on
then we have
so is the desired sentence.
Mixing with probability 1/2
Assume that we have two terms , . Then we have . Applying a pushforward yields . Then we can compose with , which assigns a fair coin distribution on .
A constant function with output bottom
We can construct a constant function term whose model is a constant function that maps everything to as follows. Let be the terminal map. Then we can define the term via
This does the right thing: By factoring through the function has to be constant, and by using , the output has to be .
A constant function with output top
This works the same way, just using the symbol instead of :
Components of infrakernels for sum types
For a function , how do we get access to its components, so that we get a term ? We do this by composing with and , i.e.
For the other components this would work essentially in the same way, which would give us functions , , .
Infra-POMDPs
Setup
We can describe infra-POMDPs in the language of infra-Bayesian logic in the following way. Infra-POMDPs consist of the following.
Finite sets of actions and observations
For each observation , a type of states producing the given observation. Let be the type of all states.
An initial state
For each , a transition kernel .
Process
Given a state at time , , and an action , we can apply the kernel pushforward , to end up in a new state . Once receiving an observation , we can do an update on the observed state. In general, this update depends on the loss function, but we can obtain a "naive" update via pull back and push forward through to get the observed state .
Laws
In infra-Bayesianism, laws (also known as belief functions in earlier posts on infra-Bayesianism) are functions , where is the space of possible (deterministic) policies of the IB agent (a technical note: the space is a Polish space, when equipped with the topology generated by cylinder sets). Such a law describes infra-Bayesian beliefs how a policy will lead to certain histories.
Under some conditions, one can convert an infra-POMDP into a corresponding law. We will not explain exactly how, and instead refer to Theorem 4 in "The many faces of infra-beliefs" [? · GW].
Examples
Let us give an example of an infra-POMDP. The first example is basically just and ordinary MDP, rewritten to fit into the above notation for infra-POMDPs.
Example 7. We set , . This example has (functionally) no relevant hidden states, i.e. states are equivalent to observations. Taking action will lead to the same observation as the previous observation. Taking action will lead to the other observation as the previous observation. As for the initial conditions, we start out with probability for each observation/corresponding state.
Let us explain how Example 7 can be described using an IBL theory . We have types and for the states corresponding to the two possible observations. First, we introduce non-logical symbols , . Then, using the construction in Mixing with probability 1/2, we can define a term in that assigns probability to each of , , as desired. Using function equality from Equality of functions, we can equate the initial state with this mixture. We add the equality sentence to our theory . This ensures that every model of the theory satisfies .
Next, we will add sentences that describe the transition kernels. Note that we have two transition kernels , one for each possible action. Then, for each transition kernel, we can decomposition it, like in Components of infrakernels for sum types. This gives us possibilities. For example, describes our beliefs that we will be in state after we take action and are in state . For all the possible combinations, the resulting function should be a constant function, with values or . Again, we can use the equality of functions as described in Equality of functions to ensure that these components of the transition kernels have the correct values, in accordance with the transition kernel as described in the example. We will add these sentences to the theory , so that now has 9 elements.
Example 8. The following is a "fully infra" POMDP.
- , single observation
- , two states
- , single action.
The credal set corresponding to
- is the interval
- is the interval .
We can take downward closure to end up with a homogeneous ultradistribution.
So a single step in the Markov process swaps and , but with some Knightean uncertainty of landing up to away from the target. As we iterate through multiple timesteps, we gain no information through observation (since ), so the intervals of Knightean uncertainty widen (the portion outside the Knightean interval shrinks as ).
We can construct the above infra-POMDP via IBL as follows. Let
be coin tosses with probability of heads equal to and respectively. We could in principle approximate these with iterated fair coin tosses, but we assume them as given for convenience. Let
be the two states. Then we have
and similarly for . We can construct the intervals of length from and :
Finally, we construct the transition kernel .
Application: an Infra-Bayesian AIXI Analogue
A potential application for infra-Bayesian logic is to describe the hypotheses of an AIXI-like infra-Bayesian agent, by using IBL to describe infra-POMDPs. Let us expand a bit on this idea.
Let's assume an infra-Bayesian agent knows that the environment is an infra-POMDP and also happens to know the finitely many actions the agent can take each time step, and the finitely many possible observations the agent could be receiving. However, it does not not know how many states there are. (We will leave rewards out of the picture here. We could include rewards by encoding it in the observations, e.g., by adding a reward for each observation. But for now let's focus only on the agent's model of the environment.)
An infra-Bayesian agent would have a set of hypotheses of how the environment could be. In our setting, each hypothesis has the form of an infra-POMDP. Like an AIXI agent, we would like to have a variant of Occam's razor: simpler hypotheses should receive a higher weight.
There is a potential approach to use a finite theory in IBL to describe a hypothesis for our infra-POMDP, and then the weight of this hypothesis in the prior is based on the length of the description of the IBL theory, e.g. proportional to when is the description length. Note that some IBL theories might have no valid model, so these theories should be discarded from the hypotheses; and there might be other reason why an IBL theory might be excluded from the space of hypotheses, see below.
Each theory in infra-Bayesian logic can have (possibly none or multiple) models. If we use types , and symbols , as in Setup, then each model describes an infra-POMDP: The initial state can be described by , and for each , is the transition kernel.
The question arises what do we do if there are multiple models for the same IBL term. Let us describe two approaches, which both rely on converting the infra-POMDP into a corresponding law, see Laws.
A first approach could be to only allow theories in IBL that produce models which are unique up to equivalence, where we say that two models are equivalent if they produce infra-POMDPs that produce the same laws.
A second approach could be to consider all possible models in a first step, then convert these models into corresponding laws, and then take the disjunction (i.e. closed convex hull of all outputs of the laws) to turn it into the final law.
Candidates for the Semantics
Compact Polish Spaces with Continuous Maps
A first candidate for the category would be compact Polish spaces with continuous maps, and '' being the space of HUCs as defined in Notation.
However, we run into issues with continuity in this category. Namely, the maps , , are not always continuous, see Continuity Counterexamples.
Allowing Discontinuity
Another approach could be to bite the bullet on discontinuity and broaden the set of allowed morphisms in the semantics category. The problem with these approaches generally is that pullbacks really only behave nicely with respect to continuous maps, we typically lose functoriality when allowing discontinuity. That is, the desired property can fail to hold for maps , . This is mostly because for discontinuous functions we have to take the closure in the definition of the pullback (see Notation).
Upper Semicontinuity
This approach is inspired by the fact that while intersection is not continuous on , it is upper semicontinuous (wrt to the set inclusion order) on .
For this approach, we would use the category of compact polish spaces with a closed partial order, and for the maps we require that they are monotone increasing and upper semicontinuous. Here, a function is upper semicontinuous when
holds for all sequences and points , . As for the partial order on , one could consider the partial order based on set inclusions, or additionally combine this with the stochastic order mentioned in Definition 2.1 in IBP [AF · GW] by requiring that elements in are also downwards closed with respect to the stochastic order.
In both cases, the approach with upper semicontinuous maps, runs into the functoriality issues mentioned above, see Semi-continuous Pullback is not Compositional.
Measurable Maps
We could try to broaden the set of morphisms even further, but naturally the issues with functoriality remain.
A Different Notion of HUCs
A possible remedy for the continuity issues could be to use an alternative definition of HUCs, in which the problematic maps become continuous. Roughly speaking, the high level idea here is to require elements of the modified HUCs to "widen" closer to , thus avoiding the discontinuities.
Unfortunately, we were not able to find a working semantics of this type, despite trying different approaches, as described below.
We can require HUCs to be closed under the finer partial order in Definition 9, and modify Definition 2 accordingly. Let denote the space of these alternative HUCs. One downside of this approach is that now the space depends on the metric on , while previously it only depended on the underlying topology.
Definition 9 (downward closure related to 1-Lipschitz functions). Let be a compact metric space. For we define the relation
The space is defined as the set of closed convex subsets of that are downward closed under the partial order .
To see that it achieves some of the desired behavior, we show that is continuous in Lemma 14. This approach also requires that we work with compact metric spaces, instead of merely with compact polish spaces.
We give some technical details in Standard setting for the alternative HUC case. Later we will look at alternatives, mostly to justify the choices made in Standard setting for the alternative HUC case.
Standard setting for the alternative HUC case
The Category of metric spaces is typically equipped with maps that are -Lipschitz continuous, and the metric on a product of metric spaces is given by
We do make some further modifications. First, we only consider compact metric spaces whose metric only has values in . This allows us to consider sums/coproducts of metric spaces (usually the category of metric spaces does not have coproducts). For the metric on a sum (disjoint union) of two metric spaces, we set the distance of two points from different components to .
Let us define the metric we will use on . As a metric on we will use the Kantorovich-Rubenstein metric
which is a metric that is compatible with the weak- topology on . Then we use the Hausdorff metric on . It can be seen that this Hausdorff metric has values in again.
A problem with is that it is not always preserved under the pullback, i.e. can happen for some , . Thus, we will redefine the pullback in this context. For a function between metric spaces we define the modified pullback via . We will use this alternative pullback for the case of .
Under this setting, we run into the issue that is not -Lipschitz continuous, see Example 16. This is despite the fact that is continuous (and probably even Lipschitz continuous).
Allow higher Lipschitz constants for maps
One might wonder what happens, when we pick a higher Lipschitz constant for maps in our category (instead of ). If we have two functions with a Lipschitz constant , then we can only guarantee a Lipschitz constant of for the composition (e.g. with the function ). If we repeat this argument, we cannot guarantee a finite Lipschitz constant, because as .
So, what happens if we have allow maps with Lipschitz constant ? It turns out that then we run into issues with functoriality, as in Allowing Discontinuity. A concrete counterexample is described in Example 15.
Another metric on products
Instead of the choice for in Standard setting for the alternative HUC case, we could consider using the metric
A first problem with this choice is that this is incompatible with any finite bound on , as is the case in Standard setting for the alternative HUC case. This issue can be avoided by considering extended metric spaces, which are metric spaces whose metric can take values in (in that case the distance between disjoint components in a sum is set to ).
A more difficult problem with this approach would be that the diagonal map is not -Lipschitz continuous for other natural choices of metric on products. We do need the diagonal map, see, e.g., Equality of functions.
Modifying the definition of alternative HUCs
We mention that as an extension, the definition for in Definition 9 could be modified to include a constant and use . For smaller , this would make these alternative HUCs sharper. The version with has some drawbacks as a lot of probability mass gets assigned to points nearby: For , we have , so an alternative HUC which contains also has to contain . In the context of infra-Bayesianism, this means we can use these kind of HUCs only in combination together with a smaller class of loss functions (in this case, loss functions that are of type for a nonnegative Lipschitz continuous ).
One might wonder whether increasing might help us decrease the Lipschitz constant for or even choose a such that is -Lipschitz. While the Lipschitz constant does get improved, calculating through Example 16 would show that the Lipschitz constant cannot reach even if we use these modified versions of .
Appendix
Lemma 10. If is a compact Polish space then so is .
Proof. It is known that the space of nonempty compact subsets of is compact in the Hausdorff metric. One can also show that is a closed subset of the space of nonempty compact subsets of .
For the related concepts (but not HUCs) in IB theory, this result can also be found in Proposition 47 in "Less Basic Inframeasure Theory" [? · GW] ◻
Issues with Continuity
Using the topology induced by the Hausdorff metric on (see below Definition 2), we find that in the 'logical' semantics (inverse) limit type constructions (in the category theoretical sense) are generally not continuous (see Continuity Counterexamples for details). This includes the semantics for , , and the pullback .
If itself is finite, then is nice enough that we don't run into these issues. This observation is sufficient for example in the construction of the infra-POMDPs in Examples when the state spaces are finite. However, we quickly run into issues if we're not careful with iterating the construction, since is not finite even if is.
Below is some discussion on a possible approach to trying to fix these issues (we haven't found an entirely satisfactory fix though).
Continuity Counterexamples
Lemma 11. The function , given by is not always continuous.
Proof. Consider the case where is the space with standard topology. This is a Polish space.
We consider sequences with and . We define . Then and are in . (For a point , I understand this notation to be , where is the measure that puts all weight on the single point ).
One can calculate that .
Because of the Hausdorff metric in we have and .
To put it together, we have
Thus, is not continuous. ◻
Lemma 12. The function is not always continuous.
Proof. Consider the case that . We use the notation . We choose the sequence , where denotes the line between points. Then one can show that .
However, we also have . One can show that holds. Thus, we have
i.e. is not continuous. ◻
Lemma 13. There is a continuous function such that is not continuous.
Proof. Pick , . Then , but for the limit we have , and not . ◻
Semi-continuous Pullback is not Compositional
Let be given by
And
Both and are monotone increasing, upper semi-continuous. Let . Then , the closure . Hence , while , so
We can construct a corresponding counterexample where by considering , defined as follows. Let be the sharp HUC on . Then
so
Then
but
which is different.
Stricter Homogeneity
We collect some technical results for A Different Notion of HUCs
With Definition 9 we can prove continuity for , which does not hold for ordinary HUCs.
Lemma 14. The function , given by is continuous.
Proof. We set . As a metric on we will use the Kantorovich-Rubenstein metric
Let , be sequences in with , . We want to show that . We have
For the first term, let be such that is maximized. By compactness, there exists a convergent subsequence of . Wlog let . We have for all , and taking limits implies . The same argument also implies . Thus, we have
For the second term, let be given. Due to , there exists a sequence with and . Similarly, there exists a sequence with and . The idea is now to construct a measure (i.e. a point in ) that is close to and but in .
Wlog we assume and , for all . We define via
where is the diameter of . Clearly, we have . We also define
We want to show and . By symmetry we only need to show . For that purpose, let be an arbitrary -Lipschitz continuous function. We define . We have to show .
We can decompose , where is a constant, and is a nonnegative function that satisfies for some . Since is -Lipschitz continuous, this implies that for all . Thus, is a function that is -Lipschitz continuous and bounded by , i.e. admissible in the definition Eqn. (1). We can use this to obtain
On the other hand, we have
Adding these yields
If we rearrange this inequality, we get . Dividing by yields , Since and was an arbitrary -Lipschitz function this proves . Since and is downwards closed wrt , this implies . Similarly, we can obtain and . From the definition of we also obtain (due to and ). Thus we have as . Since was arbitrary, this implies that the right-hand side in Eqn. (2) converges to . This implies in the Hausdorff metric, which completes the proof. ◻
In the next we will use the notation for the modified pullback, see Standard setting for the alternative HUC case.
Example 15. Consider as a compact metric space with the usual metric. Further, let be given by , for all . Then , i.e. functoriality does not hold.
Proof. Clearly, we have for all .
We define
We claim that holds. Due to for all this simplifies to
We start by observing and .
We also claim . Indeed, for -Lipschitz continuous we have
Due to this implies . Due to we have . This implies .
However, we can also show that . Indeed, lets consider the -Lipschitz continuous function . Then Thus . This completes the proof of Eqn. (3) and hence completes the proof of the lemma. ◻
Example 16. Let be the compact metric space where the metric satisfies . We use the standard setting from Standard setting for the alternative HUC case otherwise. Then the function , given by is not -Lipschitz continuous.
Proof. Consider the measures , , .
Let be the smallest set in that contains , for . Then it can be calculated that
We want to calculate . Due to , we have . Since the supremum of a convex function is attained on extremal points, we only need to consider for the case that is an extremal point of . The extremal points of are , , , . Let us calculate the distances (or bounds on distances) of these points to . We start with . We observe . This implies
On the other hand, we have and for all . Since is admissible in Eqn. (1), this implies for all , and therefore . Thus, .
Next, we calculate a bound on . As we have , we obtain
Similarly, for a bound on , we observe and calculate
Finally, holds due to . In summary, we have shown for all extremal points of that the distances to do not exceed (and obtain the value in at least one case). This implies .
By symmetry we also have .
Now, let us consider the intersection . Using the above descriptions for and we can obtain the subset relation
This implies for all . Recall that we have and . By using in Eqn. (1), it follows that for all . Thus,
Combining all these results shows
which shows that the Lipschitz constant has to be at least . ◻
Measurability and Continuity Proofs
Let us collect proofs that some maps between compact Polish spaces that are mentioned in Semantics are indeed measurable, as required. This concerns , , , .
Lemma 17. The function is continuous and measurable.
Proof. Let and . Recall that . Let be sequences with and .
We need to show for continuity. We do this by showing two things: First, every limmit of a convergent sequence with satisfies . Second, for every point there exists a sequence with .
So, first, let be a sequence with and . We have , and thus there exists , , with . Wlog we have convergences , , . This implies .
For the other part, let be given. Thus, there exist , , with . Due to and there exist sequences with , with , , and , . Then, we have . Taking limits implies . This completes the proof of continuity of . ◻
Lemma 18. If is a continuous function between compact Polish spaces, then is continuous.
In particular, the function is continuous and measurable.
Proof. Let be a sequence in with . We want to show . As before, we do this by showing two things. First, every convergent sequence with implies . Second, for every point we can find a sequence with and .
Note that is continuous if is continuous.
Let be a convergent sequence with . Then there exists with . Wlog for some due to compactness. This implies and . Due to uniqueness of the limit, we have and therefore the desired .
For the remaining part, let be given. Then there exists with . Since , there exists a sequence with and . We define . Then we have and . This completes the proof. ◻
Lemma 19. The function , given by is measurable.
Proof. We use the abbreviation , and use the notation for the metric on and for the induced Hausdorff metric on . To show measurability, it suffices to show that the preimages of open balls are measurable. Thus, we need to show that is measurable for all and .
First, we show an auxiliary claim.
Claim 1: Let be a compact subset. Then is closed and measurable.
Let us show that the claim is true. Let be a sequence in the set with , . Then there exists with and . Wlog for some (because of convergent subsequences). Then and thus . Similar, . Thus, the limit of is in the set . Thus, the set is closed and measurable.
We have .
Let us show measurability of . We have
For each , , the set is closed due to claim 1, because is a compact set. Recall that intersections of closed sets are closed. Thus, the set in question is the countable union of closed sets, and therefore measurable.
Now, let us show the measurability of . We have
Again, since the set is compact, it follows that the above set is measurable by claim 1.
This completes the proof that is measurable. ◻
Lemma 20. Let be compact metric spaces, and be given. Then there exists with and , where is the Wasserstein metric.
Proof. We set . In the following, we will also use the notation , for .
Then, due to the definition of , there exists a measure with , , (due to compactness arguments, the infimum is attained in the definition of ). We choose a measure via
where denotes a density function that satisfies for all measurable , , and is the same for the "mirrored" measure of , i.e. for all measurable . These density functions exist due to Radon-Nikodym. The measure exists due to Caratheodory's theorem and is unique.
The measure has the properties
where is map that projects on components and , and is the diagonal map.
We now choose . Indeed, this satisfies the desired property . As for the Wasserstein metric, is an upper bound on . Let us calculate . We have
This implies , which completes the proof. ◻
Lemma 21. The function is measurable.
Proof. We abbreviate , , . Again, we use the notation for a metric on or and for the induced Hausdorff metric.
Again, we start with an auxiliary claim.
Claim 2: Let be a compact subset. Then is closed and measurable.
Let us show that the claim is true. Note, that the set is the same as . Let be a sequence in the set with . That means, there is a sequence with and . Since is compact, we can wlog assume that (by selecting a convergent subsequence). Let be given with . We need to show that holds (this would show that is in the set mentioned in the claim, i.e. it would prove the claim). Wlog (otherwise follows from definition). This implies , which allows us to wlog assume that for all . Let us define the probability measures , , .
We can apply Lemma 20 to these probability measures, which implies that there exists with and .
It is known that the Wasserstein metric is compatible with the weak- topology on and . Thus, we obtain from . This, in turn, implies .
Next, we define . This satisfies and . Due to the properties of and , this implies . Now the desired property follows by taking limits. This completes the proof of the claim.
To show measurability of , it suffices to show that the preimages of open balls are measurable. Thus, we need to show that is measurable for all and . We proceed similarly to Lemma 19. We have
Let us start by showing measurability of the second set, i.e. . Since is compact, there exists a countable dense set . We have
Applying claim 2 for the compact set . Thus, we have a countable union of an intersection of closed sets. This is measurable.
Now, let us show the measurability of the other set . We have
Again, since the set is compact, claim 2 implies that the above set is measurable. This completes the proof that is measurable. ◻
As for the measurability of , we were not able to prove it for measurable . There might be some relation to the problem that the image of a measurable set under a measurable map is not measurable.
2 comments
Comments sorted by top scores.
comment by Quinn (quinn-dougherty) · 2023-07-05T19:47:40.552Z · LW(p) · GW(p)
is the inclusion map.
what makes a coproduct an "inclusion mapping"? I haven't seen this convention/synonym anywhere before.
Replies from: harfe↑ comment by harfe · 2023-07-05T20:07:09.634Z · LW(p) · GW(p)
"inclusion map" refers to the map , not the coproduct . The map is a coprojection (these are sometimes called "inclusions", see https://ncatlab.org/nlab/show/coproduct).
A simple example in sets: We have two sets , , and their disjoint union . Then the inclusion map is the map that maps (as an element of ) to (as an element of ).