Abstractions as morphisms between (co)algebras
post by Erik Jenner (ejenner) · 2023-01-14T01:51:45.622Z · LW · GW · 1 commentsContents
Motivation (Co)algebras Examples Some potential problems with this formalism None 1 comment
In a previous post [LW · GW], I suggested we should think of abstractions/ontologies as maps that make a certain commutative diagram commute:
The downward arrows are the abstraction, which throws away some information about the world state while keeping other parts. The horizontal arrow at the top is time evolution. The diagram says that there should be a way to "mirror" the time evolution inside the high-level model (that's the dashed arrow at the bottom). Commutativity of this diagram means that we can either do time evolution in the actual world and then abstract, or first abstract and then do time evolution in the model. If this doesn't make sense, the original post [LW · GW] has more details and examples.
It turns out that this idea is very, very far from new.[1] I'll hopefully write a post on some related literature soon, but here I want to discuss one connection in particular: category theory, and more specifically (co)algebras of endofunctors. Using coalgebras for abstractions is an idea from bisimulations in computer science. See nlab if you just want a quick version. This post essentially just adds lots of examples and some reframing—the computer science discussions I've seen tend to be heavily focused on the special case of abstractions of transition systems.
Motivation
There are two main things the (co)algebra framework will give us:
- It could be a useful way to formalize the slightly vague description of abstractions-via-commutative-diagrams from my previous post. (In particular, it addresses one open question that I'll discuss in a moment.)
- The original post had a few examples for commutative diagrams of the type above, but most of them focused on deterministic time evolution. This (co)algebraic framework lets us easily generate tons of examples in a natural way, from group theory to general-purpose optimizers. This gives some support both to this specific framework but also to the more general idea of using these commutative diagrams to define "good abstractions".
I'll briefly elaborate on why the description in my earlier post was a bit vague. I had another example of the commutative diagram above with a somewhat different flavor:
The abstraction maps from real numbers to a representation using 11 decimal digits (the details don't matter here). The horizontal arrow isn't time evolution of a physical system in this example, instead it's addition of real numbers. The commutative diagram essentially says we should also be able to add within our decimal approximation.
A key difference is that in the time evolution example, the horizontal arrow was an endomorphism, i.e. it mapped from the set of possible states to itself, with a type signature like . Addition does not have this type signature, it's a map . This means we can't use the same function for both downward arrows. Intuitively, it's pretty clear that it makes sense to use on the left side, i.e. apply the abstraction to each input independently. But what's the general principle that tells us how the two downward arrows should be related? And what kinds of horizontal arrows can we use besides time evolution and addition? That's what this post gives one potential answer to.
(Co)algebras
I'll assume you know what categories and functors are, and just focus on slightly less well-known concepts. If you don't know category theory, the next section with examples could still be interesting to get a sense of what kinds of things can be described in this abstractions framework.
First, an endofunctor is simply a functor from a category to itself. All of my examples are going to be for since that already gives us plenty. Some examples of endofunctors we'll use:
- The identity endofunctor.
- The power set functor maps each set to its power set. A function is mapped to a function by using the image of subsets, i.e. for a subset .
- The constant endofunctor that maps any set to some specific fixed set and maps any function to the identity on .
- The "duplication endofunctor" maps any set to the cartesian product and any function to (i.e. applied componentwise).
- The "stochastification endofunctor" maps any set to the set of probability measures on , and any function to the pushforward map (which translates probability distributions on to probability distributions on using ).
An algebra of an endofunctor is a morphism for some specific object (i.e. a set in our examples). A coalgebra is the same thing in the other direction, . Some examples:
- An algebra or a coalgebra of the identity endofunctor are both just functions on some set. So time evolution can be interpreted as a (co)algebra of the identity functor.
- An algebra of the "duplication functor" is a map , i.e. a binary operation like the addition example above.
- Many algebraic structures (such as groups) can be described as algebras of endofunctors that generalize this duplication functor. For example, groups are algebras of the functor , see Wikipedia for details.
- A coalgebra of the power set functor is a function , so for each element we need to specify a set of other elements. One interpretation of this is a binary relation: for each , we specify all the with .
- A coalgebra of the constant functor for a set is just a map .
As you can see, both the time evolution example and the binary operation example have a horizontal arrow that can be understood as a (co)algebra. In general, I think any algebra or coalgebra of any endofunctor can be used as a horizontal arrow. The choice of functor and (co)algebra is going to describe which "task" we care about: prediction of next time steps, addition of numbers, etc.
Given a horizontal arrow, i.e. a (co)algebra, we can think of abstractions as morphisms from this (co)algebra to another one. A morphism from an algebra to is defined as a map such that this diagram commutes:
In our context, the morphism is the abstraction, with the low-level/detailed system and the high-level model. is the representation of the task/operation within the high-level model (the dashed arrow in the diagrams above).
This is a generalization of the addition diagram above: there, and . Note how this answers the question of what the relation between downward arrows should be: we choose the abstraction and then get the other arrow by applying the functor , which is part of the specification of the horizontal arrow.
Coalgebras work analogously, except we choose the downward arrow on the left side and then translate it to the right using the functor:
This shows how (co)algebras are one possible formalization of these commutative diagrams. I don't see any strong a priori justification of why they are the correct formalism. The main reason this approach seems interesting to me is that lots of examples work out: abstractions I had thought about before seeing this formalism fit in nicely, and if I plug in endofunctors I hadn't thought about before, I often get good examples of abstractions out. This is what we'll look at next.
Examples
I'll now quickly go through some examples of different endofunctors and how to interpret them in the context of abstractions. These are going to be pretty informal but hopefully still clear (otherwise let me know and I'll be happy to give a more detailed version in the comments).
- We've already seen two examples: the first is time evolution, induced by algebras or coalgebras of the identity endofunctor. The second is addition of real numbers, induced by algebras of the "duplication functor". Of course, this second example can be extended to any binary operation (or even -ary ones, by using a functor that creates more than two copies within a cartesian product).
- As mentioned, coalgebras of the power set functor can be interpreted as binary relations. One way to think about these binary relations is as describing a transition system: for any state , the image under the coalgebra tells us the possible next states. If we interpret morphisms from this coalgebra to another one as abstractions, then the commutative diagram tells us that our abstraction should contain all the necessary information to "run a transition system forward" within the abstraction: given the current abstracted state, we need to be able to compute the possible next abstracted states.
- Instead of just possible next states, we can also get a probability distribution over next states (this could also be seen as a generalization of the time evolution example to stochastic systems). This can be modeled using the "stochastification functor" that maps each set to the set of probability distributions over that set. Coalgebras of this functor are precisely stochastic transition dynamics. An abstraction of such a stochastic transition system needs to contain the necessary information in order to compute the probability distribution over next abstracted states.
- Let's say we want to introduce actions that an agent can take, which influence the next state distribution. In the deterministic setting, we simply use the functor instead of the power set functor, where is the set of actions. In the context of computer science, coalgebras of this functor are called labeled transition systems. We can use a similar trick for the stochastification functor to get a probabilistic version: map each set to the set of conditional distributions on the set. This gives us a simple notion of what "good abstractions" are for an agent that can take actions in a stochastic environment. (This is certainly not a complete definition of "goodness" though, it doesn't take goals into account.)
- Consider image classification (or any other supervised learning setup) as our "task". The horizontal arrow is then a function from images to labels, . We can interpret this as a coalgebra of the constant functor that maps every set to . The commutative diagram for this coalgebra demands that the abstraction of an image should still contain all the necessary information to predict the correct label.
- If we interpret algebraic structures, such as groups, as algebras of an endofunctor on , then the abstractions under this formalism are going to be group homomorphisms out of that group. If we consider two abstractions that contain exactly the same information to be identical, then this means abstractions of a group are exactly quotients of that group.
- Finally, let's look at the type signature of agency [LW · GW], . This is an algebra of the (contravariant) functor , which maps any set to the set of functions . Interestingly, the commutative diagram in this example has one of the arrows pointing upwards because the functor is contravariant. What the diagram says is that the abstraction of needs to contain all the necessary information such that the correct abstracted output can be selected for any "loss function" . This is a reasonable notion for good abstractions of a general-purpose optimizer. As one example, consider gradient descent: it's most naturally defined on real numbers, but implemented in practice using floating point numbers. It's crucial that gradient descent is able to (approximately) select the floating point approximation of the direction of steepest descent, based on the floating point approximations of the (arbitrary) loss function. This is what makes floating-point-gradient-descent a good abstraction of actual gradient descent.
Some potential problems with this formalism
- When actually working with this (e.g. trying to prove general statements about abstractions in this framework), it's quite annoying that there are these two dual versions, algebras and coalgebras. One potential solution: there's apparently something called an algebra for an endo-profunctor, which has algebras and coalgebras of endofunctors as special cases. But I'm not sure if going even further along the abstract non-sense route is advisable.
- All of my examples were in , so do we really need category theory at all? If we are using category theory, if feels a bit clunky to have everything happen in , e.g. to use the set of all probability distributions, and to consider Markov kernels as morphisms in . But maybe this is perfectly normal when working with (co)algebras, I'm not a category theorist.
- Say we have two abstractions and of the same system , such that (so they contain exactly the same information). We might want to treat these as "the same" abstraction. So maybe we shouldn't talk about maps out of at all, and instead just consider equivalence relations on . In fact, I largely buy this objection, but I think it's still often helpful to think of the equivalence relations we want to look at as induced by (co)algebra morphisms.
I don't view any of these as decisive issues. Possible objections that I think could be a bigger deal:
- Maybe this entire way of thinking about abstractions is trivial/misguided/... One reason to be optimistic is that this type of commutative diagram appears in tons of places where people have tried to define what abstractions are. But it's worth noting that John Wentworth's approach is a bit different (I think [LW · GW]).
- Examples of good abstractions that can't be modeled this way would be a pretty big deal. Examples of this framework that don't make sense as abstractions are also interesting but a bit less damaging I think (the claim is certainly not that every commutative diagram of the "right-down = down-right" kind is best thought of as an abstraction!)
- ^
Thanks to Generallyer for pointing out [LW(p) · GW(p)] related work in causality; there are many other examples as well.
1 comments
Comments sorted by top scores.
comment by Shmi (shminux) · 2023-01-14T09:11:02.150Z · LW(p) · GW(p)
Here is a couple of my somewhat related comments from awhile back: https://www.lesswrong.com/posts/NFMPftEhagCcHcmQJ/on-the-role-of-abstraction-in-mathematics-and-the-natural?commentId=GRp8PKJd64Tm5DhtA [LW(p) · GW(p)]
https://www.lesswrong.com/posts/wuJpYLcMEBz4kcgAn/what-is-abstraction-1?commentId=qxnTe3TAYs2ZqzAQL [LW(p) · GW(p)]
I wish I had the background to formalize them the way you have in your post.
In general, I get the importance of keeping in mind the relevant commutative diagram (often more than one) for whatever mapping you are working on, but it is not clear to me that you can get anything actionable out of them. Still, I do think that even if all your examples are in Set, CT is a good framework for thinking about them, without going down the abstraction levels.