Categories: models of models

post by countedblessings · 2019-10-09T02:45:29.617Z · LW · GW · 18 comments

Contents

18 comments

Let me clarify what I mean when I say that math consists of nouns and verbs. Think about elementary school mathematics like addition and subtraction. What you learn to do is take a bunch of nouns—1, 2, 3, etc.—and a bunch of verbs—addition, subtraction—and make sentences. “1 + 2 = 3.”

When you make a sentence like that, what you're doing is taking an object, 1, and observing how it changes when it interacts—specifically, adds—with another object, 2. You observe that becomes a 3. Just like how you can observe a person (object) bump their head (interaction) into a wall (other object) and the corresponding change (equals sign): the bluish bump emerging on their forehead.

Well, it turns out that no matter how far you go in math, you’re still taking objects and observing how they change when they interact with other objects. You learn about other kinds of numbers like 2.5 and 3.333 repeating, and other kinds of interactions like multiplication and division.

Eventually things start getting more abstract. You learn about matrices and the kinds of interactions they can have. You learn about sets and functions in a deep way. Complex numbers, topologies, etc.

But you never get away from nouns and verbs.

What are we going to do with all of these nouns and verbs? Well, at some point, when you have a bunch of things that feel similar in an important way, you often want to take a step back, lump all of the individual things together, and talk about them in general. Why do we have the concept of fruit? Because we live in a world where apples, oranges, lemons, papayas, etc., exist.

Why do we have the concept of numbers? Because we have a bunch of them, not just 1, 2, and 3, but also 2.5, 3.333 repeating, irrational numbers, and even complex numbers. Why do we have the concept of “operations?” Because we have addition, subtraction, multiplication, and division. Even though all of these give you very different answers for the same input of numbers, (1 + 2 does not equal 1 - 2 does not equal 1 times 2 does not equal 1 divided by 2), they still have something very similar in common, so it’s worth coming up with the concept of “operation” to study them independently of their individual characteristics. Just like how we can talk about “fruit” without having to reference the shape of an apple.

If you lived in a world with only apples, you wouldn’t have the concept of fruit. If you lived in a world where the only kind of thing was a rock, you wouldn’t have the concept of nouns. If you lived in a world where the only kind of math-thing was numbers, category theorists wouldn’t have come up with the concept of objects. (They wouldn't have come up with category theory at all!)

And what is the use of generalization? A bird’s-eye view of something lets you see things from new perspectives, potentially changing your concept of the thing entirely. When you think of numbers as various ways you can count stuff on your fingers, you can inch your way past the natural numbers to negatives, fractions and irrational numbers. But complex numbers will throw you for a total loop—they just don’t seem to be relatable to “amounts of things” in a way that you’re ever likely to see in the world.

In fact, complex numbers made no sense to me until I learned what numbers actually are, at which point everything clicked into place. The generalization helped me understand the specific case—complex numbers are just another type of number, like how apples are just another type of fruit.

Generalization is helpful when life stops throwing convenient examples at you.

For example, say you want to grow new kinds of fruit that have never existed. Having a concept of fruit is necessary to conceiving of that idea. Life's not going to give you examples of fruit that have never existed! You have to explore the conceptual space of all fruit.

(In fact, this experience with complex numbers, years ago, probably inspired these posts. I got the idea that if you just bothered to take a really long time to explain the mathematics, the average person could probably grasp quantum physics. This series is the category-theory version of that original idea.)

Category theory exists for the same reason the concept of fruit does: there are lots of individual things that have certain commonalities that make generalization an appealing idea. Apples and oranges are very different on the surface and very similar deep down. Natural numbers and complex numbers are very different on the surface and very similar deep down.

Category theory goes one step wider. It emerges when people look at entire fields of math like “algebra” and “topology” and notices that, while they’re very different on the surface, they seem to be very similar deep down. Many other fields of mathematics seemed to also share these deep similarities, and so gradually they all became mere examples of the generalization, that generalization being a category. (A category being something we'll define by the end of this post.) Just like how apples and oranges become mere examples of the generalization that is “fruit.”

And one of those commonalities is that all of these superficially disparate fields of mathematics study things and interactions between those things. I.e., they study objects and morphisms.

That might sound really general. And it is! And yet, just like with the general definition of fields that lets us understand complex numbers, we can learn really interesting things from this super-general perspective. (Specifically, the Yoneda lemma and adjunction.)

But right now you are having to take my word for the idea that many different fields of math can be thought of as studying “nouns and verbs.” So let’s look at things from a different perspective.

Even if you don’t know higher maths, you probably know things like, “pouring milk in my cereal will make my cereal soggy.”

So “milk + cereal = soggy cereal.” Seems awfully...mathematical.

Why does math exist, anyway? Well, there’s lots of ways to answer that question, so let’s rephrase it: why do mathematicians exist? Or even better, why do mathematicians get paid? It certainly isn’t for the joy of doing math. Instead, “mathematician” is a job that you can actually get paid to do because math is very useful for modeling our reality.

So why does math boil down to objects and morphisms so often? Probably for the same reason English boils down to nouns and verbs: we use language to discuss reality, and reality seems to boil down to nouns and verbs.

Take birds, for example. They are birds, so they’re nouns (objects). And they do stuff like fly, tweet, lay eggs, eat, etc. I.e., verbs (morphisms).

Whatever you may or may not know of maths, you definitely know a thing or two about reality. You’ve been living in it your whole life.

The rest of this post will take common sense ideas about creating models of our reality, break them down into their most abstract components, and end up with some simple rules that any model should follow. We’ll discover that those components and rules are exactly the components and rules that define a category.

***

You know a lot of models, even if most of them would never make it into a textbook. You use these models to navigate reality.

For example, a sentence like "Alice pushes Bob" is a model of reality. By themselves, those letters in that order are meaningless. But because you can use them to make predictions (specifically, you're inferring the speaker's state of mind), the sentence is correspondingly a model of (that particular part of) reality.

Sentences themselves exist, so we can model them too. You can think of a sentence structure as a way of modeling a sentence. As for models themselves, a model is a way of abstracting away from some details of whatever you're studying so that other features become more salient. For example, you can model a specific cat, Mr. Peanuts, as just a cat, neglecting specific features like his name, his color, etc., so that other, more general features of cats, like the tendency to meow, eat tuna fish, and utterly despise humans become more salient.

That is to say, what will Mr. Peanuts do in [specific situation]? Hard to say unless you know him personally. But if we ask, "What will a cat do in [specific situation]?" you might have a decent guess. The cat's not going to start barking, for one.

You have a model of cats inside your head, and this lets you make predictions, at the expense of specificity.

Sentence structure models specific sentences like "Alice pushes Bob" in terms of their abstract components: "Noun verbs noun."

You might be surprised to learn that you can make predictions with sentence structure. For example, one of the rules of grammar is that a prepositional phrase ends with a noun. So let's say I ask you, "Who will the soccer player kick the ball to?" You don't know the answer—I haven't even given you a list of people to choose from.

Let's also say that you don't know what the word "who" means, so you don't even know what I'm asking. In fact, let's say you don't know what any of the words in the sentence means, only their parts of speech. You certainly aren't going to give the specific answer I'm looking for.

But you do know sentence structure! You know that "to" is a preposition, so the phrase it opens up must end with a noun.

So who is the soccer player going to kick the ball to? Answer: a person, place, or thing—a noun.

This is not a brilliant answer. But it is an answer—a correct one, in fact! This super abstract model of sentences let you figure things out even when you didn't know what any of the specific words meant.

So sentences are models, and sentence structure is a model of a model, and models are really useful: they let you predict things.

But sentence structure is a model of just one kind of model—sentences themselves. What if we wanted a model of all kinds of different models—sentences, sentence structure, scientific models, models from sensory data, mathematical models, everything? What if we wanted to model modeling itself?

Why, we'd turn to category theory.

So what are the general qualities of all models?

First of all, every model has things—objects—and actions that the objects to do other objects—morphisms. For example, you can model a person by seeing them. That is to say, you hit them with some photons and form a picture of them based on the result. Every model is ultimately found in how one object changes another object—hence the term "morphism."

(Notice how sentences, which themselves boil down to nouns and verbs, can be thought of as models of anything. After all, it's hard to imagine any kind of phenomenon that can't be expressed in a sentence, even if you have to make up some new words to do it. Effectively, the reason we use category theory to form models of models of everything instead just sticking with English is because we can define our category as obeying certain rules or axioms that correspond to how models ought to work, while English can say pretty much anything. The rest of this post discusses these rules.)

So categories—models of models—consist of objects and morphisms. But what are the rules we require our objects and morphisms to obey so that they match up with our ideas of how models of reality ought to work? One is that every object ought to be, in some sense, a perfect model of itself.

Think about what models do: they transform how we view objects. If I tell you someone is a murderer, this changes how you view them because you apply a different model. Loosely, you could say that you apply the "murderer" morphism to the person in question. Well, every object ought to have a model that you can apply that doesn't transform the object at all—not because you aren't transforming, but because there's no effect, like multiplying by 1. For example, if you apply the "Mr. Peanuts" morphism to Mr. Peanuts, this shouldn't change anything: Mr. Peanuts is already Mr. Peanuts. This morphism is, in a certain clear sense, the identity of Mr. Peanuts.

So perhaps unsurprisingly, in category theory, we say that every object has an identity morphism that "does nothing." For an arbitrary object its identity morphism is written That's because it's like multiplying by 1—you are multiplying, it's just that nothing changes. (But just like multiplying both sides by 1 helps you solve equations, the identity morphism is extremely useful.)

What does it actually mean, mathematically, for the identity morphism to "do nothing?" To answer that, let's look at another requirement we'd expect of a general model of models: compositionality.

Say Alice pushes Bob, and Bob bumps into Charles. Let's diagram this as Hopefully the interpretation of this diagram is obvious. We could say that Alice affects Bob, and Bob affects Charles—that is to say, Bob's state is a model of Alice's action on Bob, and Charles's state is a model of Bob's action on Charles. (I know we used to stand for push last time, but and are the typical "generic morphism" symbols in category theory, and it's better to get used to using generic symbols than trying to name them after things all the time. Here, is Alice pushing Bob, and is Bob bumping into Charles.)

But we also might have an intuition that Alice affects Charles. Sure, she does so indirectly through Bob, but clearly Charles's state is a model of Alice's actions—actions which are in some sense actions on Charles.

That is to say, Alice pushed Charles just as clearly as she pushed Bob.

Effect flows through. We're all just models of the original state and laws of the universe, whatever they were. So whenever we have something like we should really expect to also have something like Moreover, this morphism should be equal to what happened when Alice pushed Bob and then Bob bumped into Charles. After all, that is what Alice did to Charles.

Let's use the symbol to mean "following." So if we write , that reads as "g following f." (Yes, that means you read it right to left, in effect. Get used to it, every field has at least one counterintuitive piece of notation.) We would then say that . More to the point, the idea is that whenever you have something that looks like , regardless of what these objects and morphisms are meant to stand in for, then you necessarily have such that .

This is what composition looks like as a diagram. (The backwards E means "there exists," and the dotted line, redundantly, means the same thing.)

An example of that you are hopefully familiar with is function composition. (We'll cover it in another post anyway.) Say you have You should know how to evaluate this: first, square the , then multiply it by .

Let's see how this is exactly like what we did before. You can see it like this: say that stands for the action of squaring the (it's a transformation—a morphism). And the other morphism stands for multiplying it by .

If you're given the problem of determining what you have if you do following , you can write that like . Plugging in the actual equations, you are told to "multiply by following squaring ." Which is exactly what you did.

If you were told to determine instead, you should be able to see that this tells you to "square following multiplying by ." I.e., .

The symbol in fact tells you that you are composing one morphism with another (exactly how composition works is defined by the category), and the requirement that implies such that is called compositionality. Every category has a rule of composition.

Composition is the real action of any category. If you know the rule of composition, you know how the category works internally. (Think about the concept of the laws of physics, the view of the world as a chain of events such that we can predict our current state from the Big Bang. Understanding that rule of composition is the theory of everything, the holy grail of physics.)

Now that we know what composition is, we can use it to rigorously define how the identity morphism "does nothing." Suppose you have an object and the identity morphism . Say you also have a morphism . Since is an object, it has an identity morphism as well, .

The identity morphism is a morphism "on "—what does that mean? It means that goes from to . It goes nowhere, in effect—just what we'd expect of an identity morphism.

So we have a morphism . We also have a morphism , as stated. Well, look at this! According to the rule of composition, we must have a morphism such that .

But we already had a morphism going from to , itself. The identity morphism does nothing, so it shouldn't have any effect on what we already had. This is precisely defined by the following condition:

That is to say, doing after doing is the same as just doing . Composing by is just like multiplying by .

And notice we also have . And since we still have , we again have a rule of composition saying that we have a morphism going from to that is equal to .

What is this morphism? Again, does nothing, meaning that .

Basically, if you give a nut to a bird, and the nut is the nut, that's the same as just giving a nut to a bird. Similarly, if you give a nut to a bird, and the bird is the bird, that's just the same as giving a nut to a bird. Brilliant, I know, but sometimes math is about piling up trivialities until something profound emerges.

There's one last rule we'd expect any model to obey. Let's take again the sequence of events where Alice pushes Bob, and Bob stumbles into Charles. Temporally, Alice pushed Bob before Bob ran into Charles. But logically, we should be able to study the events out of order: we should be able to study Bob's motion into Charles, and then look backwards in time to study Alice pushing Bob. (Just like how we can study Alice pushing Bob before we look backwards in time to see how the Big Bang pushed Alice into pushing Bob.)

This ability to study the events out of order as long as ultimately put things together in the right order is called associativity. You might be familiar with this in terms of the associativity of multiplication, for example. Because we say that multiplication is associative.

Let's add David into the equation, so that we have a path like We have a new composition . Associativity would tell us that

Indeed, in any category, the rule of composition must be associative. This should be interpreted as saying that we study at any individual interaction out of order, as long as we don't forget what the order actually was! (Why "study?" Because we're modeling models, and models are for studying reality.)

You could think of associativity this way: Say you're planning a trip from North America to Europe to Asia. Associativity says that you can think about the trip from Europe to Asia before you think about the trip from North America to Europe so long as you remember to leave from North America when it's time to take the trip!

And...we're done. We have the rules we'd expect any model to obey. And, not coincidentally, we now have the rules we'd expect any model of models to obey—the rules that define a category.

Definition: a category is a mathematical structure consisting of

i) a bunch of objects written etc.

ii) a bunch of morphisms written etc., which obey some rules.

Rule 1: Every morphism goes "from" an object in the category "to" an object in the category . The "from" object is called the domain and the "to" object is called the codomain. (This should sound similar from learning about functions—functions are just a type of morphism; functions have a domain and codomain because all morphisms do.) For example, if you have a morphism , then is the domain of and is the codomain of .

Rule 2: The category must have a rule of composition such that, if the domain of one morphism in is the codomain of another morphism in then there is a way of composing the two morphisms to get a single morphism extending from the domain of the latter morphism to the codomain of the former morphism such that this new "composite" morphism is equal to the composition of the latter morphism following the former morphism. E.g., if you have morphisms and , then you necessarily have such that .

Rule 3: Every object in has an identity morphism going from itself to itself that does nothing. For an arbitrary object in the identity morphism is written . The identity morphism "does nothing" in the sense that composing it with other morphisms is the same as just doing the other morphism.

Rule 4: Composition is associative. Associativity means that if you have a chain of compositions , then it is always true that

Why are there so many rules for morphisms, and none really for objects? We'll explore in a few posts from now the idea that categories are all about the morphisms. Basically, we'll explore the idea that an object is totally defined by what it does and what is done to it—i.e., all the morphisms it is a part of.

Coming up next are examples of categories. This next post should hopefully both clarify the claim that category theory generalizes different fields of mathematics and also help make the concept of a category much more concrete. Afterward, we'll talk about a very important type of category, the category of sets and functions, and then a series of posts on both conceptual matters and enhancing our mathematical understanding of categories.

18 comments

Comments sorted by top scores.

comment by gjm · 2019-10-10T03:06:36.842Z · LW(p) · GW(p)

I'm really not convinced by this framing in terms of "objects doing things to other objects".

Let's take a typical example of a morphism: let's say (note for non-mathematicians: that is, is a function that takes a positive integer and gives you a real number) given by . How is it helpful to think about this as doing something to ? How is it even slightly like "Alice pushes Bob"? You say "Every model is ultimately found in how one object changes another object" -- are you saying here that the integers change the real numbers? Or vice versa? (After that's done, what have the integers or the real numbers become?)

The only thing here that looks to me like something changing something else is that (the morphism, not either of the objects) kinda-sorta "changes" an individual positive integer to which it's applied (an element of one of the objects, again not either of the objects) by replacing it with its square root.

But even that much isn't true for many morphisms, because they aren't all functions and the objects of a category don't always have elements to "change". For instance, there's a category whose objects are the positive integers and which has a single morphism from to if and only if ; when we observe that , is 5 changing 9? or 9 changing 5? No, nothing is changing anything else here.

So far as I can see, the only actual analogy here is with the bare syntactic structure: you can take "A pushes B" and "A has a morphism f to B" and match the pieces up. But the match isn't very good -- the second of those is a really unnatural way of writing it, and really you'd say "f is a morphism from A to B", and the things you can do with morphisms and the things you can do with sentences don't have much to do with one another. (You can say "A pushes B with a stick", and "A will push B", and so forth, and there are no obvious category-theoretic analogues of these; there's nothing grammatical that really corresponds to composition of morphisms; if A pushes B and B eats C, there really isn't any way other than that to describe the relationship between A and C, and indeed most of us wouldn't consider there to be any relationship worth mentioning between A and C in this situation.)

comment by Said Achmiz (SaidAchmiz) · 2019-10-09T17:55:34.220Z · LW(p) · GW(p)

Conceptual question:

In real life, i.e., when dealing with the physical world, there are usually many ways to generalize any given thing or phenomenon.

For example, a tomato is a fruit, but it’s also a vegetable; that is, it belongs to a botanical grouping, but also to a culinary grouping. Neither classification is more ‘real’ or ‘true’ than the other[1]; and indeed there are many other possible categories within which we can put tomatoes (red things, throwable things, round things, soft things, etc.).

Is this also the case in category theory? That is: for anything which we might be tempted to generalize with the aid of category theory, are there multiple ways to generalize it, dictated only by convenience and preference? Or, is there necessary some single canonical generalization for any given mathematical… thing? If the former: how and by what criteria are generalizations selected? If the latter: what pitfalls does this create when using real-world-based analogies to understand category theory?


  1. Recall that taxonomic classifications aren’t written in the heavens somewhere, but are merely a useful way for humans to classify organisms (namely, by putting them into groups arranged by common descent). This is useful for various reasons, but by no means unambiguous or necessary, nor dictated by reality—as “in truth there are only atoms and the void.” ↩︎

Replies from: Gurkenglas, countedblessings
comment by Gurkenglas · 2019-10-09T22:32:33.139Z · LW(p) · GW(p)

Math certainly has ambiguous generalizations. As the image hints, these are also studied in category theory. Usually, when you must select one, the one of interest is the least general one that holds for each of your objects of study. In the image, this is always unique. I'm guessing that's why bicentric has a name. I'll pass on the question of how often this turns out unique in general.

comment by countedblessings · 2019-10-10T01:36:34.763Z · LW(p) · GW(p)

One of the reasons for my own interest in category theory is my interest in the question you raise. I'm hoping that we'll explore the idea that universal properties offer an "objective" way of defining "subjective" categories.

Maybe a more direct answer is that in the very next post in the series, we'll see that sets can be considered the objects of the category of sets and functions, and also the objects of the category of sets and binary relations. Functions are binary relations, so that's not a perfect answer, but yes, you can think of an individual category as a context of sorts through which you view the objects, like how you can view a tomato as a fruit or vegetable depending on the context.

comment by gilch · 2019-10-10T02:10:39.425Z · LW(p) · GW(p)

I think it should be possible to embed images in the post instead of just linking to them.

Replies from: habryka4
comment by habryka (habryka4) · 2019-10-10T05:12:05.161Z · LW(p) · GW(p)

It's totally possible. In the post-editor, just select some empty space and press the "image" button in the toolbar. Or use markdown syntax.

Happy to edit the above post to have images in the post, as opposed to just links to images.

comment by Gordon Seidoh Worley (gworley) · 2019-10-09T19:24:15.555Z · LW(p) · GW(p)

This continues to be a slyly gentle series that has you in to something before you know it. Well done!

As a side note, maybe you or the admins can set these posts up as a sequence so they are linked together.

Replies from: countedblessings
comment by countedblessings · 2019-10-10T01:38:32.805Z · LW(p) · GW(p)

Thank you for the positive feedback. (A very underrated thing in terms of encouraging free content production.) I can go back to each post and add a link to the next one. I am concerned that I may want to add, rearrange, or even delete individual posts at some point, but I suppose that's no reason not to add in the links right now for convenience's sake.

Replies from: avturchin
comment by avturchin · 2019-10-17T15:21:03.993Z · LW(p) · GW(p)

Thanks for this sequence.

comment by Viliam · 2019-10-10T22:34:14.674Z · LW(p) · GW(p)
What you learn to do is take a bunch of nouns—1, 2, 3, etc.—and a bunch of verbs—addition, subtraction—and make sentences. “1 + 2 = 3.”

I still have no idea how to express this in a picture of objects and arrows. I suppose that 1, 2, and 3 are objects. Is the addition an arrow? But an arrow has only one start and one end...

More meta: You have already provided the readers "motivation" in the two introductory articles. It is not necessary to add more hype in each article. Yes, I already heard that you can do everything in category theory, and I am willing to suspend disbelief. Now I am curious how specifically it can be done.

Replies from: philh
comment by philh · 2019-10-11T14:56:13.003Z · LW(p) · GW(p)

It's possible to construct a category where numbers are objects and where the arrows are "plus zero" (identity), "plus one", "plus two" and so on. ("Numbers" here might look like it stands in for "natural numbers". But actually, as described, it would work just as well with "real numbers", "complex numbers", "integers greater than three", "numbers whose fractional part is the same as the fractional part of e to five decimal places"... formally, any set which is "closed under addition of natural numbers". Unless you pick a different way to operationalize "and so on".)

Then the objects in "1 + 2 = 3" are in and three, and the arrow is "plus two".

(If you picked "numbers" above to be "natural numbers", then there's a one-to-one correspondence between objects and "arrows from this object", for any object. But I'm not sure if that's important.)

More normally, "the set of numbers" would be an object all by itself, and the arrows would be the same as above, but all pointing from this one object to itself.

Neither of these sounds like what OP was trying to describe, but I don't have an answer that does.

Replies from: Viliam
comment by Viliam · 2019-10-11T19:44:35.386Z · LW(p) · GW(p)

But then there would be no obvious connection between the number "two" and the arrow "plus two". Also, no obvious connection between the "plus two" arrow doing from 1 to 3, and the "plus two" arrow going from 6 to 8. That feels like we can make a diagram that somehow represents the addition of integers, but we can't derive new insights about addition from looking at the diagram, because most information is lost in the translation.

I guess what I meant was: I have no idea how to express 1+2=3 in a useful picture of objects and arrows.

Replies from: Slider
comment by Slider · 2019-10-11T20:35:30.183Z · LW(p) · GW(p)

Knowing that haskell I think the pattern to turn multiparty relations to two place relations is R(a,b,c,d,e,f,g) -> R(S(b,c,d,e,f,g)) -> R(S(T(d,e,f,g)) ... R(S(T(U(V(X(Z(g)))))))

The connection between "+2" and 2 would then be a function of +(2)="+2". You migth also need =(3)="=3" and then you can have =3(+2(2)) = "2+2=3" and maybe a T?("2+2=3")=False. In another style you would set it up that only true equations could be derived. Then one of the findings would be that any instance of +2(2) could be replaced with 4 and the mappings would still hold (atleast on the T? level). Mind you "2+2" could be a different object from "4"

comment by Said Achmiz (SaidAchmiz) · 2019-10-09T16:07:44.546Z · LW(p) · GW(p)

For example, say you want to grow new kinds of fruit that have never existed. Having a concept of fruit is necessary to conceiving of that idea. Life’s not going to give you examples of fruit that have never existed! You have to explore the conceptual space of all fruit.

It’s not, actually. See this old comment of mine [LW(p) · GW(p)]:

Note that under this interpretation, no “general” or “extended” version of the concept is ever created (the template is anonymous, and is discarded as soon as it “goes out of scope”—which is to say, as soon as it has been used to create the new concept). There is thus no need to ask the questions of what this new, “general”/“extended” concept means, to what else it may or may not apply, how to differentiate between uses of it and any specific version, etc.

comment by Gurkenglas · 2019-10-09T11:58:20.681Z · LW(p) · GW(p)

Not every way to model reality defines identity and composition. You can start with a category-without-those G (a quiver) and end up at a category C by defining C-arrows as chains of G-arrows (the quiver's free category), but it doesn't seem necessary or a priori likely to give new insights. Can you justify this rules choice?

Replies from: countedblessings
comment by countedblessings · 2019-10-10T01:43:21.227Z · LW(p) · GW(p)

Honestly my real justification would be "adjoint functors awesome, and you need categories to do adjoint functors, so use categories." More broadly...as long as it's free to create a category out of whatever you're studying, there's clearly no harm. The question is whether anything's lost by treating the subject as a category, and while I fully expect that there are entire universes of mathematics and reality out there where categories are harmful, I don't think we live in one like that. Categories may not capture everything you can think of, but they can capture so much that I'd be stunned if they didn't yield amazing fruit eventually. I'd acknowledge that novel, groundbreaking theorems are still forthcoming.

Replies from: gjm, Gurkenglas
comment by gjm · 2019-10-10T03:28:22.783Z · LW(p) · GW(p)

Let's take a somewhat-concrete example. Your post mentions birds. OK, so let's consider e.g. a model of birds flying in a flock, how they position themselves relative to one another, and so on. You suggest that we consider the birds as objects: so far, so good. And then you say "they do stuff like fly, tweet, lay eggs, eat, etc. I.e., verbs (morphisms)." For the purpose of a flocking model, the most relevant one of those is flying. How are you going to consider flying as a morphism in a category of birds? If A and B are birds, what is this morphism from A to B that represents flying? I'm not seeing how that could work.

In the context of a flocking model, there are some things involving two birds. E.g., one bird might be following another, tending to fly toward it. Or it might be staying away from another, not getting too close. Obviously you can compose these relations if you want. (You can compose any relations whose types are compatible.) But it's not obvious to me that e.g. "following a bird that stays away from another bird" is actually a useful notion in modelling flocks of birds. It might turn out to be, but I would expect a number of other notions to be more useful: you might be interested in some sort of centre of mass of a whole flock, or the density of birds in the flock; you might want to consider something like a velocity field of which the individual birds' velocities are samples; etc. None of these things feel very categorical to me (though of course e.g. velocities live in a vector space and there is a category of vector spaces).

Maybe flocking was a bad choice of example. Let's try another: let the birds be hens on a farm, kept for breeding and/or egg-laying. We might want to understand how much space to give them, what to feed them, when to collect their eggs, whether and when to kill them, and so on. Maybe we're interested in optimizing taste or profit or chicken-happiness or some combination of those. So, according to your original comment, the birds are again objects in a category, and now when they "lay eggs, etc., etc." these are morphisms. What morphisms? When a bird lays an egg, what are the two objects the morphism goes between? When are we going to compose these morphisms and what good will it do us?

How does it actually help anything to consider birds as objects of a category?

Here's the best I can do. We take the birds, and their eggs, and whatever else, as objects in a category, and we somehow cook up some morphisms relating them. The category will be bizarre and jury-rigged because none of the things we care about are really very categorical, but its structure will somehow correspond to some of the things about the birds that we care about. And then we make whatever sort of mathematical or computational model of the birds we would have made without category theory. So now instead of birds and eggs we have tuples (position, velocity, number of eggs sat on) or objects of C++ classes or something. Now since we've designed our mathematical model to match up, kinda, to what the birds actually do, maybe we can find a morphism between these two jury-rigged categories corresponding to "making a mathematical model of". And then maybe there's some category-theoretic thing we can do with this model and other mathematical models of birds, or something. But I gravely doubt that any of this will actually deliver any insight that we didn't ourselves put into it. I'd be intrigued to be proved wrong.

comment by Gurkenglas · 2019-10-10T11:41:08.526Z · LW(p) · GW(p)

That a construction is free doesn't mean that you lose nothing. It means that if you're going to do some construction anyway, you might as well use the free one, because the free one can get to any other. (Attainable utility anyone?)

Showing that your construction is free means that all you need to show as worthwhile is constructing any category from our quiver. Adjunctions are a fine reason, though I wish we could introduce adjunctions first and then show that we need categories to get them.