Fundamentally Fuzzy Concepts Can't Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

post by VojtaKovarik · 2023-07-21T21:03:21.501Z · LW · GW · 18 comments

Contents

  Summary of the Argument
  Full Version
    Conjecture: Cooperation is Just a Word (ie, not ontologically fundamental)
    Implications
    What to Avoid
    Acknowledgments
    Disclaimers and Qualifications
None
18 comments

Epistemic status: I describe an argument that seems plausible, but could also be entirely off. IE, low confidence.

Summary of the Argument

Certain concepts (like "cooperation" or "human values") might be fundamentally fuzzy. This would have two implications: (1) We should not expect to find crisp mathematical definitions of these concepts. (2) If a crisp mathematical definition seems appropriate in one setting, we should not be surprised when it stops being appropriate in other settings.

Full Version

One of the questions discussed in connection to Cooperative AI is "what is cooperation (and collusion)". This is important because if we had clear mathematical definitions of these concepts, we could do things like: Incorporate them into loss functions when training AI. Use them to design environments and competitions to evaluate AI. Write regulations that promote cooperation or forbids collusion.

However, cooperation (and many other concepts) might be fundamentally fuzzy, such that we should not expect any such crisp definition to exist. Why might this be the case? First, some evidence for this claim is the observation that we haven’t yet managed to agree on such definition.[1]

Conjecture: Cooperation is Just a Word (ie, not ontologically fundamental)

As a second argument, consider the following story – totally made-up, but hopefully plausible – of how we came to use the word "cooperation": Over the course of history, people interacted with each other in many different situations. Over time, they developed various practices, norms, and tricks to make those interactions go better. Eventually, somebody pointed to some such practice -- in a specific situation -- and called it "cooperation". Then the word started also being applied to several analogous situations and practices. And ultimately, we now use the word "cooperation" for anything that belongs to a certain cluster (or clusters) of points in the "Behaviour x Situation space". (And all of this changes over time, as our language and world-views evolve.)[2]

Because the process which determined our usage of the word "cooperation" is quite messy and somewhat random, the boundaries of the resulting concept end up being quite complicated or "fuzzy".[3] This also means that any mathematical formula that aims to capture this concept needs to be at least complicated enough to explain all the "ad-hoc" parts of the generating process. As a result, any simple definition of "cooperation" is likely be inappropriate at least in some settings.

Implications

I expect that there is much more to be said about the formation of concepts, their fuzziness, and its implications. For now, I will make two comments.

First, fuzziness of concepts seems to be a scale: We have some extremely crisp concepts such as the prime numbers, the law of gravity, or Nash equilibrium. Towards the other end, we have things like 'all animals that live on Earth' (if you had to describe them without being able to point to them), natural languages[4], or human values [LW · GW].[5] I expect concepts such as "cooperation", "intelligence", or "fairness" to be somewhere in the middle.

Second, I think that it is reasonable to try being more precise about what we mean by cooperation (and many other concepts). More precisely, I think we can come up with concrete definitions -- even mathematical definitions -- in some specific setting. We can then try to identify a broader class of settings where the definition (a) still "makes formal sense" and (b) still captures the concept we wanted. But we should not be surprised when we encounter settings where the definition is unsuitable --- in such settings, we just come up with a different definition, and proceed as before. And by proceeding like this, we gradually cover more and more of the concept's "area" by crisp definitions.

What to Avoid

Finally, I want to explicitly warn against two things:

(1) The fallacy that "since definition D is perfect in setting S, it is suitable everywhere". That is, against (a) extending every definition to all setting where it makes formal sense without (b) checking that it captures the intended concept. I think this causes many pointless disagreements.

(2) Even more extremely, replacing the original concept by its formal definition in places where this is inappropriate, possibly even forgetting that the original concept ever existed. (An example of this is that many game theorists get so used to the concept of Nash equilibrium that they start to honestly believe that defecting against one's identical copy in a prisoner's dilemma would be the smart thing to do.)

Acknowledgments

This draft was inspired by conversations with the participants at the Cooperative AI Foundation summer retreat (July 2023). So at best, I only deserve a portion of the credit. But it is even quite possible that I heard the full idea from somebody else, then I forgot it, and then I “discovered” it later.

Disclaimers and Qualifications

My reason for writing this is not that I would think that CAIF, or other specific people, are confused about this. Instead, I think that many people already know about this, and the benefit is in making the ideas common knowledge.
Also, I don’t think I will ever turn this into a polished and engaging piece of writing. So if you find the ideas useful, feel free to rewrite or appropriate them without consulting me.

Finally, I bet that everything I write here has already been described somewhere 80 years ago, except better and in much more detail.[6] So I don’t mean to imply that this is new — only that this is something that I was confused about before, and that I wish I knew about earlier.

  1. ^

    Some weak datapoints in this direction:
    (1) As far as I know, the Cooperative AI foundation considers finding good “criteria of cooperative behaviour” to be a priority, but they haven’t yet settled on a solution (despite presumably doing a serious literature review, etc).
    (2) Intuitions definitions tend to fail in edge cases. For example, the definition “cooperation is about maximising joint welfare” would include the scenario where two optimal agents act without ever interacting with each other. Adding the requirement of interaction, we could get actions such as a rich person being forced to redistribute their wealth among the poor, which we could consider altruistic, but probably not cooperative. And we also have cases such as mafia members “cooperating” with each other at the cost of the remainder of society.
    (3) There might be some examples of behaviour that might be considered "cooperative and expected" in some cultures, but not in others. (Queuing in the UK? The amount of help one is expected to offer to distant family members?) Though note that I haven't done my lit-review due-dilligence here, so I am not sure if there actually are such differences, if they replicate, etc.

  2. ^

    In addition to the origin of concepts being messy in the way that I describe, things can also get complicated because words are ambiguous.

    To illustrate what I mean, imagine the following (totally made-up) example: Suppose that the concept of fairness originally appeared in the context of fair-play in football. Somebody then started talking about fairness in the context of family relationships. And finally person A extended football-fairness to the context of business. But at the same time, person B extended family-fairness to the same business context. We now have two different concepts of fairness in the same context, both of which are simply called "fairness".

    In the "fairness" example above, perhaps the resulting concepts end up being very close to each other, or even identical. But in other cases, this process might result in concepts that are different in important ways while still being close enough to cause a lot of confusion. In such cases, the first step should be to explicitly disambiguate the different uses.

  3. ^

    EDIT: Note that I am trying to make a non-trivial claim here. For many concepts, I would argue that concept comes first and the word-for-that-concept comes second --- this seems true for, eg, lightning or the-headphones-I-am-wearning-now. I am arguing that, to a large extent, cooperation has this the other way around. That is, the concept of [the various things we mean when saying cooperation] is mostly a function of how we use the word, and doesn't neatly correspond to some thing in the world (or low-complexity concept in the concept space).

  4. ^

    I vaguely remember a quote along the lines of: “We were working on machine translation. And over the years, every time we fired a formal linguist and replaced them by an ML engineer, the accuracy went up”. From the point of view described in this post, this seems unsurprising: if human languages are this messy and organic thing, neural networks will have a much easier time with them than rigid formal rules.

  5. ^

    The complexity of each concept seems related to the process that gave birth to the concept. For example, the complexity of our-notion-of-cooperation seems related to the number of paths that the evolution of the word "cooperation" could have taken. The complexity of human values --- the actual thing we care about in alignment, not the various things that people mean when they use the words "human values" --- seems proportional to the number of different paths that natural-evolution of humans could have taken.

  6. ^

    My guess: some areas of philosophy, and then somebody doing complexity science at Santa Fe Institute, and then a bunch of other people.

18 comments

Comments sorted by top scores.

comment by Caspar Oesterheld (Caspar42) · 2023-07-22T08:15:50.072Z · LW(p) · GW(p)

I think I sort of agree, but...

It's often difficult to prove a negative and I think the non-existence of a crisp definition of any given concept is no exception to this rule. Sometimes someone wants to come up with a crisp definition of a concept for which I suspect no such definition to exist. I usually find that I have little to say and can only wait for them to try to actually provide such a definition. And sometimes I'm surprised by what people can come up with. (Maybe this is the same point that Roman Leventov is making.)

Also, I think there are many different ways in which concepts can be crisp or non-crisp. I think cooperation can be made crisp in some ways and not in others.

For example, I do think that (in contrast to human values) there are approximate characterizations of cooperation that are useful, precise and short. For example: "Cooperation means playing Pareto-better equilibria."

One way in which I think cooperation isn't crisp, is that you can give multiple different sensible definitions that don't fully agree with each other. (For example, some definitions (like the above) will include coordination in fully cooperative (i.e., common-payoff) games, and others won't.) I think in that way it's similar to comparing sets by size, where you can give lots of useful, insightful, precise definitions that disagree with each other. For example, bijection, isomorphism, and the subset relationship can each tell us when one set is larger than or as large as another, but they sometimes disagree and nobody expects that one can resolve the disagreement between the concepts or arrive at "one true definition" of whether one set is larger than another.

When applied to the real world rather than rational agent models, I would think we also inherit fuzziness from the application of the rational agent model to the real world. (Can we call the beneficial interaction between two cells cooperation? Etc.)

Replies from: VojtaKovarik
comment by VojtaKovarik · 2023-07-22T22:00:54.405Z · LW(p) · GW(p)

Yes, I fully agree with all of this except one point, and with that one point I only want to add a small qualification.

Sometimes someone wants to come up with a crisp definition of a concept for which I suspect no such definition to exist. I usually find that I have little to say and can only wait for them to try to actually provide such a definition. And sometimes I'm surprised by what people can come up with.

The quibble I want to make here is that if we somehow knew that the Kolmogorov complexity of the given concept was at least X (and if that was even a sensible thing to say), and somebody was trying to come up with a definition with K-complexity <<X, then we could safely say that this has no chance of working. But then in reality, we do not know anything like this, so the best we can do (as I try to do with this post) is to say "this concept seems kinda complicated, so perhaps we shouldn't be too surprised if crisp definitions end up not working".

Replies from: Caspar42
comment by Caspar Oesterheld (Caspar42) · 2023-07-23T21:25:20.422Z · LW(p) · GW(p)

I mean, translated to algorithmic description land, my claim was: It's often difficult to prove a negative and I think the non-existence of a short algorithm to compute a given object is no exception to this rule. Sometimes someone wants to come up with a simple algorithm for a concept for which I suspect no such algorithm to exist. I usually find that I have little to say and can only wait for them to try to actually provide such an algorithm.

So, I think my comment already contained your proposed caveat. ("The concept has K complexity at least X" is equivalent to "There's no algorithm of length <X that computes the concept.")

Of course, I do not doubt that it's in principle possible to know (with high confidence) that something has high description length. If I flip a coin n times and record the results, then I can be pretty sure that the resulting binary string will take at least ~n bits to describe. If I see the graph of a function and it has 10 local minima/maxima, then I can conclude that I can't express it as a polynomial of degree <10. And so on. 

comment by Richard_Kennaway · 2023-07-22T10:50:10.179Z · LW(p) · GW(p)

I think this is reversing the role of words and concepts. You should not be seeking a crisp definition of a fuzzy concept, you should be seeking a crisp concept or concepts in the neighbourhood of your fuzzy one, that can better do the work of the fuzzy one.

The history of mathematics abounds in examples. Imagine yourself some centuries back and ask "what is a function?" (before the word was even introduced). Or "what is a polyhedron?". Difficulties surfaced in reasoning about these imprecisely defined things, which led to the discovery of more precise concepts that put the knowledge previously built on the fuzzy ones on a clearer and stronger foundation. Mathematicians did not discover what their fuzzy concepts of functions and polyhedra "really were", or what the words "really meant", or discover "the right definitions" for the words. They discovered new concepts that served better.

Replies from: Roman Leventov, VojtaKovarik
comment by Roman Leventov · 2023-07-22T13:01:54.538Z · LW(p) · GW(p)

In the space of psychology (cognition) and systems build of cognitive agents (such as the society), i.e., complex systems, crisp concepts "in the neighbourhood of our fuzzy ones" will tend simplify the reality too much, perhaps sometimes in detrimental or even catastrophic ways (cf. "Value is fragile [LW · GW]"), rather than amplify your prediction and reasoning power thanks to formalisation.

I've discussed these problems here [LW · GW] and the tradeoff in capabilities between formalisation and "intuitive"/"fuzzy"/connectionistic reasoning here [? · GW]. See also this recent Jim Rutt show with David Krakauer where they discuss related themes (philosophy of science and understanding in the AI age).

comment by VojtaKovarik · 2023-07-22T21:19:23.917Z · LW(p) · GW(p)

I think I agree with essentially everything you are saying here? Except that I was trying to emphasize something different from what you are emphasizing.

More specifically: I was trying to emphasize the point that [the concept that the word "cooperation" currently points to] is very fuzzy. Because it seemed to me that this was insufficiently clear (or at least not common knowledge). And appreciating this seemed necessary for ppl agreeing that (1) our mission should be to find crisp concepts in the vicinity of the fuzzy one (2) but that we shouldn't be surprised when those concepts fail to fully capture everything we wanted. (And also (3) avoiding unnecessary arguments about which definition is better, at least to the extent that those only stem from (1) + (2).)

Replies from: VojtaKovarik
comment by VojtaKovarik · 2023-07-22T21:30:34.937Z · LW(p) · GW(p)

And to highlight a particular point: I endorse your claim about crisp concepts, but I think it should be ammended as follows:

You should not be seeking a crisp definition of a fuzzy concept, you should be seeking a crisp concept or concepts in the neighbourhood of your fuzzy one, that can better do the work of the fuzzy one. However, you should keep in mind that the given collection of crisp concepts might fail to capture some important nuances of the fuzzy concept.

(And it is fine that this difference is there --- as long as we don't forget about it.)

comment by Roman Leventov · 2023-07-21T21:12:38.170Z · LW(p) · GW(p)

Note that some recent developments in category theory aim to recapitulate fuzziness (of concepts, or whatever) and talk formally (mathematically) about it, including in the context of intelligence and ontology.

comment by Jiro · 2023-07-24T14:34:15.244Z · LW(p) · GW(p)

Some concepts have a small degree of fuzziness, but also get treated as fuzzy in situations that aren't very far into the fuzzy range. How do you handle this? Or to put it in another way, if someone said "cooperation is a fuzzy concept, so you have no way to deny that I am cooperating", is it possible to deny that they are cooperating anyway?

(Or to put it yet another way, trans issues.)

Replies from: VojtaKovarik
comment by VojtaKovarik · 2023-07-24T15:25:25.555Z · LW(p) · GW(p)

Do you have a more realistic (and perhaps more specific, and ideally apolitical) example than "cooperation is a fuzzy concept, so you have no way to deny that I am cooperating"? (All instances of this that I managed to imagine were either actually complicated, about something else, or something that I could resolve by replying "I don't care about your language games" and treating you as non-cooperative.)

Replies from: Jiro
comment by Jiro · 2023-07-24T15:58:42.243Z · LW(p) · GW(p)

The actual example I was thinking of is the trans issues (about which we've had a number of posts by someone else). In a sense, "woman" is a fuzzy concept, but the demand "if someone claims to be a woman, you must consider them to be one" isn't limited to the fuzzy areas, but is often justified by reference to the fuzzy areas.

Replies from: VojtaKovarik
comment by VojtaKovarik · 2023-07-24T16:33:39.352Z · LW(p) · GW(p)

Ok, got it. Though, not sure if I have a good answer. With trans issues, I don't know how to decouple the "concepts and terminology" part of the problem from the "political" issues. So perhaps the solution with AI terminology is to establish the precise terminology? And perhaps to establish it before this becomes an issue where some actors benefit from ambiguity (and will therefore resist disambiguation)? [I don't know, low confidence on all of this.]

comment by KNakamura · 2023-07-23T00:04:45.909Z · LW(p) · GW(p)

There is a decent bit in Dugatkin & Reeve 1998 on this (emphasis mine):

I will define cooperation as follows: Cooperation is an outcome that—despite potential costs to individuals—is "good" (measured by some appropriate fitness measure) for the members of a group of two or more individuals and whose achievement requires some sort of collective action. But to cooperate can mean either to achieve that cooperation (something manifest at the group level) or to behave cooperatively—that is, to behave in a manner making cooperation possible (something the individual does), despite the fact that the cooperation will not actually be realized unless other group members have also behaved cooperatively. Here, to cooperate will always mean to behave cooperatively (as in Mesterton-Gibbons & Dugatkin 1992; Dugatkin et al. 1992; also see Stephens and Clements, this volume).

If someone's definition of cooperation includes a proxy for the definition of the word "good" then let's start at the definition of something being "good" in a concrete sense and move forward from there.

To that end I agree the term isn't ontologically basal. Instead I'd work at qualifying the state of the interaction and using cooperation as an inverse to an alternative. Is commensalism a type of cooperation?

These are all in relative context though, so you could layer on complexity until you're trying to dig at the genetic basis of ecological phenotypes - something you can arguably prove - but the model is only as useful as what it predicts.

Therefore, I don't think implication (1) or (2) follow from the premise, even if it is correct.

Replies from: VojtaKovarik
comment by VojtaKovarik · 2023-07-23T10:51:38.067Z · LW(p) · GW(p)

Therefore, I don't think implication (1) or (2) follow from the premise, even if it is correct.

To clarify: what do you mean by the premise and implications (1) and (2) here? (I am guessing that premise = text under the heading "Conjecture: ..." and implications (1) or (2) = text under the heading "Implications".)

Replies from: KNakamura, KNakamura
comment by KNakamura · 2023-07-23T15:59:46.571Z · LW(p) · GW(p)

😆 and just for fun, in relation to your footnote 6, I don't know much about Dugatkin's associations but to the best of my knowledge Reeve is related to the Santa Fe Institute through his collaboration with Bert Hölldobler who is part of the SIRG at ASU

comment by KNakamura · 2023-07-23T14:12:19.387Z · LW(p) · GW(p)

Correct, I am suggesting that fuzzy concepts can and should be strictly defined mathematically, and within the limits of that mathematical definition it should hold true to be generally useful within the scope of what it was constructed for.

To use a loose mathematical analogy, we can use the definition of a limit to arrive at precise constraints, and generate iff theorems like L'Hôpitals to make it easier to digest. Cooperation in this case would be an iff theorem, with more basal concepts being the fallback. But for the model to be useful, the hypothesis of the theorem needs to absolutely suggest the conclusion.

Edit: What I asserted in my last sentence isn't strictly true. You could find utility in a faulty model if it is for hypothesis generation and it is very good at it.

comment by Noosphere89 (sharmake-farah) · 2023-07-22T16:06:59.261Z · LW(p) · GW(p)

I think a more likely scenario turns out to be this:

Either there are precise mathematical concepts for what someone is talking about, or it doesn't actually correspond to anything, but that the general case is exponential or worse in say propositional logic, and the more expressive logics to get at more concepts gets worse asymptotics very fast. You also need the constants and polynomial functions to play nicely in order for it to be tractable, for a finite being, so there's that.

In essence, what I'm claiming is that logic/mathematics is way harder in the general case than people think, and that this explains why we can't use crisp concepts in our everyday lives, or why we can't generalize math/logic everywhere, as it will make everything very difficult to do by default.

Also, let me address this, while you're at it:

(An example of this is that many game theorists get so used to the concept of Nash equilibrium that they start to honestly believe that defecting against one's identical copy in a prisoner's dilemma would be the smart thing to do.)

What are we assuming here, because your assumptions will dictate how you react here.

Logic/mathematics are still more useful than people think, but are also complicated even if it's tractable.