Interoperable High Level Structures: Early Thoughts on Adjectives

johnswentworth

Interoperable High Level Structures: Early Thoughts on Adjectives

post by johnswentworth, David Lorell · 2024-08-22T21:12:38.223Z · LW · GW · 1 comments

  The Problem
  Two Previous Models: Naturality Over Objects vs Features
  Referents of Adjectives?
  One Implication: Adjectives Tend To Be Less Natural (Convergent) Than Nouns?
  How This Fits Into The Broader Gameplan
None
1 comment

Meta: This post is a relatively rough dump of some recent research thoughts; it’s not one of our more polished posts, in terms of either clarity or rigor. You’ve been warned.

The Interoperable Semantics [LW · GW] post and the Solomonoff Inductor Walks Into A Bar [LW · GW] post each tackled the question of how different agents in the same world can coordinate on an ontology, so that language can work at all given only a handful of example usages of each word (similar to e.g. children learning new words). Both use natural latents [LW · GW] as a central mathematical tool - one in a Bayesian probabilistic framework, the other in a minimum description length framework. Both focus mainly on nouns, i.e. interoperable-across-minds clusters of “objects” in the environment.

… and the two propose totally different models. In one, the interoperability of cluster labels (i.e. nouns) follows from natural latent conditions over different features of each object. In the other, interoperability follows from natural latent conditions across objects, with no mention of features. The two models are not, in general, equivalent; they can’t both be both correct and complete.

In this post, we’ll propose that while the natural latent conditions over objects still seem to intuitively capture the rough notion of nouns, the natural latent conditions over features seem much better suited to adjectives. We’ll briefly lay out two different potential ways to use natural latents over features as semantic values for adjectives. Then we’ll talk a bit about implications, open threads and how this fits into a broader research gameplan.

The Problem

When children learn language, the cognitive process seems to go:

Observe the world a bunch
… organize knowledge of the world according to some categories, concepts, ontology, etc
… those categories, concepts, ontology, etc match other humans’ categories, concepts, ontology, etc reasonably well
… so it only takes a handful of examples (1-3, say) of the use of a given word in order for the child to learn what the word refers to.

The crucial point here is that the categories/concepts/ontology are mostly learned before a word is attached; children do not brute-force learn categories/concepts/ontology from “labeled data”. We can tell this is true mainly because it typically takes so few examples to learn the meaning of a new word.

The big puzzle, then, is that different humans learn mostly approximately the same categories/concepts/ontology - i.e. the same “candidates” to which words might point - as required for language to work at all with so few examples. How does that work? Mathematically, what are those “interoperable” categories/concepts/ontology, which different humans mostly convergently learn? How can we characterize them?

Or, somewhat earlier on the tech tree: can we find even a single model capable of accounting for the phenomenon of different minds in the same environment robustly converging on approximately the same categories/concepts/ontology? Forget whether we can find a model which correctly captures the ontology converged upon by humans, can we even find any model capable of accounting for any sort of robust ontological convergence? Can we find such a model for which the convergent ontology even vaguely resembles the sorts of things in human language (nouns, verbs, adjectives, etc)? What would such a model even look like?

That’s roughly the stage we’re at in this post.

Two Previous Models: Naturality Over Objects vs Features

Our main tool is (deterministic) natural latents [LW · GW]. The usage looks like:

Suppose the different minds each look for (and find) a latent variable which satisfies the natural latent conditions over some lower-level variables - i.e. the latent approximately mediates between the lower-level variables (“mediation”), and the latent can be estimated well from any of the lower-level variables (“redundancy”).
Then the uniqueness of natural latents says that the latent variables found by different minds will be approximately isomorphic.^[1]

The big degrees of freedom here are which lower-level variables (or sets of variables) to look for a natural latent over, and what to assign words to once the natural latent(s) are found (noting that e.g. the fact that a particular apple is red is distinct from the general concept of redness).

One option: suppose the minds already have some convergent way to pick out “objects” (like e.g. the criteria here [LW · GW]^[2]). We represent each object with one random variable, consisting of the entire low-level geometry of the object - presumably mostly unknown to the mind, which is why it’s a random variable. The mind can then cluster together objects whose geometries share a nontrivial natural latent. For instance: presumably there are some general properties of mai tais, such that most mai tais are approximately informationally independent if one knows all those properties (which, to be clear, nobody does), and those properties can in-principle be well estimated by intensive study of just one or a few mai tais. Think properties like e.g. the chemical composition, to within the amount mai tais typically vary. Those properties would constitute a(n approximate) natural latent across mai tais. And since existence of a natural latent is nontrivial (they don’t always exist, to within any specified precision), a mind could perhaps discover the category of mai tais - along with many other categories - by looking for sets of objects over which a natural latent exists to within a good approximation^[3].

That’s the core idea of the “natural latent across objects” approach from Solomonoff Inductor Walks Into A Bar [LW · GW].

A different option: suppose the minds already have some convergent way to pick out “features” of an object. For instance, maybe spacetime is a convergent concept, and therefore there is a convergent notion of what it means to view the states of “small” - i.e. spatially-localized - chunks of an object’s geometry as “features”. Then the minds might look for natural latents across some of the features. For instance, by looking at any one of many different small parts of a car’s body, one can tell what color the car is. Or, by close study of any one of many different small parts of a tree, one can reconstruct the tree’s genome (i.e. via sequencing). So, a mind might conclude that e.g. the color of a car or the genome of a tree are natural concepts. (Though that still doesn’t tell us how to assign words - the fact that a car is red might be a natural latent over the some features of the car, but that natural latent is not itself the general concept of red; it’s an instance of redness.)

That’s the core idea of the “natural latent across features” approach from Interoperable Semantics [LW · GW].

Notice that the “natural latent across objects” example - discovering the category of mai tais - sounds like a way of discovering concepts which are assigned noun-words, like e.g. the referent of “mai tai”. On the other hand, the “natural latent across features” examples sound like a way of discovering properties of an object which are natural concepts - the sort of properties which are typically described by adjectives, e.g. color. (Though note that at this point we haven’t suggested how specifically to use adjective-words to describe those properties.)

Referents of Adjectives?

Let’s take natural latents across features as our operationalization of “properties” of an object. Adjectives typically describe an object’s properties. So, can we find some natural candidate adjective-referents to “describe” natural latents over features of many different objects?

The core difficulty in going from “properties of an object = natural latents over features of the object” to a model of adjectives is that each adjective applies to many different objects. There are many different red things, many different smooth things, many different metallic things, etc. But a natural latent over features of one object is of a different type than a natural latent over features of another object; without some additional structure, there’s no “interoperability across objects”, no built-in sense in which the redness of an apple is “the same kind of property as” the redness of a car, or even the redness of a different apple.

We currently have two plausible approaches.

First approach: assume that the different objects already have features which are interoperable across objects, so that it’s meaningful to mix the random variables for each object. We construct a new variable whose features take on all the feature-values of object 1 with probability , the feature-values of object 2 with probability $α_{2}$ , etc. We can then show that any natural latent over the features of the “mixed” object which robustly remains natural as the mixture coefficients change is also natural over the features of each individual object. [LW · GW] If that latent is e.g. color, then it allows us to talk about “the color of object 1”, “the color of object 2”, etc, as natural latents over their respective objects which all “look at the same thing” (i.e. color) in a meaningful sense. In particular, since those natural latents over different objects all take on values from the same set, it makes sense to talk about e.g. many different objects all being “red”; “red” refers to one of the values which that latent variable can take on.

Second approach: instead look for sets of features across objects which share a natural latent. The hope is that e.g. all the different features of different objects from which redness can be backed out, all share a single natural latent encoding redness. Then, when we say a bunch of different objects are all “red”, that means that the redness latent is natural over a bunch of different features of each of those different objects.

For both approaches, we’re still not sure exactly what things should be conditioned on, in what order. For instance: if we’re doing some clustering to identify category types for nouns, should natural latents for adjectives be natural conditional on the noun latents? Insofar as redness is natural over little parts of the skin of an apple, is it natural conditional on the fact that the apple is, y’know, an apple? Alternatively (and not necessarily mutually exclusively) do we choose which objects to consider apples by looking for a cluster of objects which have a natural latent (representing general facts about apples) conditional on properties like redness of each individual apple? In general, which latents are supposed to be natural given which others? We’re pretty uncertain about that right now.

One Implication: Adjectives Tend To Be Less Natural (Convergent) Than Nouns?

When talking about “features” above, we gave the example of small spatially-localized chunks of an object’s geometry. And that is one reasonable and probably-convergent way to break (some kinds of) things up into features. But it’s an obvious guess that humans rely significantly on “features” which are more specific to our sensory modalities - properties like color, texture, sweetness/saltiness/bitterness, temperature, etc, are quite loaded on specific human senses.

“Properties are natural latents across features” still makes sense for these properties - e.g. for most objects most of the time, you will feel approximately the same temperature if you put your hand on different parts of the object, so the temperature is a natural latent over the “features” consisting of heat-sensation from different parts of the object. Another example: there are many different little chunks of our visual field which light up red when looking at a red object.

And for purposes of communication between humans, naturality across such features is fine. However, it’s less obvious whether such concepts will be natural/convergent for other kinds of minds with other sensory modalities; “red” is presumably less natural a concept for e.g. the congenitally blind, let alone aliens or AI with entirely different senses and minds.

This seems less relevant for nouns. Yes, it will sometimes happen that an agent’s sensory apparatus fails to distinguish between two natural clusters, but that seems more corner-case-ish than for adjectives. After all, if we’re identifying noun-clusters based on the existence of natural latents over objects in each cluster, then to fail-to-distinguish two natural clusters we’d have to somehow miss a whole big bundle of information.

How This Fits Into The Broader Gameplan

The point of this semantics theorizing is to get another angle on the types of interoperable data structures - things like natural latents, but also higher-level structures built of natural latents, and potentially entirely different kinds of structures as well. If e.g. ambitious AI interpretability is ever to work, “interoperable” structures which will need to be found and surfaced inside of AI minds: structures which both accurately (though maybe approximately) represent the AI’s internal concepts, and match corresponding concepts in human minds.

^{^}
The main loophole to uniqueness of natural latents is divergence between the minds’ models [LW · GW]. Currently, our best way to close that loophole is simply to switch to the minimum description length formulation, which does not seem to suffer from any analogous problem. That said, we will use the probabilistic formulation in this post, not the minimum description length formulation, mainly because the “mixtures” used later in this post are less intuitive in the minimum description length formulation.
^{^}
… though we’re not going to restrict our examples in this post to rigid bodies.
^{^}
… perhaps conditional on some other convergent information

1 comments

Comments sorted by top scores.

comment by tailcalled · 2024-08-23T10:25:03.238Z · LW(p) · GW(p)

A distinction between objects (nouns) and features (adjectives) is that objects contain their own potential energy barriers that maintain the states of the objects, whereas features are often either generated by the objects they are features of, or generated by exogenous factors. Any given object can carry a lot of features, but it's still just one object, so with features you have a source separation problem that you don't have for objects.

That said, once you go away from simple objects and up to more advanced objects like people or computer files or organizations, most of the interesting dynamics are generated by features which originate from exogenous factors.

If you just want to predict the flow of features, you can do best by memorizing how well different kinds of features tend to persist or spread. However, if you want to act on these flows, the features themselves are often too big and tough to act on, and instead one needs to identify the root sources.

Interoperable High Level Structures: Early Thoughts on Adjectives

Contents

The Problem

Two Previous Models: Naturality Over Objects vs Features

Referents of Adjectives?

One Implication: Adjectives Tend To Be Less Natural (Convergent) Than Nouns?

How This Fits Into The Broader Gameplan

1 comments