What program structures enable efficient induction?

post by Daniel C (harper-owen) · 2024-09-05T10:12:14.058Z · LW · GW · 4 comments

Contents

  A simple model of meta/continual learning
  Quick thoughts
  Why this might be relevant for alignment
None
4 comments

previously: My decomposition of the alignment problem [LW · GW]

A simple model of meta/continual learning

In the framework of solomonoff induction, we observe an infinite stream of bitstring and we try to predict the next bit by finding the shortest hypothesis which reproduces our observations (some caveats here [LW(p) · GW(p)]).  When we receive an additional bit of observation, in principle, we can rule out an infinite number of hypotheses (namely all programs which didn't predict our observation) which creates an opportunity to speedup our induction process for future observations. Specifically, as we try to find the next shortest program which predicts our next bit of observation, we can learn to skip over the programs that have already been falsified by our past observations. The process of "learning how to skip over falsified programs" takes time and computational costs upfront, but it can yield dividends [LW · GW] of computational efficiency for future induction.

This is my mental model for how agents can "learn how to learn efficiently": An agent who has received more observations can usually adapt to new situations quicker because more incorrect hypotheses can be ruled out already, which means there's a narrower set of remaining hypotheses to choose from. 

More generally,  an important question to ask is given that the underlying space of remaining hypotheses is constantly shrinking as we receive new observations, what sorts of data structures for representing hypothesis should we use to exploit that? How should we represent programs if we don't just want to execute them, but also potentially modify them into other plausible hypothesis? If a world model is selected based on its ability to quickly adapt to new environments, what is the type signature of that world model?

Quick thoughts

Why this might be relevant for alignment

Transformative AI will often need to modify their ontologies in order to accomodate new observations, which means that if we want to translate our preferences over real world objects to the AI's world model, we need to be able to stably "point" to real world objects despite ontology shifts. If efficient learning relies on specific data structures for representing hypotheses, these structures may reveal properties that remain invariant under ontology shifts. By identifying these invariant properties, we can potentially create robust ways to maintain our preferences within the AI's evolving world model. 

Furthermore, insofar as humans utilize a similar data structure to represent their world models, this could provide insights into how our actual preferences remain consistent despite ontology shifts, offering a potential blueprint for replicating this process in AI.

4 comments

Comments sorted by top scores.

comment by lukehmiles (lcmgcd) · 2024-09-08T04:34:33.098Z · LW(p) · GW(p)

Any ideas?

Replies from: harper-owen
comment by Daniel C (harper-owen) · 2024-09-08T23:49:39.880Z · LW(p) · GW(p)

Yes, I plan to write a sequence about it some time in the future, but here are some rough high-level sketches:

  • Basic assumptions: Modularity implies that the program can be broken down into loosely coupled components, for now I'll just assume that each component has some "class definition" which specifies how it interacts with other components; "class definitions" can be reused (aka we can instantiate multiple components of the same class); each component can aggregate info from other components & the info they store can be used by other components
  •  Expressive modularity: A problem with modularity is that it cuts out information flow between certain components, but before we learn about the world we don't know which components are actually independent, & the modularity of the environment might change over time, so we need to account for that.
    • As a basic framework, we can think of each component as having transformer-style attention values over other components, modularity means that we want the "attention values"(mutual info) to be as sparse as possible
    • Expressivity means that those "attention values"  should be context dependent (they are functions of aggregate information from other components)
    • A consequence of this is that we can have variables that encode the modularity structure of the environment which influence the attention values(mutual info) of other variables
      • One example is the eulerian vs lagrangian description of fluid flow: the eulerian description has a fixed modularity structure because each region of space has a fixed markov blanket, but the lagrangian structure has a dynamic modularity structure because "what particles are directly influenced by what other particles" depends on the positions of the particles which change over time. We want to our program be able to accomodate both types of descriptions
    • We can get the equivalent of "function calls" by having attention values over "class definitions", so that components can instantiate computations of other components if it needs to. This is somewhat similar to the idea of lazy world-modelling [LW(p) · GW(p)]
  • Components that generalize over other components: Given modularity, the main way that we can augment our program to accomodate new observations is to add more components (or tweak existing components), this means that the main way to learn efficiently is to structure our current program in a way such that we can accomodate new observations with as few additional components as possible
    • Since our program is made out of components, this means we want our exsting components to adapt to new components in a generalizable way
    • Concretely, if we think of each "component" as a causal node, then each causal node  should define a mapping  from another causal node  to the causal edge . This basically allows each causal node to "generalize" over other causal nodes so that it can use information from them in the right ways
  • Closing the loop: On top of that, we can use a part of our program to encode a compressed encoding of additional components (so that components that are more likely will be higher in the search ordering). Implementing the compressed encoding itself requires additional components, so that changes the distribution of additional components, & we can augment the compressed encoding to account for that (but that introduces a further change in distribution, and so on and so on...)
  • Relevance to alignment(highly speculative): Accomodating new observations by adding new components while keeping existing structures might allow us to more easily preserve a particular ontology, so that even when the AI augments it to accomodate new observations, we can still map back to the original ontology

Note: I haven't thought of the best framing of these ideas but hopefully I'll come back with a better presentation some point in the future 

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2024-09-11T00:47:26.690Z · LW(p) · GW(p)

I think that this would make a very nice sequence, and despite all my discussion with you, I'd absolutely like to see this sequence carried out.

Replies from: harper-owen
comment by Daniel C (harper-owen) · 2024-09-11T10:55:56.065Z · LW(p) · GW(p)

Thanks! :)