What makes a theory of intelligence useful?

post by Cole Wyeth (Amyr) · 2025-02-20T19:22:29.725Z · LW · GW · 0 comments

Contents

    Robustness of the definitions
    Self-optimization
    Continuous relaxation
  Discussion 
    Action Theory
    Policy Theory
    Agent Theory
  Conclusions
None
No comments

This post is a sequel to "Action theory is not policy theory is not agent theory." [LW · GW] I think this post is a little better, so if you want to start here you just need to know that I consider an action theory to discuss choosing the best action, a policy theory to discuss the best decision-making policy when the environment may directly read your policy, and an agent theory to consider the best way to build an agent physically (which goes beyond policy theory by even caring about computational boundedness etc.).  

Epistemic status: Attempting to work through confusion about embedded and acausal decision theories. 

It is tempting to search for some underlying core algorithm of intelligence. In the best case, one might hope for (say) the sort of algorithm that might fit in a half page of a textbook with appropriate compression into elegant mathematical notation. Arguably, the closest thing we have is called AIXI and more specifically its -approximation-in-the-limit, derived by Jan Leike and Marcus Hutter (I think this is better than AIXI-tl). Computational cognitive scientists also have a longstanding quest to understand human intelligence through a unified Bayesian theory (e.g. the free energy principle). I think it's important not to equate these objectives though - there may be a concise theory of intelligence whether or not the human brain actually uses it (I intend to discuss this distinction at greater length in its own post). Relatedly, there may be a tradeoff curve between the elegance and generality of a theory of intelligence and its practical usefulness. Here I will focus focus on the usefulness of theories of intelligence, particularly distinguishing their descriptive applicability to superintelligence and their tractability as implementable advice for building AGI.  

As a simple example, the framework of Markov Decision Processes (MDPs) is, in my opinion, much less mathematically elegant and much more limited than AIXI. However, it has probably been far more useful in practice (though this situation may be changing as A.I. becomes more powerful and general). Briefly, MDPs encode independence assumptions that are satisfied exactly for a wide class of interesting tasks such as video games, and these simplifying assumptions allow lots of nice practical algorithms for MDPs to be invented. In some cases these algorithms even work okay on tasks that slightly violate the assumptions (such as robotics applications - though these typically generalize the framework a bit to allow partial observability -> POMDPs). This is the success story of reinforcement learning.

Personally I have a bit of a flinch response to Markov assumptions - I don't really expect them to be on the path to general intelligence. Though I wouldn't be shocked if the first AGI had a built-in POMDP-solving module, I would expect an AGI to use this framework only insofar as it invented the POMDP abstraction itself and found it useful for higher-level reasons. 

Instead, I tend to draw the line somewhere around AIXI: that is, I believe that Action Theory is an important practical tool for building and understanding AGI. Here I mean Action Theory in my narrow sense which includes CDT and loosely EDT (hopefully you read the prerequisite post [LW · GW]). I recognize that these theories do not solve all possible problems of embedded agency in Tegmark's Level 4 mathematical multi-verse [LW · GW]. Some of those problems may even be important, but I doubt that precise solutions exist at all, and I am pretty confident we don't need to find all of them to build the first aligned AGI. 

As a disclaimer, I may be subject to motivated reasoning in that I worry the situation is not winnable if I am wrong. Also, I may be guilty of a bit of mind-projection here; I am much more confused about FDT than CDT and AIXI, but perhaps they actually have clean theories out there somewhere which I am simply unaware of or do not understand. In fact I am certain this is distorting my perspective to some degree, but I suspect not enough to invalidate the central claims of this post.

The general question of when an elegant theory is useful in practice is a bit beyond my scope, and ties in to the map-territory distinction and pretty much the entirety of the sequences. Here I am concerned specifically with the levels of analysis for theories of intelligence: action theory, policy theory, and agent theory. I want to consider some factors that might make these theories more or less useful as practical and conceptual tools. These factors have slightly different weights for capabilities and alignment applications, but I don't think the difference is actually massive. Also, I am not focused to the point of fixation on philosophical truth here - at the outset, I acknowledge that anything we ever build will be an embedded agent. The risks of ignoring this (or circumventing it with patches) as we scale an AGI to superintelligence are also mostly out of scope for this post, except that it is important to keep them in mind as an originating motivation for policy theory and agent theory.

  Now, here are the considerations I have identified:

Interestingly, it seems that good behavior according to each of the later two standards seems to be somewhat of a solution to deficiencies according to the previous standard (that is, powerful self-optimization can correct fragile definitions and a good continuous relaxation can be a particular solution to self-optimization).  

Robustness of the definitions

This standard asks: do the specific arbitrary choices made by the mathematical model matter?

For example, the Turing machine model is very robust because specific choices like the number of tape symbols, whether the tape is bidirectional, etc. do not affect the resulting notion of computable functions at all. When runtime is neglected (and, uh, possibly-but-probably-not otherwise) even the determinism of the transition rules does not matter. Also, even completely different models such as the lambda calculus turned out to be equivalent to the Turing machine model in the sense of bi-simulation. This is kind of the best-case scenario for robustness (in fact, it even helpfully leaks robustness to algorithmic information theory -> Solomonoff -> AIXI, though unfortunately some has been lost along the way on the right-hand side).

Beyond the model itself being robust to arbitrary changes in definitions, its axioms should also fail gracefully. A high grade here is probably why linear algebra is useful in practice - most things we care to optimize are smooth so at least approximately linear. I think this is philosophically a part of robustness and deserves mention here, but it might be easier to see how it works when we discuss self-optimization below. 

Some degree of robustness is important for doing mathematics to be useful; otherwise none of your assumptions will be satisfied and you are just pushing symbols around. 

Self-optimization

It should be clear how any under-specified parameters in the model can be learned, at least in principle, through interaction with our world. A theory of intelligence is usually supposed to work flexibly in a wide variety of circumstances (to some extent this is what intelligence is for). However, increased generality comes at the cost of hardcoding narrow abilities - at least for evolution, but I think this is also relevant for models, because if a lot of stuff needs to be hardcoded our models might not have much explanatory power. For me this principle invokes the vibe of uniform versus non-uniform models of computation - a circuit is essentially just a hardcoded computation for a fixed input size, and can probably[1] be more compressed/fast than a program which must work at each input size. For this generality tax to be worth it, there should be some mechanism to dump experience/observations/data into an agent/predictor/algorithm and rapidly tune it to its specific circumstances. Also, you should be able to see how hardcode some stuff out of the box without breaking the generality. 

A good example of this is Bayesian probability theory. A subjectivist often has trouble specifying priors, but various agreement and merging of opinions results show that different priors tend to converge anyway, and quickly. 

Continuous relaxation

Essentially all modern machine learning algorithms rely on routing gradients through things to optimize them. The most available example is deep learning, but it seems that most civilization-carrying industrial optimization algorithms also work this way. My tentative conclusion is that discrete optimization over the real world is computationally intractable but continuous optimization works in practice. I am not sure exactly why this is true at a fundamental level, but if you think it isn't true, can your model win some best paper awards at ICML? I see no signs of this situation changing anytime soon.  

This standard may be a bit more controversial that the others (and in fact I think it is basically subservient to them), but it strongly informs my intuitions about which agent foundations approaches have a chance of eventually becoming useful. 

Discussion 

To illustrate these standards, I will apply them to the action/policy/agent theory gradation. Notably (as pointed out by @abramdemski [LW · GW]), this is not the only axis along which theories of intelligence can abstract / make simplifying assumptions. For instance, even within the discussion below, I am often forced to distinguish between e.g. purely predictive and "reactive" or truly sequential situations. 

Action Theory

Action theory has already demonstrated its practical usefulness through countless applications. In restricted forms (particularly POMDPs) it is the basis for ~all classical game playing algorithms, robotics, and recently even LLM agents through (apparently) RLHF. As these approaches become more general, they start to look more like AIXI (with base foundation models as approximations to Solomonoff induction in context and tree-of-thoughts replacing AIXI's argmax tree). MuZero is also notable as pretty obviously inspired by AIXI and highly performant. So how does action theory (particularly its "universal" algorithmic information theory instantiation) fair by our standards?

When restricted to pure prediction, we know that Solomonoff induction has a highly robust definition: it can be formulated as the universal a prior distribution (piping noise through a UTM) or a universal Bayesian mixture. In either case, it doesn't really matter which UTM you pick (the choice gets washed out quickly as an instance of Bayesian "merging of opinions"). This can also be viewed as satisfying the self-optimizing criterion. I would argue (in fact, DeepMind and my group at the University of Waterloo have argued related but distinct points) that transformers succeed because their self-supervised teacher-forcing training method is a non-obvious, stable, continuous relaxation of the prequential problem for which Solomonoff induction is the general solution.  

I consider logical induction (LI) to be a particular bounded approximation of Solomonoff induction which has additional self-optimization guarantees at small finite times, particularly regarding self-consistency/prediction/trust. However, I find it hard to see how it could have a good continuous relaxation?[2] For that reason, though it is conceptually interesting and perhaps useful, I am not sure that it is a load-bearing step on the path to understanding real intelligent systems. I weakly expect that "logical" symbol manipulation should best be handled by a decision theoretic agent operating on its own mental workspace rather than being a built in feature of an A.G.I. - I see how this process could be relaxed.

As I have mentioned above, AIXI's choice of UTM can matter if it is sufficiently bad. AIXI has self-optimizing results, but they require its environment class to be restricted. Personally, I suspect that providing extensive interaction histories to AIXI can overcome these limitations - that is, offline reinforcement learning can succeed where setting an agent loose to explore fails. Michael Cohen has made a similar argument for imitation learning. Policy optimization methods like policy-value Monte-Carlo tree search can be viewed as a partial continuous relaxation of the action part of AIXI. Overall, it is clear that the AIXI model is somewhat less successful by each standard than Solomonoff induction, but at least there is some apparent path to meeting these challenges.   

Overall, I think that action theory is unambiguously useful as a conception framework, and I think it may be a core part of future AGI systems - though they should be flexible enough to invent and use more sophisticated theories of intelligence themselves.

Policy Theory

As mentioned in the prerequisite post, I am not aware of practical applications for any policy theory[3].

Yudkowsky's preferred form of functional decision theory (FDT) is explicitly a policy theory, though as far as I can tell the main implementation difficulties already arise for indirect (non-causal) impacts of actions, which I would normally classify as lying somewhere between the action/policy levels. I am fairly convinced that within its domain of applicability, FDT is the correct theory of intelligence. However, I would guess that is difficult to explicitly implement a functional decision theorist primary because of issues with determining tractably the logical (subjunctive) connection between one's actions and the actions of another agent in practice, and probably such connections rapidly degrade in complex connectionist systems that are not exactly the same - this issue is related to robustness, but may not be easily addressed through self-optimization.  I am not sure how sensitive FDT is to definitions (it seems UDT is one formalization which already has various inequivalent versions). Of course, there is also no known continuous relaxation. But I think that the main obstacle to FDT being useful is that the domain of problems where it outperforms CDT is simply very narrow when agents are not mathematically precise simple computer programs that can be perfectly cloned.

A more concrete implementation, updateless decision theory [LW · GW] (UDT), depends on logical inductors as a core component. I am guessing UDT is probably fairly robust to e.g. the choice of computational model, but I worry that it inherits the "continuous relaxation" issues of LI. Also, I worry that anything inherently designed to operate within Tegmark's Level 4 Multiverse will be difficult to self-optimize to our specific universe.

Apparently infra-Bayesianism (IB) and infra-Bayesian physicalism have types of self-optimization guarantees, at least in environments without traps. I do not know enough about IB to judge its performance against my standards, but I see no a priori reason that it can't satisfy all of them. However, it does seem to import the worst-case obsessed pessimism of frequentism, which I am not sure I philosophically agree with (though see a potential justification from AIT [LW · GW]).  

As a non-expert it is difficult for me to judge today's policy theories, but my intuition is that they will generally under-perform action theory in practice, and probably will not be built into the core logic of any early AGI. However, I wouldn't be surprised if they were discovered and adopted by an A.G.I. as it scales past human intelligence. It's not clear whether this runs a risk of breaking alignment - I mean it's not like humans adopt FDT/TDT and then start acting in insane ways that we wouldn't endorse - okay there is that ONE, err, a few[3] examples, but I'm sure that won't keep happening.

Agent Theory

All of robotics is concerned with agent theory in some sense, but it seems to be more focused on engineering than theory. General agent theories such as space-time embedded intelligence do not seem to have inspired any algorithms. 

In its most extreme form[4], space-time embedded intelligence (STEI) as formulated by Orseau and Ring treats the agent as a collection of bits on a Turing machine's tape and asks to find the best way to set those bits (in terms of ~expected lifetime reward). I see no reason to think that the answer is independent of changes in the computational model, even among choices of UTM. The best agent embedded in Conway's game of life specifically probably looks totally different still. 

I do not expect self-optimizing guarantees to hold because the strongest overall agent may not work in our universe at all. Similarly, the best program for one computer may not run on a computer with a different OS - so this criticism applies to Stuart Russell's bounded rationality framework in general beyond STEI. Naturally, we would hardcode a lot of initial knowledge into our agent in practice. For instance, we might only search over syntactically correct Python programs for our agent's "core" intelligence. Roboticists carefully select the sensors to provide along with extensive firmware for those sensors. All of these "implementation details" can be viewed as hardcoding facts about our universe. 

Evolutionary algorithms may be viewed as an approximation to agent theory. By running them inside physics simulators, one arguably optimizes directly for the target suggested by agent theory, heavily restricting the class of environments considered by STEI to look a lot more like our universe. Unfortunately, this risks killing a lot of the generality we expect out of intelligent systems - the simulator will never exactly match the real world and this approach doesn't seem to guarantee effective transfer, which is a failure of self-optimization. Also, I am not aware of good continuous relaxations for evolutionary algorithms (that don't start to look a lot less inspired by agent theory). This family of approaches seems to have fallen well behind the state-of-the-art. 

I expect agent theory to be a useful conceptual tool for A.G.I. designers in some cases, but perhaps only as an extreme point which is commonly understood to be too idealized (even beyond AIXI) to serve as a target. I don't expect even superintelligences to derive many load-bearing theorems about agent theory.   

Conclusions

When prioritizing agent foundations research programs, I believe it is important to have a theory of impact that addresses how these standards might eventually be met. When carrying out basic research and "de-confusion," it is okay (and even expected) for practical approximations to appear distant. However, particularly if you believe timelines are short, you should be regularly asking yourself whether your theory of intelligence can be made useful.

In fact, I might even go as far as to say that conceptual algorithmic progress on AI may be net positive for alignment. An LLM is essentially a massive black box that apparently figures out how to do a ton of decision-theoretic stuff by almost completely unknown means - the only facts we know are on the level of "the first outer training loop looks like it is approximating Solomonoff induction, then we hit it with RLHF and who knows what happens." Insofar as decision theory is actually giving us more insight into the workings and principles of artificial intelligence, it should be able to let us carve this black box up into smaller black boxes, and say things like "this is the Solomonoff-approximation module, passing it to the planning layer we distill search into the policy self-prediction module that recursively guides search..." and eventually break those black boxes down further until we actually know about how the components work. This is hard and requires real insight because the alignment tax is usually high - there's massive value in allowing one big neural circuit to flexibly integrate, ammortize, and combine multiple functions. That means you really need to know what you're doing to make capabilities progress this way, which is actually a good thing - we should only be able to make capabilities progress by knowing what we are doing.    

Perhaps even (most) agent foundations researchers should have a theory->implementation cycle in mind for each project. For instance, decision theory math from five years ago should typically become an ICML paper today (even if the final product is from a different set of authors). Otherwise, it is hard to see how any alignment solutions resting atop a sophisticated decision theory will ever be implemented in time - the decision theory has to be tested first

  1. ^

    Sorry, our civilization and also I in particular are still pretty ignorant about computational complexity theory, and its often hard to say anything definitive with certainty. 

  2. ^

    @abramdemski [LW · GW] suggests that LI's self-trust is a form of bootstrapping from its own future beliefs, which is along the lines of dynamic programming.

  3. ^

    Though since the time of that post, the Zizians seem to have done a lot more senseless violence nominally justified (in part?) by TDT. This doesn't prove (this) policy theory wrong or useless, but perhaps it is easy for humans to misapply? Or more likely they are just insane and latched on to TDT as a fixation for circumstantial reasons. 

  4. ^

    I have previously discussed STEI as a policy theory - if I remember correctly, Orseau and Ring formulate a few versions with weaker and weaker "duality" assumptions, so it really spans the policy theory -> agent theory range. 

0 comments

Comments sorted by top scores.