Three Levels for Large Language Model Cognition

post by Eleni Angelou (ea-1) · 2025-02-25T23:14:00.306Z · LW · GW · 0 comments

Contents

  1. Background
  2. What are the three levels?
  3. What are some examples of explanations?
  4. Why a leaky abstraction?
  5. Concluding thoughts
None
No comments

This is the abridged version of my second dissertation chapter. Read the first here [LW · GW]

Thanks to everyone I've discussed this with, and especially, M.A. Khalidi, Lewis Smith, and Aysja Johnson.

 

TL;DR: Applying Marr's three levels to LLMs seems useful, but quickly proves itself to be a leaky abstraction. Despite the porousness, can we agree on what kinds of explanations we'd find at each level?

 

1. Background

When I think about the alignment problem, I typically ask the following question: what kinds of explanations would we need to say that we understand a system sufficiently well to control it? Because I don't know the answer to that question (and the philosophy of science vortex is trying to devour me), I expect to make some progress if I look at the explanations we have available, taxonomize them, and maybe even find what they're missing. This is where David Marr's framework comes in.

2. What are the three levels?

According to Marr (1982), we can understand a cognitive system through an analysis of three levels: (1) computational theory, (2) representation and algorithm, and (3) hardware implementation

From Marr's Vision: A Computational Investigation, 1982, p. 25.

Marr's problem at the time was that mere descriptions of phenomena or pointing to parts of the network could not sufficiently explain cognitive functions such as vision. Surely, finding something like the "grandma neuron" in the network was a breakthrough in its own right. But it wasn't an explanation; it didn't say anything about the how or the why behind the phenomenon. Some attempts in LLM interpretability have a similar flavor. For example, one answer to the question "where do concepts live" is to point to vectors and their directions in the neural network of the LLM.

The three levels for LLMs look as follows:

Computational theory level: what is the functional role or goal of the LLM? Commonly discussed goals include:

  1. Modeling semantic relations in text
  2. Generating text that reliably follows prompt instructions or models user preferences
  3. Performing the specific task requested (e.g., writing code for a website)

Algorithmic level: at least two kinds of algorithms exist in LLMs:

  1. Training algorithms - Further divided into pre-training (learning from large-scale, unsupervised data) and fine-tuning (improving performance at specific tasks with supervised data)
  2. The trained model itself - A new algorithm that encapsulates the distilled learning of regularities and statistical relationships from the training data

Implementation level: what is the physical substrate or hardware that realizes the computational processes of the LLM? Examples of hardware are hard disk drives (HDDs), central processing units (CPUs), graphics processing units (GPUs), tensor processing units (TPUs), high-bandwidth memory (HBM), and random access memory (RAM) sticks. 
 

3. What are some examples of explanations?

At the computational level: 

With some hesitation, I'm willing to argue that scaling laws also offer a kind of computational level explanation since their whole point is to characterize the most efficient trade-offs between compute, data, and model size optimization dynamics. In addition to that, scaling laws explain why there is an upper bound in the model’s performance given a set of resource constraints. In that sense, they delineate a functional limit for how effective a model can be, suggesting that beyond that point, training with more compute, data, or parameters would lead to diminishing returns in the system’s performance. 

At the algorithmic level: 

We mostly have explanations from mechanistic interpretability (MI). There are two paradigmatic categories of MI explanations. 

At the implementation level:

These explanations typically fall into two categories. 

4. Why a leaky abstraction?

The three-level distinction doesn't satisfy the most ambitious vision for explaining LLMs and can be seen as a leaky abstraction for the following reasons:

However, it adopts a more modest goal: to point to directions (i.e., top-down) that may lead to causal relations or show why seeking a specific type of explanation might be epistemically infertile (e.g., the reductionism of the implementation level). 

5. Concluding thoughts

Despite the leakiness, the three-level theory is pragmatically speaking the most useful, organized, flexible, and theoretically accessible tool for thinking about explanations in an otherwise chaotic research domain. 

There are at least two obvious advantages: 

Highlights of what remains unresolved:

0 comments

Comments sorted by top scores.