Renormalization Redux: QFT Techniques for AI Interpretability
post by Lauren Greenspan (LaurenGreenspan), Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-18T03:54:28.652Z · LW · GW · 0 commentsContents
Introduction: Why QFT? Renormalization in physics and ML Some Current Applications Another Call to Action Some paths forward None No comments
Introduction: Why QFT?
In a previous post [LW · GW], Lauren offered a take on why a physics way of thinking is so successful at understanding AI systems. In this post, we look in more detail at the potential of Quantum field theory (QFT) to be expanded into a more comprehensive framework for this purpose. Interest in this area has been steadily increasing[1], but efforts have yet to condense into a larger-scale, coordinated effort. In particular, a lot of the more theoretical, technically detailed work remains opaque to anyone not well-versed in physics, meaning that insights[2] are largely disconnected from the AI safety community. The most accessible of these is Principles of Deep Learning theory (which we abbreviate “PDLT”), a nearly 500 page book that lays the groundwork for these ideas[3]. While there has been some AI safety research that has incorporated QFT-inspired threads[4], we see untapped potential for cross-disciplinary collaborations to unify these disparate directions. With this post – one of several in a series linking physics and AI– we explain some of the high-level ideas we find important, with the goal of generating ideas to be developed later. In particular, we want to encourage more of a dialogue between the physics and AI safety communities to generate a tighter feedback loop between (theoretical) idea generation and AI safety’s epistemic goals and methods (namely: strong empirics).
AI interpretability researchers are increasingly realizing that NN’s are less like exact programs and more like big collections of shallow, stochastically interacting heuristics. QFT – a theoretical framework for describing systems with many interacting degrees of freedom – is well suited to study phenomena of this shape, as it captures the collective behavior of particle interactions at varying levels of abstraction set by the scale of the field theory. Briefly, there is a particular scaling limit of neural networks (corresponding roughly to infinite width[5]) in which the neurons become non-interacting, and can be modeled by a system of independent particles, known as a free QFT. The width can be thought of as the parameter governing the scale of interactions between neurons, as it governs how sparse (overparameterized) the network is. In the field theory description, we can do a perturbative expansion in this scale parameter to add more complex interactions (higher order moments) between particles.
The theoretical framework of a QFT for AI has stayed close to the idealized limit, but some simple QFT-inspired experiments still perform reasonably well at providing mechanistic solutions[6]. Interestingly, these examples indicate that the ‘infinite’ width approximation is a good approximation even at realistic (small) widths (maybe everything is[7] Gaussian after all).
Neural networks exhibit complexities that mirror particle interactions in QFT, opening the door for a systematic understanding of its stoachasticity, redundancy, and competing scales. It is unlikely that a “fully reductivist” application of current theoretical techniques will automatically capture sophisticated data relationships learned by state-of-the-art models. However, we are optimistic that an extension of theoretical QFT methods – and corresponding new experimental techniques – will provide insight that extends to real-world settings.
Renormalization in physics and ML
In QFT, a particle interaction can be pictorially represented by a Feynman diagram. In the one below, two particles collide, two particles emerge, and a mess of intermediate interactions at an infinite range of energies can happen in between. How important each latent interaction is depends on scale: they can either damp out quickly, leading to a finite number of important Feynman diagrams, or they become exponentially louder. These divergences are considered ‘unphysical’; they don’t match up with our observations, indicating a problem with the theoretical description. What this means is that the QFT is not appropriately parametrized – or renormalized – given the scale of interactions we care about. A solution to this problem comes from one of the most powerful techniques in QFT called renormalization: at each scale, the unimportant degrees of freedom are systematically left out, resulting in a coarse-grained effective field theory (EFT) which represents the physics at that scale.
In an analogous picture below[8], a neural network’s input can be run through many different feature interactions to achieve the same output. As anyone who has done causal scrubbing can likely attest, the number of these input-output pathways can quickly blow up, turning into an unholy mess of constructively and destructively interfering phenomena. Renormalization effectively cancels out the irrelevant noise to leave only the meaningful pathways, leading to an effective coarse-graining of the features.
One renormalization technique is known as the renormalization group (RG) flow. The RG flow provides a recipe for renormalization by iteratively filtering out information that doesn’t describe the empirical world at a given scale (i.e. long range, short range). The process defines a parameterized ‘flow’ along the space of models as your scale changes, and can lead to fixed points that describe new or interesting behavior like phase transitions. Different QFTs generally make different predictions but can flow to the same fixed point, demonstrating the nice feature of universality: many microscopic distributions can be described by the same macroscopic theory.
In short: renormalization makes a theory predictable at a fixed scale by ensuring its observables match with empirical results, and the RG flow offers a way to interpolate between theories at different scales. The need for renormalization points to the theoretical description’s inability to adequately describe reality, and the corresponding techniques turn this supposed bug into a way to discover important properties of complex systems.
Some Current Applications
In AI safety, one goal is to correctly interpret what an AI system is doing, but it is often difficult to find a mechanistic description that matches the AI’s reality, given the complexity of even simple real-world tasks. Perhaps further application of QFT techniques like the RG flow will offer a path to alignment while closing the theory-practice gap. To date, most applications of QFT to NNs have focused on a theoretical exploration of networks at infinite width. Taking this limit directly takes us into the so-called “NTK regime”, where all parameters become either ‘frozen’ or Gaussian, meaning they do not interact. Moreover, the network becomes linear, limiting its ability to learn arbitrary functions of the input. In this limit, the network cannot learn new features because its parameters remain close to their initialization.
From this simplified starting point, renormalization leads to a re-tuning of the scale governing the infinite width behavior. In the PDLT book, this is the ratio of hyperparameters w/d, where 1 << depth << width. The width is still considered to be a large parameter, imposing enough sparsity in parameter space that they don’t interact, but tuning the depth turns on what looks like an RG flow through the network’s layers[9]. At every layer, you get a new, effective description of the network features, having marginalized out the low-level features that are irrelevant at that scale. Heuristically, this is similar to curve detectors coarse graining into cat detectors in CNNs. At the end of training, you can reach a stable fixed point corresponding to a QFT that matches the ‘physics’ of the neural network ontology. Importantly, these methods still apply (with some caveats) when neurons are allowed to interact, corresponding to corrections in 1/w where non-Gaussianities become relevant[10].
Halverson et al[11] considers a different, though related parametrization than NTK known as the Neural Network Gaussian Processs (NNGP) limit. This work also aims to construct a precise framework between QFT and neural networks with EFTs and renormalization as core tenets, but treats the RG flow in a different direction to PDLT – tied to a scale parameter describing the input space rather than the feature space[12]. Heuristically, if an input image has a ‘natural’ range of pixel brightness (meaning it is not too high contrast) the function the neural network finds to describe that image should be similarly limited to match the input resolution.
Another Call to Action
Application of QFT techniques to neural networks sometimes seems like a compromise between waving your hands and theoretical rigor[13]. The examples from the last section provide promising proof of concepts while also pointing out some gaps in our understanding that prevent us from coming up with a full QFT framework for AI systems. The fact that there are different ways to conceptualize an RG flow highlights at least three scales of interest in neural networks: the width, the depth, and some characteristic scale of the data distribution. We are not completely sure how to interpret the relationships between these parameters (for example, the initial interaction strength 1/w and the RG cutoff scale d/w) in real-world networks, since terms in the per-layer expansion may become important as the depth is tuned. In order words: Each non gaussian term (new field in the effective field theory at the RG scale) has a strength of its interaction that generally changes with this scale.
It is likely that the ‘physics’ of AI systems are governed by many competing scales that will be difficult to parse without a better sense of what ‘physical’ means for AI systems. Regarding neural network ‘fields’: are they best thought of as particles colliding in a detector, or spins confined to a lattice? Are the interactions between them local (and what defines ‘local’ for NNs?)? What is the natural cutoff? In condensed matter systems, there is a ‘natural’ cutoff given at high energy by the lattice spacing of your system (or at long distances, something like the size of the material). These questions generate parameters for measuring the strength of our understanding. We hope to be able to answer them in the future.
While we’re not the first to say that this could be important, we want to point out some ways in which mechanistic interpretability could leverage QFT techniques. As we will explain in a later post, physicists consistently use renormalization as a way to “fix an ansatz interpretation” of a physical system. Namely, when interpreting experimental data or simplifying a complicated model (such as a lattice model) at macroscopic scales, renormalization techniques allow you to:
- “throw away” overly granular interactions that can be ignored at coarser scales because they don’t match what we see in nature. For AI systems, renormalization can result in cleaner abstractions of neural networks. This is an example of theory being led by empirics.
- discover ‘new physics’. In order for the standard model observables to be renormalizable (finite after renormalization), the theory needed an extra field – the Higgs Boson – at a certain energy scale. In neural networks, this is like the discovery of an important feature ‘pathway’ that prevents the network from being dominated by noise. To name another example, critical points of the RG flow may also allow us to shed light on emergent phenomena in neural networks. These are examples of empirics being led by theory.
In pursuing research combining QFT and mechanistic interpretability, theory and experiment can both iterate toward an adequate description of neural network behavior and model internals.
Some paths forward
In this section, we present some wildly speculative research questions. We welcome feedback in the hopes of discovering QFT techniques for AI interpretability that are both doable and impactful.
There is some hope that real-world networks are not too far from the ideal, and the ratio d/w may be small enough that interesting empirical settings will converge to the theoretical limit to first or second order. Perhaps we can continue to nudge the idealized model toward the real-world, incrementally building up the framework to apply to different notions of scales and non-stochastic initializations.
However, the stronger hope is that this framework is general: NNs are QFTs, even when treated non-perturbatively at finite width. It could be that the starting point of “NTK theory” is just too weak to take us very far in interpreting state-of-the-art neural nets, and that jumping into a new conceptualization of the RG flow with an interpretability mentally will help build the ‘right’ QFT for AI systems. If this can be done (for example, by studying local interactions between SAE features and its extensions), perhaps we can use renormalization techniques to universally subtract out ‘unphysical’ noise from destructive interference, leaving only an EFT that represents the features we want in a more computationally compatible way.
To say more about computational compatibility, it may be possible to probe the relationship between ‘human interpretable’ (SAE) features and features from computational mechanics, which are built on natural units of computation. The latter are also non-linear, so it could be that they would agree with effective features at the right level of abstraction (maybe this could be a definition for what the ‘right level of abstraction’ is). Moreover, computational mechanics has a built in data scale – the degree of resolution that defines how ‘zoomed in’ you are to the fractal simplex [LW · GW]. It would be great if this could shed light on, for example, how different scales (input, feature…) are organized in a neural network field theory, or help us distinguish aspects of NNs that are model agnostic from those that are architecture dependent[14].
- ^
Anecdotally, many physicists (mainly high energy theorists) I have met think this is a promising idea. Among these, many come to the same conclusions somewhat independently, given that the basic insight is pretty low hanging fruit (i.e. Gaussian statistics are universal). On one hand, academic consensus is a signal that this idea at least deserves some further thought. On the other, there is probably some academic bias at play here, and high energy theorists are particularly tempted by the promise of applying their ideas in realistic, useful settings.
- ^
- ^
- ^
- ^
There are a lot of names used to describe this limit, and they are not all the same (infinite width, large N, NTK, NNGP, lazy…). We think this leads to a certain amount of ‘talking past one another’ between research groups, and hope to help unite the masses in this direction by getting everyone on the same page.
- ^
- ^
Roughly, globally, if you squint…
- ^
From this paper.
- ^
If we understand correctly, learning in this way can be thought of as ‘unfreezing’ the previous layers, similar to an analogous recovery of optimal initialization scale found by work on tensor programs.
- ^
In physics speak, turning on finite width corrections leads to a “weakly interacting” or perturbative QFT.
- ^
Berman et al. follow a similar story, running empirical experiments on MNIST. Their results are promising – in particular the renormalized interpretations have significantly better prediction properties than the unrenormalized NTK limit they are “fixing”. However note that these results are unlikely to scale to models significantly beyond MNIST, namely models which require rich learning. MNIST and similar basic vision classifiers have the oh-so-physical property of being empirically learnable by Gaussian learning – see for example this paper.
- ^
They also give a nice pictorial representation of Feynman diagrams, which could help make this work more accessible to researchers outside of physics (the way Feynman diagrams made particle physics more accessible to experimentalists who had never studied QFT).
- ^
This is not a criticism. Maybe this is the sweet spot of physics, and more work needs to be done to understand the corollaries for the AI universe.
- ^
A similar separation can also be found between terms in the NN kernel.
0 comments
Comments sorted by top scores.