# Look For Principles Which Will Carry Over To The Next Paradigm

post by johnswentworth · 2022-01-14T20:22:58.606Z · LW · GW · 1 comments## Contents

Examples In My Own Work Why Not Just Jump To The Next Paradigm? Why Not Just Forget About The Next Paradigm? How Can We Recognize Principles Which Will Carry Over? None 1 comment

In 1918, Emmy Noether published her famous theorem showing that each symmetry of the laws of physics implies a corresponding conserved quantity. Laws which remain the same even if we move the whole universe left or right a little result in conservation of momentum, laws which remain the same over time result in conservation of energy, and so forth.

At the time, Noether’s Theorem was only proven for the sorts of systems used in classical physics - i.e. a bunch of differential equations derived by minimizing an “action”. Over the next few decades, the foundational paradigm shifted from classical to quantum, and Noether’s original proof did not carry over. But the *principle* - the idea that symmetries imply conserved quantities - *did* carry over. Indeed, the principle is arguably simpler and more elegant in quantum mechanics than in classical.

This is the sort of thing I look for in my day-to-day research: principles which are simple enough, fundamental enough, and general enough that they’re likely to carry over to the next paradigm. I don’t know what the next paradigm will be, yet; the particulars of a proof or formulation of a problem might end up obsolete. But I look for principles which I expect will survive, even if the foundations shift beneath them.

## Examples In My Own Work

My own day-to-day research focuses on modelling abstraction.

I generally build these models on a framework of probability, information theory, and causal models. I *know* that this framework will not cover all of abstraction - for example, it doesn’t cover mathematical abstractions like “addition” or “linearity”. Those abstractions are built into the structure of logic, and probability theory takes all of logic as given. There may be some way in which the abstraction of linearity lets me answer some broad class of questions more easily, but standard probability and information theory ignore all that by just assuming that all pure-logic questions are answered for free.

… yet I continue to use this probability/information/causality framework, rather than throwing it away and looking for something more general on which to build the theory. Why? Well, I expect that this framework is *general enough* to figure out principles which will carry over to the next paradigm. I can use this framework to talk about things like “throwing away information while still accurately answering queries” or “information relevant far away” or “massively redundant information”, I can show that various notions of “abstraction” end up equivalent, I can mathematically derive the surprising facts implied by various assumptions. For instance, I can prove the __Telephone Theorem__ [LW · GW]: when transmitted over a sufficiently long distance, all information is either completely lost or arbitrarily perfectly conserved. I expect a version of that principle to carry over to whatever future paradigm comes along, even after the underlying formulations of “information” and “distance” change.

## Why Not Just Jump To The Next Paradigm?

One obvious alternative to looking for such principles is to instead focus on the places where my current foundational framework falls short, and try to find the next foundational framework upfront. Jump right to the next paradigm, as quickly as possible.

The main reason not to do that is that I don’t think I have enough information yet to figure out what the next paradigm is.

Noether’s Theorem and principles like it played a causal role in figuring out quantum mechanics. It was the simple, general principles of classical mechanics which provided constraints on our search for quantum mechanical laws. Without those guideposts, the search space of possible physical laws would have been too wide.

Special relativity provides a particularly clear example here. Nobody would have figured it out without the principles of electrodynamics and Lorentz transformations to guide the way. Indeed, Einstein’s contribution was “just” to put an interpretation on math which was basically there already.

More generally, knowing a few places where the current framework fails is not enough to tell us what the next framework should be. I know that my current foundation for thinking about abstraction is too narrow, but the search space of possible replacements is still too wide. I want simple general principles, principles which capture the relevant parts which I *do* think I understand, in order to guide that search. So, in my day-to-day I use the framework I have - but I look for the sort of principles which I expect to generalize to the next framework, and which can guide the search for that next framework.

This leaves a question: how do we know when it’s time to make the jump to the next paradigm? As a rough model, we’re trying to figure out the constraints which govern the world. Sometimes, the rate-limiting step might be figuring out new constraints, to limit our search. Sometimes, the rate-limiting step might be abandoning (probably implicit) *wrong* constraints already in our models, like the assumption of Galilean relativity implicitly built into pre-special-relativity physics. When finding new constraints is the rate-limiting step, it should feel like exploring a wide-open space, like we’re looking around and noticing patterns and finding simple ways to describe those patterns. When abandoning wrong constraints is the rate-limiting step, it should feel like the space is *too* constrained, like different principles or examples come into conflict with each other.

## Why Not Just Forget About The Next Paradigm?

On the other end of the spectrum, some people argue for just working within the current paradigm and forgetting about the next one. This is a long-term/short-term tradeoff: in the short term, the current paradigm is usually the best we have; building new frameworks takes time. So if our goals are short term - like, say, a startup which needs to show growth in the next six months - then maybe we should just do what we can with what we have.

There are definitely lots of places where this is the right move. On the other hand, I think “long term” is often much, much shorter than people realize.

I worked in startups for about five years. Usually, the companies I was at needed to show results or shut down within ~2 years. On the other hand, the code we wrote usually turned over within a year - the company would pivot or the UI design would change or the code architecture and tech stack would shift, and old code would either be deprecated or rewritten. In that environment, “building for the next paradigm” meant figuring out principles which would leave us better off a year from now, when the current code had mostly turned over. For instance, knowledge about our users (often from A/B tests), typically had lasting value. Sometimes, a smart library design would last. With a runway of ~2 years and a turnover time of ~1 year, the right move is to usually to spend that first year on things which will make us better off a year from now after everything has turned over.

… not that we always did that, mind you, but it was the things which lasted through turnover which were consistently the most important in hindsight. And after five years of this, one can see the patterns in what kinds of things will last.

AI research (and alignment research) in particular is a place where the “long term” is much, much shorter than many people realize. Not in the sense that AGI is right around the corner, but in the sense that the next paradigm is less than 5 years away, not more than 20. Just within the past 10 years, we saw the initial deep learning boom with image classifiers, then a shift to image generators (with the associated shift to GAN architectures), and then the shift to transformers and language models. Even if you think that transformer-based language models are the most probable path to AGI, there will __still likely be major qualitative shifts along the way__ [LW · GW]. If we’re doing work which is narrowly adapted to the current paradigm, it’s likely to be thrown out, and probably not even very far in the future.

The work done by Chris Olah’s team is a good example here. They did some really cool work on __generative image nets__. Then the shift to transformers came along, and they recently restarted from roughly square zero on __transformer nets__ [LW · GW]. Presumably *some* illegible skills transferred, but they mostly seem to be figuring things out from scratch, as far as I can tell. When the next shift comes, I expect they’ll be back at roughly square zero again. My advice to someone like Chris Olah would be: figure out the principles which seem likely to generalize. At a bare minimum, look for principles or tools which are useful for both image and text models, both CNNs and transformers. Those are the principles which are likely to still be relevant in 5 years.

## How Can We Recognize Principles Which Will Carry Over?

As an 80/20 solution, I think it’s usually fine to trust your instincts on this one. The important step is just to *actually ask yourself* whether something will carry over. I can look at my own work and say “hmm, this specific notion of ‘redundant information’ probably won’t carry over, but some general notion of ‘abstractions summarize massively redundant information’ probably will, and the principles I've derived from this model probably will”. Similarly, I expect someone in 1920 could look at Noether’s Theorem and think “wow, even if the foundations of physics are totally overturned, I bet some version of this principle will survive”.

If you want a more legible answer than that, then my advice is to introspect on what information is driving your intuitions about what will or will not carry over. I intend to do that going forward, and will hopefully figure out some patterns. For now, simplicity and generality seem like the main factors.

## 1 comments

Comments sorted by top scores.

## comment by Vika · 2022-01-26T18:15:24.358Z · LW(p) · GW(p)

Great post! I don't think Chris Olah's work is a good example of non-transferable principles though. His team was able to make a lot of progress on transformer interpretability in a relatively short time, and I expect that there was a lot of transfer of skills and principles from the work on image nets that made this possible. For example, the idea of circuits and the "universality of circuits" principle seems to have transferred to transformers pretty well.