So You Want To Make Marginal Progress...
post by johnswentworth · 2025-02-07T23:22:19.825Z · LW · GW · 12 commentsContents
The Generalizable Lesson Application:None 12 comments
Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot in San Francisco (SF) down south to their hotel in Los Angeles (LA).
The first friend, Alice, tackled the “central bottleneck” of the problem: she figured out that they probably wanted to take the I-5 highway most of the way (the blue 5’s in the map above). But it took Alice a little while to figure that out, so in the meantime, the rest of the friends each tried to make some small marginal progress on the route planning.
The second friend, The Subproblem Solver, decided to find a route from Monterey to San Louis Obispo (SLO), figuring that SLO is much closer to LA than Monterey is, so a route from Monterey to SLO would be helpful. Alas, once Alice had figured out that they should take I-5, all The Subproblem Solver’s work was completely useless, because I-5 dodges the whole Monterey-to-SLO area entirely.
The third friend, The Forward Chainer, started from the parking lot in San Francisco and looked for a route which would generally just go south and a little east, toward LA, whenever possible. Alas, once Alice figured out that they should take I-5, The Forward Chainer’s work was completely useless, because it turned out that the fastest strategy was to head east from San Francisco (toward I-5), rather than south.
The fourth friend, The Backward Chainer, started from their hotel in LA and looked for a route which would generally just go north and a little west, toward San Francisco. Alas, this turned out to be completely useless in much the same way as The Forward Chainer’s work… though by dumb luck it was a “near miss”, in some sense, as only a few routes looked better than the I-5 by The Backward Chainer’s “go north and west from hotel” criterion.
The fifth friend, The Very General Helper, bought snacks for the trip. Snacks are great no matter the route. Everybody likes The Very General Helper, she’s great.
The sixth friend, The Clever One, realized that they’d probably need to stop for gas along the way, so she found a gas station about halfway between SF and LA. It was in Paso Robles, near the coast. Alas, once Alice figured out that they should take I-5, The Clever One’s work was also completely useless. The friends just found a different gas station on I-5 rather than make a detour all the way out to Paso Robles.
The seventh friend, The One Who Actually Thought This Through A Bit, realized upfront that once Alice figured out the main route, it would totally change all the subproblems her friends were tackling. So, she squinted at a map a bit, trying to figure out if there was any particular subproblem which would predictably generalize - a subproblem which would be useful to solve for any (or at least most) plausible choices made by Alice, without having to know in advance what main route Alice would go with. And she realized that the mountains just north of LA only had a couple easy passes through them - one for I-5, and another out by the coast (route 1). So, she tried to figure out routes from both I-5 and route 1 to their hotel in LA. Her work turned out to be useful: as soon as Alice figured out that they needed to take I-5 most of the way, The One Who Actually Thought This Through A Bit promptly offered a route from I-5 to their hotel in LA.
The Generalizable Lesson
Making marginal progress on a problem, i.e. tackling a non-bottlenecking subproblem, has a hidden extra requirement: a solution must generalize to work well with the whole wide class of possible solutions to other subproblems (especially the main bottleneck).
When tackling the bottleneck itself, one does not need to worry as much about this generalization requirement. Why? Well, the whole point of a “bottleneck” is that it’s the main hard/costly part; we’re perfectly happy to make other subproblems more difficult (on the margin) in exchange for solving the main bottleneck. So if a solution to the main bottleneck plays poorly with solutions to other subproblems, whatever, it’s cheaper to go find new solutions to those other subproblems than a new solution to the bottleneck.
But when solving a non-bottleneck problem, we have no such “out”. Solutions to our subproblem must generalize enough to work with solutions to other subproblems. And that’s a pretty tricky constraint, when we don’t already know how to solve the main bottleneck(s).
Application: <Your Research Field>
So let’s say you’re working on <your research field here>[1]. You don’t fancy yourself one of the greatest minds in the field, you’re just trying to make a marginal contribution.
… alas, in this specific way your challenge is actually harder than solving the big hard central problems. Because your marginal contribution needs to be sufficiently generalizable to play well with whatever techniques are ultimately used to solve the big hard central problems… and you have no idea what those techniques might be. Indeed, probably one of the main reasons you’re aiming for a marginal contribution in the first place is to avoid needing to worry about those big hard central problems.
So you have basically two options, corresponding to The Very General Helper and The One Who Actually Thought This Through A Bit in the parable above.
- The Very General Helper strategy: find a way to contribute which will be helpful no matter how the big hard central problems are solved. Remember that big hard problems are often solved in Weird Ways totally orthogonal to anything you ever considered, so your contribution had better be very robustly generalizably valuable.
- The One Who Actually Thought This Through A Bit strategy: think through the big hard central problems enough to find some predictable aspect of them, some particular subproblem which will come up no matter how the big hard central problems are solved. Again, remember that big hard problems are often solved in Weird Ways totally orthogonal to anything you ever considered, so you better be very sure your subproblem will robustly and generalizably show up as a subproblem of any solution to the big hard central problems.
Notice the theme in both of these: robust generalization. If you do not choose robustly generalizable subproblems and find robustly generalizable solutions to them, then most likely, your contribution will not be small; it will be completely worthless. Once the bottleneck problems are sorted out, the problem will predictably look very different, because that’s what happens when a bottleneck is handled; your subproblem and solution need to generalize to that very different scenario in order to actually be useful.
- ^
Ok, fine, it's AI alignment/safety.
12 comments
Comments sorted by top scores.
comment by ryan_greenblatt · 2025-02-08T00:48:20.933Z · LW(p) · GW(p)
This post seems to assume that research fields have big, hard central problems that are solved with some specific technique or paradigm.
This isn't always true. Many fields have the property that most of the work is on making small components work slightly better in ways that are very interoperable and don't have complex interactions. For instance, consider the case of making AIs more capable in the current paradigm. There are many different subcomponents which are mostly independent and interact mostly multiplicatively:
- Better training data: This is extremely independent: finding some source of better data or better data filtering can be basically arbitrarily combined with other work on constructing better training data. That's not to say this parallelizes perfectly (given that work on filtering or curation might obsolete some prior piece of work), but just to say that marginal work can often just myopically improve performance.
- Better architectures: This breaks down into a large number of mostly independent categories that typically don't interact non-trivially:
- All of attention, MLPs, and positional embeddings can be worked on independently.
- A bunch of hyperparameters can be better understood in parallel
- Better optimizers and regularization (often insights within a given optimizer like AdamW can be mixed into other optimizers)
- Often larger scale changes (e.g., mamba) can incorporate many or most components from prior architectures.
- Better optimized kernels / code
- Better hardware
Other examples of fields like this include: medicine, mechanical engineering, education, SAT solving, and computer chess.
I agree that paradigm shifts can invalidate large amounts of prior work (and this has occurred at some point in each of the fields I list above), but it isn't obvious whether this will occur in AI safety prior to human obsolescence. In many fields, this doesn't occur very often.
Replies from: johnswentworth↑ comment by johnswentworth · 2025-02-08T01:04:47.685Z · LW(p) · GW(p)
This post seems to assume that research fields have big hard central problems that are solved with some specific technique or paradigm.
This isn't always true. [...]
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a "big hard central problem", and progress on the smaller subproblems of the current paradigm is unlikely by-default to generalize to whatever new paradigm solves that big hard central problem.
I agree that paradigm shifts can invalidate large amounts or prior work, but it isn't obvious whether this will occur in AI safety.
I claim it is extremely obvious and very overdetermined that this will occur in AI safety sometime between now and superintelligence. The question which you'd probably find more cruxy is not whether, but when - in particular, does it come before or after AI takes over most of the research?
... but (I claim) that shouldn't be the cruxy question, because we should not be imagining completely handing off the entire alignment-of-superintelligence problem to early transformative AI; that's a recipe for slop [LW · GW]. We ourselves need to understand a lot about how things will generalize beyond the current paradigm, in order to recognize when that early transformative AI is itself producing research which will generalize beyond the current paradigm, in the process of figuring out how to align superintelligence. If an AI assistant produces alignment research which looks good to a human user, but won't generalize across the paradigm shifts between here and superintelligence, then that's a very plausible way for us to die.
Replies from: ryan_greenblatt↑ comment by ryan_greenblatt · 2025-02-08T01:19:51.333Z · LW(p) · GW(p)
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
Maybe, but it is interesting to note that:
- A majority of productive work is occuring on small subproblems even if some previous paradigm change was required for this.
- For many fields, (e.g., deep learning) many people didn't recognize (and potentially still don't recognize!) that the big hard central problem was already solved. This potentially implies it might be non-obvious whether this has been solved and making bets on some existing paradigm which doesn't obviously suffice can be reasonable.
Things feel more continuous to me than your model suggests.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a "big hard central problem", and progress on the smaller subproblems of the current paradigm is unlikely by-default to generalize to whatever new paradigm solves that big hard central problem.
It doesn't seem clear to me this is true in AI safety at all, at least for non-worst-case AI safety.
I claim it is extremely obvious and very overdetermined that this will occur in AI safety sometime between now and superintelligence.
Yes, I added "prior to human obsolescence" (which is what I meant).
Depending on what you mean by "superintelligence", this isn't at all obvious to me. It's not clear to me we'll have (or will "need") new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks. Doing this hand over doesn't directly require understanding whether the AI is specifically producing alignment work that generalizes. For instance, the system might pursue routes other than alignment work and we might determine its judgement/taste/epistimics/etc are good enough based on examining things other than alignment research intended to generalize beyond the current paradigm.
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
Replies from: johnswentworth, johnswentworth↑ comment by johnswentworth · 2025-02-08T01:43:09.856Z · LW(p) · GW(p)
If by superintelligence, you mean wildly superhuman AI, it remains non-obvious to me that new paradigms are needed (though I agree they will pretty likely arise prior to this point due to AIs doing vast quantity of research if nothing else). I think thoughtful and laborious implementation of current paradigm strategies (including substantial experimentation) could directly reduce risk from handing off to superintelligence down to perhaps 25% and I could imagine being argued considerably lower.
I find it hard to imagine such a thing being at all plausible. Are you imagining that jupiter brains will be running neural nets? That their internal calculations will all be differentiable? That they'll be using strings of human natural language internally? I'm having trouble coming up with any "alignment" technique of today which would plausibly generalize to far superintelligence. What are you picturing?
Replies from: ryan_greenblatt↑ comment by ryan_greenblatt · 2025-02-08T02:27:20.882Z · LW(p) · GW(p)
I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said "needed". So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.
(Note that I was responding to "between now and superintelligence", not claiming that this would generalize to all superintelligences built in the future.)
I agree that literal jupiter brains will very likely be built using something totally different than machine learning.
Replies from: johnswentworth↑ comment by johnswentworth · 2025-02-08T02:46:51.977Z · LW(p) · GW(p)
Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today's methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.
↑ comment by johnswentworth · 2025-02-08T01:45:19.276Z · LW(p) · GW(p)
It's not clear to me we'll have (or will "need") new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks.
If you want to not die to slop, then "fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks" not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.
Replies from: ryan_greenblatt↑ comment by ryan_greenblatt · 2025-02-08T02:24:58.482Z · LW(p) · GW(p)
"fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks"
Suppose we replace "AIs" with "aliens" (or even, some other group of humans). Do you agree that doesn't (necessarily) kill you due to slop if you don't have a full solution to the superintelligence alignment problem?
Replies from: johnswentworth↑ comment by johnswentworth · 2025-02-08T02:43:39.069Z · LW(p) · GW(p)
Aliens kill you due to slop, humans depend on the details.
The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren't fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the "slop/nonslop is hard to verify" problem, and you need to handle the "make sure agents the aliens build will also be aligned, and their children, etc" problem.
comment by Martin Randall (martin-randall) · 2025-02-08T03:08:59.034Z · LW(p) · GW(p)
The fourth friend, Becky the Backward Chainer, started from their hotel in LA and...
Well, no. She started at home with a telephone directory. A directory seems intelligent but is actually a giant look-up table. It gave her the hotel phone number. Ring ring.
Heidi the Hotel Receptionist: Hello?
Becky: Hi, we have a reservation for tomorrow evening. I'm back-chaining here, what's the last thing we'll do before arriving?
Heidi: It's traditional to walk in through the doors to reception. You could park on the street, or we have a parking lot that's a dollar a night. That sounds cheap but it's not because we're in the past. Would you like to reserve a spot?
Becky: Yes please, we're in the past so our car's easy to break into. What's the best way to drive to the parking lot, and what's the best way to get from the parking lot to reception?
Heidi: We have signs from the parking lot to reception. Which way are you driving in from?
Becky: Ah, I don't know, Alice is taking care of that, and she's stepped out to get more string.
Heidi: Oh, sure, can't plan a car trip without string. In the future we'll have pet nanotech spiders that can make string for us, road trips will never be the same. Anyway, you'll probably be coming in via Highway 101, or maybe via the I-5, so give us a buzz when you know.
Becky: Sorry, I'm actually calling from an analogy, so we're planning everything in parallel.
Heidi: No worries, I get stuck in thought experiments all the time. Yesterday my friend opened a box and got a million dollars, no joke. Look, get something to take notes and I'll give you directions from the three main ways you could be coming in.
Becky: Ack! Hang on while I...
Gerald the General Helper: Here's a pen, Becky.
Trevor the Clever: Get off the phone! I need to call a gas station!
Susan the Subproblem Solver: Alice, I found some string and.... Hey, where's Alice?
comment by Ruby · 2025-02-08T03:04:40.801Z · LW(p) · GW(p)
This doesn't seem right. Suppose there are two main candidates for how to get there, I-5 and J-6 (but who knows, maybe we'll be surprised by a K-7) and I don't know which Alice will choose. Suppose I know there's already a Very General Helper and Kinda Decent Generalizer, then I might say "I assign 65% chance that Alice is going to choose the I-5 and will try to contribute having conditioned on that". This seems like a reasonable thing to do. It might be for naught, but I'd guess in many case the EV of something definitely helpful if we go down Route A is better than the EV of finding something that's helpful no matter the choice.
One should definitely track the major route they're betting on and make updates and maybe switch, but seems okay to say your plan is conditioning on some bigger plan.
↑ comment by johnswentworth · 2025-02-08T03:12:36.783Z · LW(p) · GW(p)
Yup, if you actually have enough knowledge to narrow it down to e.g. a 65% chance of one particular major route, then you're good. The challenging case is when you have no idea what the options even are for the major route, and the possibility space is huge.