You should delay engineering-heavy research in light of R&D automation

daniel-paleka

You should delay engineering-heavy research in light of R&D automation

post by Daniel Paleka · 2025-01-07T02:11:11.501Z · LW · GW · 3 comments

This is a link post for https://newsletter.danielpaleka.com/p/you-should-delay-engineering-heavy

  Implementation keeps getting cheaper
  When later means better
  Did I do well in 2024?
  Themes for temporally privileged work
None
3 comments

tl;dr: LLMs rapidly improving at software engineering and math means lots of projects are better off as Google Docs until your AI agent intern can implement them.

Implementation keeps getting cheaper

Writing research code has gotten a lot faster over the past few years. Since 2021 and OpenAI Codex, new models and tools such as Cursor built around them have saved myself more and more time on coding every year.

This trend is accelerating fast: AI agents using Claude-3.5-Sonnet and o1-preview can do tasks that take ML researchers up to 2 hours of coding. This is without considering newer models such as o3, which do 70% on SWE-bench out of the box.
Yet this progress remains somewhat concentrated in implementation: progress on “soft” skills like idea generation has, as far as I can tell, been slower.

I’ve come to believe that, if you work in technical AI safety research, this trend is a very practical consideration that should be the highest order bit in your decisions on what to spend time on.

Hence, my New Year's resolution is the following: Do not work on a bigger project if there is not a clear reason for doing it now. Disregarding AGI timelines ^[1] 1, the R&D acceleration is a clear argument against technical work where the impact does not critically depend on timing.

When later means better

The wait calculation in space travel is a cool intuition pump for today’s AI research. In short, when technological progress is sufficiently rapid, later projects can overtake earlier ones.

For instance, a space probe sent to Alpha Centauri in 2025 will likely reach there after the one sent in 2040, due to advances in propulsion technology. Similarly, starting a multi-year LLM training run in 2022 would not have yielded a better model than starting a much shorter training run in 2024. ^[2]

The above examples involve long feedback loops, and it’s clear why locking in too early has issues: path dependence is high, and the tech improves quickly.

Now, my research (and likely your research too) has had much faster feedback loops, and path dependence in research projects is not that high if LLMs can refactor the codebase. However, it still does not make marginal sense to start some projects now if those can be done later.

If you work in AI safety, a common issue is having a lot of half-baked project ideas that you'd like to do and too little time to try them all. The silver lining of fast AI R&D improvement is that many of these ideas will become much easier to implement in the future. Thus, strategic timing — deciding which projects truly benefits from being done now — has become a crucial research meta-skill.

Did I do well in 2024?

To get a grasp on what this means in practice, I decided to go through the papers I contributed to in 2024, in chronological order, and analyze of whether it is good that this paper was done at the time, versus later, assuming all things are equal. ^[3]

Stealing Part of a Production Language Model: The timing was correct. If you’re looking at it from a security perspective, it’s kind of tautological: unless there are good reasons to not publish a vulnerability, you should do it right away. The paper also improved the policy discussion about distillation and model security somewhat. However, my contribution to this paper was mostly some 10 pages of math and algorithmic tinkering. If the promises of o3 hold up, I could have done this in a matter of days if it was a followup project done in 2025.
Foundational Challenges in Assuring Alignment and Safety of Large Language Models: I wrote about jailbreaks, prompt injection, and a few other things. Writing this clarified my thinking on robustness and LLMs significantly. The main positive impact of this comes from researchers using it as reference when writing papers. For “field steering / reference” work, there are both upsides and downsides to publishing early; but I feel this one was timed correctly.
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition: This paper is a result of a capture-the-flag competition on LLM prompt injection; if you’ve seen TensorTrust, it’s similar but with much more complex mechanisms for the defender. The bulk of the work was setting up the platform and backend for interactions between attack and defense teams.
The core issue working against this paper is that the documented attack/defense dynamic is highly dependent on the capabilities of the models used; and since we used models that are now outdated, I doubt the findings will be robustly useful for prompt injection / extraction research. The same paper could be done much more efficiently in 2026, with more relevant results.
Refusal in Language Models Is Mediated by a Single Direction: This paper popularized residual stream interventions to modulate refusal/harmlessness behavior in open-source LLMs. The timing appears correct in retrospect. It was causal for some good stuff in adjacent research topics, e.g. a paper from Meta on jailbreak resistance that builds on this approach.
Consistency Checks for Language Model Forecasters: This is a scalable oversight paper in disguise (“how can we evaluate or steer models without ground truth?”), applied to forecasting. I think it’s a good paper with seeds of many ideas I’m excited about, but good AI forecasting bots are not yet really there, hence the paper’s immediate use as a benchmark is limited. I invested substantial engineering time that could be cut drastically with the tools of 2025, assuming this paper’s utility lies in the future. In retrospect, it might have been more efficient to make a quick 15-page paper than this comprehensive 56-page treatment of consistency and forecasting.

All the papers above are accepted (or likely to be accepted) in ML venues on first submission, and some are heavily cited, so there is some evidence the papers are considered good by conventional standards. Yet half of them were mistimed.

The above analysis ignores other reasons a paper might not be counterfactually impactful, such as parallel discovery, scooping other researchers (or being scooped), or even the mundane “this research direction didn’t end up being useful after all”. For example, another group did a big chunk of the Stealing paper independently and published a week later; and several teams worked on similar concepts to the Refusal paper before and after us.

On the other hand, a key product of research is improving own expertise in the papers; I'm definitely a stronger researcher now than a year ago, and it’s hard to gain experience if I hadn't gotten my hands dirty on some of the above work.

Looking back, I think my efforts look better than I expected, given that last year I did not optimize for the consideration of this post at all. But it’s far from perfect. If you’re doing AI safety research, I'd encourage you to do a similar audit of your own work.

Themes for temporally privileged work

So, why now and not later? The previous section has a few themes for work that can be worthwhile doing as soon as possible rather than waiting:

Security-critical research that loses value if delayed (like discovering vulnerabilities before they're exploited);
Research that unblocks yourself or other researchers on an existing problem that was going to be worked on anyway;
Projects that build skills for you as a researcher that are genuinely important and will not be obsoleted quickly.

In addition, I can recall more reasons:

Research showing properties of current AI systems that influence ongoing technical or policy discussions; see the global coordination case for model organism research [LW · GW];
Work intended not to solve a problem, but as a demo of a novel research agendas and get many researchers to think about a new setting. Several papers by the Center for AI Safety are an example of this thing done well.

These last two categories—influencing policy discussions and introducing research agendas—rely on social diffusion of ideas, and this takes time. With shorter timelines in mind, this only make sense if your work can actually shape what other researchers do before AI capabilities advance significantly. If you do not have existing credibility or a concrete plan how it reaches the right audience, it might not be worth it.

In fact, technically motivated research faces a similar challenge: unless you're working at a leading AGI lab or in a position to influence one, your brilliant insights might never make it into frontier models. ^[4]

As for research that is not worth doing now, I do have some opinions, but I think better advice is to just apply this mindset on a case by case basis. ^[5] Pick some reasonable prior, say 50% reduction of total coding time per year; and before starting any significant technical work, write down a brief description of what you're trying to achieve and make an explicit case for why it needs to be done this year rather than in 25% of the engineering time in two years.

Thanks to Nikola Jurkovic for reading a draft of this post.

^{^}
And if you are indeed operating on a tight schedule, Orienting to 3-year AGI timelines [LW · GW] agrees:
Keep in mind that the only things that matter are things that need to get done before AGI. AGI can do the rest.
^{^}
The Longest Training Run analysis from Epoch indicates that hardware and algorithmic improvements incentivize shorter training runs.
^{^}
This analysis reflects my contributions and views alone, not those of coauthors, especially on papers where I am not a joint first author.
^{^}
Again from Orienting to 3-year AGI timelines [LW · GW]:
External human AI safety researchers - they do pre-automation research. A small subset of it gets implemented internally. They also critique the public parts of the pre-automation and post-automation research plans from the outside. They likely become irrelevant during the post-automation era as the gap between public and private knowledge will widen for national security reasons.
^{^}
Consider also what Gwern had to say and don't let this quote describe you:
If you look at it that way, many of the projects that people do now have basically no lasting value. They’re doing things that they don't enjoy, which record nothing ephemeral of value that could not be inferred or generated later on. They are, at best, getting 2 or 3 years of utility out of it before it could have been done by an AI system.

3 comments

Comments sorted by top scores.

comment by MiloSal (milosal) · 2025-01-07T16:49:16.033Z · LW(p) · GW(p)

Great post! Glad to see more discussion of the implications of short timelines on impactful work prioritization on LW.

These last two categories—influencing policy discussions and introducing research agendas—rely on social diffusion of ideas, and this takes time. With shorter timelines in mind, this only make sense if your work can actually shape what other researchers do before AI capabilities advance significantly.

Arguably this is not just true of those two avenues for impactful work, but rather all avenues. If your work doesn't cause someone in a position of power to make a better decision than they otherwise would (e.g., implement this AI control solution on a production model, appoint a better-informed person to lead such-and-such an AI project, care about AI safety because they saw a scary demo, etc.), it's unlikely to matter. Since timelines are short and governments are likely to get involved soon, only a highly concentrated range of actors have final sway over decisions that matter.

comment by Richard_Kennaway · 2025-01-07T09:20:22.765Z · LW(p) · GW(p)

Is this post an argument for accelerationism? Because the work that it is always timely to do right now is work that progresses the march towards the AGI that is to obsolete all the other work (if it doesn't kill us). Just as in the interstellar spaceship analogy, the timely work before the optimum time to launch is work on better propulsion.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-07T19:09:08.260Z · LW(p) · GW(p)

Hmm. I think it's an argument for the mindset of preparing for a final safety sprint right near the end.

Something I've talked about with others as the idea of, "our best chance to make progress on safety will be right at the moment before we lose control because the AI is too strong. If we can concentrate our work then, and maybe focus on prepping for that time now, we can do a fast sprint and save the world at that critical juncture."

I feel torn about this, since I work that we might mistime it and overshoot the optimal point without realizing it.

This post isn't exactly taking about this dynamic, but it kinda fits the rough pattern. I think there's something to the point being made, but also, I see danger in taking it too far.

You should delay engineering-heavy research in light of R&D automation

Contents

Implementation keeps getting cheaper

When later means better

Did I do well in 2024?

Themes for temporally privileged work

3 comments