Impact in AI Safety Now Requires Specific Strategic Insight

post by MiloSal (milosal) · 2024-12-29T00:40:53.780Z · LW · GW · 1 comments

This is a link post for https://ameliorology.substack.com/p/impact-in-ai-safety-now-requires

Contents

  The strategy ball
  Why untargeted career plans fail
    Untargeted technical safety research fails to be impactful
    Untargeted policy and governance work fails to be impactful
    General problems with untargeted planning
  Conclusion
None
1 comment

TL;DR: Short AGI timelines, the likelihood of a nationalized AGI project, and implications thereof, mean you should reevaluate your impact plan with these specific considerations in mind. Most AI safety career plans formed without these considerations will not be impactful.

 

The most widespread justification for participation in the Manhattan Project—for those in the know—was to beat the Nazis to the bomb.

One Polish-British scientist, Joseph Rotblat, was among those who had foreseen this strategic priority: “he thought the only way to prevent Nazi Germany from using a nuclear bomb was if Britain had one to act as a deterrent.” He joined the British mission to Los Alamos. This strategic motivation was one which he reevaluated over the course of the war. 

At the close of 1944, by the time it became clear the Nazis were not, in fact, to succeed in producing a bomb, he left the Project on ethical grounds and committed much of the rest of his life to advocating for nuclear disarmament. Nazi surrender in May, 1945, severed the final connections between most scientists and their justification in working on the Project. Yet few followed in Rotblat’s footsteps. Most saw development through to its gruesome end—and beyond.

Rotblat’s open-minded pursuit of truth over and above biases of the status quo, prestige, and self-importance led him to a great moral insight. If you are pursuing or established in an AI safety career, you should now reevaluate your impact plan. By doing this, you can reach an equally great strategic insight. 

The strategic considerations that led to your existing impact plan are probably outdated given recent evidence that the gap between today’s AIs and an automated AI engineer is not great. A number of further implications should dilute any prior reasoning, as well. I’ll spend little time arguing for short timelines here—the skeptical reader should consider reading the top answer here [LW · GW]. I’ll also spend little time arguing for the relevant implications, in particular centralization/nationalization of AGI development in the US, though you can read my thoughts on that here

Instead, I’ll spend most of this post arguing that AI safety career plans that do not operate under the above considerations are unlikely to be impactful.

The strategy ball

There is an interrelated set of beliefs about the way the future is likely to go that I like to call “the strategy ball.” These views basically come out of short timelines to automated AI R&D, a serious (though in places critical) reading of “Situational Awareness,” and a bit of thinking about implications of actually taking this future seriously. The strategy ball I’m envisioning can be roughly summarized as the following:

If we take the strategy ball seriously—and I think we should—the most promising career plans (really “impact plans,” since 4 years is hardly a career) are starkly different from many that feel like accepted wisdom in AI safety circles. The strategy ball makes it much less likely for you to stumble into impact by roughly aiming towards AI safety. To have an impact, you must plan carefully in accordance with the strategy ball. 

Why untargeted career plans fail

A lot of people planning AI safety careers that they hope will have an impact don’t do a whole lot of planning at all. Most people think a little bit about what they’re good at, what sounds interesting, and what conventional opportunities they’re likely to get, and then choose a well-established (i.e., titled) career path. Think “interpretability researcher,” “AI safety field-builder,” or “AI policy researcher.” 

Although we should expect plans like these to be far from optimal from an impact perspective, their effect is hardly negligible. AI safety as a field is really just a nebulous concoction of these people, who make significant overall contributions. 

These career plans, and others that might include more strategizing but do not consider the strategy ball, are untargeted. Untargeted career plans are unlikely to be impactful. Across the most common buckets, strategy ball considerations always dominate. Plans that whiff the ball strike out. 

Untargeted technical safety research fails to be impactful

Technical safety research career plans fail in a number of ways. For one, many future technical researchers expect to have time to skill up, do a PhD, or “build career capital” at intermediate jobs. The ball says there isn’t time. 

Other technical safety researchers realize that positions at AGI companies are absurdly competitive and want to do independent or small-organization work instead. Unfortunately, it’s already hard to imagine work performed outside of AI companies being incorporated into frontier systems. Once a nationalization happens, such a thing won’t happen at all. 

The suite of research agendas in technical AI safety is wide. Given 2028 (median) timelines and rapid AI R&D automation, alignment of powerful AIs is unlikely to go through any avenue other than safety research automation. The question for research agendas, then, is: What needs to be done to ensure useful, safe alignment work can be elicited from 2028 AIs? Work that doesn’t fall under this narrow criterion—in some sense the scope of the AI control agenda—is unlikely to matter. 

For example, interpretability research that isn’t for control-facing monitoring could just be safely produced by the first few weeks of 2028 AI automation. One notable exception is model organisms/scary demos work, which may be useful for increasing the safety focus of a nationalized project or accelerating its creation (whether this is good or not is unclear), though this is arguably more technical governance or technical communications work. 

Untargeted policy and governance work fails to be impactful

There are policy careers external to goverment and ones interal to it. Most external career plans focus on doing important think tank-style research and communications. However, most categories of this research miss the mark, too. For one, any project that focuses on private-sector regulation or policy is highly unlikely to be useful, given nationalization will likely occur and end all frontier private-sector competition. Indeed, most projects advocated for by researchers attempting to influence government from the outside fall into obsolescence given the strategy ball. 

The projects most likely to be impactful from the outside are those in three categories:

Policy careers internal to government are surely among the most impactful that exist under the strategy ball assumptions. If you end up in one of these (rare and difficult) positions, it’s critical that you focus on completing projects and influencing decisionmakers that will be relevant in the next three years, not the next ten. You should also avoid focusing on private-sector regulation, for reasons discussed above. Position yourself well to be relevant when a nationalization happens.

Note also that there is a wide range of different positions that might be seen as relevant for default AI safety plans. The relative impact potential in each type of position changes drastically under strategy ball considerations. For instance, working in an agency that is likely to have a hand in nationalization (such as the Executive Office of the President, Department of Defense, or Department of State) is particularly high leverage. You might want to deprioritize Congressional offices that have historically focused on private-sector regulation of AI.[2]

General problems with untargeted planning

A lot of planning that goes on without reference back to the strategy ball goes wrong in general. For instance, many AI safety career plans route through well-established, traditional intermediate career steps. Given short timelines, it’s worth thinking about ways to circumvent intermediate steps.

In addition, planners that expect to have time for flexibility (say, more than 6 years) often neglect to consider a wide action space. Traditional paths look way less promising given short timelines and massive strategic upheaval. Instead, you should consider more exotic or wacky plans (e.g., AI safety activism or social media influencing).

Because short timelines mean everyone has less time to pivot into careers that require significant training or expertise, you should probe your skillset (along many dimensions) and situation (e.g., student at elite university, child of politician, etc.) for opportunities where you have unique leverage. All of this is especially true for young people, who constitute most of AI safety. (See this post [LW · GW] for more advice on planning under short timelines.)

Conclusion

One of the great challenges of our time is taking an absurd, farcical, consistently shocking world seriously. If you are planning a career in AI safety, you’ve likely already penetrated one veneer—the one that says everything is fine. In a recent past, I might have told you this realization was enough to have a huge positive impact on the trajectory of the world. Unfortunately, another massive upheaval has thrown a wrench in those plans. 

To meet the new strategic landscape—to take a real crack at the strategy ball—requires a new round of planning, with all the unprecedented and unexpected craziness priced in. Don’t expect your old plans to lead you to impact. They probably won’t. 

Instead, take a cue from Ivanka Trump, crack open “Situational Awareness,” and plan for a mad, mad, mad, mad world

  1. ^

    Perhaps the most effective way to see this is to play a short-timelines AGI wargame with (well-informed) friends. Doing this, I found it almost comical how irrelevant most actors feel once things start heating up. 

  2. ^

    Though this is far from a clear recommendation. For one, the congresspeople with current focus on AI, regardless of whether it is untargeted or not, might be the most likely to take strategy ball-type considerations seriously in the future. 

1 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-12-29T02:03:45.966Z · LW(p) · GW(p)

I've been arguing this for the past couple years. My bet is on AI R&D automation by mid 2026.