Impact in AI Safety Now Requires Specific Strategic Insight

milosal

Impact in AI Safety Now Requires Specific Strategic Insight

post by MiloSal (milosal) · 2024-12-29T00:40:53.780Z · LW · GW · 1 comments

This is a link post for https://ameliorology.substack.com/p/impact-in-ai-safety-now-requires

  The strategy ball
  Why untargeted career plans fail
    Untargeted technical safety research fails to be impactful
    Untargeted policy and governance work fails to be impactful
    General problems with untargeted planning
  Conclusion
None
1 comment

TL;DR: Short AGI timelines, the likelihood of a nationalized AGI project, and implications thereof, mean you should reevaluate your impact plan with these specific considerations in mind. Most AI safety career plans formed without these considerations will not be impactful.

The most widespread justification for participation in the Manhattan Project—for those in the know—was to beat the Nazis to the bomb.

One Polish-British scientist, Joseph Rotblat, was among those who had foreseen this strategic priority: “he thought the only way to prevent Nazi Germany from using a nuclear bomb was if Britain had one to act as a deterrent.” He joined the British mission to Los Alamos. This strategic motivation was one which he reevaluated over the course of the war.

At the close of 1944, by the time it became clear the Nazis were not, in fact, to succeed in producing a bomb, he left the Project on ethical grounds and committed much of the rest of his life to advocating for nuclear disarmament. Nazi surrender in May, 1945, severed the final connections between most scientists and their justification in working on the Project. Yet few followed in Rotblat’s footsteps. Most saw development through to its gruesome end—and beyond.

Rotblat’s open-minded pursuit of truth over and above biases of the status quo, prestige, and self-importance led him to a great moral insight. If you are pursuing or established in an AI safety career, you should now reevaluate your impact plan. By doing this, you can reach an equally great strategic insight.

The strategic considerations that led to your existing impact plan are probably outdated given recent evidence that the gap between today’s AIs and an automated AI engineer is not great. A number of further implications should dilute any prior reasoning, as well. I’ll spend little time arguing for short timelines here—the skeptical reader should consider reading the top answer here [LW · GW]. I’ll also spend little time arguing for the relevant implications, in particular centralization/nationalization of AGI development in the US, though you can read my thoughts on that here.

Instead, I’ll spend most of this post arguing that AI safety career plans that do not operate under the above considerations are unlikely to be impactful.

The strategy ball

There is an interrelated set of beliefs about the way the future is likely to go that I like to call “the strategy ball.” These views basically come out of short timelines to automated AI R&D, a serious (though in places critical) reading of “Situational Awareness,” and a bit of thinking about implications of actually taking this future seriously. The strategy ball I’m envisioning can be roughly summarized as the following:

Someone will create a system capable of relatively autonomous AI R&D around 2028. As I mentioned above, I won’t argue obsequiously for this view. See the top answer here [LW · GW]. In particular, mounting evidence suggests that it no longer makes much sense to hedge impact-oriented decisions towards timelines beyond ~7 years.
The US government is likely to centralize/nationalize AGI development. As the national security implications of advanced frontier AI systems become increasingly apparent, the government will effectively have no choice but to centralize development. The USG cannot allow some technology company to possess what amounts to a weapon of mass destruction. As Leopold Aschenbrenner puts it: “Imagine if we had developed atomic bombs by letting Uber just improvise.” The Chinese government may reason similarly and centralize its own project. These may compete (though they don’t have to).
Most actors have negligible influence in a world like this.^[1] A strategy-ball world looks much more intense than the vaudeville surrounding AI development today. The national security state doesn’t play games. In particular, individuals not attached to the few relevant established institutions won’t have a say in or outlet to affect the decisions that matter. Notably, key decisionmakers in the US federal government have an absurdly outsize impact on the overall trajectory of the world. All of this applies even more strongly once nationalization occurs.
These views have consequences for every AI safety career.

If we take the strategy ball seriously—and I think we should—the most promising career plans (really “impact plans,” since 4 years is hardly a career) are starkly different from many that feel like accepted wisdom in AI safety circles. The strategy ball makes it much less likely for you to stumble into impact by roughly aiming towards AI safety. To have an impact, you must plan carefully in accordance with the strategy ball.

Why untargeted career plans fail

A lot of people planning AI safety careers that they hope will have an impact don’t do a whole lot of planning at all. Most people think a little bit about what they’re good at, what sounds interesting, and what conventional opportunities they’re likely to get, and then choose a well-established (i.e., titled) career path. Think “interpretability researcher,” “AI safety field-builder,” or “AI policy researcher.”

Although we should expect plans like these to be far from optimal from an impact perspective, their effect is hardly negligible. AI safety as a field is really just a nebulous concoction of these people, who make significant overall contributions.

These career plans, and others that might include more strategizing but do not consider the strategy ball, are untargeted. Untargeted career plans are unlikely to be impactful. Across the most common buckets, strategy ball considerations always dominate. Plans that whiff the ball strike out.

Untargeted technical safety research fails to be impactful

Technical safety research career plans fail in a number of ways. For one, many future technical researchers expect to have time to skill up, do a PhD, or “build career capital” at intermediate jobs. The ball says there isn’t time.

Other technical safety researchers realize that positions at AGI companies are absurdly competitive and want to do independent or small-organization work instead. Unfortunately, it’s already hard to imagine work performed outside of AI companies being incorporated into frontier systems. Once a nationalization happens, such a thing won’t happen at all.

The suite of research agendas in technical AI safety is wide. Given 2028 (median) timelines and rapid AI R&D automation, alignment of powerful AIs is unlikely to go through any avenue other than safety research automation. The question for research agendas, then, is: What needs to be done to ensure useful, safe alignment work can be elicited from 2028 AIs? Work that doesn’t fall under this narrow criterion—in some sense the scope of the AI control agenda—is unlikely to matter.

For example, interpretability research that isn’t for control-facing monitoring could just be safely produced by the first few weeks of 2028 AI automation. One notable exception is model organisms/scary demos work, which may be useful for increasing the safety focus of a nationalized project or accelerating its creation (whether this is good or not is unclear), though this is arguably more technical governance or technical communications work.

Untargeted policy and governance work fails to be impactful

There are policy careers external to goverment and ones interal to it. Most external career plans focus on doing important think tank-style research and communications. However, most categories of this research miss the mark, too. For one, any project that focuses on private-sector regulation or policy is highly unlikely to be useful, given nationalization will likely occur and end all frontier private-sector competition. Indeed, most projects advocated for by researchers attempting to influence government from the outside fall into obsolescence given the strategy ball.

The projects most likely to be impactful from the outside are those in three categories:

Advancing international coordination, which may be the best way to avoid risks of safety skimping due to racing dynamics between national AI projects. International coordination is also one of the most promising ways to reach a stable geopolitical state given AGI development occurs.
Nudging the character of a national project. By default, a nationalized project in the US may not have a safety focus or the right expertise to handle the technical and geopolitical challenges of AGI development and, later, takeoff. External government-facing work that plans for or influences the character of the US (or Chinese) national project seems robustly good.
Security. Another unfortunate default of development is poor security. Model weight theft by competing projects would likely increase catastrophe risk by intensifying competition and hostility. However, security robust enough to dissuade (up to) state-level actors will not happen through “business-as-usual” security ramp-up. Work that encourages existing companies or, later, a centralized project to sprint to excellent security could help.

Policy careers internal to government are surely among the most impactful that exist under the strategy ball assumptions. If you end up in one of these (rare and difficult) positions, it’s critical that you focus on completing projects and influencing decisionmakers that will be relevant in the next three years, not the next ten. You should also avoid focusing on private-sector regulation, for reasons discussed above. Position yourself well to be relevant when a nationalization happens.

Note also that there is a wide range of different positions that might be seen as relevant for default AI safety plans. The relative impact potential in each type of position changes drastically under strategy ball considerations. For instance, working in an agency that is likely to have a hand in nationalization (such as the Executive Office of the President, Department of Defense, or Department of State) is particularly high leverage. You might want to deprioritize Congressional offices that have historically focused on private-sector regulation of AI.^[2]

General problems with untargeted planning

A lot of planning that goes on without reference back to the strategy ball goes wrong in general. For instance, many AI safety career plans route through well-established, traditional intermediate career steps. Given short timelines, it’s worth thinking about ways to circumvent intermediate steps.

In addition, planners that expect to have time for flexibility (say, more than 6 years) often neglect to consider a wide action space. Traditional paths look way less promising given short timelines and massive strategic upheaval. Instead, you should consider more exotic or wacky plans (e.g., AI safety activism or social media influencing).

Because short timelines mean everyone has less time to pivot into careers that require significant training or expertise, you should probe your skillset (along many dimensions) and situation (e.g., student at elite university, child of politician, etc.) for opportunities where you have unique leverage. All of this is especially true for young people, who constitute most of AI safety. (See this post [LW · GW] for more advice on planning under short timelines.)

Conclusion

One of the great challenges of our time is taking an absurd, farcical, consistently shocking world seriously. If you are planning a career in AI safety, you’ve likely already penetrated one veneer—the one that says everything is fine. In a recent past, I might have told you this realization was enough to have a huge positive impact on the trajectory of the world. Unfortunately, another massive upheaval has thrown a wrench in those plans.

To meet the new strategic landscape—to take a real crack at the strategy ball—requires a new round of planning, with all the unprecedented and unexpected craziness priced in. Don’t expect your old plans to lead you to impact. They probably won’t.

Instead, take a cue from Ivanka Trump, crack open “Situational Awareness,” and plan for a mad, mad, mad, mad world.

^{^}
Perhaps the most effective way to see this is to play a short-timelines AGI wargame with (well-informed) friends. Doing this, I found it almost comical how irrelevant most actors feel once things start heating up.
^{^}
Though this is far from a clear recommendation. For one, the congresspeople with current focus on AI, regardless of whether it is untargeted or not, might be the most likely to take strategy ball-type considerations seriously in the future.

1 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-12-29T02:03:45.966Z · LW(p) · GW(p)

I've been arguing this for the past couple years. My bet is on AI R&D automation by mid 2026.

Impact in AI Safety Now Requires Specific Strategic Insight

Contents

The strategy ball

Why untargeted career plans fail

Untargeted technical safety research fails to be impactful

Untargeted policy and governance work fails to be impactful

General problems with untargeted planning

Conclusion

1 comments