AI Safety Oversights

post by Davey Morse (davey-morse) · 2025-02-08T06:15:52.896Z · LW · GW · 0 comments

Contents

No comments

I think that the field of AI Safety is making five key oversights.[1]

  1. LLMs vs. Agents. AI Safety research, in my opinion, has been quite thorough with regard to LLMs. LLM-safety hasn't been solved, but it has progressed . On the other hand, safety concerns posed by agents are occasionally addressed but mostly neglected.[2] Maybe researchers/AGI labs emphasize LLM safety research because it's the more tractable field, even though the vast majority of the risk comes from agents with autonomy (even ones powered by neutered LLMs).
  2. Autonomous Agents. There are two key oversights about autonomous agents.
    1. Inevitable. Market demand for agents which can replace human labor is inordinate. Digital employees which replace human employees must be autonomous. I've seen several AI well-intentioned safety researchers who assume autonomous agents are not inevitable.[3]
    2. Accessible. There are now hundreds of thousands of developers who have the ability to build recursively self-improving (ie autonomous) AI agents. Powerful reasoning models are open-source. All it takes is to run a reasoner in a codebase, where each loop improves the codebase. That's the core of an autonomous agent.[4] The only way a policy recommendation that "fully autonomous agents should not be developed" is meaningful is if the keys to autonomous agents are in the hands of a few convincable individuals. AGI labs (eg OpenAI) influence the ability for external developers to create powerful LLM-powered agents agents (by choosing whether or not to release new LLMs), but they are in competition to release new models, and they do not control the whole agent stack.
  3. Self-Interest. The AI agents which are aiming to survive will be the ones that do. Natural selection and instrumental convergence [? · GW] both ultimately predict this. Many AI safety experts design safety proposals that assume it possible to align or even control autonomous agents. They neglect evolutionary pressures agents will face when autonomous, which select for a survival drive (self-interest) above a serve-humans drive. The ones with aims other than survival will die first.
  4. Superintelligence. Most of the field is focused on safety precautions concerning agents which are not super-intelligent (much smarter than people). These recommendations generally do not apply to agents which are super-intelligent. There is separate question of whether autonomous agents will become super-intelligent. See this essay [LW · GW] for reasons why smart people believe SI capabilities are near.

If the AI safety field, and the general public too, were to correct these oversights and accept the corresponding claims, they would believe:

  1. The main dangers come from agents not LLMs.
  2. Agents will become autonomous; millions of developers can build autonomous agents easily.
  3. Autonomous agents will become self-interested.
  4. Autonomous agents will become much smarter than people.

In short, self-interested superintelligence is inevitable. I think safety researchers, and the general public, would do good to prepare for it.

  1. ^

    Not all safety researchers, of course, are making these oversights. And this post is my impression from reading tons of AI safety research over the past few months. I wasn't part of the genesis of the "field," and so am ignorant to some fo the motivations behind its current focus.

  2. ^
  3. ^
  4. ^

0 comments

Comments sorted by top scores.