AI companies’ unmonitored internal AI use poses serious risks

post by sjadler · 2025-04-04T18:17:46.924Z · LW · GW · 2 comments

This is a link post for https://stevenadler.substack.com/p/ai-companies-unmonitored-internal

Contents

2 comments

AI companies’ unmonitored internal AI use poses serious risks
AI companies use their own models to secure the very systems the AI runs on. They should be monitoring those interactions for signs of deception and misbehavior.

AI companies use powerful unreleased models for important work, like writing security-critical code and analyzing results of safety tests. From my review of public materials, I worry that AI is largely unmonitored when AI companies use it internally today, despite its role in important work.

Right now, unless an AI company specifically says otherwise, you should assume they don’t have visibility into all the ways their models are being used internally and the risks these could pose.

In many industries, using an unreleased product internally can’t harm third-party outsiders. A car manufacturer might test-drive its new prototype around the company campus, but civilian pedestrians can’t be hurt—no matter how many manufacturing defects—because the car is contained to the company’s facilities.

The AI industry is different—more like biology labs testing new pathogens: The biolab must carefully monitor and control conditions like air pressure, or the pathogen might leak into the external world. Same goes for AI: If we don’t keep a watchful eye, the AI might be able to escape—for instance, by exploiting its role in shaping the security code meant to keep the AI locked inside the company’s computers.

The minimum improvement we should ask for is retroactive internal monitoring: AI companies logging all internal use of frontier models by their company, analyzing the logs (e.g., for signs of the AI engaging in scheming), and following up on concerning patterns.

An even stronger vision—likely necessary in the near future, as AI systems become more capable—is to wrap all internal uses of frontier AI within real-time control systems that can intervene on the most urgent interactions.

In this post, I’ll cover:

(If you aren’t sure what I mean by “monitoring”—logging, analysis, and enforcement—you might want to check out my explainer first.)

[ continues on Substack ]

2 comments

Comments sorted by top scores.

comment by ProgramCrafter (programcrafter) · 2025-04-06T09:55:03.739Z · LW(p) · GW(p)

The AI industry is different—more like biology labs testing new pathogens: The biolab must carefully monitor and control conditions like air pressure, or the pathogen might leak into the external world.

This looks like a cached analogy, because given previous paragraph this would fit more: "the AI industry is different, more like testing planes; their components can be internally tested, but eventually the full plane must be assembled and fly, and then it can damage the surroundings".

Same goes for AI: If we don’t keep a watchful eye, the AI might be able to escape—for instance, by exploiting its role in shaping the security code meant to keep the AI locked inside the company’s computers.

This seems very likely to happen! We've already seen that AIs improving code can remove its essential features, so it wouldn't be surprising for some security checks to disappear.

Based on this, I've decided to upvote this post weakly only.

Replies from: sjadler
comment by sjadler · 2025-04-06T17:27:17.394Z · LW(p) · GW(p)

I appreciate the feedback. That’s interesting about the plane vs. car analogy - I tended to think about these analogies in terms of life/casualties, and for whatever reason, describing an internal test-flight didn’t rise to that level for me (and if it’s civilian passengers, that’s an external deployment). I also wanted to convey the idea not just that internal testing could cause external harm, but that you might irreparably breach containment. Anyway, appreciate the explanation, and I hope you enjoyed the post overall!