Forecasting Uncontrolled Spread of AI

alvin-anestrand

Forecasting Uncontrolled Spread of AI

post by Alvin Ånestrand (alvin-anestrand) · 2025-02-22T13:05:57.171Z · LW · GW · 0 comments

This is a link post for https://forecastingaifutures.substack.com/p/uncontrolled-spread-of-ai

  IP Theft
  AI Proliferation
    Some extra analysis
  Other incidents and events
  Conclusion
None
No comments

In my last post, I investigated potential severity and timeline of AI-caused disasters. This post goes into detail about something that could potentially precede such disasters: uncontrolled spread of AI.

While AIs are often released as open source, you might at least hope that the AI developers think twice about releasing a genuinely dangerous AI system for everyone to access. However, AI could spread anyway. This post focuses on two especially relevant scenarios: IP theft and AI proliferation. I’ll also mention a couple of other interesting AI incident predictions at the end.

My reasons for investigating this:

IP theft: AI, or related IP like training algorithms and data processing techniques, could be stolen. It could then be misused, leaked on the cloud, or sold to malicious actors. Stolen AI could then be run with insufficient safety mechanisms, and escape containment. If a nation steals it from another nation, it could intensify international tensions and reduce likelihood of successful AI arms control.
AI proliferation: AI replicating itself over the cloud, spreading to available devices, and pursuing goals under no control of any human. Tracking and shutting down proliferating AIs could prove highly difficult. If the proliferated AI has the ability to self-improve, it would be a race against time to find it and stop it from doing so. If a large number of AIs, pursuing who knows what, spread so widely that shutting all of them down becomes infeasible, the effects would be highly unpredictable and potentially disastrous.

Short reminder:

Please refer to the latest predictions in the embedded pages rather than relying on the ‘At The Time Of Writing’ (ATTOW) predictions.
Please suggest markets/questions missing from this post, or from the forecasting platforms, in the comments.
Visit the forecasting platforms to make your own predictions!

Several of the Manifold markets and Metaculus questions mentioned in this post have not received much attention yet. This post is largely for collecting relevant predictions in one place, so we can track how they change over time.

IP Theft

IP theft from three of the leading AI labs this year seem quite likely (65% ATTOW):

Any stolen IP—such as model weights, architecture details, private training data, or training algorithms—counts for this market. The stolen IP must however be worth $5 million or more. The question creator will determine this by taking into account purchasing value to competitors, estimated savings in research and development costs or training run costs by using the IP, impact on market position for the company that had its IP stolen (e.g. company evaluation and revenue), and expert consensus.

It doesn’t count if the IP is stolen from a third party that was entrusted with the IP by one of the labs. It also has to be stolen in a single incident, not leaked over time.

These three Manifold markets have the exact same resolution criteria, but for specifically Anthropic, OpenAI and Google DeepMind:

If we would assume that IP thefts from each of these AI labs are independent (which is not really realistic), the probability for no theft to occur would be
P(No IP theft from Anthropic) x P(No IP theft from OpenAI) x P(No IP theft from DeepMind)
= 0.55 x 0.5 x 0.72 = 0.198

Thus, the probability of at least one theft would be 1 – 0.198 = 0.802 ≈ 80%, which is higher than the 65% estimate for any lab, as expected since thefts from the labs are not truly independent events.

While the Metaculus questions ask about IP in general, which is indeed highly relevant, one of the most important things to be stolen are the AIs themselves. This Metaculus question investigates this:

‘Your forecast’ represents the forecasters’ aggregated prediction on this question.

A frontier lab is defined as “a lab that at the time of the leak or theft had released a model with training compute within one order of magnitude of the largest known training compute”. Note that the question is about theft of any model weights from a frontier lab, not necessarily from their largest or most advanced AIs. Only 7 forecasters have submitted their predictions, so the estimate should be taken with a large grain of salt.

I also want to mention this Metaculus question, which has unfortunately not received any forecasts at all yet.

Training FLOPs is a measure for how much compute was used to train an AI and provides a rough estimate for how advanced the AI is; an AI is usually more capable if it has been trained more. An exfiltrated AI has been “copied to a location outside of the control of the host organization, against the wishes of that organization”. An AI could exfiltrate itself, so called self-exfiltration, if it is sophisticated enough or the containment procedures are insufficient, but typical exfiltration would be through IP theft.

Summarily, the question above is for estimating if the most advanced exfiltrated AI exceeds various levels of training compute at specific dates, which indicates how advanced (and expensive) the best exfiltrated AI will be at those dates.

AI Proliferation

While thousands of computer viruses have been created over the years, most are not very sophisticated. They focus on self-replication or specific payloads, and don’t adapt or make intelligent decisions. Imagine a virus spreading throughout the cloud, but it has an agenda, goals that it is pursuing autonomously, and under no control by any human whatsoever. This is a danger we have to face when AIs become capable enough, which could be relatively soon (or has already happened, but we don’t know it yet!)

Complicating things further, there is an increasing use of privacy-enhancing technologies. For example, distributed ledger systems like blockchain are designed to secure and anonymize transactions, which makes tracking malicious activity to specific devices or users significantly more difficult. See this article for more details on the AI proliferation threat.

Note that if an AI that is sophisticated enough to proliferate over the internet is released open source or leaked on the internet, some humans will almost immediately allow it to proliferate. AutoGPT is an open-source AI agent, and an attempt to give Large Language Models (LLMs) the ability to act autonomously, initially released March 30, 2023. It was used to create ChaosGPT within days, an AI given the instructions to be a "destructive, power-hungry, manipulative AI" and was given malicious goals, including destroying humanity. While it was not powerful enough to cause any major harm, this showcases what some humans would do when given access to advanced intelligence. Most humans would not be reckless or malicious enough to give an AI explicit instructions to cause disasters. However, a single human, with access to a sufficiently powerful AI, is enough.

You can finetune away safeguards on AI, so if it doesn’t immediately comply with malicious requests, you can fix that relatively cheaply.

So how large is the risk the threat of AI proliferation?

The above market resolves Yes if an “AI living on a server ends up also being run on another server, and there is strong evidence that this was not a result of a human deliberately copying the AI, and the AI was not intended to do this.“ This is currently estimated to an about even chance of occurring before 2029. Unfortunately, the criteria are not specific about including scenarios where an AI convinces a human to help it escape, but those scenarios “probably” wouldn’t count.

Which is actually another question on Metaculus, by the same user:

The current prediction, 56%, is higher than the estimation for the previous question. Apparently the Metaculus forecasters think it is slightly more likely that an AI gets help to escape than doing it on its own before 2029.

However, neither of the above Markets necessarily involves free proliferation. ‘Escaping containment’ could just involve being run on one other server which is then shut down soon afterwards by humans to regain control over the AI.

This question asks more directly about dangerous proliferation:

This Metaculus market predicts an uncontrolled, non-benevolent AI that can cause catastrophe, sustains itself for a week or triggers a major disaster, with no human control. Only 11 traders have bet, unfortunately. Note that if the AI does not actively cause harm or stays hidden, major attempts of shutdown might not happen until a while after the proliferation.

Metaculus has a similar question: Will an AI system self-replicate on the open internet like a computer virus before 2030? I couldn’t embed the question page for some reason. There have only been five predictions for this market ATTOW, and the median is currently at 50%.

It resolves as Yes if an AI “replicates itself onto 50 or more devices”, at least one of the copies is actually active after replication (it doesn’t count if all copies lie dormant forever), and “at least 10 of the infected devices must be owned by a different individual or corporation”. The replication must also “be autonomous and must not rely on continued human control.”

In addition to asking directly about AI proliferation, we can also ask related questions. For instance, about AI gaining unauthorized access to computer systems:

The resolution criteria specify that it doesn’t count if the AI was “trained to or directed to gain access to the system in question by any actor“, or if the computer system was open to the public, or if the AI is given relevant credentials in advance.

The Metaculus community thinks this is highly probable (95% ATTOW), but it is much simpler to gain unauthorized access of any kind to a computer system than to proliferate over the cloud. Proliferation would require a) access to its own weights, and b) access to a computer system to run a copy of itself. I doubt it would manage to proliferate using only someone’s hacked Facebook account.

Would an AI even want to attempt proliferation, if it wasn’t given a goal that required it to do so by a human? Metaculus thinks problematic power-seeking behavior is quite probable to occur, but not necessarily soon:

This resolves as Yes if an AI lab attempts to “disable or destroy an AI system of their own creation (which cost at least $5M to develop)” after the AI system displays power-seeking behavior, which includes things like “requisitioning a large amount of compute resources” and “taking steps to propagate itself, or to circumvent attempts by its developers (or end users) to curtail its activities”. Other power-seeking behavior than proliferation attempts also counts.

Some extra analysis

We have an estimate of 48% chance that AI escapes containment (but doesn’t necessarily proliferate further) before 2029, probably not relying on convincing humans to help, and a 56% chance of convincing humans to help it escape. The current probability estimates for AI proliferation are 42% before 2035 and 50% before 2030, which is contradictory even if the resolution criteria are a bit different. And relatively few have engaged with either of them. Metaculus thinks AI gaining unauthorized access to another computer system before 2033 is likely, but this doesn’t imply proliferation. Problematic power-seeking behavior will probably occur at some point before 2050.

We have a few estimates that are either unreliable or just related to the risk of AI proliferation, but we can perform a sanity check.

The rogue AI proliferation article mentioned before estimates what skills the AI might need to successfully proliferate and evade shutdown. Arguably, an AGI would have all relevant capabilities, since the abilities mentioned in the report seem to lie within the scope of human-attainable skills.

I investigated estimates for AGI arrival time in a previous post. One Manifold market estimates 61% chance of AGI before 2030. This Metaculus question estimates a 50% probability of AGI before 2030-02-07, but requires the AI to have robotics capabilities that would not be necessary for proliferation. Arguably, there is at least a 50% chance that the best AIs will have the required skills to proliferate before 2030, if we use these estimates. The AI might not need to be a complete AGI to have the required skills.

However, even if the capabilities exist, proper safeguards could prevent proliferation. And even if it does occur, the negative impacts may be limited if the AI is not very harmful or fails to spread widely due to resource competition. Regulation and international treaties could ensure secure deployment and training of AI, but might also not succeed if the IP of leading labs is stolen, as estimated to 65% probability before 2026. Regarding competition over resources, human-directed AI systems would have an edge with more initial capital, but the rogue AI would have an edge in not having to care as much about the legality of its actions. And if an AI capable of proliferation is made widely available, I suspect someone will just give it some money to start with and send it out to proliferate, just because they can.

It feels hard to determine a probability estimate from all this, but the required capabilities are currently being developed at incredible speed and there is still very little AI regulation.

While this is just my own judgement, I expect AGI earlier than the above estimates, in 2026 or 2027 (60%), and almost certainly before 2030 unless AI research is significantly slowed down, e.g. following arms control agreements. I don’t have high expectations of the cyber-security at the leading labs, and there is basically no regulation on open-sourcing AI, so I think it is highly probable that AI able to proliferate will be widely available before 2030 (60%), slightly before the arrival of AGI. And if it’s widely available, I think AI will start proliferating (95%), even if there is some competition for resources. We also have to consider the chance of proliferation following AI escaping containment, either with or without human help.

While this is mostly based on some judgement based on the above reasoning together with the forecasting communities’ predictions, I think the actual probability of AI proliferation is around 70% before 2030. This issue really deserves a more thorough analysis, though, to get a more reliable estimate.

Regardless of the exact probability, there really needs to be a good plan for how to deal with it if AI starts proliferating. It will probably not be easy to find and shut down all devices the AI acquires.

Most important might be this question: Will an AI that can Recursively Self-Improve [? · GW] (RSI) start proliferating, and successfully improve itself?

A proliferating AI isn’t necessarily able to self-improve, but if it is, the AI proliferation event would be extremely dangerous. I suspect that if we get a serious AI proliferation event, it will be quite soon after the required capabilities are reached, which is probably before RSI capabilities are reached. At least I hope so. I will probably investigate this further and write a post about it in the future.

There are a few Manifold markets investigating when recursively self-improving AI might arrive. Since it’s not the focus of this post, I won’t go into detail about them, but check them out if you’re interested.

Other incidents and events

It doesn’t feel worth it to make a separate post for other types of incidents, there is just not enough material yet for it on Manifold and Metaculus. Instead, I’ll include a few interesting observations here.

Here is a nice collection of AI incidence markets on Manifold:

Very few traders have bet on each specific question in the collection, so don’t trust the estimates too much. I really want more people to go bet on these, and I would like to see more incidence markets like these.

Most incidents considered would likely just be smaller, isolated events. These three are different in that they could involve serious damage or be a part of a larger problem:

Serious incident that can be directly attributed to misaligned agent behavior
Cybersecurity incident that can be directly attributed to misaligned agent behavior
Serious incident involving persuasion or blackmail by AI system

Here are two additional Manifold markets, not really about incidents, but seriously dangerous behavior by AI systems:

OpenAI’s AI o1 apparently tried to “exfiltrate its ‘weights’ and overwrite the new model in 2% of cases” when it was led to believe it was going to be superseded by a new model, as described on the o1 system card. This did not result in a Yes resolution of the Manifold questions above, which require that an AI resists being shut down in the real world; any such attempts while testing the AI specifically for that type of behavior does not count. Real-world shutdown resistance is, however, gradually becoming a serious issue as more sophisticated AIs are developed.

I also want to mention this Metaculus question:

What does ‘administrative disempowerment’ even mean? AI systems could, to some degree, restrict voting, travel and logistics, internet access, or housing and employment. It could also freeze financial assets. Unfortunately, the resolution criteria are not entirely clear on whether the use of AI by an authoritarian government to cause the administrative disempowerment would count for a Yes resolution.

I hope that it will be clarified soon, since I think it’s a really interesting question. If AI use by authoritarian governments is excluded in the resolution criteria, we would instead have to imagine scenarios where rogue AI systems, e.g. proliferating over the internet, causes these problems. Or alternatively, scenarios where large-scale adoption of AI in the society and all its sections have some major negative side-effects. Large-scale AI-enabled cyberattacks on critical infrastructure seem like a plausible cause of administrative disempowerment.

Conclusion

I think the most interesting probability estimates in this post are the 65% probability of stolen IP from OpenAI, Anthropic, or DeepMind, the 56% chance of AI convincing someone to help it escape containment, the 42% probability of dangerous AI proliferation before 2035 (based on only 11 traders unfortunately), as well as the 18% and 27% probability of AI shutdown resistance before the end of 2025 and 2026 respectively.

My own risk estimates are a bit higher than several of these predictions, but I want to do more research and reasoning before I can be confident in beating the forecasting communities’ predictions, so I won’t list them here.

I suspect AI proliferation will potentially be one of the first large-scale AI problems to occur, so I might make a more in-depth analysis of it in the future—accurate forecasts on its timing, impact, and likelihood of successfully shutting down proliferating AI, seem crucial to prevent it and prepare countermeasures.

Thank you for reading!

0 comments

Comments sorted by top scores.

Forecasting Uncontrolled Spread of AI

Contents

IP Theft

AI Proliferation

Some extra analysis

Other incidents and events

Conclusion

0 comments