Some of my predictable updates on AI

post by Aaron_Scher · 2023-10-23T17:24:34.720Z · LW · GW · 8 comments

Contents

  Introduction
    tldr
  Some big policy stuff will happen
  AI labs will agree on some safety standards 
  AI labs will compete on some safety things 
  Task-oriented / Agentic LLMs will be a bigger deal 
  Misuse threats will be a big deal 
  Likely major foreign interference in US 2024 elections 
  We’ll make significant progress on “alignment” for current AIs
  Conclusion
None
8 comments

Introduction

Author note: I’m struggling to write this in a way I’m happy with. Rather than having it sit in my drafts, I’m going to post in an unfinished state now. I don’t think I endorse sharing/upvoting this much, but if you find the content particularly compelling feel free to overrule me. Epistemic status: speculative and imprecise forecasts

Joe Carlsmith has a lengthy blog post about predictably updating on AI risk [LW · GW]. I skimmed it and found it interesting. This post includes some of my predictions about AI and AI risk in the next year or so, including what I expect to happen and how I think I should update if my predictions are wrong. Partially I’m writing this to force myself to make predictions, partially I’m writing to get feedback from others about my predictions, and partially I’m writing to try and spread these predictions and their associated updates with others concerned about AI existential safety. This is a suspicious activity partially because it’s really hard to currently know how future events should affect my beliefs. i.e., I currently expect a thing like election interference to not change my beliefs much, but there may be unpredicted-by-me circumstances which actually make election interference evidence very important; doing this exercise might make it more difficult to change my beliefs properly later. 

For each item, I’ll note:

tldr

In short, here are some things I expect: 

Overall, I expect the most significant updates on the table in the next year are related to how seriously AGI labs seem to be taking existential safety (where I expect some positive signs), and how AI alignment research is going (where I expect pretty good results). 

Now let’s get into the specifics.

Some big policy stuff will happen

AI labs will agree on some safety standards 

AI labs will compete on some safety things 

Task-oriented / Agentic LLMs will be a bigger deal 

Misuse threats will be a big deal 

Likely major foreign interference in US 2024 elections 

We’ll make significant progress on “alignment” for current AIs

Conclusion

Writing this list has forced me to think about the future in a way I don’t usually do, which seems useful. On the whole, I expect to see some positive-but-not-amazing signs in the next year. Agentic AIs are probably going to happen soon, plausibly in the next year, and they’re gonna be wild/scary. On the other hand, I expect we’ll make significant progress on alignment and labs will seem to be taking existential risk (including misuse and misalignment) seriously. 


 

8 comments

Comments sorted by top scores.

comment by 1a3orn · 2023-10-23T18:12:37.395Z · LW(p) · GW(p)

LLMs that significantly help with the creation of bio weapons are 2-3 years away, according to Dario Amodei; hacking capabilities are probably around the same or sooner

So, I note that LLMs only significantly increase the risk of bioterrorist attacks if, indeed, such attacks are currently bottlenecked on knowledge of lab procedures, etc, that LLMs could provide. They could be also bottlenecked on any of the other steps involved -- any estimate that LLMs do increase the risk assumes that this is the case.

I am unaware of any paper arguing that such knowledge is indeed the bottleneck.

Note that we also have evidence that such attacks are not currently bottlenecked on such knowledge, but on the (many) other steps involved. He's a paper from the Future of Humanity Institute that argues that, for instance. So, if the paper is correct, open source LLMs do not contribute to biorisk substantially.

Even apart from the paper, that's also my prior view -- given the relatively weak generalization abilities of LLMs, if they could contribute to knowledge of lab procedures, then the knowledge is already out there and not-too-hard to find.

(The pattern of discourse around biorisk and LLMs looks much more like "ban open source LLMs" was the goal and "use biorisk concerns" was the means, rather than "decrease biorisk" was the goal and "ban open source LLMs was the means." I'm not saying this about you, to be clear -- I'm saying this about the relevant thought-leaders / think tanks who keep mentioning biorisk.)

It now looks like we’re probably going to get medium warning shots on the scale of hundreds of millions of dollars or hundreds of deaths, due to AI-enabled attacks in the next few years. I’m slightly surprised we haven’t seen effective misuse of current open source LLMs, but this seems like mostly a matter of time.

I don't know what kind of AI-enabled-cyber-attacks causing hundreds of deaths you mean. Right now, if I want to download penetration tools to hack other computers without using any LLM at all I can just do so. What kind of misuse of current open source LLMs, enabling hundreds of deaths, did you expect to have seen?

Replies from: Aaron_Scher, Aaron_Scher
comment by Aaron_Scher · 2023-10-23T22:14:22.392Z · LW(p) · GW(p)

Thanks for your comment!

I appreciate you raising this point about whether LLMs alleviate the key bottlenecks on bioterrorism. I skimmed the paper you linked, thought about my previous evidence, and am happy to say that I'm much less certain that I was before. 

My previous thinking for why I believe LLMs exacerbate biosecurity risks:

  • Kevin Esvelt said so on the 80k podcast, see also the small experiment he mentions. Okay evidence. (I have not listened to the entire episode)
  • Anthropic says so in this blog post. Upon reflection I think this is worse evidence than I previously thought (seems hard to imagine seeing the opposite conclusion from their research, given how vague the blog post is; access to their internal reports would help). 

The Montague 2023 paper you link: the main bottleneck to high consequence biological attacks is actually R&D to create novel threats, especially spread testing which needs to take part in a realistic deployment environment. This requires both being purposeful and being having a bunch of resources, so policies need not be focused on decreasing democratic access to biotech and knowledge. 

I don't find the paper's discussion of 'why extensive spread testing is necessary' to be super convincing, but it's reasonable and I'm updating somewhat toward this position. That is, I'm not convinced either way. I would have a better idea if I knew how accurate a priori spread forecasting has been for bioweapons programs in the past. 

I think the "LLMs exacerbate biosecurity risks" still goes through even if I mostly buy Montague's arguments, given that those arguments are partially specific to high consequence attacks. Additionally, Montague is mainly arguing that democratization of biotech / info doesn't help with the main barriers, not that it doesn't help at all:

The reason spread-testing has not previously been perceived as a the defining stage of difficulty in the biorisk chain (see Figure 1), eclipsing all others, is that, until recently, the difficulties associated with the preceding steps in the risk chain were so high as to deter contemplation of the practical difficulties beyond them. With the advance of synthetic biology enabled by bioinformatic inferences on ‘omics data, the perception of these prior barriers at earlier stages of the risk chain has receded.

So I think there exists some reasonable position (which likely doesn't warrant focusing on LLM --> biosecurity risks, and is a bit of an epistemic cop out): LLMs increase biosecurity risks marginally even though they don't affect the main blockers for bio weapon development. 

Thanks again for your comment, I appreciate you pointing this out. This was one of the most valuable things I've read in the last 2 weeks. 

comment by Aaron_Scher · 2023-10-23T22:15:41.420Z · LW(p) · GW(p)

Second smaller comment:

I'm not saying LLMs necessarily raise the severity ceiling on either a bio or cyber attack. I think it's quite possible that AIs will do so in the future, but I'm less worried about this on the 2-3 year timeframe. Instead, the main effect is decreasing the cost of these attacks and enabling more actors to execute such attacks. (as noted, it's unclear whether this substantially worsens bio threats) 

if I want to download penetration tools to hack other computers without using any LLM at all I can just do so

Yes, it's possible to launch cyber attacks currently. But with AI assistance it will require less personal expertise and be less costly. I am slightly surprised that we have not seen a much greater amount of standard cybercrime (the bar I was thinking when I wrote this was not the hundreds of deaths bar, it was more like "statistically significant increase in cybercrime / serious deepfakes / misinformation, in a way that concretely impacts the world, compared to previous years"). 

comment by Vladimir_Nesov · 2023-10-24T01:18:48.786Z · LW(p) · GW(p)

Very complicated ways of facilitating agency seem feasible. There's Imbue doing some CoEm-sounding things (debuggable planning in natural language that's possible to inspect and intervene on), without a clear stance on extinction risk implications. It might turn out that there is a whole operating system's worth of engineering effort that's useful for turning basic LLM-style capabilities into coherent autonomous reasoning.

This is mostly irrelevant for capabilities if scaling gets to AGI on its own, but if that doesn't happen in the next few years [LW(p) · GW(p)], extremely complicated agency engineering efforts might become more important than further scaling, giving CoEm-like or even CAIS [LW · GW]-like systems an opportunity to determine the safety properties of first AGIs.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-10-23T20:11:50.402Z · LW(p) · GW(p)
  • Some big policy stuff will happen, and it will seem neutral or positive
  • AI labs will agree on some safety standards which will seem positive
  • AI labs might compete on some safety things, but I expect a mediocre outcome
  • Task-oriented / agentic LLMs will be a bigger deal, and this will be scary
  • Misuse threats will be a big deal, and many more people will care about this
  • AI will enable interference in the US 2024 elections, but the policy effects are the main part I care about
  • We’ll make significant progress on alignment for current AIs which will look optimistic


Great list. I also expect these things.

comment by IKumar · 2023-10-24T10:12:43.270Z · LW(p) · GW(p)

I like the style of this post, thanks for writing it! Some thoughts:

model scaling stops working

Roughly what probability would you put on this? I see this as really unlikely (perhaps <5%) such that ‘scaling stops working’ isn’t part of my model over the next 1-2yrs.
 

I will be slightly surprised if by end of 2024 there are AI agents running around the internet that are meaningfully in control of their own existence, e.g., are renting their own cloud compute without a human being involved.

Only slightly surprised? IMO being able to autonomously rent cloud compute seems quite significant (technically and legally), and I’d be very surprised if something like this happened on a 1yr horizon. I’d be negatively surprised if the US government didn’t institute regulation on the operation of autonomous agents of this type by the end of 2024, basically due to their potential for misuse and their economic value. It may help to know how you're operationalizing AIs that are ‘meaningfully aware of their own existence’.

Replies from: Aaron_Scher, Aaron_Scher
comment by Aaron_Scher · 2023-10-24T16:02:10.289Z · LW(p) · GW(p)

I think it's pretty unlikely that scaling literally stops working, maybe I'm 5-10% that we soon get to a point where there are only very small or negligible improvements to increasing compute. But I'm like 10-20% on some weaker version. 

A weaker version could look like there are diminishing returns to performance from scaling compute (as is true), and this makes it very difficult for companies to continue scaling. One mechanism at play is that the marginal improvements from scaling may not be enough to produce the additional revenue needed to cover the scaling costs, this is especially true in a competitive market where it's not clear scaling will put one ahead of their competitors. 

In the context of the post, I think it's quite unlikely that I see strong evidence in the next year indicating that scaling has stopped (if only because a year of no-progress is not sufficient evidence). I was more so trying to point to how there [sic] are contingencies which would make OpenAI's adoption of an RSP less safety-critical. I stand by the statements that scaling no longer yielding returns would be such a contingency, but I agree that it's pretty unlikely. 

comment by Aaron_Scher · 2023-10-24T16:32:05.340Z · LW(p) · GW(p)

Only slightly surprised?

We are currently at ASL-2 in Anthropic's RSP. Based on the categorization, ASL-3 is "low-level autonomous capabilities". I think ASL-3 systems probably don't meet the bar of "meaningfully in control of their own existence", but they probably meet the thing I think is more likely:

I think it wouldn’t be crazy if there were AI agents doing stuff online by the end of 2024, e.g., running social media accounts, selling consulting services; I expect such agents would be largely human-facilitated like AutoGPT

I think it's currently a good bet (>40%) that we will see ASL-3 systems in 2024. 

I'm not sure how big of a jump if will be from that to "meaningfully in control of their own existence". I would be surprised if it were a small jump, such that we saw AIs renting their own cloud compute in 2024, but this is quite plausible on my models. 

I think the evidence indicates that this is a hard task, but not super hard. e.g., looking at ARC's report on autonomous tasks, one model partially completes the task of setting up GPT-J via a cloud provider (with human help).

I'll amend my position to just being "surprised" without the slightly, as I think this better captures my beliefs — thanks for the push to think about this more. Maybe I'm at 5-10%.
 

It may help to know how you're operationalizing AIs that are ‘meaningfully aware of their own existence’.

shrug, I'm being vague