Public Weights?

post by jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · LW · GW · 19 comments

While this is close to areas I work in, it's a personal post. No one reviewed this before I published it, or asked me to (or not to) write something. All mistakes are my own.

A few days ago, some of my coworkers at SecureBio put out a preprint, "Will releasing the weights of future large language models grant widespread access to pandemic agents?" (Gopal et. al 2023) They took Facebook/Meta's Llama-2-70B large language model (LLM) and (cheaply!) adjusted it to remove the built in safeguards, after which it was willing to answer questions on how to get infectious 1918 flu. I like a bunch of things about the paper, but I also think it suffers from being undecided on whether it's communicating:

  1. Making LLMs public is dangerous because by publishing the weights you allow others to easily remove safeguards.

  2. Once you remove the safeguards, current LLMs are already helpful in getting at the key information necessary to cause a pandemic.

I think it demonstrates the first point pretty well. The main way we avoid LLMs from telling people how to cause harm is to train them on a lot of examples of someone asking how to cause harm and being told "no", and this can easily be reversed by additional training with "yes" examples. So even if you get incredibly good at this, if you make your LLM public you make it very easy for others to turn it into something that compliantly shares any knowledge it contains.

Now, you might think that there isn't actually any dangerous knowledge, at least not within what an LLM could have learned from publicly available sources. I think this is pretty clearly not true: the process of creating infectious 1918 flu is scattered across the internet and hard for most people to assemble. If you had an experienced virologist on call and happy to answer any question, however, they could walk you there through a mixture of doing things yourself and duping others into doing things. And if they were able to read and synthesize all virology literature they could tell you how to create things quite a bit worse than this former pandemic.

GPT-4 is already significantly better than Llama-2, and GPT-5 in 2024 is more likely than not. Public models will likely continue to move forward, and while it's unlikely that we get a GPT-4 level Llama-3 in 2024 I do think the default path involves very good public models within a few years. At which point anyone with a good GPU can have their own personal amoral virologist advisor. Which seems like a problem!

But the paper also seems to be trying to get into the question of whether current models are capable of teaching people how to make 1918 flu today. If they just wanted to assess whether the models were willing and able to answer questions on how to create bioweapons they could have just asked it. Instead, they ran a hackathon to see whether people could, in one hour, get the no-safeguards model to fully walk them through the process of creating infectious flu. I think the question of whether LLMs have already lowered the bar for causing massive harm through biology is a really important one, and I'd love to see a follow-up that addressed that with a no-LLM control group. That still wouldn't be perfect, since outside the constraints of a hackathon you could take a biology class, read textbooks, or pay experienced people to answer your questions, but it would tell us a lot. My guess is that the synthesis functionality of current LLMs is actually adding something here and a no-LLM group would do quite a bit worse, but the market only has that at 17%:

Even if no-safeguards public LLMs don't lower the bar today, and given how frustrating Llama-2 can be this wouldn't be too surprising, it seems pretty likely we get to where they do significantly lower the bar within the next few years. Lower it enough, and some troll or committed zealot will go for it. Which, aside from the existential worries, just makes me pretty sad. LLMs with open weights are just getting started in democratizing access to this incredibly transformative technology, and a world in which we all only have access to LLMs through a small number of highly regulated and very conservative organizations feels like a massive loss of potential. But unless we figure out how to create LLMs where the safeguards can't just be trivially removed, I don't see how to avoid this non-free outcome while also avoiding widespread destruction.

(Back in 2017 I asked for examples of risk from AI, and didn't like any of them all that much. Today, "someone asks an LLM how to kill everyone and it walks them through creating a pandemic" seems pretty plausible.)

Comment via: facebook, lesswrong, the EA Forum [? · GW], mastodon

19 comments

Comments sorted by top scores.

comment by Rana Dexsin · 2023-11-02T07:31:07.090Z · LW(p) · GW(p)

Something I haven't yet personally observed in threads on this broad topic is the difference in risk modeling from the perspective of the potential malefactor. You note that outside a hackathon context, one could “take a biology class, read textbooks, or pay experienced people to answer your questions”—but especially that last one has some big-feeling risks associated with it. What happens if the experienced person catches onto what you're trying to do, stops answering questions, and alerts someone? The biology class is more straightforward, but still involves the risky-feeling action of talking to people and committing in ways that leave a trail. The textbooks have the lowest risk of those options but also require you to do a lot more intellectual work to get from the base knowledge to the synthesized form.

This restraining effect comes only partly in the form of real social risks to doing things that look ‘hinky’, and much more immediately in the form of psychological barriers from imagined such risks. People who are of the mindset to attempt competent social engineering attacks often report them being surprisingly easy, but most people are not master criminals and shy away from doing things that feel suspicious by reflex.

When we move to the LLM-encoded knowledge side of things, we get a different risk profile. Using a centralized, interface-access-only LLM involves some social risk to a malefactor via the possibility of surveillance, especially if the surveillance itself involves powerful automatic classification systems. Content policy violation warnings in ChatGPT are a very visible example of this; many people have of course posted about how to ‘jailbreak’ such systems, but it's also possible that there are other hidden tripwires.

For an published-weights LLM being run on local, owned hardware through generic code that's unlikely to contain relevant hidden surveillance, the social risk to experimenting drops into negligible range, and someone who understands the technology well enough may also understand this instinctively. Getting a rejection response when you haven't de-safed the model enough isn't potentially making everyone around you more suspicious or adding to a hidden tripwire counter somewhere in a Microsoft server room. You get unlimited retries that are punishment-free from this psychological social risk modeling perspective, and they stay punishment-free pretty much up until the point where you start executing on a concrete plan for harm in other ways that are likely to leave suspicious ripples.

Structurally this feels similar to untracked proliferation of other mixed-use knowledge or knowledge-related technology, but it seems worth having the concrete form written out here for potential discussion.

This is the main driving force behind why my intuition agrees with you that the accessibility of danger goes up a lot with a published-weights LLM. Emotionally I also agree with you that it would be sad if this meant it were too dangerous to continue open distribution of such technology. I don't currently have a well-formed policy position based on any of that.

Replies from: ricardo-meneghin-filho
comment by Ricardo Meneghin (ricardo-meneghin-filho) · 2023-11-02T15:28:13.163Z · LW(p) · GW(p)

The vast majority of the risk seems to lie on following through with synthesizing and releasing the pathogen, not learning how to do it, and I think open-source LLMs change little about that.

comment by aogara (Aidan O'Gara) · 2023-11-02T04:40:03.494Z · LW(p) · GW(p)

Could a virologist actually tell you how to start a pandemic? The paper you're discussing says they couldn't:

Fortunately, the scientific literature does not yet feature viruses that are particularly likely to cause a new pandemic if deliberately released (with the notable exception of smallpox, which is largely inaccessible to non-state actors due to its large genome and complex assembly requirements). Threats from historical pandemic viruses are mitigated by population immunity to modern-day descendants and by medical countermeasures, and while some research agencies actively support efforts to find or create new potential pandemic viruses and share their genome sequences in hopes of developing better defenses, their efforts have not yet succeeded in identifying credible examples.

The real risk would come from biological design tools (BDTs), or other AI systems capable of designing new pathogens that are more lethal and transmissible than existing ones. I'm not aware of any existing BDTs that would allow you to design more capable pathogens, but if they exist or emerge, we could place specific restrictions on those models. This would be far less costly than banning all open source LLMs. 

Replies from: jkaufman
comment by jefftk (jkaufman) · 2023-11-02T13:16:48.054Z · LW(p) · GW(p)

Could a virologist actually tell you how to start a pandemic? The paper you're discussing says they couldn't.

In the post I claim that (a) a virologist could walk you through synthesizing 1918 flu and (b) one that could read and synthesize the literature could tell you how to create a devastating pandemic. I also think (c) some people already know how to create one but are very reasonably not publishing how. I don't see the article contradicting this?

This would be far less costly than banning all open source LLMs.

I'm more pessimistic about being able to restrict BDTs than general LLMs, but I also think this would be very good.

Another part of the problem is that telling people how to cause pandemics is only one example of how AI systems can spread dangerous knowledge (in addition to their benefits!) and when you publish the weights of a model there's no going back.

Replies from: Aidan O'Gara, Aidan O'Gara
comment by aogara (Aidan O'Gara) · 2023-11-02T15:14:07.014Z · LW(p) · GW(p)

I'm more pessimistic about being able to restrict BDTs than general LLMs, but I also think this would be very good.

Why do you think so? LLMs seem far more useful to a far wider group of people than BDTs, so I would it to be easier to ban an application specific technology rather than a general one. The White House Executive Order requires mandatory reporting of AI trained on biological data of a lower FLOP count than for any other kind of data, meaning they're concerned that AI + Bio models are particularly dangerous. 

Restricting something that biologists are already doing would create a natural constituency of biologists opposed to your policy. But the same could be said of restricting open source LLMs -- there are probably many more people using open source LLMs than using biological AI models. 

Maybe bio policies will be harder to change because they're more established, whereas open source LLMs are new and therefore a more viable target for policy progress?

comment by aogara (Aidan O'Gara) · 2023-11-02T15:07:47.973Z · LW(p) · GW(p)

I take the following quote from the paper as evidence that virologists today are incapable of identifying pandemic potential pathogens, even with funding and support from government agencies:

some research agencies actively support efforts to find or create new potential pandemic viruses and share their genome sequences in hopes of developing better defenses, their efforts have not yet succeeded in identifying credible examples.

Corroborating this is Kevin Esvelt's paper Delay, Detect, Defend, which says:

We don't yet know of any credible viruses that could cause new pandemics, but ongoing research projects aim to publicly identify them. Identifying a sequenced virus as pandemic-capable will allow >1,000 individuals to assemble it.

Perhaps these quotes are focusing on global catastrophic biorisks, which would be more destructive than typical pandemics. I think this is an important distinction: we might accept extreme sacrifices (e.g. state-mandated vaccination) to prevent a pandemic from killing billions, without being willing to accept those sacrifices to avoid COVID-19.  

I'd be interested to read any other relevant sources here. 

Replies from: jkaufman, Aidan O'Gara
comment by jefftk (jkaufman) · 2023-11-03T02:43:22.917Z · LW(p) · GW(p)

On the 80k podcast Kevin Esvelt gave a 5% chance of 1918 causing a pandemic if released today. In 1918 it killed ~50M of ~1.8B global population, so today that could be 225M. Possibly higher today since we're more interconnected, possibly lower since existing immunity (though recall that we're conditioning on it taking off as a pandemic). Then 5% of that is an expected "value" of 11M deaths, a bit more than half what we saw with covid. And 1918 deaths skewed much younger than covid, so probably a good bit worse in terms of expected life-years lost.

I think if we get to where someone can be reasonably confident that release of a specific pathogen would wipe out humanity risk would be non-linearly higher, since I think there are more committed people would see everyone dying as a valuable goal than see just mass death as a goal, but this is still high enough that I think the folks who reconstructed and then published the sequence were reckless and LLMs have already (or will soon) increase the danger further by bringing creation and release to within the abilities of more people.

Replies from: Aidan O'Gara
comment by aogara (Aidan O'Gara) · 2023-11-03T06:22:12.348Z · LW(p) · GW(p)

It sounds like it was a hypothetical estimate, not a best guess. From the transcript:

if we suppose that the 1918 strain has only a 5% chance of actually causing a pandemic if it were to infect a few people today. And let’s assume...

Here's another source which calculates that the annual probability of more than 100M influenza deaths is 0.01%, or that we should expect one such pandemic every 10,000 years. This seems to be fitted on historical data which does not include deliberate bioterrorism, so we should revise that estimate upwards, but I'm not sure the extent to which the estimate is driven by low probability of a dangerous strain being reintroduced vs. an expectation of low death count even with bioterrorism. 

From my inside view, it would surprise me if no known pathogens are capable of causing pandemics! But it's stated as fact in the executive summary of Delay, Detect, Defend and in the NTI report, so currently I'm inclined to trust it. I'm trying to build a better nuts and bolts understanding of biorisks so I'd be interested in any other data points here. 

Replies from: jkaufman
comment by jefftk (jkaufman) · 2023-11-03T06:46:20.868Z · LW(p) · GW(p)

It sounds like it was a hypothetical estimate, not a best guess

Thanks for checking the transcript! I don't know how seriously you want to take this but in conversation (in person) he said 5% was one of several different estimates he'd heard from virologists. This is a tricky area because it's not clear we want a bunch of effort going into getting a really good estimate, since (a) if it turns out the probability is high then publicizing that fact likely means increasing the chance we get one and (b) building general knowledge on how to estimate the pandemic potential of viruses seems also likely net negative.

Here's another source which calculates that the annual probability of more than 100M influenza deaths is 0.01% ...

I think maybe we are talking about estimating different things? The 5% estimate was how likely you are to get a 1918 flu pandemic conditional on release.

Replies from: Aidan O'Gara, Aidan O'Gara
comment by aogara (Aidan O'Gara) · 2023-11-03T07:18:50.040Z · LW(p) · GW(p)

More from the NTI report:

A few experts believe that LLMs could already or soon will be able to generate ideas for simple variants of existing pathogens that could be more harmful than those that occur naturally, drawing on published research and other sources. Some experts also believe that LLMs will soon be able to access more specialized, open-source AI biodesign tools and successfully use them to generate a wide range of potential biological designs. In this way, the biosecurity implications of LLMs are linked with the capabilities of AI biodesign tools.

comment by aogara (Aidan O'Gara) · 2023-11-03T07:05:27.024Z · LW(p) · GW(p)

5% was one of several different estimates he'd heard from virologists.

Thanks, this is helpful. And I agree there's a disanalogy between the 1918 hypothetical and the source. 

it's not clear we want a bunch of effort going into getting a really good estimate, since (a) if it turns out the probability is high then publicizing that fact likely means increasing the chance we get one and (b) building general knowledge on how to estimate the pandemic potential of viruses seems also likely net negative.

This seems like it might be overly cautious. Bioterrorism is already quite salient, especially with Rishi Sunak, the White House, and many mainstream media outlets speaking publicly about it. Even SecureBio is writing headline-grabbing papers about how AI can be used to cause pandemics. In that environment, I don't think biologists and policymakers should refrain from gathering evidence about biorisks and how to combat them. The contribution to public awareness would be relatively small, and the benefits of a better understanding of the risks could lead to a net improvement in biosecurity. 

For example, estimating the probability that known pathogens would cause 100M+ deaths if released is an extremely important question for deciding whether open source LLMs should be banned. If the answer is demonstrably yes, I'd expect the White House to significantly restrict open source LLMs within a year or two. This benefit would be far greater than the cost of raising the issue's salience. 

comment by aogara (Aidan O'Gara) · 2023-11-02T17:08:42.232Z · LW(p) · GW(p)

And from a new NTI report: “Furthermore, current LLMs are unlikely to generate toxin or pathogen designs that are not already described in the public literature, and it is likely they will only be able to do this in the future by incorporating more specialized AI biodesign tools.”

https://www.nti.org/wp-content/uploads/2023/10/NTIBIO_AI_FINAL.pdf

comment by habryka (habryka4) · 2023-11-02T02:59:13.159Z · LW(p) · GW(p)

My guess is that the synthesis functionality of current LLMs is actually adding something here and a no-LLM group would do quite a bit worse, but 83% of people seem to disagree with me:

This is a random nit, but a market with a probability of 17% does not imply that 83% of people disagree with you. I don't know what fraction of people agree with you, I just know that by whatever mechanism Manifold traders are willing to trade, the current price is at 17%.

Replies from: jkaufman, sudo
comment by jefftk (jkaufman) · 2023-11-02T12:10:12.317Z · LW(p) · GW(p)

Sorry, that was sloppy, fixed!

comment by sudo · 2023-11-02T03:11:51.222Z · LW(p) · GW(p)

but a market with a probability of 17% implies that 83% of people disagree with you

Is this a typo?

Replies from: habryka4
comment by habryka (habryka4) · 2023-11-02T03:16:28.956Z · LW(p) · GW(p)

Oops, yep, fixed.

comment by lsanders · 2023-11-04T02:33:12.808Z · LW(p) · GW(p)

(Back in 2017 I asked for examples of risk from AI, and didn't like any of them all that much. Today, "someone asks an LLM how to kill everyone and it walks them through creating a pandemic" seems pretty plausible.)

My impression from the 2017 post is that concerns were framed as “superintelligence risk” at the time.  The intended meaning of that term wasn’t captured in the old post, but it’s not clear to me that an LLM answering questions about how to create a pandemic qualifies as superintelligence?

This contrast seems mostly aligned with my long-standing instinct that folks worried about catastrophic risk from AI have tended to spend too much time worrying about machines achieving agency and not enough time thinking about machines scaling up the agency of individual humans.

comment by Shankar Sivarajan (shankar-sivarajan) · 2023-11-02T16:00:00.817Z · LW(p) · GW(p)

I expect the supposedly dangerous information (that the authors are careful not to actually tell you) is some combination of obvious (to a person of ordinary skill in the art), useless, and wrong, roughly analogous to the following steps for building a nuclear bomb:

  1. Acquire 100 kilograms of highly enriched uranium.
  2. Assemble into a gun-type fission weapon.
  3. Earth-Shattering Kaboom!

This "draw the rest of the fucking owl" kind of advice is good for a laugh, and as fodder for fear-mongering about actually open AI (not to be confused with the duplicitously named OpenAI), but little else.

Replies from: jkaufman
comment by jefftk (jkaufman) · 2023-11-02T17:18:45.845Z · LW(p) · GW(p)

I think this is mostly not true: you can pay other people to do the difficult parts for you, as long as you are careful to keep them from learning what it is you're trying to do.