Non-loss of control AGI-related catastrophes are out of control too

post by Yi-Yang (yiyang), Mo Putera (Mo Nastri), zeshen · 2023-06-12T12:01:26.682Z · LW · GW · 3 comments

Contents

  Executive Summary
  Introduction
  Epistemic status
  Modeling
    Preamble
    Estimating p(AGI-related catastrophe | AGI by 2070) = 26%
    Estimating p(existential catastrophe | LoC) = 22% and p(existential catastrophe | non-LoC) = 23%
        x p(LoC | AGI-related catastrophes) 
                                                                   x p(unrecoverable catastrophe | LoC)
        8%
        2%
        10%
        5%
        5%
        10%
      Modeling step 1: scenario tree of AGI-driven catastrophes
      Modeling step 2: assigning transition probabilities
      Modeling step 3: estimating probabilities for LoC vs other factors
      Modeling step 4: estimating recoverability probabilities
        22%
        23%
        45%
    Putting it all together: p(existential catastrophe | LoC) = 5.7% and p(existential catastrophe | non-LoC) = 6.0%
        Failure mode / recoverability of catastrophe
        p(LoC → AGI catastrophe)
        p(non LoC → AGI catastrophe)
        Total p(AGI catastrophe)
  Conclusion and Recommendations
  Appendix
    Terminologies
    Our model of Open Philanthropy’s views
    What counts as ‘loss of control’
    p(AGI-driven catastrophe | AGI by 2070)
    Competition intensity
      Gap between SOTA and second place
        Median gap in years
        0.74
        Median gap in years if NPLM is removed
        0.60
    Polarity
    p(existential catastrophe | non-LoC)
      Engineered pandemic
        How likely will the use of in-control AGI increase the likelihood of an engineered pandemic?
        How unrecoverable would an engineered pandemic enabled by an in-control AGI be?
        Wouldn’t the use of in-control AGI to defend against engineered pandemic reduce p(unrecoverable)?
      Nuclear winter
        How likely will the use of in-control AGI increase the likelihood of a nuclear winter? 
None
3 comments

For the Open Philanthropy AI Worldview Contest

Executive Summary

Failure mode / recoverability of catastrophe

p(LoC → AGI catastrophe)

p(non-LoC → AGI catastrophe)

Total p(AGI catastrophe)

Unrecoverable
(i.e. existential)

5.7%

6.0%

11.7%

Recoverable

3.3%

10.9%

14.3%

Total p(AGI-related catastrophe)

9.1%

16.9%

26.0%

Table 1

Introduction

We broadly agree with the following picture from Metaculus’ Ragnarok Question Series, even if we disagree with exact estimates:

In the following sections we go into more detail on these claims.  

Epistemic status

Modeling

Preamble

At a high level, we think it is necessary to contextualize the estimation of p(existential catastrophe | LoC) (henceforth p(existential catastrophe | LoC)) against the backdrop of all possible futures where AGI has been developed by 2070, instead of trying to directly estimate p(existential catastrophe | LoC) for the single category of LoC scenarios. 

This makes sense since the total probability of all possible futures happening should sum to 100%, and the relative likelihoods of different AGI-related catastrophes should constrain the ranges of probabilities that loss-of-control scenarios can have in a consistent manner. 

This implies 2 things. First, it makes sense to factorize the following term and estimate each term separately:

p(existential catastrophe | LoC) = p(AGI-related catastrophes | AGI by 2070) 

                                                         x p(LoC | AGI-related catastrophes) 

                                                         x p(unrecoverable catastrophe | LoC)

The corresponding Sankey diagram looks like this: 

Note that we use the terms existential catastrophe and unrecoverable catastrophe interchangeably (more).

Secondly, instead of modeling p(existential catastrophe | LoC) along the lines of Carlsmith (2022)’s approach, which constructs a conjunctive risk model to evaluate the probability of a single risk category, we adopted the following approach: 

  1. Attempt a mutually exclusive and collectively exhaustive (MECE) classification of all AGI-related catastrophes.
  2. Estimate the probability of each category of catastrophe happening, whether due to LoC or non-LoC.
  3. Estimate the probability of recoverability from catastrophe.
  4. Aggregate the granular probability estimates to get the high-level p(existential catastrophe | LoC).

This approach yields two benefits: 

The corresponding Sankey diagram now looks like this:

We estimate the first factor (bolded) below by appealing to a combination of Metaculus predictions and Cotra’s timelines model:

p(existential catastrophe | LoC) = p(AGI-related catastrophes | AGI by 2070) 

                                                         x p(LoC | AGI-related catastrophes) 

                                                         x p(unrecoverable catastrophe | LoC)

At the time of writing (May 2023), there were two Metaculus questions that were relevant to estimating p(AGI-driven catastrophe):

  1. By 2100, will the human population decrease by at least 10% during any period of 5 years? (Answer: 38%)
  2. If a global catastrophe occurs, will it be due to an artificial intelligence failure-mode? (Answer: 46%)
  3. If an artificial intelligence catastrophe occurs, will it reduce the human population by 95% or more? (Answer: 68%)

The first 2 Metaculus questions are relevant for this section (the 3rd question will be relevant later [LW · GW]). However, none of these questions condition on AGI having been developed by 2070 as per Open Phil’s question, so we proxy this by appealing to Cotra’s model to get p(AGI by 2070) > 65%. 

Using these estimates, 

However, in the case of Metaculus’ figures, it is unclear how the participants of these questions arrived at these estimates. We expect these estimates to carry a huge degree of uncertainty. This is evident when different reviewers of Carlsmith (2022) arrived at wildly different estimates (~6 orders of magnitude difference) even given the same framework. There seems to be several classes of disagreements that contribute to these differences (more [LW · GW]).

Estimating p(existential catastrophe | LoC) = 22% and p(existential catastrophe | non-LoC) = 23%

Next, we estimate the second and third factors (bolded) below.

p(existential catastrophe | LoC) = p(AGI-related catastrophes | AGI by 2070) 

                                                         x p(LoC | AGI-related catastrophes) 

                                                           x p(unrecoverable catastrophe | LoC)

As mentioned in the preamble [LW(p) · GW(p)], looking at p(LoC | AGI-related catastrophes) x p(unrecoverable catastrophe | LoC) naturally suggests its complement p(non-LoC | AGI-related catastrophes) x p(unrecoverable catastrophe | non-LoC); both terms sum to p(unrecoverable catastrophe | AGI-related catastrophe). In a bit more detail:

It is plausible that most of the probability mass of AGI-driven catastrophes falls into the ‘recoverable’ bucket – using contrived figures to illustrate:

CategoryProbabilities
Unrecoverable10%
Recoverable16%
Total p(AGI catastrophe)26%

Table 2

It’s also plausible that the probability mass of unrecoverable AGI-driven catastrophes primarily arises from LoC. Again using contrived figures below, LoC accounts for 8% / 10% = 80% of the probability mass of unrecoverable scenarios. This would justify a focus on examining LoC scenarios in the context of mitigating AGI-driven x-risk: 

Categoryp(LoC → AGI catastrophe)p(non LoC → AGI catastrophe)Total p(AGI catastrophe)
Unrecoverable

8%

2%

10%

Recoverable

2%

14%

16%

Total p(AGI catastrophe)

10%

16%

26%

Table 3.1

But this could go in the other direction as well – i.e. perhaps a comparable share of p(unrecoverable catastrophes) is driven by non-LoC catastrophes as well, similarly using contrived figures below. This would suggest a need to better understand other AGI x-risk sources, instead of just focusing on LoC; it would also mean p(existential catastrophe | LoC) is lower:

Categoryp(LoC → AGI catastrophe)p(non LoC → AGI catastrophe)Total p(AGI catastrophe)
Unrecoverable

5%

5%

10%

Recoverable

5%

11%

16%

Total p(AGI catastrophe)

10%

16%

26%

Table 3.2

Our claim is that Table 3.2 is more directionally accurate than Table 3.1, which would suggest a broadening of priorities to include specific non-LoC scenarios that are comparably existentially risky to (if not more than) LoC. 

The modeling process for estimating the probabilities that go into Table 3.2 above proceeds in a few steps:

StepDescription
1Create a MECE scenario tree of AGI-driven catastrophes, generated using (a different subset of) the key variables used in Distinguishing AI takeover scenarios [AF · GW] (go to step 1 [LW · GW])
2Assign “transition probabilities” to leaf nodes, each of which is a conjunctive risk model representing a class of similar catastrophes, and use them to calculate the probability fraction of AGI-driven catastrophes it captures, all summing up to 100% (go to step 2 [LW · GW])
3For each leaf node (i.e. class of similar catastrophes), estimate p(LoC | catastrophe), informed by the descriptions in the Survey of potential long-term impacts of AI [EA · GW] (go to step 3 [LW · GW])
4For each leaf node, estimate p(recoverable | LoC) and p(recoverable | non-LoC). Finally, sum over the probabilities to get the estimates in Table 2.2 (go to step 4 [LW · GW])

Modeling step 1: scenario tree of AGI-driven catastrophes

In Distinguishing AI takeover scenarios [AF · GW], Sam Clarke and Sammy Martin distinguish and categorize AI takeover scenarios using the following variables, of which the first 3 are key: 

Our scenario tree uses their writeup as a starting point, but differs in the choice of variables used because we aren’t limiting our focus to AI takeover scenarios, but the more general superset of AI-driven catastrophes. The goal is to broaden coverage to include all risk sources described in Classifying sources of AI x-risk [EA · GW], instead of being constrained by the ‘AI takeover’ criterion to the leftmost branch as illustrated below: 

After exploring a few different combinations of variables, we settled on the following 4 as key to achieving this:

The variables can be ordered like so: competitive pressures → takeoff speed → polarity → alignment. (It turns out that the final estimates are robust to choice of variable ordering, so this is not really key to the argument.) This leads to the following scenario tree:

The ‘aligned AI’ leaf nodes in this scenario tree collectively capture the non-takeover risk sources illustrated in the preceding classification chart.

Modeling step 2: assigning transition probabilities

The transition probabilities in the scenario tree are assigned as follows. Note that all of these branches are conditioned upon an AGI-related catastrophe.

Stage 1:

FactorGiven factorProbabilities & reasoning
Competition intensityn/a

60% intense competition, 40% mild competition

Reasoning:

  • We are highly uncertain about our forecast here, but we think it’s more likely than not that nation-states (particularly US and China) might adopt a “win at all costs” strategy (similar to the Space Race or war time scenarios). 
  • We have a lot of evidence that points to an increasingly intense competition between AI labs, but we are extremely uncertain whether AI labs would adopt a “win at all costs” strategy.. 

Find out more about our reasoning here [LW · GW].

Table 4.1

Stage 2:

FactorGiven factorProbabilities & reasoning
Takeoff speedIntense competition

50% fast takeoff, 50% slow

Reasoning:

  • Davidson mentions that take-off speed is “probably 1-12 months” (i.e. fast). We interpret this phrase as him having a >50% credence for fast takeoff over slow. 
  • We do think an intense competitive state of the world would increase takeoff speeds
  • We do think that Davidson has made some valid arguments, but we are still not fully convinced that software and hardware improvements will continue to accelerate (as mentioned by Davidson too). We think it’s rare to see such extreme rates of improvement -- only the cost of genome sequencing seemed to be able to match this.
  • Hence, we gave a forecast of 50:50 between fast and slow take-off.
Takeoff speedMild competition

30% fast takeoff, 70% slow

Reasoning: 

  • Given the above reasons and a mild competitive state, we think a forecast of 30:70 seems adequate. 
  • It turns out that the conclusion is robust to parameter choices here, e.g. a 10:90 forecast does not materially change the key takeaway 

Table 4.2

Stage 3:

FactorGiven factorGiven factorProbabilities & reasoning
PolarityFast takeoffIntense competition

60% unipolar, 40% multipolar

Reasoning: 

  • We think a fast takeoff speed will increase the probability of a single superintelligence dominating the world. 
  • On the other hand, we think intense competition will likely increase the probability of multiple superintelligences dominating the world, albeit to a lesser extent.
  • Hence, given fast takeoff and intense competition, we think a unipolar world is more likely.

Find out more about our thoughts here [LW(p) · GW(p)].

PolaritySlow takeoffIntense competition

30% unipolar, 70% multipolar

Reasoning: Due to a slower takeoff speed and a more intense competition dynamic, we should expect more possible worlds with multiple superintelligences.

PolarityFast takeoffMild competition

75% unipolar, 25% multipolar

Reasoning: Due to a higher takeoff speed and a less intense competition dynamic, we think there’s a much larger chance of a unipolar world.

PolaritySlow takeoffMild competition

50% unipolar, 50% multipolar

Reasoning: Given slow takeoff and mild competition, we are uncertain which will have a bigger effect size. Hence, we gave both an equal chance.

Table 4.3

Stage 4:

FactorGiven factorGiven factorGiven factorProbabilities & reasoning
Alignment UnipolarFast takeoffIntense competition

70% misaligned AGI, 30% aligned AGI

Reasoning: 

  • The following are reasons why there’s more possible worlds with misaligned AI(s):
    • A world with faster takeoff speeds means less time to align an AGI
    • A world with more intense competition means less incentive for AI labs and nation-states to implement safety measures
  • Given a unipolar world with fast takeoff and intense competition, we think it’s quite likely that we’ll get misaligned AI.
Alignment MultipolarFast takeoffIntense competition

80% misaligned AGI, 20% aligned AGI

Reasoning: 

  • A multipolar world means more AIs to align, and safety measures may not scale to so many AIs, therefore the probability of misalignment is higher than in the unipolar case.
  • Given a multipolar world with fast takeoff and intense competition, we think it’s very likely that we’ll get misaligned AIs.
Alignment UnipolarSlow takeoffIntense competition

60% misaligned AGI, 40% aligned AGI

Reasoning: Given a unipolar world with slow takeoff and intense competition, we think it’s somewhat likely that we’ll get misaligned AI.

Alignment MultipolarSlow takeoffIntense competition

65% misaligned AGI, 35% aligned AGI

Reasoning: Given a multipolar world with slow takeoff and intense competition, we think it’s somewhat likely that we’ll get misaligned AIs, again with higher probability of misalignment in the multipolar case.

Alignment UnipolarFast takeoffMild competition

65% misaligned AGI, 35% aligned AGI

Reasoning: Given a unipolar world with fast takeoff and mild competition, we think it’s somewhat likely that we’ll get misaligned AI.

Alignment MultipolarFast takeoffMild competition

70% misaligned AGI, 30% aligned AGI

Reasoning: Given a multipolar world with fast takeoff and mild competition, we think it’s quite likely that we’ll get misaligned AIs.

Alignment UnipolarSlow takeoffMild competition

50% misaligned AGI, 50% aligned AGI

Reasoning: Given a unipolar world with slow takeoff and mild competition, we are unsure which is more likely.

Alignment MultipolarSlow takeoffMild competition

55% misaligned AGI, 45% aligned AGI

Reasoning: Given a multipolar world with slow takeoff and mild competition, we think it’s a little likely that we’ll get misaligned AIs.

Table 4.4

We then use the estimates (from stages 1 to 4 above) to fill out the scenario tree below. The resulting probabilities for the leaf nodes, calculated by taking the product of transition probabilities along scenario tree branches ending in each leaf node, are as follows:

For reference, here are the aggregated probabilities derived from the scenario tree above. Our tree puts more likelihood on intense competitive pressures, slow takeoff, unipolar scenarios, and misaligned AI: 

Intense competitive pressures60%Mild competitive pressures40%
Fast takeoff40%Slow takeoff60%
Unipolar54%Multipolar46%
Misaligned AI64%Aligned AI36%

 

 

 

 

 

Table 5

Modeling step 3: estimating probabilities for LoC vs other factors

Given a leaf node (i.e. category of similar AGI-related global catastrophes), this modeling step splits the 100% probability into two groups, following the LoC classification [LW · GW] in the Appendix section: 

  1. p(LoC) i.e. scenarios involving LoC over an AGI system, and 
  2. p(non-LoC) comprising every other scenario not included above 

A few high-level remarks:

These are our (unweighted) estimates at a granular level:

Table 6

Rolling up these granular estimates, we get the following summary table – “scenario weighted” means weighted by the leaf node (i.e. groups of similar scenarios) probabilities in Table 4 in modeling step 2: 

AGI-related catastrophesLoC scenariosNon-LoC scenarios
Unweighted29%71%
Scenario-weighted35%65%

 

 

 

 

Table 7

While the scenario tree weights LoC scenarios as likelier than non-LoC ones, most of the probability mass of AGI-related catastrophes still comes from the latter, even after scenario-weighting.

Modeling step 4: estimating recoverability probabilities

Given a leaf node of the scenario tree and whether it’s classified as LoC over the AGI system or not, this modeling step estimates the recoverability of the catastrophe. 

At a high level, we have considered the following when we came up with estimates:

Firstly, here are the granular recoverability probability estimates for LoC scenarios:

Secondly, here are the same for non-LoC scenarios:

Aggregating these granular estimates, the final probabilities (filling in Table 3.2 above [LW · GW]) are:

Categoryp(LoC → AGI catastrophe)p(non LoC → AGI catastrophe)Total p(AGI catastrophe)
Unrecoverable

22%

23%

45%

Recoverable

13%

42%

55%

Total p(AGI catastrophe)

35%

65%

100%

Table 8

Putting it all together: p(existential catastrophe | LoC) = 5.7% and p(existential catastrophe | non-LoC) = 6.0%

Taking Table 8’s outputs and multiplying by the probability of AGI-driven catastrophe conditional on AGI being developed by 2070 (from this section [LW · GW]) yields our finalized estimates, which recaps Table 1:

Failure mode / recoverability of catastrophe

p(LoC → AGI catastrophe)

p(non LoC → AGI catastrophe)

Total p(AGI catastrophe)

Unrecoverable
(i.e. existential)

5.7%

6.0%

11.7%

Recoverable

3.3%

10.9%

14.3%

Total p(AGI catastrophe)

9.1%

16.9%

26.0%

Table 9 (same as Table 1)

Our total probability of AGI-related existential catastrophe of 11.7% compares well with other estimates:

Conclusion and Recommendations

Appendix

Terminologies

Our model of Open Philanthropy’s views

These are our best guesses as to Open Philanthropy’s views as pertains to Question #2: 

What counts as ‘loss of control’

There are many AI x-risk sources and even more existential threat models; calculating p(existential catastrophe | LoC) requires us to classify AI x-risk sources as LoC vs non-LoC. Using Sam Clarke’s writeup [EA · GW] as a starting point, we adopt the following classification:

We broadly agree with the discussion in Section 7 of Carlsmith (2022) that

p(AGI-driven catastrophe | AGI by 2070)

The following are factors that we think if true, would likely increase p(AGI-driven catastrophe | AGI by 2070):

The following are factors that we think if true, would likely decrease p(AGI-driven catastrophe | AGI by 2070):

Competition intensity

“Intense competition” refers to a scenario where for a duration of more than a year, the involved parties adopt a “win at all costs” strategy. On the other hand, “mild competition” refers to scenarios that don’t involve a “winning at all costs” strategy.

We’ve categorized the parties into two groups:

Examples of what it means to “win at all costs”: 

Do we expect AI labs or nation-states to adopt a “win at all costs” strategy in 2070? 

Let’s start with AI labs.

MetricEvidenceMore confidence that there will likely be intense competition?
Historical gap between release of state of the art (SOTA) AI product and second place in terms of computation usedThe median gap is 0.6 and 0.74 years (see more [LW · GW])The gap seems pretty small, hence we’re slightly more confident here.
Gap between current most powerful LLMs (e.g. ChatGPT, Claude, Bard)0.5 yearsThe gap seems pretty small, hence we’re slightly more confident here.
Number of competitors4-6 competitors (e.g. OpenAI/Microsoft, Google/Deepmind, Anthropic, Meta, Baidu, HuggingFace)The number seems pretty low, hence we’re slightly less confident here.
Size of economic moatleaked internal memo from Google suggests that open source AI products might outcompete Google or OpenAIThere is a possibility that the moat is smaller, but we’re not totally convinced, hence we’re slightly more confident here.
Funds raised

26.7% decrease in private investment from 2021 to 2022, but the overall trend seems to be positive

OpenAI raised $10.3B in the last 4 months
Anthropic raised $0.75B in the last 3 months

We’re slightly more confident here. 
Adoption rate

ChatGPT got 100 million users in 2 months (fastest ever)

Google announced “Code Red”

There’s also a large historical reference class of businesses who failed to adopt new technology fast enough (e.g. Blockbuster vs Netflix, Borders vs Amazon, Blackberry vs Apple/Google, Kodak vs Canon/Sony)

We’re slightly more confident here. 
Product release cycles Uncertain
Profit margin Uncertain
Working hours Uncertain


 Despite the above evidence pointing towards increasing competition between AI labs, we still feel very uncertain whether AI labs will adopt a “win at all cost” strategy. What would make us feel more confident? We think we’ll need to see the following evidence:

Next, let’s explore competition between nation-states.

MetricEvidenceMore confidence that there will likely be intense competition?
Government expenditure in AI as % of GDP

There’s a trend in the US government of increasing spending in:

  • Non-defense AI R&D, going from $0.56B in 2018 to $1.84B in 2022
  • Defense AI R&D, going from $0.93B in 2020 to $1.1B in 2022
  • AI-related contracts, going from US$1.3B in 2017 to $3.3B in 2022

There’s a mixed trend in releases of AI national strategies.

We’re slightly confident here.
Foreign policy movesUS chip ban to ChinaWe’re slightly confident here.
Regulatory movesThe number of AI-Related bills that passed into law has a positive trend, going from 6 bills in 2016 to 37 in 2022.We’re slightly confident here.
Talent and migration Uncertain
Research output Uncertain


 Despite the uncertainty, we think it’s more likely than not that nation-states (particularly US and China) will adopt a “win at all cost” strategy when it comes to AI capabilities. We would feel more confident if we have evidence that the US and Chinese governments’ respective expenditure in AI supasses 1% of GDP in a single year.

Gap between SOTA and second place

In terms of computation in petaFLOP

2nd placeSOTAGap in years
NPLMWord2Vec (large)

10.74

Word2Vec (large)RNNsearch-50*

0.89

RNNsearch-50*Seq2Seq LSTM

0.03

Seq2Seq LSTMGNMT

2.08

GNMTT5-11B

3.12

T5-11BGPT-3 175B (davinci)

0.61

GPT-3 175B (davinci)AlphaCode

0.15

AlphaCodeChinchilla

0.02

Chinchilla PaLM (540B)

0.24

Median gap in years

0.74

Median gap in years if NPLM is removed

0.60


 

 

 

 

 

 

 

 

 

 

 

Polarity

In the diagram above, the curved lines represent the rate of capabilities advances of AI systems, where the blue line represents that of the state-of-the-art (SOTA), while the green solid line and the red dotted line represents that of a potential second-best competitor. 

We posit that when an AI achieves AGI, it would yet to assume dominance over the world. Subsequently, when an AI achieves superintelligence, where it is much more powerful than an AGI, it would likely be able to assume dominance over the world and hinder other AGIs from being developed - if they have yet to exist. In the diagram, the gradient of the curve between ‘AGI’ and ‘undefeatable dominance’ loosely corresponds to its takeoff speed. 

In the scenario where the second-best AI (green solid line in the diagram) achieves AGI before the SOTA achieves undefeatable dominance, the world would end up in a multipolar scenario, where more than one AGI system exists simultaneously. On the other hand, in the scenario where the second-best AI (dotted red line in the diagram) develops at a pace where it is on track to achieve AGI after the SOTA achieves undefeatable dominance, it will in fact never reach AGI, and the world would end up in a unipolar scenario. As an example, in a fast-takeoff world where the SOTA achieves undefeatable dominance <1 year after achieving AGI, while the second-best is constantly lagging behind by more than 1 year, the world would likely end up in a unipolar scenario. 

p(existential catastrophe | non-LoC)

Engineered pandemic

How likely will the use of in-control AGI increase the likelihood of an engineered pandemic? 

We think this seems very likely. It seems like there are already use cases for current ML algorithms in DNA sequencing and DNA expressions analysis (O'Brien and Nelson, 2020Sawaya et al, 2020Brockmann, 2019Oliveira, 2019). 

[EDIT: Soice et al, 2023]

We’re unsure whether current ML algorithms could also conduct genetic editing or de novo DNA synthesis, as well as strategic deployment, but we think this might be plausible for more advanced AI systems.

Although this isn’t technically related to genetic engineering, we think AI advances in protein folding (in particular AlphaFold) might plausibly increase this risk too.

How unrecoverable would an engineered pandemic enabled by an in-control AGI be?

We think it’s quite likely that an AGI would be capable enough to do the following:

Furthermore, we think reducing humanity’s population size by 95% is enough to cause a civilisation collapse due to a lack of easy-to-extract fossil fuels that don’t require heavy machinery.

Given the above reasons, we think it’s possible that (even if one single strain of highly optimized pathogen is released accidentally) p(unrecoverable | non-LoC) is non-trivially high.

Wouldn’t the use of in-control AGI to defend against engineered pandemic reduce p(unrecoverable)?

Yes, we agree, but we think there’s an asymmetry between the “attacking” party and the “defending” party, where the attacking party has a much bigger upper hand compared to the defending party.

Nuclear winter

How likely will the use of in-control AGI increase the likelihood of a nuclear winter? 

Rodriguez (2019) [? · GW] estimates that there’s a 0.38% chance of a US-Russia nuclear exchange each year and a 11% chance that a severe nuclear winter will happen given a US-Russia nuclear exchange. According to Rautenbach (2023) [EA · GW] and Aird and Aldred (2023) [EA · GW], we think that the use of in-control AGI will likely increase Rodriguez’s estimates considerably.
 

3 comments

Comments sorted by top scores.

comment by Davidmanheim · 2023-07-12T10:31:17.440Z · LW(p) · GW(p)

A few brief notes.

1. Openphil's biosecurity work already focuses on AIxBio quite a bit.

2. re: engineered pandemics, I think the analysis is wrong on several fronts, but primarily, it seems unclear how AI making a pandemic would differ from the types of pandemic we're already defending against, and so assuming no LoC, typical biosecurity approaches seem fine.

3. Everyone is already clear that we want AI away from nuclear systems, and I don't know that there is anything more to say about it, other than focusing on how little we understand Deep Learning systems and ensuring politicians and others are aware.

4. This analysis ignores AIxCybersecurity issues, which also seem pretty important for non-LoC risks.
 

Replies from: yiyang
comment by Yi-Yang (yiyang) · 2023-07-27T02:09:48.300Z · LW(p) · GW(p)

Hey David, thanks for the feedback! 

1. I did look at Openphil’s grant database back in May and found nothing. Could you point where we could find more information about this?

2. Hmm are you saying an AI-engineered pandemic would likely be similar to natural pandemics we’re already defending against? If so, I would probably disagree here. I’m also unsure how AI-engineered pandemics might create highly novel strains, but I think the risk of this happening seems non-trivial.

And wouldn’t AI accelerate such developments as well?

3. Thanks for the info! I don’t think we have access to a lot of personal views of researchers besides what we’ve seen in literature.

4. Ah did you mean AI x Cybersecurity x Non-LOC risks? That sounds right. I don't think we've actually thought about this. 

Reflecting on this, it feels like we could have done better if we spoke to at least a few researchers instead of depending so much on lit reviews. 

Replies from: Davidmanheim
comment by Davidmanheim · 2023-07-29T18:32:45.038Z · LW(p) · GW(p)
  1. The grants are to groups and individuals doing work in this area, so it encompasses much of the grants that are on biorisk in general - as well as several people employed by OpenPhil doing direct work on the topic.
  2. I'm saying that defense looks similar - early detection, quarantines, contact tracing, and rapid production of vaccines all help, and it's unclear how non-superhuman AI could do any of the scariest things people have discussed as threats.
  3. Public statements to the press already discuss this, as does some research.
  4. Yes, and again, I think it's a big deal.

And yes, but expert elicitation is harder than lit reviews.