Posts
Comments
Glad to hear that!
I do feel excited about this being used as a sort of "201 level" overview of AI strategy and what work it might be useful to do. And I'm aware of the report being included in the reading lists / curricula for two training programs for people getting into AI governance or related work, which was gratifying.
Unfortunately we did this survey before ChatGPT and various other events since then, which have majorly changed the landscape of AI governance work to be done, e.g. opening various policy windows. So I imagine people reading this report today may feel it has some odd omissions / vibes. But I still think it serves as a good 201 level overview despite that. Perhaps we'll run a followup in a year or two to provide an updated version.
I'd consider those to be "in-scope" for the database, so the database would include any such estimates that I was aware of and that weren't too private to share in the database.
If I recall correctly, some estimates in the database are decently related to that, e.g. are framed as "What % of the total possible moral value of the future will be realized?" or "What % of the total possible moral value of the future is lost in expectation due to AI risk?"
But I haven't seen many estimates of that type, and I don't remember seeing any that were explicitly framed as "What fraction of the accessible universe's resources will be used in a way optimized for 'the correct moral theory'?"
If you know of some, feel free to comment in the database to suggest they be added :)
...and while I hopefully have your attention: My team is currently hiring for a Research Manager! If you might be interested in managing one or more researchers working on a diverse set of issues relevant to mitigating extreme risks from the development and deployment of AI, please check out the job ad!
The application form should take <2 hours. The deadline is the end of the day on March 21. The role is remote and we're able to hire in most countries.
People with a wide range of backgrounds could turn out to be the best fit for the role. As such, if you're interested, please don't rule yourself out due to thinking you're not qualified unless you at least read the job ad first!
I found this thread interesting and useful, but I feel a key point has been omitted thus far (from what I've read):
- Public, elite, and policymaker beliefs and attitudes related to AI risk aren't just a variable we (members of the EA/longtermist/AI safety communities) have to bear in mind and operate in light of, but instead also a variable we can intervene on.
- And so far I'd say we have (often for very good reasons) done significantly less to intervene on that variable than we could've or than we could going forward.
- So it seems plausible that actually these people are fairly convincible if exposed to better efforts to really explain the arguments in a compelling way.
We've definitely done a significant amount of this kind of work, but I think we've often (a) deliberately held back on doing so or on conveying key parts of the arguments, due to reasonable downside risk concerns, and (b) not prioritized this. And I think there's significantly more we could do if we wanted to, especially after a period of actively building capacity for this.
Important caveats / wet blankets:
- I think there are indeed strong arguments against trying to shift relevant beliefs and attitudes in a more favorable direction, including not just costs and plausibly low upside but also multiple major plausible downside risks.[1]
- So I wouldn't want anyone to take major steps in this direction without checking in with multiple people working on AI safety/governance first.
- And it's not at all obvious to me we should be doing more of that sort of work. (Though I think whether, how, & when we should is an important question and I'm aware of and excited about a couple small research projects that are happening on that.)
All I really want to convey in this comment is what I said in my first paragraph: we may be able to significantly push beliefs and opinions in favorable directions relative to where they are now or would be n future by default.
- ^
Due to time constraints, I'll just point to this vague overview.
Personally I haven't thought about how strong the analogy to GoF is, but another thing that feels worth noting is that there may be a bunch of other cases where the analogy is similarly strong and where major government efforts aimed at risk-reduction have occurred. And my rough sense is that that's indeed the case, e.g. some of the examples here.
In general, at least for important questions worth spending time on, it seems very weird to say "You think X will happen, but we should be very confident it won't because in analogous case Y it didn't", without also either (a) checking for other analogous cases or other lines of argument or (b) providing an argument for why this one case is far more relevant evidence than any other available evidence. I do think it totally makes sense to flag the analogous case and to update in light of it, but stopping there and walking away feeling confident in the answer seems very weird.
I haven't read any of the relevant threads in detail, so perhaps the arguments made are stronger than I imply here, but my guess is they weren't. And it seems to me that it's unfortunately decently common for AI risk discussions on LessWrong to involve this pattern I'm sketching here.
(To be clear, all I'm arguing here is that these arguments often seem weak, not that their conclusions are false.)
(This comment is raising an additional point to Jan's, not disagreeing.)
Update: Oh, I just saw Steve Byrnes also the following in this thread, which I totally agree with:
"[Maybe one could argue] “It’s all very random—who happens to be in what position of power and when, etc.—and GoF is just one example, so we shouldn’t generalize too far from it” (OK maybe, but if so, then can we pile up more examples into a reference class to get a base rate or something? and what are the interventions to improve the odds, and can we also try those same interventions on GoF?)"
Cool!
Two questions:
- Is it possible to also get something re-formatted via this service? (E.g., porting a Google Doc with many footnotes and tables to LessWrong or the EA Forum.)
- Is it possible to get feedback, proofreading, etc. via this service for things that won't be posts?
- E.g. mildly infohazardous research outputs that will just be shared in the relevant research & policy community but not made public
(Disclaimer: I only skimmed this post, having landed here from Habryka's comment on It could be useful if someone ran a copyediting service. Apologies if these questions are answered already in the post.)
Thanks for this post! This seems like good advice to me.
I made an Anki card on your three "principles that stand out" so I can retain those ideas. (Mainly for potentially suggesting to people I manage or other people I know - I think I already have roughly the sort of mindset this post encourages, but I think many people don't and that me suggesting these techniques sometimes could be helpful.)
It's not sufficient to argue that taking over the world will improve prediction accuracy. You also need to argue that during the training process (in which taking over the world wasn't possible), the agent acquired a set of motivations and skills which will later lead it to take over the world. And I think that depends a lot on the training process.
[...] if during training the agent is asked questions about the internet, but has no ability to edit the internet, then maybe it will have the goal of "predicting the world", but maybe it will have the goal of "understanding the world". The former incentivises control, the latter doesn't.
I agree with your key claim that it's not obvious/guaranteed that an AI system that has faced some selection pressure in favour of predicting/understanding the world accurately would then want to take over the world. I also think I agree that a goal of "understanding the world" is a somewhat less dangerous goal in this context than a goal of "predicting the world". But it seems to me that a goal of "understanding the world" could still be dangerous for basically the same reason as why "predicting the world" could be dangerous. Namely, some world states are easier to understand than others, and some trajectories of the world are easier to maintain an accurate understanding of than others.
E.g., let's assume that the "understanding" is meant to be at a similar level of analysis to that which humans typically use (rather than e.g., being primarily focused at the level of quantum physics), and that (as in humans) the AI sees it as worse to have a faulty understanding of "the important bits" than "the rest". Given that, I think:
- a world without human civilization or with far more homogeneity of its human civilization seems to be an easier world to understand
- a world that stays pretty similar in terms of "the important bits" (not things like distant stars coming into/out of existence), rather than e.g. having humanity spread through the galaxy creating massive structures with designs influenced by changing culture, requires less further effort to maintain an understanding of and has less risk of later being understood poorly
I'd be interested in whether you think I'm misinterpreting your statement or missing some important argument.
(Though, again, I see this just as pushback against one particular argument of yours, and I think one could make a bunch of other arguments for the key claim that was in question.)
Thanks for this series! I found it very useful and clear, and am very likely to recommend it to various people.
Minor comment: I think "latter" and "former" are the wrong way around in the following passage?
By contrast, I think the AI takeover scenarios that this report focuses on have received much more scrutiny - but still, as discussed previously, have big question marks surrounding some of the key premises. However, it’s important to distinguish the question of how likely it is that the second species argument is correct, from the question of how seriously we should take it. Often people with very different perspectives on the latter actually don’t disagree very much on the former.
(I.e., I think you probably mean that, of people who've thought seriously about the question, probability estimates vary wildly but (a) tend to be above (say) 1 percentage point of x-risk from a second species risk scenario and (b) thus tend to suffice to make the people think humanity should put a lot more resources into understanding and mitigating the risk than we currently do. Rather than that people tend to wildly disagree on how much effort to put into this risk yet agree on how likely the risk is. Though I'm unsure, since I'm just guessing from context that "how seriously we should take it" means "how much resources should be spent on this issue", but in other contexts it'd mean "how likely is this to be correct" or "how big a deal is this", which people obviously disagree on a lot.)
FWIW, I feel that this entry doesn't capture all/most of how I see "meta-level" used.
Here's my attempted description, which I wrote for another purpose. Feel free to draw on it here and/or to suggest ways it could be improved.
- Meta-level and object-level = typically, “object-level” means something like “Concerning the actual topic at hand” while “Meta-level” means something like “Concerning how the topic is being tackled/researched/discussed, or concerning more general principles/categories related to this actual topic”
- E.g., “Meta-level: I really appreciate this style of comment; I think you having a policy of making this sort of comment is quite useful in expectation. Object-level: I disagree with your argument because [reasons]”
Thanks for writing this. The summary table is pretty blurry / hard to read for me - do you think you could upload a higher resolution version? Or if for some reason that doesn't work on LessWrong, could you link to a higher resolution version stored elsewhere?
This section of a new post may be more practically useful than this post was: https://forum.effectivealtruism.org/posts/4T887bLdupiNyHH6f/six-takeaways-from-ea-global-and-ea-retreats#Takeaway__2__Take_more__calculated__risks
My Anki cards
Nanda broadly sees there as being 5 main types of approach to alignment research.
Addressing threat models: We keep a specific threat model in mind for how AGI causes an existential catastrophe, and focus our work on things that we expect will help address the threat model.
Agendas to build safe AGI: Let’s make specific plans for how to actually build safe AGI, and then try to test, implement, and understand the limitations of these plans. With an emphasis on understanding how to build AGI safely, rather than trying to do it as fast as possible.
Robustly good approaches: In the long-run AGI will clearly be important, but we're highly uncertain about how we'll get there and what, exactly, could go wrong. So let's do work that seems good in many possible scenarios, and doesn’t rely on having a specific story in mind. Interpretability work is a good example of this.
De-confusion: Reasoning about how to align AGI involves reasoning about complex concepts, such as intelligence, alignment and values, and we’re pretty confused about what these even mean. This means any work we do now is plausibly not helpful and definitely not reliable. As such, our priority should be to do some conceptual work on how to think about these concepts and what we’re aiming for, and trying to become less confused.
I consider the process of coming up with each of the research motivations outlined in this post to be examples of good de-confusion work
Field-building: One of the biggest factors in how much Alignment work gets done is how many researchers are working on it, so a major priority is building the field. This is especially valuable if you think we’re confused about what work needs to be done now, but will eventually have a clearer idea once we’re within a few years of AGI. When this happens, we want a large community of capable, influential and thoughtful people doing Alignment work.
Nanda focuses on three threat models that he thinks are most prominent and are addressed by most current research:
Power-Seeking AI
You get what you measure
[The case given by Paul Christiano in What Failure Looks Like (Part 1)]
AI Influenced Coordination Failures
[The case put forward by Andrew Critch, eg in What multipolar failure looks like. Many players get AGI around the same time. They now need to coordinate and cooperate with each other and the AGIs, but coordination is an extremely hard problem. We currently deal with this with a range of existing international norms and institutions, but a world with AGI will be sufficiently different that many of these will no longer apply, and we will leave our current stable equilibrium. This is such a different and complex world that things go wrong, and humans are caught in the cross-fire.]
Nanda considers three agendas to build safe AGI to be most prominent:
Iterated Distillation and Amplification (IDA)
AI Safety via Debate
Solving Assistance Games
[This is Stuart Russell’s agenda, which argues for a perspective shift in AI towards a more human-centric approach.]
Nanda highlights 3 "robustly good approaches" (in the context of AGI risk):
Interpretability
Robustness
Forecasting
[I doubt he sees these as exhaustive - though that's possible - and I'm not sure if he sees them as the most important/prominent/most central examples.]
Thanks for this! I found it interesting and useful.
I don't have much specific feedback, partly because I listened to this via Nonlinear Library while doing other things rather than reading it, but I'll share some thoughts anyway since you indicated being very keen for feedback.
- I in general think this sort of distillation work is important and under-supplied
- This seems like a good example of what this sort of distillation work should be like - broken into different posts that can be read separately, starting with an overall overview, each post is broken down into clear and logical sections and subsections, use of bold, clarity about terms, addition of meta notes where relevant
- Maybe it would've been useful to just name & link to sources on threat models, agendas to build safe AGI, and robustly good approaches that you don't discuss in any further detail? Rathe than not mentioning them at all.
- That could make it easier for people to dive deeper if they want, could help avoid giving the impression that the things you list are the only things in those categories, and could help people understand what you mean by the overall categories by seeing more examples of things within the categories.
- This is assuming you think there are other discernible nameable constituents of those categories which you didn't name - I guess it's possible that you don't think that.
- I'll put in a reply to this comment the Anki cards I made, on the off chance that that's of interest to you as oblique feedback or of interest to other people so they can use the same cards themselves
Adam Binks replied to this list on the EA Forum with:
To add to your list - Subjective Logic represents opinions with three values: degree of belief, degree of disbelief, and degree of uncertainty. One interpretation of this is as a form of second-order uncertainty. It's used for modelling trust. A nice summary here with interactive tools for visualising opinions and a trust network.
Not sure what you mean by that being unverifiable? The question says:
This question resolves as the total number of nuclear weapons (fission or thermonuclear) reported to be possessed across all states on December 31, 2022. This includes deployed, reserve/ nondeployed, and retired (but still intact) warheads, and both strategic and nonstrategic weapons.
Resolution criteria will come from the Federation of American Scientists (FAS). If they cease publishing such numbers before resolution, resolution will come from the Arms Control Association or any other similar platform.
FAS update their estimates fairly regularly - here are their estimates as of May (that link is also provided earlier in the question text).
Though I do realise now that they're extremely unlikely to update their numbers on December 31 specifically, and maybe not even in December 2022 at all. I'll look into the best way to tweak the question in light of that. If that's what you meant, thanks for the feedback!
(I do expect there'll be various minor issues like that, and we hope the community catches them quickly so we can tweak the questions to fix them. This was also one reason for showing some questions before they "open".)
That makes sense to me.
But it seems like you're just saying the issue I'm gesturing at shouldn't cause mis-calibration or overconfidence, rather than that it won't reduce the resolution/accuracy or the practical usefulness of a system based on X predicting what Y will think?
(Update: I just saw the post Welcome to LessWrong!, and I think that that serves my needs well.)
I think it's good that a page like this exists; I'd want to be able to use it as a go-to link when suggesting people engage with or post on LessWrong, e.g. in my post on Notes on EA-related research, writing, testing fit, learning, and the Forum.
Unfortunately, it seems to me that this page isn't well suited to that purpose. Here are some things that seem like key issues to me (maybe other people would disagree):
- This introduction seems unnecessarily intimidating, non-welcoming, and actually (in my perception) somewhat arrogant. For example:
- "If you have no familiarity with the cultural articles and other themes before you begin interacting, your social experiences are likely to be highly awkward. The rationalist way of thinking and subculture is extremely, extremely complex. To give you a gist of how complex it is and what kind of complexity you'll encounter:"
- This feels to me like saying "We're very special and you need to do your homework to deeply understand us before interacting at all with us, or you're just wasting our time and we'll want you to go away."
- I do agree that the rationalist culture can take some getting used to, but I don't think it's far more complex or unusual than the cultures in a wide range of other subcultures, and I think it's very often easiest to get up to speed with a culture partly just by interacting with it.
- I do agree that reading parts of the Sequences is useful, and that it's probably good to gently encourage new users to do that. But I wouldn't want to make it sound like it's a hard requirement or like they have to read the whole thing. And this passage will probably cause some readers to infer that, even if it doesn't outright say it. (A lot of people lurk more than they should, have imposter syndrome, etc.)
- I started interacting on LessWrong before having finished the Sequences (though I'd read some), and I think I both got and provided value from those interactions.
- Part of this is just my visceral reaction to any group saying their way of thinking and subculture is "extremely, extremely complex", rather than me having explicit reasons to think that that's bad.
- "If you have no familiarity with the cultural articles and other themes before you begin interacting, your social experiences are likely to be highly awkward. The rationalist way of thinking and subculture is extremely, extremely complex. To give you a gist of how complex it is and what kind of complexity you'll encounter:"
- I wrote all of that before reading the next paragraphs, and the next paragraphs very much intensified my emotional feeling of "These folks seem really arrogant and obnoxious and I don't want to ever hang out with them"
- This is despite the fact that I've actually engaged a lot on LessWrong, really value a lot about it, rank the Sequences and HPMOR as among my favourite books, etc.
- Maybe part of this is that this is describing what rationalists aim to be as if all rationalists always hit that mark.
- Rationalists and the rationalist community often do suffer from the same issues other people and communities do. This was in fact one of the really valuable things Eliezer's posts pointed out (e.g., being wary of trending towards cult-hood).
Again, these are just my perceptions. But FWIW, I do feel these things quite strongly.
Here are a couple much less important issues:
- I don't think I'd characterise the Sequences as "mostly like Kahneman, but more engaging, and I guess with a bit of AI etc." From memory, a quite substantial chunk of the sequences - and quite a substantial chunk of their value - had to do with things other than cognitive biases, e.g. what goals one should form, why, how to act on them, etc. Maybe this is partly a matter of instrumental rather than just epistemic rationality.
- Relatedly, I think this page presents a misleading or overly narrow picture of what's distinctive (and good!) about rationalist approaches to forming beliefs and choosing decisions when it says "There are over a hundred cognitive biases that humans are affected by that rationalists aim to avoid. Imagine you added over one hundred improvements to your way of thinking."
- "Kahneman is notoriously dry" feels like an odd thing to say. Maybe he is, but I've never actually heard anyone say this, and I've read one of his books and papers and watched one of his talks and found them all probably somewhat more engaging than similar things from the average scientist. (Though maybe this was more the ideas themselves, rather than the presentation.)
(I didn't read "Website Participation Intro or "Why am I being downvoted?"", because it was unfortunately already clear that I wouldn't want to link to this page when aiming to introduce people to LessWrong and encourage them to read, comment, and/or post there.)
Authoritarian closed societies probably have an advantage at covert racing, at devoting a larger proportion of their economic pie to racing suddenly, and at artificially lowering prices to do so. Open societies have probably a greater advantage at discovery/the cutting edge and have a bigger pie in the first place (though better private sector opportunities compete up the cost of defense engineering talent).
These are interesting points which I hadn't considered - thanks!
(Your other point also seems interesting and plausible, but I feel I lack the relevant knowledge to immediately evaluate it well myself.)
Interesting post.
You or other readers might also find the idea of epistemic security interesting, as discussed in the report "Tackling threats to informed decisionmaking in democratic societies: Promoting epistemic security in a technologically-advanced world". The report is by researchers at CSER and some other institutions. I've only read the executive summary myself.
There's also a BBC Futures article on the topic by some of the same authors.
While I am not sure I agree fully with the panel, an implication to be drawn from their arguments is that from an equilibrium of treaty compliance, maintaining the ability to race can disincentivize the other side from treaty violation: it increases the cost to the other side of gaining advantage, and that can be especially decisive if your side has an economic advantage.
This is an idea/argument I hadn't encountered before, and seems plausible, so it seems valuable that you shared it.
But it seems to me that there's probably an effect pushing in the opposite direction:
- Even from an equilibrium of treaty compliance, if one state has the ability to race, that might incentivise the other side to develop the ability to race as well. That wouldn't necessarily require treaty violation.
- Either or especially both sides having the ability to race can increase risks if they could race covertly until they have gained an advantage, or race so quickly that they gain an advantage before the other side can get properly started, or if the states don't always act as rational cohesive entities (e.g., if leaders are more focused on preventing regime change than preventing millions of deaths in their own country), or probably under other conditions.
- I think the term "arms race stability" captures the sort of thing I'm referring to, though I haven't yet looked into the relevant theoretical work much.
- In contrast, if we could reach a situation where neither side currently had the ability to race, that might be fairly stable. This could be true if building up that ability would take some time and be detectable early enough to be responded to (by sanctions, a targeted strike, the other side building up their own ability, or whatever).
Does this seem accurate to you?
I guess an analogy could be to whether you'd rather be part of a pair of cowboys who both have guns but haven't drawn them (capability but not yet racing), or part of a pair who don't have guns but could go buy one. It seems like we'd have more opportunities for de-escalation, less risk from nerves and hair-triggers, etc. in the latter scenario than the former.
I think this overlaps with some of Schelling's points in The Strategy of Conflict (see also my notes on that), but I can't remember for sure.
Thanks for this thought-provoking post. I found the discussion of how political warfare may have influenced nuclear weapons activism particularly interesting.
Since large yield weapons can loft dust straight to the stratosphere, they don’t even have to produce firestorms to start contributing to nuclear winter: once you get particles that block sunlight to an altitude that heating by the sun can keep them lofted, you’ll block sunlight a very long time and start harming crop yields.
I think it's true that this could "contribute" to nuclear winter, but I don't think I've seen this mentioned as a substantial concern in the nuclear winter papers I've read. E.g., I don't think I've seen any papers suggest that nuclear winter could occur solely due to that effect, without there being any firestorms, or that that effect could make the climate impacts 20% worse than would occur with firestorms alone. Do you have any citations on hand for this claim?
Final thoughts on whether you should read this book
- I found the book useful
- The parts I found most useful were (a) the early chapters on the history of biowarfare and bioterrorism and (b) the later chapters on attempts to use international law to reduce risks from bioterror and biowarfare
- I found parts of the book hard to pay attention to and remember information from
- In particular, the middle chapters on various types and examples of pathogens
- But this might just be a “me problem”. Ever since high school, I’ve continually noticed that I seem to have a harder time paying attention to and remembering information about biology than information from other disciplines. (I don’t understand what that would be the case, and I’m not certain it’s actually true, but it has definitely seemed true.)
- In particular, the middle chapters on various types and examples of pathogens
- I’m not sure how useful this book would be to someone who already knows a lot about bioterror, biowarfare, and/or chemical weapons
- I’m not sure how useful this book would be to someone who doesn’t have much interest in the topics of bioterror, biowarfare, and/or chemical weapons
- But I’m inclined to think most longtermists should read consume at least one book’s worth of content from experts on those topics
- And I think the book could be somewhat useful for understanding WMDs, international relations, and international law more generally
- There might be better books on the topic
- In particular, it’s possible a more recent book would be better?
My Anki cards
Note that:
- It’s possible that some of these cards include mistakes, or will be confusing or misleading out of context.
- I haven’t fact-checked Dando on any of these points.
- Some of these cards are just my own interpretations - rather than definitely 100% parroting what the book is saying
- The indented parts are the questions, the answers are in "spoiler blocks" (hover over them to reveal the text), and the parts in square brackets are my notes-to-self.
Dando says ___ used biological weapons in WW1, but seemingly only against ___.
the Germans and perhaps the French;
draft animals (e.g. horses), not humans
[This was part of sabotage operations, seemingly only/especially in the US, Romania, Norway, and Argentina. The US and Romania were neutral at the time; not sure whether Norway and Argentina were.]
Dando says the 1925 Geneva Protocol prohibits ___, but not ___, of chemical and biological weapons, and that many of the parties to the Protocol entered reservations to their agreement to make it clear that ___.
Use;
Development and stockpiling;
Although they would not use such weapons first, they were prepared to use them in retaliation if such weapons were used first against them
[And a number of offensive bio weapons programs were undertaken by major states in the interwar period. Only later in the 20th century were further arms control restrictions placed on chem and bio weapons.]
Japan's offensive biological warfare program was unique in that ___. The program probably caused ___.
It used human experimentation to test biological agents;
The deaths of thousands of Chinese people
[This program ran from 1931-1945]
Dando mentions 6 countries as having had "vigorous" offensive biological weapons programs during WW2:
Japan, The Soviet Union, France, the UK, the US, Canada
[He doesn't explicitly say these were the only countries with such programs, but does seem to imply that, or at least that no other countries had similarly large programs
He notes that Germany didn't have such a program.
France's program was interrupted by the German invasion in 1940, but was resumed after WW2.]
Dando suggests that the main or most thoroughly prepared type of British WW2 biological warfare weapon/plan was...
To drop millions of cattle cakes infected with anthrax spores onto German fields, to wipe out cattle and thus deal an economic blow to Germany's overstretched agricultural system
[The British did make 5 million of these cakes.]
Dando says that there are 7 countries which definitely had offensive biological weapons programs in the second half of the 20th century:
The US, the UK, the Soviet Union, Canada, France, South Africa, Iraq
[He also says there've been numerous accusations that other countries had such programs as well, but that there isn't definite information about them.]
Dando says that 3 countries continued to have offensive biological weapons after becoming the depository for, ratifying, and/or signing the BTWC:
Soviet Union, South Africa, and Iraq
[This was then illegal under international law. Prior to the BTWC, having such a program wasn't illegal - only the use of bioweapons was.
I think the other 4 states that had had such programs between WW2 and 1972 stopped at that point or before then.]
During WW2, the US offensive biological weapons program was developing anti-___, anti-___, and anti-___ weapons.
personnel; animal; plant
[And the US was considering using anti-plant weapons against Japanese rice production.]
What major change in high-level US policy regarding chemical and biological weapons does Dando suggest occurred around 1956?
What does he suggest this was partly a reaction to?
Changing from a retaliation-only policy for BW and CW to a policy stating that the US would be prepared to use BW or CW in a general war for the purposes of enhancing military effectiveness [and the decision would be reserved for the president];
Soviet statements in 1956 that chemical and biological weapons would be used in future wars for the purposes of mass destruction
[Dando notes that the retaliation-only policy was in line with the US's signature of the 1925 Geneva protocol, but also that the US didn't actually ratify the Geneva protocol till 1975; until then it was only a signatory.]
Dando says an army report says the origin of the US's shift (under Nixon) to renouncing biological and chemical weapons dates from...
Criticism of US application of chemical herbicides and riot control agent(s) in Vietnam starting in the 1960s
[I think this means criticism/opposition by the public.]
The UK's work on an offensive biological weapons capability had been abandoned by...
1957
[According to a report cited by Dando.
Though Dando later indicates the UK restarted some of this work in 1961, I think particularly/only to find a nonlethal incapacitating chemical weapon.]
Dando says that, at the end of WW2, the UK viewed biological weapons as...
On a par with nuclear weapons
["Only when the UK obtained its own nuclear systems did interest in biological weapons decline.”
I don't know precisely what Dando means by this.]
South Africa had an offensive biological weapons program during...
The later stages of the Apartheid regime
[But it was terminated before the regime change.]
What was the scale of South Africa's offensive biological weapons program? What does its main purpose seem to have been?
Relatively small (e.g. smaller than Iraq's program)
Finding means of assassinating the Apartheid regime's enemies
[Elsewhere, Dando suggests that original motivations for the program - or perhaps for some chemical weapons work? - also included the Angola war and a desire to find crowd control agents.]
What has Iraq stated about authority (as of ~1991) to launch its chemical and biological weapons?
Authority was pre-delegated to regional commanders if Baghdad was hit with nuclear weapons
[UNSCOM has noted that that doesn't exclude other forms of use, and doesn't constitute a proof of a retaliation-only policy.]
The approach to chemical weapons that Iraq pursued was ___, in contrast to a Western approach of ___.
Production and rapid use;
Production and stockpiling
[I'm guessing that this means that Iraq pursued the ability to produce chemical weapons shortly before they were needed, rather than having a pre-made, long-lasting stockpile of more stable versions.
Dando says a similar approach could've been taken towards biological weapons.]
Dando says that the main lesson from the Iraqi biological weapons program is that...
A medium-sized country without great scientific and technical resources was, within a few years, able to reach the stage of weaponising a range of deadly biological agents
What kind of vaccine does Dando say South Africa's biological weapons program tried to find? What does someone who had knowledge of the program say the vaccine might've been used for, if it had been found?
An anti-fertility vaccine;
Administering to black women without their knowledge
Dando lists 6 different types of biological agents that could be used for biological weapons:
Bacteria; Viruses; Toxins; Bioregulators; Protozoa; Fungi
[I'm not sure whether this was meant to be exhaustive, nor whether I'm right to say these are "different types of biological agents".
There's also a chance I forgot one of the types he mentioned.]
Dando says that vaccination during a plague epidemic would not be of much help, because...
Immunity takes a month to build up
[Note that I haven't fact-checked this, and that, for all I know, the situation may be different with other pathogens or newer vaccines.]
In the mid twentieth century, ___ tried to use plague-infected fleas to cause an outbreak among ___.
Japan; the Chinese
Dando notes at least 3 factors that could make the option of biowarfare or bioterrorism against animal agriculture attractive:
1. The animals are densely packed in confined areas
2. The animals reared are often from very limited genetic stock (so that a large percentage of them could succumb to a single strain of a pathogen)
3. Many/all pathogens that would be used don't infect humans (reducing risks to the people involved in producing and using the pathogens)
[Dando implies that that third point is more relevant to bioterrorism than biowarfare, but doesn't say why. I assume it's because terrorists will tend to have fewer skills and resources than military programs, making them more vulnerable to accidents.]
What proportion of state-level offensive biological weapons programs (of which we have knowledge) "carefully investigated anti-plant attacks"?
Nearly all
In the 1990s, the US OTA concluded that the cheapest overt production route for 1 nuclear bomb per year, with no international controls, would cost __.
They also concluded that a chemical weapons arsenal for substantial military capability would cost __.
They concluded that a large biological weapons arsenal may cost __.
~$200 million;
$10s of millions;
Less than $10 million
[I'm unsure precisely what this meant.
I assume the OTA thought a covert route for nuclear weapons, with international controls, would be more expensive than the overt route with no international controls.]
Efforts in the 1990s to strengthen the BWC through agreement of a verification protocol eventually failed in 2001 due to the opposition from which country?
The United States
The BTWC was opened for signature in __, and entered into force in __.
1972; 1975
Dando highlights two key deficiencies of the BTWC (at least as of it entering into force in 1975):
1. There was a lack of verification measures
2. No organisation had been put in place to take care of the convention, of its effective implementation, and of its development between review conferences
[Dando notes that, in contrast to 2, there was a large organisation associated with the Chemical Weapons Convention.
Wikipedia suggests that a (very small) Implementation Support Unit for the BTWC was finally created in 2006.]
Dando highlights a US-based stakeholder as being vocally opposed to the ideas that were proposed for verifying compliance with the BTWC:
The huge US pharmaceutical industry and its linked trade associations
[I think Dando might've been talking about opposition to inspections in particular
Dando implies that this contributed to US executive branch being lukewarm on or sort-of opposed to these verification ideas.]
See also:
A final thought that came to mind, regarding the following passage:
It seems possible for person X to predict a fair number of a more epistemically competent person Y’s beliefs -- even before person X is as epistemically competent as Y. And in that case, doing so is evidence that person X is moving in the right direction.
I think that that's is a good and interesting point.
But I imagine there would also be many cases in which X develops an intuitive ability to predict Y's beliefs quite well in a given set of domains, but in which that ability doesn't transferring to new domains. It's possible that this would be because X's "black box" simulation of Y's beliefs is more epistemically competent than Y in this new domain. But it seems more likely that Y is somewhat similarly epistemically competent in this new domain as in the old domain, but has to draw on different reasoning processes, knowledge, theories, intuitions, etc., and X's intuitions aren't calibrated for how Y is now thinking.
I think we could usefully think of this issue as a question of robustness to distributional shift.
I think the same issue could probably also occur even if X has a more explicit process for predicting Y's beliefs. E.g., even if X believes they understand what sort of sources of information Y considers and how Y evaluates it and X tries to replicate that (rather than just trying to more intuitively guess what Y will say), the process X uses may not be robust to distributional shift.
But I'd guess that more explicit, less "black box" approaches for predicting what Y will say will tend to either be more robust to distributional shift or more able to fail gracefully, such as recognising that uncertainty is now much higher and there's a need to think more carefully.
(None of this means I disagree with the quoted passage; I'm just sharing some additional thoughts that came to mind when I read it, which seem relevant and maybe useful.)
Here's a second thought that came to mind, which again doesn't seem especially critical to this post's aims...
You write:
Someone who can both predict my beliefs and disagrees with me is someone I should listen to carefully. They seem to both understand my model and still reject it, and this suggests they know something I don’t.
I think I understand the rationale for this statement (though I didn't read the linked Science article), and I think it will sometimes be true and important. But I think that those sentences might overstate the point. In particular, I think that those sentences implicitly presume that this other person is genuinely primarily trying to form accurate beliefs, and perhaps also that they're doing so in a way that's relatively free from bias.
But (almost?) everyone is at least sometimes primarily aiming (perhaps unconsciously) at something other than forming accurate beliefs, even when it superficially looks like they're aiming at forming accurate beliefs. For example, they may be engaging in "ideologically motivated cognition[, i.e.] a form of information processing that promotes individuals’ interests in forming and maintaining beliefs that signify their loyalty to important affinity groups". The linked study also notes that "subjects who scored highest in cognitive reflection were the most likely to display ideologically motivated cognition".
So I think it might be common for people to be able to predict my beliefs and disagree with me, but with their disagreement not being based on knowing more or having better reasoning process but rather finding ways to continue to hold beliefs that they're (in some sense) "motivated" to hold for some other reason.
Additionally, some people may genuinely be trying to form accurate beliefs, but with unusually bad epistemics / unusually major bias. If so, they may be able to predict my beliefs and disagree with me, but with their disagreement not being based on knowing more or having better reasoning process but rather being a result of their bad epistemics / biases.
Of course, we should be very careful with assuming that any of the above is why a person disagrees with us! See also this and this.
The claims I'd more confidently agree with are:
Someone who can both predict my beliefs and disagrees with me might be someone I should listen to carefully. They seem to both understand my model and still reject it, and this suggests they might know something I don’t (especially if they seem to be genuinely trying to form accurate beliefs and to do so via a reasonable process).
(Or maybe having that parenthetical at the end would be bad via making people feel licensed to dismiss people who disagree with them as just biased.)
Thanks for this and its companion post; I found the two posts very interesting, and I think they'll usefully inform some future work for me.
A few thoughts came to mind as I read, some of which can sort-of be seen as pushing back against some claims, but in ways that I think aren't very important and that I expect you've already thought about. I'll split these into separate comments.
Firstly, as you note, what you're measuring is how well predictions match a proxy for the truth (the proxy being Elizabeth's judgement), rather than the truth itself. Something I think you don't explicitly mention is that:
- Elizabeth's judgement may be biased in some way (rather than just randomly erring), and
- The network-based forecasters' judgements may be biased in a similar way, and therefore
- This may "explain away" part of the apparent value of the network-based forecasters' predictions, along with part of their apparent superiority over the online crowdworkers' predictions.
E.g., perhaps EA-/rationality-adjacent people are biased towards disagreeing with "conventional wisdom" on certain topics, and this bias is somewhat shared between Elizabeth and the network-based forecasters. (I'm not saying this is actually the case; it's just an example.)
You make a somewhat similar point in the Part 2 post, when you say that the online crowdworkers:
were operating under a number of disadvantages relative to other participants, which means we should be careful when interpreting their performance. [For example, the online crowdworkers] did not know that Elizabeth was the researcher who created the claims and would resolve them, and so they had less information to model the person whose judgments would ultimately decide the questions.
But that is about participants' ability to successfully focus on predicting what Elizabeth will say, rather than their ability to accidentally be biased in the same way as Elizabeth when both are trying to make judgements about the ground truth.
In any case, I don't think this matters much. One reason is that this "shared bias" issue probably at most "explains away" a relatively small fraction of the apparent value of the network-adjacent forecasters' predictions, probably without tipping the balance of whether this sort of set-up is worthwhile. Another reason is that there may be ways to mitigate this "shared bias" issue.
Good idea! I didn't know about that feature.
I've now edited the post to use spoiler-blocks (though a bit messily, as I wanted to do it quickly), and will use them for future lazy-Anki-card-notes-posts as well.
I didn't add that tag; some other reader did.
And any reader can indeed downvote any tag, so if you feel that that tag shouldn't be there, you could just downvote it.
Unless you feel that the tag shouldn't be there but aren't very confident about that, and thus wanted to just gently suggest that maybe the tag should be removed - like putting in a 0.5 vote rather than a full one. But that doesn't seem to match the tone of your comment.
That said, it actually does seem to me that this post fairly clearly does match the description for that tag; the exercise is using these Anki cards as Anki cards. People can find a link to download these cards in the Anki card file format here. (I've now added that link in the body of the post itself; I guess I should've earlier.)
---
As a meta comment: For what it's worth, I feel like your comment had an unnecessarily snarky tone, at least to my eye. I think you could've either just downvoted the tag, or said the same thing in a way that sounds less snarky. That said:
- It's very possible (even probable?) that you didn't intend to be snarky, and tht this is just a case of tone getting misread on the internet
- And in any case, this didn't personally bug me, partly because I've posted on LessWrong and the EA Forum a lot.
- But I think if I was newer to the sites or to posting, this might leave a bad taste in my mouth and make me less inclined to post in future. (Again, I'm not at all trying to say this was your intent!)
(Edited to add: Btw, I wasn't the person who downvoted your comment, so that appears to be slightly more evidence that your comment was at least liable to be interpreted as unnecessary and snarky - although again I know that that may not have been your intention.)
Yeah, I definitely agree that that's a good idea with any initialisations that won't already be known to the vast majority of one's readers (e.g., I wouldn't bother with US or UK, but would with APA). In this case, I just copied and pasted the post from the EA Forum, where I do think the vast majority of readers would know what "EA" means - but I should've used the expanded form "effective altruism" the first time in the LessWrong version. I've now edited that.
Here's a comment I wrote on the EA Forum version of this post, which I'm copying here as I'd be interested on people's thoughts on the equivalent questions in the context of LessWrong:
Meta: Does this sort of post seem useful? Should there be more posts like this?
I previously asked Should pretty much all content that's EA-relevant and/or created by EAs be (link)posted to the Forum? I found Aaron Gertler's response interesting and useful. Among other things, he said:
Eventually, we'd like it to be the case that almost all well-written EA content exists on the Forum somewhere.
[...]
I meant "quite EA-relevant and well-written". I don't especially care whether the content is written by community members, though I suppose that's slightly preferable (as community members are much more likely to respond to comments on their work).
[...]
A single crosspost with a bit of context from the author -- e.g. a few sentences each of summary/highlights, commentary, and action items/takeaways -- seems better to me than three or four crossposts with no context at all. In my view, the best Forum content tends to give busy people a quick way to decide whether to read further.
And I read a lot of stuff that I think it could be useful for at least some other EAs to read, and that isn't (link)posted to the Forum. So Aaron's comments, combined with my own thinking and some comments from other people, make me think it'd be good for me to make linkposts for lots of that stuff if there was a way to do it that took up very little of my time.
Unfortunately, writing proper book reviews, or even just notes that are geared for public consumption, for all of those things I read would probably take a while.
But, starting about a month ago, I now make Anki cards for myself anyway during most of the reading I do. So maybe I should just make posts sort-of like this one for most particularly interesting things I read? And maybe other people could start doing that too?
A big uncertainty I have is how often the cards I make myself would be able to transmit useful ideas even to people who (a) aren't me and (b) didn't read the thing I read, and how often they'd do that with an efficiency comparable to people just finding and reading useful sources themselves directly. Another, related uncertainty is whether there'd be any demand for posts like this.
So I'd be interested in people's thoughts on the above.
Note: If you found this post interesting, you may also be interested in my Notes on "The Bomb: Presidents, Generals, and the Secret History of Nuclear War" (2020), or (less likely) Notes on The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous. (The latter book has a very different topic; I just mention it as the style of post is the same.)
To your first point...
My impression is that there is indeed substantially less literature on misuse risk and structural risk, compared to accident risk, in relation to AI x-risk. (I'm less confident when it comes to a broader set of negative outcomes, not just x-risks, but that's also less relevant here and less important to me.) I do think that that might the sort of work this post does less interesting if done in relation to those less-discussed types of risks, since there fewer disagreements have been revealed, so there's less to analyse and summarise.
That said, I still expect interesting stuff along these lines could be done on those topics. It just might be a quicker job with a smaller output than this post.
I collected a handful of relevant sources and ideas here. I think someone reading those things and providing a sort of summary, analysis, and/or mapping could be pretty handy, and might even be doable in just a day or so of work. It might also be relatively easy to provide more "novel ideas" in the course of that work that it would've been for your post, since misuse/structural risks seem like less charted territory.
(Unfortunately I'm unlikely to do this myself, as I'm currently focused on nuclear war risk.)
---
A separate point is that I'd guess that one reason why there's less work on misuse/structural AI x-risk than on accidental AI x-risk is that a lot of people aren't aware of those other categories of risks, or rarely think about them, or assume the risks are much smaller. And I think one reason for that is that people often write or talk about "AI x-risk" while actually only mentioning accidental AI x-risk. That's part of why I say "So, personally, I think I’d have made that choice of scope even more explicit."
(But again, I do very much like this post overall. And as a target of this quibble of mine, you're in good company - I have the same quibble with The Precipice. I think one of the quibbles I most often have with posts I like is "This post seems to imply, or could be interpreted as implying, that it covers [topic]. But really it covers [some subset of that topic]. That's fair enough and still very useful, but I think it'd be good to be clearer about what the scope is.")
---
I know some people working on expanded and more in-depth models like this post. It would be great to get your thoughts when they're ready.
Sounds very cool! Yeah, I'd be happy to have a look at that work when it's ready.
Thanks for this post; this does seem like a risk worth highlighting.
I've just started reading Thomas Schelling's 1960 book The Strategy of Conflict, and noticed a lot of ideas in chapter 2 that reminded me of many of the core ideas in this post. My guess is that that sentence is an uninteresting, obvious observation, and that Daniel and most readers were already aware (a) that many of the core ideas here were well-trodden territory in game theory and (b) that this post's objectives were to:
- highlight these ideas to people on LessWrong
- highlight their potential relevance to AI risk
- highlight how this interacts with updateless decision theory and acausal trade
But maybe it'd be worth people who are interested in this problem reading that chapter of The Strategy of Conflict, or other relevant work in standard academic game theory, to see if there are additional ideas there that could be fruitful here.
Problems in AI risk that economists could potentially contribute to
List(s) of relevant problems
- What can the principal-agent literature tell us about AI risk? (and this comment)
- Many of the questions in Technical AGI safety research outside AI
- Many of the questions in The Centre for the Governance of AI’s research agenda
- Many of the questions in Cooperation, Conflict, and Transformative Artificial Intelligence (a research agenda of the Center on Long-Term Risk)
- At least a couple of the questions in 80,000 Hours' Research questions that could have a big social impact, organised by discipline
- Longtermist AI policy projects for economists (this doc was originally just made for Risto Uuk's own use, so the ideas shouldn't be taken as high-confidence recommendations to anyone else)
Context
I intend for this to include both technical and governance problems, and problems relevant to a variety of AI risk scenarios (e.g., AI optimising against humanity, AI misuse by humans, AI extinction risk, AI dystopia risk...)
Wei Dai’s list of Problems in AI Alignment that philosophers could potentially contribute to made me think that it could be useful to have a list of problems in AI risk that economists could potentially contribute to. So I began making such a list.
But:
- I’m neither an AI researcher nor an economist
- I spent hardly any time on this, and just included things I’ve stumbled upon, rather than specifically searching for these things
So I’m sure there’s a lot I’m missing.
Please comment if you know of other things worth mentioning here. Or if you’re better placed to make a list like this than I am, feel very free to do so; you could take whatever you want from this list and then comment here to let people know where to find your better thing.
(It’s also possible another list like this already exists. And it's also possible that economists could contribute to such a large portion of AI risk problems that there’s no added value in making a separate list for economists specifically. If you think either of those things are true, please comment to say so!)
It occurs to me that all of the hypotheses, arguments, and approaches mentioned here (though not necessarily the scenarios) seem to be about the “technical” side of things. There are two main things I mean by that statement:
First, this post seems to be limited to explaining something along the lines of “x-risks from AI accidents”, rather than “x-risks from misuse of AI”, or “x-risk from AI as a risk factor” (e.g., how AI could potentially increase risks of nuclear war).
I do think it makes sense to limit the scope that way, because:
- no one post can cover everything
- you don’t want to make the diagram overwhelming
- there’s a relatively clear boundary between what you’re covering and what you’re not
- what you’re covering seems like the most relevant thing for technical AI safety researchers, whereas the other parts are perhaps more relevant for people working on AI strategy/governance/policy
And the fact that this post's scope is limited in that way seems somewhat highlighted by saying this is about AI alignment (whereas misuse could occur even with a system aligned to some human’s goals), and by saying “The idea is closely connected to the problem of artificial systems optimizing adversarially against humans.”
But I think misuse and “risk factor”/“structural risk” issues are also quite important, that they should be on technical AI safety researchers’ radars to some extent, and that they probably interact in some ways with technical AI safety/alignment. So, personally, I think I’d have made that choice of scope even more explicit.
I’d also be really excited to see a post that takes the same approach as this one, but for those other classes of AI risks.
---
The second thing I mean by the above statement is that this post seems to exclude non-technical factors that seem like they’d also impact the technical side or the AI accident risks.
One crux of this type would be “AI researchers will be cautious/sensible/competent “by default””. Here are some indications that that’s an “important and controversial hypothes[is] for AI alignment”:
- AI Impacts summarised some of Rohin’s comments as “AI researchers will in fact correct safety issues rather than hacking around them and redeploying. Shah thinks that institutions developing AI are likely to be careful because human extinction would be just as bad for them as for everyone else.”
- But my impression is that many people at MIRI would disagree with that, and are worried that people will merely “patch” issues in ways that don’t adequately address the risks.
- And I think many would argue that institutions won’t be careful enough, because they only pay a portion of the price of extinction; reducing extinction risk is a transgenerational global public good (see Todd and this comment).
- And I think views on these matters influence how much researchers would be happy with the approach of “Use feedback loops to course correct as we go”. I think the technical things influence how easily we theoretically could do that, while the non-technical things influence how much we realistically can rely on people to do that.
So it seems to me that a crux like that could perhaps fit well in the scope of this post. And I thus think it’d be cool if someone could either (1) expand this post to include cruxes like that, or (2) make another post with a similar approach, but covering non-technical cruxes relevant to AI safety.
Thanks for this post! This seems like a really great way of visually representing how these different hypotheses, arguments, approaches, and scenarios interconnect. (I also think it’d be cool to see posts on other topics which use a similar approach!)
It seems that AGI timelines aren’t explicitly discussed here. (“Discontinuity to AGI” is mentioned, but I believe that's a somewhat distinct matter.) Was that a deliberate choice?
It does seem like several of the hypotheses/arguments mentioned here would feed into or relate to beliefs about timelines - in particular, Discontinuity to AGI, Discontinuity from AGI, and Recursive self-improvement, ML scales to AGI, and Deep insights needed (or maybe not that last one, as that means “needed” for alignment in particular). But I don’t think beliefs about timelines would be fully accounted for by those hypotheses/arguments - beliefs about timelines could also involve cruxes like whether “Intelligence is a huge collection of specific things”) or whether “There’ll be another AI winter before AGI” could also play a role.
I’m not sure to what extent beliefs about timelines (aside from beliefs about discontinuity) would influence which of the approaches people should/would take, out of the approaches you list. But I imagine that beliefs that timelines are quite short might motivate work on ML or prosaic alignment rather than (Near) proof-level assurance of alignment or Foundational or “deconfusion” research. This would be because people might then think the latter approaches would take too long, such that our only shot (given these people’s beliefs) is doing ML or prosaic alignment and hoping that’s enough. (See also.)
And it seems like beliefs about timelines would feed into decisions about other approaches you don’t mention, like opting for investment or movement-building rather than direct, technical work. (That said, it seems reasonable for this post’s scope to just be what a person should do once they have decided to work on AI alignment now.)
Thanks for this post; I found it useful.
The US policy has never ruled out the possibility of escalation to full countervalue targeting and is unlikely to do so.
But the 2013 DoD report says "The United States will not intentionally target civilian populations or civilian objects". That of course doesn't prove that the US actually wouldn't engage in countervalue targeting, but doesn't it indicate that US policy at that time ruled out engaging in countervalue targeting?
This is a genuine rather than rhetorical question. I feel I might be just missing something, because, as you note, the paper you cited says:
Did this mean that the United States was discarding its ultimate assured destruction threat for deterring nuclear war? Clearly not. The guidance was carefully drafted. Does not rely on is different from will not resort to
...and yet, as far as I can see, the paper just doesn't address the "will not intentionally target" line. So I feel confused by the paper's analysis. (Though I haven't read the paper in full.)
If I had to choose between a AW treaty and some treaty governing powerful AI, the latter (if it made sense) is clearly more important. I really doubt there is such a choice and that one helps with the other, but I could be wrong here. [emphasis added]
Did you mean something like "and in fact I think that one helps with the other"?
I don't think I know of any person who's demonstrated this who thinks risk is under, say, 10%
If you mean risk of extinction or existential catastrophe from AI at the time AI is developed, it seems really hard to say, as I think that that's been estimated even less often than other aspects of AI risk (e.g. risk this century) or x-risk as a whole.
I think the only people (maybe excluding commenters who don't work on this professionally) who've clearly given a greater than 10% estimate for this are:
- Buck Schlegris (50%)
- Stuart Armstrong (33-50% chance humanity doesn't survive AI)
- Toby Ord (10% existential risk from AI this century, but 20% for when the AI transition happens)
Meanwhile, people who I think have effectively given <10% estimates for that (judging from estimates that weren't conditioning on when AI was developed; all from my database):
- Very likely MacAskill (well below 10% for extinction as a whole in the 21st century)
- Very likely Ben Garfinkel (0-1% x-catastrophe from AI this century)
- Probably the median FHI 2008 survey respondent (5% for AI extinction in the 21st century)
- Probably Pamlin & Armstrong in a report (0-10% for unrecoverable collapse extinction from AI this century)
- But then Armstrong separately gave a higher estimate
- And I haven't actually read the Pamlin & Armstrong report
- Maybe Rohin Shah (some estimates in a comment thread)
(Maybe Hanson would also give <10%, but I haven't seen explicit estimates from him, and his reduced focus on and "doominess" from AI may be because he thinks timelines are longer and other things may happen first.)
I'd personally consider all the people I've listed to have demonstrated at least a fairly good willingness and ability to reason seriously about the future, though there's perhaps room for reasonable disagreement here. (With the caveat that I don't know Pamlin and don't know precisely who was in the FHI survey.)
Mostly I only start paying attention to people's opinions on these things once they've demonstrated that they can reason seriously about weird futures
[tl;dr This is an understandable thing to do, but does seem to result in biasing one's sample towards higher x-risk estimates]
I can see the appeal of that principle. I partly apply such a principle myself (though in the form of giving less weight to some opinions, not ruling them out).
But what if it turns out the future won't be weird in the ways you're thinking of? Or what if it turns out that, even if it will be weird in those ways, influencing it is too hard, or just isn't very urgent (i.e., the "hinge of history" is far from now), or is already too likely to turn out well "by default" (perhaps because future actors will also have mostly good intentions and will be more informed).
Under such conditions, it might be that the smartest people with the best judgement won't demonstrate that they can reason seriously about weird futures, even if they hypothetically could, because it's just not worth their time to do so. In the same way as how I haven't demonstrated my ability to reason seriously about tax policy, because I think reasoning seriously about the long-term future is a better use of my time. Someone who starts off believing tax policy is an overwhelmingly big deal could then say "Well, Michael thinks the long-term future is what we should focus on instead, but how why should I trust Michael's view on that when he hasn't demonstrated he can reason seriously about the importance and consequences of tax policy?"
(I think I'm being inspired here by Trammell's interested posting "But Have They Engaged With The Arguments?" There's some LessWrong discussion - which I haven't read - of an early version here.)
I in fact do believe we should focus on long-term impacts, and am dedicating my career to doing so, as influencing the long-term future seems sufficiently likely to be tractable, urgent, and important. But I think there are reasonable arguments against each of those claims, and I wouldn't be very surprised if they turned out to all be wrong. (But I think currently we've only had a very small part of humanity working intensely and strategically on this topic for just ~15 years, so it would seem too early to assume there's nothing we can usefully do here.)
And if so, it would be better to try to improve the short-term future, which further future people can't help us with, and then it would make sense for the smart people with good judgement to not demonstrate their ability to think seriously about the long-term future. So under such conditions, the people left in the sample you pay attention to aren't the smartest people with the best judgement, and are skewed towards unreasonably high estimates of the tractability, urgency, and/or importance of influencing the long-term future.
To emphasise: I really do want way more work on existential risks and longtermism more broadly! And I do think that, when it comes to those topics, we should pay more attention to "experts" who've thought a lot about those topics than to other people (even if we shouldn't only pay attention to them). I just want us to be careful about things like echo chamber effects and biasing the sample of opinions we listen to.
I'm not sure which of these estimates are conditional on superintelligence being invented. To the extent that they're not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I'm using here.
Good point. I'd overlooked that.
I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.
(I think it's good to be cautious about bias arguments, so take the following with a grain of salt, and note that I'm not saying any of these biases are necessarily the main factor driving estimates. I raise the following points only because the possibility of bias has already been mentioned.)
I think social desirability bias could easily push the opposite way as well, especially if we're including non-academics who dedicate their jobs or much of their time to x-risks (which I think covers the people you're considering, except that Rohin is sort-of in academia). I'd guess the main people listening to these people's x-risk estimates are other people who think x-risks are a big deal, and higher x-risk estimates would tend to make such people feel more validated in their overall interests and beliefs.
I can see how something like a bias towards saying things that people take seriously and that don't seem crazy (which is perhaps a form of social desirability bias) could also push estimates down. I'd guess that that that effect is stronger the closer one gets to academia or policy. I'm not sure what the net effect of the social desirability bias type stuff would be on people like MIRI, Paul, and Rohin.
I'd guess that the stronger bias would be selection effects in who even makes these estimates. I'd guess that people who work on x-risks have higher x-risk estimates than people who don't and who have thought about odds of x-risk somewhat explicitly. (I think a lot of people just wouldn't have even a vague guess in mind, and could swing from casually saying extinction is likely in the next few decades to seeing that idea as crazy depending on when you ask them.)
Quantitative x-risk estimates tend to come from the first group, rather than the latter, because the first group cares enough to bother to estimate this. And we'd be less likely to pay attention to estimates from the latter group anyway, if they existed, because they don't seem like experts - they haven't spent much time thinking about the issue. But they haven't spent much time thinking about it because they don't think the risk is high, so we're effectively selecting who to listen to the estimates of based in part on what their estimates would be.
I'd still do similar myself - I'd pay attention to the x-risk "experts" rather than other people. And I don't think we need to massively adjust our own estimates in light of this. But this does seem like a reason to expect the estimates are biased upwards, compared to the estimates we'd get from a similarly intelligent and well-informed group of people who haven't been pre-selected for a predisposition to think the risk is somewhat high.
That does seem interesting and concerning.
Minor: The link didn’t work for me; in case others have the same problem, here is (I believe) the correct link.
Yeah, totally agreed.
I also think it's easier to forecast extinction in general, partly because it's a much clearer threshold, whereas there are some scenarios that some people might count as an "existential catastrophe" and others might not. (E.g., Bostrom's "plateauing — progress flattens out at a level perhaps somewhat higher than the present level but far below technological maturity".)
Conventional risks are events that already have a background chance of happening (as of 2020 or so) and does not include future technologies.
Yeah, that aligns with how I'd interpret the term. I asked about advanced biotech because I noticed it was absent from your answer unless it was included in "super pandemic", so I was wondering whether you were counting it as a conventional risk (which seemed odd) or excluding it from your analysis (which also seems odd to me, personally, but at least now I understand your short-AI-timelines-based reasoning for that!).
I am going read through the database of existential threats though, does it include what you were referring too?
Yeah, I think all the things I'd consider most important are in there. Or at least "most" - I'd have to think for longer in order to be sure about "all".
There are scenarios that I think aren't explicitly addressed in any estimates that database, like things to do with whole-brain emulation or brain-computer interfaces, but these are arguably covered by other estimates. (I also don't have a strong view on how important WBE or BCI scenarios are.)
The overall risk was 9.2% for the community forecast (with 7.3% for AI risk). To convert this to a forecast for existential risk (100% dead), I assumed 6% risk from AI, 1% from nuclear war, and 0.4% from biological risk
I think this implies you think:
- AI is ~4 or 5 times (6% vs 1.3%) as likely to kill 100% of people as to kill between 95 and 100% of people
- Everything other than AI is roughly equally likely (1.5% vs 1.4%) to kill 100% of people as to kill between 95% and 100% of people
Does that sound right to you? And if so, what was your reasoning?
I ask out of curiosity, not because I disagree. I don't have a strong view here, except perhaps that AI is the risk with the highest ratio of "chance it causes outright extinction" to "chance it causes major carnage" (and this seems to align with your views).
Very interesting, thanks for sharing! This seems like a nice example of combining various existing predictions to answer a new question.
a forecast for existential risk (100% dead)
It seems worth highlighting that extinction risk (risk of 100% dead) is a (big) subset of existential risk (risk of permanent and drastic destruction of humanity's potential), rather than those two terms being synonymous. If your forecast was for extinction risk only, then the total existential risk should presumably be at least slightly higher, due to risks of unrecoverable collapse or unrecoverable dystopia.
(I think it's totally ok and very useful to "just" forecast extinction risk. I just think it's also good to be clear about what one's forecast is of.)
Thanks for those responses :)
MIRI people and Wei Dai for pessimism (though I'm not sure it's their view that it's worse than 50/50), Paul Christiano and other researchers for optimism.
It does seem odd to me that, if you aimed to do something like average over these people's views (or maybe taking a weighted average, weighting based on the perceived reasonableness of their arguments), you'd end up with a 50% credence on existential catastrophe from AI. (Although now I notice you actually just said "weight it by the probability that it turns out badly instead of well"; I'm assuming by that you mean "the probability that it results in existential catastrophe", but feel free to correct me if not.)
One MIRI person (Buck Schlegris) has indicated they think there's a 50% chance of that. One other MIRI-adjacent person gives estimates for similar outcomes in the range of 33-50%. I've also got general pessimistic vibes from other MIRI people's writings, but I'm not aware of any other quantitative estimates from them or from Wei Dai. So my point estimate for what MIRI people think would be around 40-50%, and not well above 50%.
And I think MIRI is widely perceived as unusually pessimistic (among AI and x-risk researchers; not necessarily among LessWrong users). And people like Paul Christiano give something more like a 10% chance of existential catastrophe from AI. (Precisely what he was estimating was a little different, but similar.)
So averaging across these views would seem to give us something closer to 30%.
Personally, I'd also probably include various other people who seem thoughtful on this and are actively doing AI or x-risk research - e.g., Rohin Shah, Toby Ord - and these people's estimates seem to usually be closer to Paul than to MIRI (see also). But arguing for doing that would be arguing for a different reasoning process, and I'm very happy with you using your independent judgement to decide who to defer to; I intend this comment to instead just express confusion about how your stated process reached your stated output.
(I'm getting these estimates from my database of x-risk estimates. I'm also being slightly vague because I'm still feeling a pull to avoid explicitly mentioning other views and thereby anchoring this thread.)
(I should also note that I'm not at all saying to not worry about AI - something like a 10% risk is still a really big deal!)