Please take a survey on the quality/impact of things I've written 2020-08-29T10:39:33.033Z
Good and bad ways to think about downside risks 2020-06-11T01:38:46.646Z
Failures in technology forecasting? A reply to Ord and Yudkowsky 2020-05-08T12:41:39.371Z
Database of existential risk estimates 2020-04-20T01:08:39.496Z
[Article review] Artificial Intelligence, Values, and Alignment 2020-03-09T12:42:08.987Z
Feature suggestion: Could we get notifications when someone links to our posts? 2020-03-05T08:06:31.157Z
Understandable vs justifiable vs useful 2020-02-28T07:43:06.123Z
Memetic downside risks: How ideas can evolve and cause harm 2020-02-25T19:47:18.237Z
Information hazards: Why you should care and what you can do 2020-02-23T20:47:39.742Z
Mapping downside risks and information hazards 2020-02-20T14:46:30.259Z
What are information hazards? 2020-02-18T19:34:01.706Z
[Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? 2020-02-16T19:56:15.963Z
Value uncertainty 2020-01-29T20:16:18.758Z
Using vector fields to visualise preferences and make them consistent 2020-01-28T19:44:43.042Z
Risk and uncertainty: A false dichotomy? 2020-01-18T03:09:18.947Z
Can we always assign, and make sense of, subjective probabilities? 2020-01-17T03:05:57.077Z
MichaelA's Shortform 2020-01-16T11:33:31.728Z
Moral uncertainty: What kind of 'should' is involved? 2020-01-13T12:13:11.565Z
Moral uncertainty vs related concepts 2020-01-11T10:03:17.592Z
Morality vs related concepts 2020-01-07T10:47:30.240Z
Making decisions when both morally and empirically uncertain 2020-01-02T07:20:46.114Z
Making decisions under moral uncertainty 2019-12-30T01:49:48.634Z


Comment by michaela on Why those who care about catastrophic and existential risk should care about autonomous weapons · 2020-11-20T07:25:15.162Z · LW · GW

If I had to choose between a AW treaty and some treaty governing powerful AI, the latter (if it made sense) is clearly more important. I really doubt there is such a choice and that one helps with the other, but I could be wrong here. [emphasis added]

Did you mean something like "and in fact I think that one helps with the other"?

Comment by michaela on Forecasting Thread: Existential Risk · 2020-10-09T06:57:11.159Z · LW · GW

I don't think I know of any person who's demonstrated this who thinks risk is under, say, 10%

If you mean risk of extinction or existential catastrophe from AI at the time AI is developed, it seems really hard to say, as I think that that's been estimated even less often than other aspects of AI risk (e.g. risk this century) or x-risk as a whole. 

I think the only people (maybe excluding commenters who don't work on this professionally) who've clearly given a greater than 10% estimate for this are: 

  • Buck Schlegris (50%)
  • Stuart Armstrong (33-50% chance humanity doesn't survive AI)
  • Toby Ord (10% existential risk from AI this century, but 20% for when the AI transition happens)

Meanwhile, people who I think have effectively given <10% estimates for that (judging from estimates that weren't conditioning on when AI was developed; all from my database):

  • Very likely MacAskill (well below 10% for extinction as a whole in the 21st century)
  • Very likely Ben Garfinkel (0-1% x-catastrophe from AI this century)
  • Probably the median FHI 2008 survey respondent (5% for AI extinction in the 21st century)
  • Probably Pamlin & Armstrong in a report (0-10% for unrecoverable collapse extinction from AI this century)
    • But then Armstrong separately gave a higher estimate
    • And I haven't actually read the Pamlin & Armstrong report
  • Maybe Rohin Shah (some estimates in a comment thread)

(Maybe Hanson would also give <10%, but I haven't seen explicit estimates from him, and his reduced focus on and "doominess" from AI may be because he thinks timelines are longer and other things may happen first.)

I'd personally consider all the people I've listed to have demonstrated at least a fairly good willingness and ability to reason seriously about the future, though there's perhaps room for reasonable disagreement here. (With the caveat that I don't know Pamlin and don't know precisely who was in the FHI survey.)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-10-09T06:35:02.114Z · LW · GW

Mostly I only start paying attention to people's opinions on these things once they've demonstrated that they can reason seriously about weird futures

[tl;dr This is an understandable thing to do, but does seem to result in biasing one's sample towards higher x-risk estimates]

I can see the appeal of that principle. I partly apply such a principle myself (though in the form of giving less weight to some opinions, not ruling them out).

But what if it turns out the future won't be weird in the ways you're thinking of? Or what if it turns out that, even if it will be weird in those ways, influencing it is too hard, or just isn't very urgent (i.e., the "hinge of history" is far from now), or is already too likely to turn out well "by default" (perhaps because future actors will also have mostly good intentions and will be more informed). 

Under such conditions, it might be that the smartest people with the best judgement won't demonstrate that they can reason seriously about weird futures, even if they hypothetically could, because it's just not worth their time to do so. In the same way as how I haven't demonstrated my ability to reason seriously about tax policy, because I think reasoning seriously about the long-term future is a better use of my time. Someone who starts off believing tax policy is an overwhelmingly big deal could then say "Well, Michael thinks the long-term future is what we should focus on instead, but how why should I trust Michael's view on that when he hasn't demonstrated he can reason seriously about the importance and consequences of tax policy?"

(I think I'm being inspired here by Trammell's interested posting "But Have They Engaged With The Arguments?" There's some LessWrong discussion - which I haven't read - of an early version here.)

I in fact do believe we should focus on long-term impacts, and am dedicating my career to doing so, as influencing the long-term future seems sufficiently likely to be tractable, urgent, and important. But I think there are reasonable arguments against each of those claims, and I wouldn't be very surprised if they turned out to all be wrong. (But I think currently we've only had a very small part of humanity working intensely and strategically on this topic for just ~15 years, so it would seem too early to assume there's nothing we can usefully do here.)

And if so, it would be better to try to improve the short-term future, which further future people can't help us with, and then it would make sense for the smart people with good judgement to not demonstrate their ability to think seriously about the long-term future. So under such conditions, the people left in the sample you pay attention to aren't the smartest people with the best judgement, and are skewed towards unreasonably high estimates of the tractability, urgency, and/or importance of influencing the long-term future.

To emphasise: I really do want way more work on existential risks and longtermism more broadly! And I do think that, when it comes to those topics, we should pay more attention to "experts" who've thought a lot about those topics than to other people (even if we shouldn't only pay attention to them). I just want us to be careful about things like echo chamber effects and biasing the sample of opinions we listen to.

Comment by michaela on Forecasting Thread: Existential Risk · 2020-10-08T06:56:26.788Z · LW · GW

I'm not sure which of these estimates are conditional on superintelligence being invented. To the extent that they're not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I'm using here.

Good point. I'd overlooked that.

I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.

(I think it's good to be cautious about bias arguments, so take the following with a grain of salt, and note that I'm not saying any of these biases are necessarily the main factor driving estimates. I raise the following points only because the possibility of bias has already been mentioned.)

I think social desirability bias could easily push the opposite way as well, especially if we're including non-academics who dedicate their jobs or much of their time to x-risks (which I think covers the people you're considering, except that Rohin is sort-of in academia). I'd guess the main people listening to these people's x-risk estimates are other people who think x-risks are a big deal, and higher x-risk estimates would tend to make such people feel more validated in their overall interests and beliefs. 

I can see how something like a bias towards saying things that people take seriously and that don't seem crazy (which is perhaps a form of social desirability bias) could also push estimates down. I'd guess that that that effect is stronger the closer one gets to academia or policy. I'm not sure what the net effect of the social desirability bias type stuff would be on people like MIRI, Paul, and Rohin.

I'd guess that the stronger bias would be selection effects in who even makes these estimates. I'd guess that people who work on x-risks have higher x-risk estimates than people who don't and who have thought about odds of x-risk somewhat explicitly. (I think a lot of people just wouldn't have even a vague guess in mind, and could swing from casually saying extinction is likely in the next few decades to seeing that idea as crazy depending on when you ask them.) 

Quantitative x-risk estimates tend to come from the first group, rather than the latter, because the first group cares enough to bother to estimate this. And we'd be less likely to pay attention to estimates from the latter group anyway, if they existed, because they don't seem like experts - they haven't spent much time thinking about the issue. But they haven't spent much time thinking about it because they don't think the risk is high, so we're effectively selecting who to listen to the estimates of based in part on what their estimates would be.

I'd still do similar myself - I'd pay attention to the x-risk "experts" rather than other people. And I don't think we need to massively adjust our own estimates in light of this. But this does seem like a reason to expect the estimates are biased upwards, compared to the estimates we'd get from a similarly intelligent and well-informed group of people who haven't been pre-selected for a predisposition to think the risk is somewhat high.

Comment by michaela on Thoughts on Human Models · 2020-09-26T14:33:50.871Z · LW · GW

That does seem interesting and concerning.

Minor: The link didn’t work for me; in case others have the same problem, here is (I believe) the correct link.

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-25T06:35:34.107Z · LW · GW

Yeah, totally agreed. 

I also think it's easier to forecast extinction in general, partly because it's a much clearer threshold, whereas there are some scenarios that some people might count as an "existential catastrophe" and others might not. (E.g., Bostrom's "plateauing — progress flattens out at a level perhaps somewhat higher than the present level but far below technological maturity".)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-24T06:41:47.906Z · LW · GW

Conventional risks are events that already have a background chance of happening (as of 2020 or so) and does not include future technologies. 

Yeah, that aligns with how I'd interpret the term. I asked about advanced biotech because I noticed it was absent from your answer unless it was included in "super pandemic", so I was wondering whether you were counting it as a conventional risk (which seemed odd) or excluding it from your analysis (which also seems odd to me, personally, but at least now I understand your short-AI-timelines-based reasoning for that!).

I am going read through the database of existential threats though, does it include what you were referring too?

Yeah, I think all the things I'd consider most important are in there. Or at least "most" - I'd have to think for longer in order to be sure about "all".

There are scenarios that I think aren't explicitly addressed in any estimates that database, like things to do with whole-brain emulation or brain-computer interfaces, but these are arguably covered by other estimates. (I also don't have a strong view on how important WBE or BCI scenarios are.)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T20:16:34.680Z · LW · GW

The overall risk was 9.2% for the community forecast (with 7.3% for AI risk). To convert this to a forecast for existential risk (100% dead), I assumed 6% risk from AI, 1% from nuclear war, and 0.4% from biological risk

I think this implies you think: 

  • AI is ~4 or 5 times (6% vs 1.3%) as likely to kill 100% of people as to kill between 95 and 100% of people
  • Everything other than AI is roughly equally likely (1.5% vs 1.4%) to kill 100% of people as to kill between 95% and 100% of people

Does that sound right to you? And if so, what was your reasoning?

I ask out of curiosity, not because I disagree. I don't have a strong view here, except perhaps that AI is the risk with the highest ratio of "chance it causes outright extinction" to "chance it causes major carnage" (and this seems to align with your views).

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T20:10:18.842Z · LW · GW

Very interesting, thanks for sharing! This seems like a nice example of combining various existing predictions to answer a new question.

a forecast for existential risk (100% dead)

It seems worth highlighting that extinction risk (risk of 100% dead) is a (big) subset of existential risk (risk of permanent and drastic destruction of humanity's potential), rather than those two terms being synonymous. If your forecast was for extinction risk only, then the total existential risk should presumably be at least slightly higher, due to risks of unrecoverable collapse or unrecoverable dystopia.

(I think it's totally ok and very useful to "just" forecast extinction risk. I just think it's also good to be clear about what one's forecast is of.)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T20:04:02.427Z · LW · GW

Thanks for those responses :)

MIRI people and Wei Dai for pessimism (though I'm not sure it's their view that it's worse than 50/50), Paul Christiano and other researchers for optimism. 

It does seem odd to me that, if you aimed to do something like average over these people's views (or maybe taking a weighted average, weighting based on the perceived reasonableness of their arguments), you'd end up with a 50% credence on existential catastrophe from AI. (Although now I notice you actually just said "weight it by the probability that it turns out badly instead of well"; I'm assuming by that you mean "the probability that it results in existential catastrophe", but feel free to correct me if not.)

One MIRI person (Buck Schlegris) has indicated they think there's a 50% chance of that. One other MIRI-adjacent person gives estimates for similar outcomes in the range of 33-50%. I've also got general pessimistic vibes from other MIRI people's writings, but I'm not aware of any other quantitative estimates from them or from Wei Dai. So my point estimate for what MIRI people think would be around 40-50%, and not well above 50%.

And I think MIRI is widely perceived as unusually pessimistic (among AI and x-risk researchers; not necessarily among LessWrong users). And people like Paul Christiano give something more like a 10% chance of existential catastrophe from AI. (Precisely what he was estimating was a little different, but similar.)

So averaging across these views would seem to give us something closer to 30%. 

Personally, I'd also probably include various other people who seem thoughtful on this and are actively doing AI or x-risk research - e.g., Rohin Shah, Toby Ord - and these people's estimates seem to usually be closer to Paul than to MIRI (see also). But arguing for doing that would be arguing for a different reasoning process, and I'm very happy with you using your independent judgement to decide who to defer to; I intend this comment to instead just express confusion about how your stated process reached your stated output.

(I'm getting these estimates from my database of x-risk estimates. I'm also being slightly vague because I'm still feeling a pull to avoid explicitly mentioning other views and thereby anchoring this thread.)

(I should also note that I'm not at all saying to not worry about AI - something like a 10% risk is still a really big deal!)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T12:46:13.938Z · LW · GW

(Just a heads up that the link leads back to this thread, rather than to your Elicit snapshot :) )

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T12:44:46.723Z · LW · GW

(Minor & meta: I'd suggest people take screenshots which include the credence on "More than 2120-01-01" on the right, as I think that's a quite important part of one's prediction. But of course, readers can still find that part of your prediction by reading your comment or clicking the link - it's just not highlighted as immediately.)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T12:37:53.599Z · LW · GW

I do not think any conventional threat such as nuclear war, super pandemic or climate change is likely to be an ER

Are you including risks from advanced biotechnology in that category? To me, it would seem odd to call that a "conventional threat"; that category sounds to me like it would refer to things we have a decent amount of understanding of and experience with. (Really this is more of a spectrum, and our understanding of and experience with risks from nuclear war and climate change is of course limited in key ways as well. But I'd say it's notably less limited than is the case with advanced biotech or advanced AI.)

with the last <1% being from more unusual threats such as simulation being turned off, false vacuum collapse, or hostile alien ASI. But also, for unforeseen or unimagined threats.

It appears to me that there are some important risks that have been foreseen and imagined which you're not accounting for. Let me know if you want me to say more; I hesitate merely because I'm wary of pulling independent views towards community views in a thread like this, not for infohazard reasons (the things I have in mind are widely discussed and non-exotic). 

Note: I made this prediction before looking at the Effective Altruism Database of Existential Risk Estimates.

I think it's cool that you made this explicit, to inform how and how much people update on your views if they've already updated on views in that database :)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T12:32:32.325Z · LW · GW

Interesting, thanks for sharing. 

an uncertain but probably short delay for a major x-risk factor (probably superintelligence) to appear as a result

I had a similar thought, though ultimately was too lazy to try to actually represent it. I'd be interested to hear what what size of delay you used, and what your reasoning for that was.

averaging to about 50% because of what seems like a wide range of opinions among reasonable well-informed people

Was your main input into this parameter your perceptions of what other people would believe about this parameter? If so, I'd be interested to hear whose beliefs you perceive yourself to be deferring to here. (If not, I might not want to engage in that discussion, to avoid seeming to try to pull an independent belief towards average beliefs of other community members, which would seem counterproductive in a thread like this.)

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-23T07:37:27.758Z · LW · GW

I'll also hesitantly mention my database of existential risk estimates

I hesitate because I suspect it's better if most people who are willing to just make a forecast here without having recently looked at the predictions in that database, so we get a larger collection of more independent views. 

But I guess people can make their own decision about whether to look at the database, perhaps for cases where:

  • People just feel too unsure where to start with forecasting this to bother trying, but if they saw other people's forecasts they'd be willing to come up with their own forecast that does more than just totally parroting the existing forecasts
    • And it's necessary to do more than just parroting, as the existing forecasts are about % chance by a given date, not the % chance at each date over a period
    • People could perhaps come up with clever ways to decide how much weight to give each forecast and how to translate them into an Elicit snapshot
  • People make their own forecast, but then want to check the database and consider making tweaks before posting it here (ideally also showing here what their original, independent forecast was)
Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-22T06:28:24.759Z · LW · GW

Here are a couple sources people might find useful for guiding how they try to break this question down and reason about it:

Comment by michaela on Forecasting Thread: Existential Risk · 2020-09-22T06:21:11.402Z · LW · GW

Thanks for making this thread!

I should say that I'd give very little weight to both my forecast and my reasoning. Reasons for that include that:

  • I'm not an experienced forecaster
  • I don't have deep knowledge on relevant specifics (e.g., AI paradigms, state-of-the-art in biotech)
  • I didn't spend a huge amount of time on my forecast, and used pretty quick-and-dirty methods
  • I drew on existing forecasts to some extent (in particular, the LessWrong Elicit AI timelines thread and Ord's x-risk estimates). So if you updated on those forecasts and then also updated on my forecast as if it was independent of them, you'd be double-counting some views and evidence

So I'm mostly just very excited to see other people's forecasts, and even more excited to see how they reason about and break down the question!

Comment by michaela on MichaelA's Shortform · 2020-09-04T08:51:21.298Z · LW · GW

If any reading this has read anything I’ve written on LessWrong or the EA Forum, I’d really appreciate you taking this brief, anonymous survey. Your feedback is useful whether your opinion of my work is positive, mixed, lukewarm, meh, or negative. 

And remember what mama always said: If you’ve got nothing nice to say, self-selecting out of the sample for that reason will just totally bias Michael’s impact survey.

(If you're interested in more info on why I'm running this survey and some thoughts on whether other people should do similar, I give that here.)

Comment by michaela on Please take a survey on the quality/impact of things I've written · 2020-08-29T10:40:39.457Z · LW · GW

Why I’m running this survey

I think that getting clear feedback on how well one is doing, and how much one is progressing, tends to be somewhat hard in general, but especially when it comes to:

  • Research
    • And especially relatively big-picture/abstract research, rather than applied research
  • Actually improving the world compared to the counterfactual
    • Rather than, e.g., getting students’ test scores up, meeting an organisation’s KPIs, or publishing a certain number of papers
  • Longtermism

And I’ve primarily been doing big-picture/abstract research aimed at improving the world, compared to the counterfactual, from a longtermist perspective. So, yeah, I’m a tad in the dark about how it’s all been going…[1]

I think some of the best metrics by which to judge research are whether people:

  • are bothering to pay attention to it
  • think it’s interesting
  • think it’s high-quality/rigorous/well-reasoned
  • think it addresses important topics
  • think it provides important insights
  • think they’ve actually changed their beliefs, decisions, or plans based on that research
  • etc.

I think this data is most useful if these people have relevant expertise, are in positions to make especially relevant and important decisions, etc. But anyone can at least provide input on things like how well-written or well-reasoned some work seems to have been. And whoever the respondents are, whether the research influenced them probably provides at least weak evidence regarding whether the research influenced some other set of people (or whether it could, if that set of people were to read it).

This year, I’ve gathered a decent amount of data about the above-listed metrics. But more data would be useful. And the data I’ve gotten so far has usually been non-anonymous, and often resulted from people actively reaching out to me. Both of those factors likely bias the responses in a positive direction. 

So I’ve created this survey in order to get additional - and hopefully less biased - data, as an input into my thinking about: 

  1. whether EA-aligned research and/or writing is my comparative advantage (as I’m also actively considering a range of alternative pathways)
  2. which topics, methodologies, etc. within research and/or writing are my comparative advantage
  3. specific things I could improve about my research and/or writing (e.g., topic choice, how rigorous vs rapid-fire my approach should be, how concise I should be)

But there’s also another aim of this survey. The idea of doing this survey, and many of the questions, was inspired partly by Rethink Priorities’ impact survey. But I don’t recall seeing evidence that individual researchers/writers (or even other organisations) run such surveys.[2] And it seems plausible to me that they’d benefit from doing so. 

So this is also an experiment to see how feasible and useful this is, to inform whether other people should run their own surveys of this kind. I plan to report back here in a couple weeks September with info like how many responses I got and how useful this seemed to be.

[1] I’m not necessarily saying that that type of research is harder to do than e.g. getting students’ test scores up. I’m just saying it’s harder to get clear feedback on how well one is doing.

[2] Though I have seen various EAs provide links to forms for general anonymous feedback. I think that’s also a good idea, and I’ve copied the idea in my own forum bio.

Comment by michaela on MichaelA's Shortform · 2020-08-23T10:41:45.312Z · LW · GW

See also Open Philanthropy Project's list of different kinds of uncertainty (and comments on how we might deal with them) here

Comment by michaela on MichaelA's Shortform · 2020-08-04T12:02:28.487Z · LW · GW

See also EA reading list: cluelessness and epistemic modesty.

Comment by michaela on MichaelA's Shortform · 2020-06-26T00:03:50.072Z · LW · GW

Ok, so it sounds like Legg and Hutter's definition works given certain background assumptions / ways of modelling things, which they assume in their full paper on their own definition. 

But in the paper I cited, Legg and Hutter give their definition without mentioning those assumptions / ways of modelling things. And they don't seem to be alone in that, at least given the out-of-context quotes they provide, which include: 

  • "[Performance intelligence is] the successful (i.e., goal-achieving) performance of the system in a complicated environment"
  • "Achieving complex goals in complex environments"
  • "the ability to solve hard problems."

These definitions could all do a good job capturing what "intelligence" typically means if some of the terms in them are defined certain ways, or if certain other things are assumed. But they seem inadequate by themselves, in a way Legg and Hutter don't note in their paper. (Also, Legg and Hutter don't seem to indicate that that paper is just or primarily about how intelligence should be defined in relation to AI systems.)

That said, as I mentioned before, I don't actually think this is a very important oversight on their part.

Comment by michaela on MichaelA's Shortform · 2020-06-25T00:28:30.091Z · LW · GW

Firstly, I'll say that, given that people already have a pretty well-shared intuitive understanding of what "intelligence" is meant to mean, I don't think it's a major problem for people to give explicit definitions like Legg and Hutter's. I think people won't then go out and assume that wealth, physical strength, etc. count as part of intelligence - they're more likely to just not notice that the definitions might imply that.

But I think my points do stand. I think I see two things you might be suggesting:

  • Intelligence is the only thing that increases an agent’s ability to achieve goals across all environments.
  • Intelligence is an ability, which is part of the agent, whereas things like wealth are resources, and are part of the environment.

If you meant the first of those things, I'd agree that "“Intelligence” might help in a wider range of environments than those [other] capabilities or resources help in". E.g., a billion US dollars wouldn't help someone at any time before 1700CE (or whenever) or probably anytime after 3000CE achieve their goals, whereas intelligence probably would. 

But note that Legg and Hutter say "across a wide range of environments." A billion US dollars would help anyone, in any job, any country, and any time from 1900 to 2020 achieve most of their goals. I would consider that a "wide" range of environments, even if it's not maximally wide.

And there are aspects of intelligence that would only be useful in a relatively narrow set of environments, or for a relatively narrow set of goals. E.g., factual knowledge is typically included as part of intelligence, and knowledge the dates of birth and death of US presidents will be helpful in various situations, but probably in fewer situations and for fewer goals than a billion dollars.

If you meant the second thing, I'd note in response the other capabilities, rather than the other resources. For example, it seems to me intuitive to speak of an agent's charisma or physical strength as a property of the agent, rather than of the state. And I think those capabilities will help it achieve goals in a wide (though not maximally wide) range of environments. 

We could decide to say an agent's charisma and physical strength are properties of the state, not the agent, and that this is not the case for intelligence. Perhaps this is useful when modelling an AI and its environment in a standard way, or something like that, and perhaps it's typically assumed (I don't know). If so, then combining an explicit statement of that with Legg and Hutter's definition may address my points, as that might explicitly slice all other types of capabilities and resources out of the definition of "intelligence". 

But I don't think it's obvious that things like charisma and physical strength are more a property of the environment than intelligence is - at least for humans, for whom all of these capabilities ultimately just come down to our physical bodies (assuming we reject dualism, which seems safe to me).

Does that make sense? Or did I misunderstand your points?

Comment by michaela on TurnTrout's shortform feed · 2020-06-24T03:05:06.050Z · LW · GW

This seems right to me, and I think it's essentially the rationale for the idea of the Long Reflection.

Comment by michaela on MichaelA's Shortform · 2020-06-24T03:02:32.837Z · LW · GW

“Intelligence” vs. other capabilities and resources

Legg and Hutter (2007) collect 71 definitions of intelligence. Many, perhaps especially those from AI researchers, would actually cover a wider set of capabilities or resources than people typically want the term “intelligence” to cover. For example, Legg and Hutter’s own “informal definition” is: “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” But if you gave me a billion dollars, that would vastly increase my ability to achieve goals in a wide range of environments, even if it doesn’t affect anything we’d typically want to refer to as my “intelligence”.

(Having a billion dollars might lead to increases in my intelligence, if I use some of the money for things like paying for educational courses or retiring so I can spend all my time learning. But I can also use money to achieve goals in ways that don’t look like “increasing my intelligence”.)

I would say that there are many capabilities or resources that increase an agent’s ability to achieve goals in a wide range of environments, and intelligence refers to a particular subset of these capabilities or resources. Some of the capabilities or resources which we don’t typically classify as “intelligence” include wealth, physical strength, connections (e.g., having friends in the halls of power), attractiveness, and charisma. 

“Intelligence” might help in a wider range of environments than those capabilities or resources help in (e.g., physical strength seems less generically useful). And some of those capabilities or resources might be related to intelligence (e.g., charisma), be “exchangeable” for intelligence (e.g., money), or be attainable via intelligence (e.g., higher intelligence can help one get wealth and connections). But it still seems a useful distinction can be made between “intelligence” and other types of capabilities and resources that also help an agent achieve goals in a wide range of environments.

I’m less sure how to explain why some of those capabilities and resources should fit within “intelligence” while others don’t. At least two approaches to this can be inferred from the definitions Legg and Hutter collect (especially those from psychologists): 

  1. Talk about “mental” or “intellectual” abilities
    • But then of course we must define those terms. 
  2. Gesture at examples of the sorts of capabilities one is referring to, such as learning, thinking, reasoning, or remembering.
    • This second approach seems useful, though not fully satisfactory.

An approach that I don’t think I’ve seen, but which seems at least somewhat useful, is to suggest that “intelligence” refers to the capabilities or resources that help an agent (a) select or develop plans that are well-aligned with the agent’s values, and (b) implement the plans the agent has selected or developed. In contrast, other capabilities and resources (such as charisma or wealth) primarily help an agent implement its plans, and don’t directly provide much help in selecting or developing plans. (But as noted above, an agent could use those other capabilities or resources to increase their intelligence, which then helps the agent select or develop plans.)

For example, both (a) becoming more knowledgeable and rational and (b) getting a billion dollars would help one more effectively reduce existential risks. But, compared to getting a billion dollars, becoming more knowledgeable and rational is much more likely to lead one to prioritise existential risk reduction.

I find this third approach useful, because it links to the key reason why I think the distinction between intelligence and other capabilities and resources actually matters. This reason is that I think increasing an agent’s “intelligence” is more often good than increasing an agent’s other capabilities or resources. This is because some agents are well-intentioned yet currently have counterproductive plans. Increasing the intelligence of such agents may help them course-correct and drive faster, whereas increasing their other capabilities and resources may just help them drive faster down a harmful path. 

(I plan to publish a post expanding on that last idea soon, where I’ll also provide more justification and examples. There I’ll also argue that there are some cases where increasing an agent’s intelligence would be bad yet increasing their “benevolence” would be good, because some agents have bad values, rather than being well-intentioned yet misguided.)

Comment by michaela on Good and bad ways to think about downside risks · 2020-06-13T11:16:55.580Z · LW · GW

Yes, this seems plausible to me. What I was saying is that that would be a reason why the EV of arbitrary actions might often be negative, rather than directly being a reason why people will overestimate the EV of arbitrary actions. The claim "People should take the pure EV perspective" is consistent with the claim "A large portion of actions have negative EV and shouldn't be taken". This is because taking the pure EV perspective would involve assessing both the benefits and risks (which could include adjusting for the chance of many unknown unknowns that would lead to harm), and then deciding against doing actions that appear negative.

Comment by michaela on Good and bad ways to think about downside risks · 2020-06-13T02:13:29.866Z · LW · GW

I find the idea in those first two paragraphs quite interesting. It seems plausible, and isn't something I'd thought of before. It sounds like it's essentially applying the underlying idea of the optimiser's/winner's/unilateralist's curse to one person evaluating a set of options, rather than to a set of people evaluating one option? 

I also think confirmation bias or related things will tend to bias people towards thinking options they've picked, or are already leaning towards picking, are good. Though it's less clear that confirmation bias will play a role when a person has only just began evaluating the options.

Most systems in our modern world are not anti-fragile and suffer if you expose them to random noise. 

This sounds more like a reason why many actions (or a "random action") will make things worse (which seems quite plausible to me), rather than a reason why people would be biased to overestimate benefits and underestimate harms from actions. Though I guess perhaps people's failure to recognise this reason why many/random actions may make things worse, despite this reason being real, will then lead to them systematically overestimating how positive actions will be.

In any case, I can also think of biases that could push in the opposite direction. E.g., negativity bias and status quo bias. My guess would be there are some people and domains where, on net, there tends to be a bias towards overestimating the value of actions, and some people and domains where the opposite is true. And I doubt we could get a strong sense of how it all plays out just by theorising; we'd need some empirical work. (Incidentally, Convergence should also be releasing a somewhat related post soon, which will outline 5 potential causes of too little caution about information hazards, and 5 potential causes of too much caution.)

Finally, it seems worth noting that, if we do have reason to believe that, by default, people tend to overestimate the benefits and underestimate the harms that an action will cause, that wouldn't necessarily mean we should abandon the pure EV perspective. Instead, we could just incorporate an adjustment to our naive EV assessments to account for that tendency/bias, in the same way we should adjust for the unilateralist's curse in many situations. And the same would be true if it turned out that, by default, people had the opposite bias. (Though if there are these biases, that could mean it'd be unwise to promote the pure EV perspective without also highlighting the bias that needs adjusting for.)

Comment by michaela on Good and bad ways to think about downside risks · 2020-06-11T23:24:27.802Z · LW · GW

Good point. I intended "compared to the counterfactual" to be implicit throughout this article, as that's really what "impact" should always mean. I also briefly alluded to it in saying "such as harms from someone less qualified being interviewed". 

But it's true that many people don't naturally interpret "impact" as "compared to the counterfactual", and that it's often worth highlighting explicitly that that's the relevant comparison. 

To address that, I've now sprinkled in a few mentions of "compared to the counterfactual". Thanks for highlighting this :)

Comment by michaela on Good and bad ways to think about downside risks · 2020-06-11T23:20:58.916Z · LW · GW

Very good point! Thanks for raising it. I think this was an important oversight, and one I'm surprised I made, as I think the unilateralist's curse is a very useful concept and I've previously collected some sources on it.

To rectify that, I've now added two mentions of the curse (with links) in the section on the pure EV perspective.

Comment by michaela on Information hazards: Why you should care and what you can do · 2020-06-06T01:52:38.605Z · LW · GW

Here are some relevant thoughts from Andrew Critch on a FLI podcast episode I just heard (though it was released in 2017):

what if what you discover is not a piece of technology, but a piece of prediction, like Anthony said? What if you discover that it seems quite likely, based on the aggregate opinion of a bunch of skilled predictors, that artificial general human intelligence will be possible within 10 years? Well that, yeah, that has some profound implications for the world, for policy, for business, for military. There’s no denying that. I feel sometimes there’s a little bit of an instinct to kind of pretend like no one’s going to notice that AGI is really important. I don’t think that’s the case.

I had friends in the 2010 vicinity, who thought, surely no one in government will recognize the importance of superintelligence in the next decade. I was almost convinced. I had a little more faith than my friends, so I would have won some bets, but I still was surprised to see Barack Obama talking about superintelligence on an interview. I think the first thing is not to underestimate the possibility that, if you’ve made this prediction, maybe somebody else is about to make it, too.

That said, if you’re Metaculus, maybe you just know who’s running prediction markets, who is studying good prediction aggregation systems, and you just know no one’s putting in the effort, and you really might know that you’re the only people on earth who have really made this prediction, or maybe you and only a few other think tanks have managed to actually come up with a good prediction about when superintelligent AI will be produced, and, moreover, that it’s soon. If you discovered that, I would tell you the same thing I would tell anyone who discovers a potentially dangerous idea, which is not to write a blog post about it right away.

I would say, find three close, trusted individuals that you think reason well about human extinction risk, and ask them to think about the consequences and who to tell next. Make sure you’re fair-minded about it. Make sure that you don’t underestimate the intelligence of other people and assume that they’ll never make this prediction, but … [...]

Then do a rollout procedure. In software engineering, you developed a new feature for your software, but it could crash the whole network. It could wreck a bunch of user experiences, so you just give it to a few users and see what they think, and you slowly roll it out. I think a slow rollout procedure is the same thing you should do with any dangerous idea, any potentially dangerous idea. You might not even know the idea is dangerous. You may have developed something that only seems plausibly likely to be a civilizational scale threat, but if you zoom out and look at the world, and you imagine all the humans coming up with ideas that could be civilizational scale threats.

Maybe they’re a piece of technology, maybe they’re dangerous predictions, but no particular prediction or technology is likely to be a threat, so no one in particular decides to be careful with their idea, and whoever actually produces the dangerous idea is no more careful than anyone else, and they release their idea, and it falls into the wrong hands or it gets implemented in a dangerous way by mistake. Maybe someone accidentally builds Skynet. Somebody accidentally releases replicable plans for a cheap nuclear weapon.

If you zoom out, you don’t want everyone to just share everything right away, and you want there to be some threshold of just a little worry that’s just enough to have you ask your friends to think about it first. If you’ve got something that you think is 1% likely to pose an extinction threat, that seems like a small probability, and if you’ve done calibration training, you’ll realize that that’s supposed to feel very unlikely. Nonetheless, if 100 people have a 1% chance of causing human extinction, well someone probably has a good chance of doing it.

If you just think you’ve got a small chance of causing human extinction, go ahead, be a little bit worried. Tell your friends to be a little bit worried with you for like a day or three. Then expand your circle a little bit. See if they can see problems with the idea, see dangers with the idea, and slowly expand, roll out the idea into an expanding circle of responsible people until such time as it becomes clear that the idea is not dangerous, or you manage to figure out in what way it’s dangerous and what to do about it, because it’s quite hard to figure out something as complicated as how to manage a human extinction risk all by yourself or even by a team of three or maybe even ten people. You have to expand your circle of trust, but, at the same time, you can do it methodically like a software rollout, until you come up with a good plan for managing it. As for what the plan will be, I don’t know. That’s why I need you guys to do your slow rollout and figure it out.

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T23:34:23.995Z · LW · GW

I actually quite like your four dot points, as summaries of some distinguishing features of these cases. (Although with Rutherford, I'd also highlight the point about whether or not the forecast is likely to reflect genuine beliefs, and perhaps more specifically whether or not a desire to mitigate attention hazards may be playing a role.)

And I think "Too many degrees of freedom to find some reason we shouldn't count them as "serious" predictions" gets at a good point. And I think it's improved my thinking on this a bit.

Overall, I think that your comment would be a good critique of this post if this post was saying or implying that these case studies provide no evidence for the sorts of claims Ord and Yudkowsky want to make. But my thesis was genuinely just that "I think those cases provide less clear evidence [not no evidence] than those authors seem to suggest". And I genuinely just aimed to "Highlight ways in which those cases may be murkier than Ord and Yudkowsky suggest" (and also separately note the sample size and representativeness points).

It wasn't the case that I was using terms like "less clear" and "may be murkier" to be polite or harder-to-criticise (in a motte-and-bailey sort of way), while in reality I harboured or wished to imply some stronger thesis; instead, I genuinely just meant what I said. I just wanted to "prod at each suspicious plank on its own terms", not utterly smash each suspicious plank, let alone bring the claims resting atop them crashing down.

That may also be why I didn't touch on what you see as the true crux (though I'm not certain, as I'm not certain I know precisely what you mean by that crux). This post had a very specific, limited scope. As I noted, "this post is far from a comprehensive discussion on the efficacy, pros, cons, and best practices for long-range or technology-focused forecasting."

To sort-of restate some things and sort-of address your points: I do think each of the cases provide some evidence in relation to the question (let's call it Q1) "How overly 'conservative' (or poorly-calibrated) do experts' quantitative forecasts of the likelihood or timelines of technology tend to be, under "normal" conditions?" I think the cases provide clearer evidence in relation to questions like how overly 'conservative' (or poorly-calibrated) do experts' forecasts of the likelihood or timelines of technology tend to be, when...

  • it seems likelier than normal that the forecasts themselves could change likelihoods or timelines
    • I'm not actually sure what we'd base that on. Perhaps unusually substantial prominence or publicity of the forecaster? Perhaps a domain in which there's a wide variety of goals that could be pursued, and which one is pursued has sometimes been decided partly to prove forecasts wrong? AI might indeed be an example; I don't really know.
  • it seems likelier than normal that the forecaster isn't actually giving their genuine forecast (and perhaps more specifically, that they're partly aiming to mitigate attention hazards)
  • cutting-edge development on the relevant tech is occurring in highly secretive or militarised ways well as questions about poor communication of forecasts by experts.

I think each of those questions other than Q1 are also important. And I'd agree that, in reality, we often won't know much about how far conditions differ from "normal conditions", or what "normal conditions" are really like (e.g., maybe forecasts are usually not genuine beliefs). These are both reasons why the "murkiness" I highlight about these cases might not be that big a deal in practice, or might do something more like drawing our attention to specific factors that should make us wary of expert predictions, rather than just making us wary in general.

In any case, I think the representativeness issue may actually be more important. As I note in footnote 4, I'd update more on these same cases (holding "murkiness" constant) if they were the first four cases drawn randomly, rather than through what I'd guess was a somewhat "biased" sampling process (which I don't mean as a loaded/pejorative term).

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T09:54:27.739Z · LW · GW

I do want to point out how small sample sizes are incredibly useful.

Yeah, I think that point is true, valuable, and relevant. (I also found How To Measure Anything very interesting and would recommend it, or at least this summary by Muehlhauser, to any readers of this comment who haven't read those yet.)

In this case, I think the issue of representativeness is more important/relevant than sample size. On reflection, I probably should've been clearer about that. I've now edited that section to make that clearer, and linked to this comment and Muehlhauser's summary post. So thanks for pointing that out!

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T09:37:41.586Z · LW · GW

Minor thing: did you mean to refer to Fermi rather than to Rutherford in that last paragraph?

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T04:14:50.866Z · LW · GW

Oh, good point, thanks! I had assumed Truman was VP for the whole time FDR was in office. I've now (a) edited the post to swap "during his years as Vice President" with "during his short time as Vice President", and (b) learned a fact I'm a tad embarrassed I didn't already know!

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T01:12:59.561Z · LW · GW

I think this comment raises some valid and interesting points. But I'd push back a bit on some points.

(Note that this comment was written quickly, so I may say things a bit unclearly or be saying opinions I haven't mulled over for a long time.)

More generally, on a strategic level there is very little difference between a genuinely incorrect forecast and one that is "correct", but communicated so poorly as to create a wrong impression in the mind of the listener.

There's at least some truth to this. But it's also possible to ask experts to give a number, as Fermi was asked. If the problem is poor communication, then asking experts to give a number will resolve at least part of the problem (though substantial damage may have been done by planting the verbal estimate in people's mind). If the problem is poor estimation, then asking for an explicit estimate might make things worse, as it could give a more precise incorrect answer for people to anchor on. (I don't know of specific evidence that people anchor more on numerical than verbal probability statements, but it seems likely me. Also, to be clear, despite this, I think I'm generally in favour of explicit probability estimates in many cases.)

If the state of affairs is such that anyone who privately believes there is a 10% chance of AGI is incentivized to instead report their assessment as "remote", the conclusion of Ord/Yudkowsky holds, and it remains impossible to discern whether AGI is imminent by listening to expert forecasts.

I think this is true if no one asks the experts for explicit numerical estimate, or if the incentives to avoid giving such estimates are strong enough that experts will refuse when asked. I think both of those conditions hold to a substantial extent in the real world and in relation to AGI, and that that is a reason why the Fermi case has substantial relevance to the AGI case. But it still seems useful to me to be aware of the distinction between failures of communication vs of estimation, as it seems we could sometimes get evidence that discriminates between which of those is occurring/common, and that which is occurring/common could sometimes be relevant.

Furthermore, and more importantly, however: I deny that Fermi's 10% somehow detracts from the point that forecasting the future of novel technologies is hard.

I definitely wasn't claiming that forecasting the future of novel technologies is easy, and I didn't interpret ESRogs as doing so either. What I was exploring was merely whether this case is a clear case of an expert's technology forecast being "wrong" (and, if so, "how wrong"), and what this reflects about the typical accuracy of expert technology forecasts. They could conceivably be typically accurate even if very very hard to make, if experts are really good at it and put in lots of effort. But I think more likely they're often wrong. The important question is essentially "how often", and this post bites off the smaller question "what does the Fermi case tell us about that".

As for the rest of the comment, I think both the point estimates and the uncertainty are relevant, at least when judging estimates (rather than making decisions based on them). This is in line with my understanding from e.g. Tetlock's work. I don't think I'd read much into an expert saying 1% rather than 10% for something as hard to forecast as an unprecedented tech development, unless I had reason to believe the expert was decently calibrated. But if they have given one of those numbers, and then we see what happens, then which number they gave makes a difference to how calibrated vs uncalibrated I should see them as (which I might then generalise in a weak way to experts more widely).

That said, I do generally think uncertainty of estimates is very important, and think the paper you linked to makes that point very well. And I do think one could easily focus too much on point estimates; e.g., I wouldn't plug Ord's existential risk estimates into a model as point estimates without explicitly representing a lot of uncertainty too.

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:41:32.825Z · LW · GW

So to summarize that case study criticism: everything you factchecked was accurate and you have no evidence of any kind that the Fermi story does not mean what O/Y interpret it as.

I find this a slightly odd sentence. My "fact-check" was literally just quoting and thinking about Ord's own footnote. So it would be very odd if that resulted in discovering that Ord was inaccurate. This connects back to the point I make in my first comment response: this post was not a takedown.

My point here was essentially that:

  • I think the main text of Ord's book (without the footnote) would make a reader think Fermi's forecast was very very wrong.
  • But in reality it is probably better interpreted as very very poorly communicated (which is itself relevant and interesting), and either somewhat wrong or well-calibrated but unlucky.

I do think the vast majority of people would think "remote possibility" means far less than 10%.

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:40:26.450Z · LW · GW

? If you are trying to make the point that technology is unpredictable, an example of a 'direct connection' and backfiring is a great example because it shows how fundamentally unpredictable things are: he could hardly have expected that his dismissal would spur an epochal discovery and that seems extremely surprising; this supports Ord & Yudkowsky, it doesn't contradict them. And if you're trying to make a claim that forecasts systematically backfire, that's even more alarming than O/Y's claims, because it means that expert forecasts will not just make a nontrivial number of errors (enough to be an x-risk concern) but will be systematically inversely correlated with risks and the biggest risks will come from the ones experts most certainly forecast to not be risks...

I think this paragraph makes valid points, and have updated in response (as well as in response to ESRogs indication of agreement). Here are my updated thoughts on the relevance of the "direct connection":

  • I may be wrong about the "direct connection" slightly weakening the evidence this case provides for Ord and Yudkowsky's claims. I still feel like there's something to that, but I find it hard to explain it precisely, and I'll take that, plus the responses from you and ESRogs, as evidence that there's less going on here than I think.
  • I guess I'd at least stand by my literal phrasings in that section, which were just about my perceptions. But perhaps those perceptions were erroneous or idiosyncratic, and perhaps to the point where they weren't worth raising.
  • That said, it also seems possible to me that, even if there's no "real" reason why a lack of direction connection should make this more "surprising", many people would (like me) erroneously feel it does. This could perhaps be why Ord writes "the very next morning" rather than just "the next morning".
  • Perhaps what I should've emphasised more is the point I make in footnote 2 (which is also in line with some of what you say):

This may not reduce the strength of the evidence this case provides for certain claims. One such claim would be that we should put little trust in experts’ forecasts of AGI being definitely a long way off, and this is specifically because such forecasts may themselves annoy other researchers and spur them to develop AGI faster. But Ord and Yudkowsky didn’t seem to be explicitly making claims like that.

  • Interestingly, Yudkowsky makes similar point in the essay this post partially responds to: "(Also, Demis Hassabis was present, so [people at a conference who were asked to make a particular forecast] all knew that if they named something insufficiently impossible, Demis would have DeepMind go and do it [and thereby make their forecast inaccurate].)" (Also, again, as I noted in this post, I do like that essay.)

  • I think that that phenomenon would cause some negative correlation between forecasts and truth, in some cases. I expect that, for the most part, that'd get largely overwhelmed by a mixture of random inaccuracies and a weak tendency towards accuracy. I wouldn't claim that, overall, "forecasts systematically backfire".

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:36:12.923Z · LW · GW

Firstly, I think I should say that this post was very much not intended as anything like a scathing takedown of Ord and Yudkowsky's claims or evidence. Nor did I mean to imply I'm giving definitive arguments that these cases provide no evidence for the claims made. I mean this to have more of a collaborative than combative spirit in relation to Ord and Yudkowsky's projects.

My aim was simply to "prod at each suspicious plank on its own terms, and update incrementally." And my key conclusion is that the authors, "in my opinion, imply these cases support their claims more clearly than they do" - not that the cases provide no evidence. It seems to me healthy to question evidence we have - even for conclusions we do still think are right, and even when our questions don't definitively cut down the evidence, but rather raise reasons for some doubt.

It's possible I could've communicated that better, and I'm open to suggestions on that front. But from re-reading the post again, especially the intro and conclusion, it does seem I repeatedly made explicit statements to this effect. (Although I did realise after going to bed last night that the "And I don’t think we should update much..." sentence was off, so I've now made that a tad clearer.)

I've split my response about the Rutherford and Fermi cases into different comments.

Of the 4 case studies you criticize, your claim actually supports them in the first one, you agree the second one is accurate, and you provide only speculations and no actual criticisms in the third and fourth.

Again, I think this sentence may reflect interpreting this post as much more strident and critical than it was really meant to be. I may be wrong about the "direct connection" thing (discussed in a separate comment), but I do think I raise plausible reasons for at least some doubt about (rather than outright dismissal of) the evidence each case provides, compared to how a reader might initially interpret them.

I'm also not sure what "only speculations and no actual criticisms" would mean. If you mean e.g. that I don't have evidence that a lot of Americans would've believed nuclear weapons would exist someday, then yes, that's true. I don't claim otherwise. But I point out a potentially relevant disanalogy between nuclear weapons development and AI development. And I point give some evidence that "the group of people who did know about nuclear weapons before the bombing of Hiroshima, or who believed such weapons may be developed soon, was (somewhat) larger than one might think from reading Yudkowsky’s essay." And I do give some evidence for that, as well as pointing out that I'm not aware of evidence either way for one relevant point.

Also, I don't really claim any of this post to be "criticism", at least in the usual fairly negative sense, just "prod[ding] at each suspicious plank on its own terms". I'm explicitly intending to make only relatively weak claims, really.

And then the "Sample size and representativeness" section provides largely separate reasons why it might not make much sense to update much on these cases (at least from a relatively moderate starting point) even ignoring those reasons for doubt. (Though see the interesting point 3 in Daniel Kokotajlo's comment.)

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-08T23:49:23.039Z · LW · GW


  1. Yes, I think that'd be very interesting. If this post could play a tiny role in prompting something like that, I'd be very happy. And that's the case whether or it supports some of Ord and Yudkowsky's stronger claims/implications (i.e., beyond just that experts are sometimes wrong about these things) - it just seems it'd be good to have some clearer data, either way. ETA: But I take this post by Muelhauser as indirect evidence that it'd be hard to do at least certain versions of this.

  2. Interesting point. I think that, if we expect AGI research to be closed during it shortly before really major/crazy AGI advances, then the nuclear engineering analogy would indeed have more direct relevance, from that point on. But it might not make the analogy stronger until those advances start happening. So perhaps we wouldn't necessarily strongly expect major surprises about when AGI development starts having major/crazy advances, but then expect a closing up and major surprises from that point on. (But this is all just about what that one analogy might suggest, and we obviously have other lines of argument and evidence too.)

  3. That's a good point; I hadn't really thought about that explicitly, and if I had I think I would've noted it in the post. But that's about how well the cases provide evidence about the likely inaccuracy of expert forecasts (or surprisingness) of the most important technology developments, or something like that. This is what Ord and Yudkowsky (and I) primarily care about in this context, as their focus when they make these claims is AGI. But they do sometimes (at least in my reading) make the claims as if they apply to technology forecasts more generally.

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-08T06:22:47.632Z · LW · GW

I've started collecting estimates of existential/extinction/similar risk from various causes (e.g., AI risk, biorisk). Do you know of a quick way I could find estimates of that nature (quantified and about extreme risks) in your spreadsheet? It seems like an impressive piece of work, but my current best idea for finding this specific type of thing in it would be to search for "%", for which there were 384 results...

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-04T04:20:56.820Z · LW · GW

I think I get what you're saying. Is it roughly the following?

"If an AI race did occur, maybe similar issues to what we saw in MAD might occur; there may well be an analogy there. But there's a disanalogy between the nuclear weapon case and the AI risk case with regards to the initial race, such that the initial nuclear race provides little/no evidence that a similar AI race may occur. And if a similar AI race doesn't occur, then the conditions under which MAD-style strategies may arise would not occur. So it might not really matter if there's an analogy between the AI risk situation if a race occurred and the MAD situation."

If so, I think that makes sense to me, and it seems an interesting/important argument. Though it seems to suggest something more like "We may be more ok than people might think, as long as we avoid an AI race, and we'll probably avoid an AI race", rather than simply "We may be more ok than people might think". And that distinction might e.g. suggest additional value to strategy/policy/governance work to avoid race dynamics, or to investigate how likely they are. (I don't think this is disagreeing with you, just highlighting a particular thing a bit more.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-04T04:13:43.472Z · LW · GW

Interesting (again!).

So you've updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an "optimist"... (which was already perhaps a tad misleading, given what the 1 in 20 was about)

(I mean, I know we're all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T05:01:28.870Z · LW · GW

That seems reasonable to me. I think what I'm thinking is that that's a disanalogy between a potential "race" for transformative AI, and the race/motivation for building the first nuclear weapons, rather than a disanalogy between the AI situation and MAD.

So it seems like this disanalogy is a reason to think that the evidene "we built nuclear weapons" is weaker evidence than one might otherwise think for the claim "we'll build dangerous AI" or the claim "we'll build AI so in an especially 'racing'/risky way". And that seems an important point.

But it seems like "MAD strategies have been used" remains however strong evidence it previously was for the claim "we'll do dangerous things with AI". E.g., MAD strategies could still serve as some evidence for the general idea that countries/institutions are sometimes willing to do things that are risky to themselves, and that pose very large negative externalities of risks to others, for strategic reasons. And that general idea still seems to apply at least somewhat to AI.

(I'm not sure this is actually disagreeing with what you meant/believe.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T04:52:30.441Z · LW · GW

Quite interesting. Thanks for that response.

And yes, this does seem quite consistent with Ord's framing. E.g., he writes "my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously." So I guess I've seen it presented this way at least that once, but I'm not sure I've seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).

But if we just exerted a lot more effort (i.e. "surprisingly much action"), the extra effort probably doesn't help much more than the initial effort, so maybe... 1 in 25? 1 in 30?

Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?

That's a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and "surprisingly much action" as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won't be very useful, vs thinking very super useful additional people will eventually jump aboard "by default".

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T01:41:41.890Z · LW · GW

I was already interpreting your comment as "if you deploy a dangerous AI system, that affects you too". I guess I'm just not sure your condition 2 is actually a key ingredient for the MAD doctrine. From the name, the start of Wikipedia's description, my prior impressions of MAD, and my general model of how it works, it seems like the key idea is that neither side wants to do the thing, because if they do the thing they get destroyed to.

The US doesn't want to nuke Russia, because then Russian nukes the US. This seems the same phenomena as some AI lab not wanting to develop and release a misaligned superintelligence (or whatever), because then the misaligned superintelligence would destroy them too. So in the key way, the analogy seems to me to hold. Which would then suggest that, however incautious or cautious society was about nuclear weapons, this analogy alone (if we ignore all other evidence) suggests we may do similar with AI. So it seems to me to suggest that there's not an important disanalogy that should update us towards expecting safety (i.e., the history of MAD for nukes should only make us expect AI safety to the extent we think MAD for nukes was handled safely).

Condition 2 does seem important for the initial step of the US developing the first nuclear weapon, and other countries trying to do so. Because it did mean that the first country who got it would get an advantage, since it could use it without being destroyed itself, at that point. And that doesn't apply for extreme AI accidents.

So would your argument instead be something like the following? "The initial development of nuclear weapons did not involve MAD. The first country who got them could use them without being itself harmed. However, the initial development of extremely unsafe, extremely powerful AI would substantially risk the destruction of its creator. So the fact we developed nuclear weapons in the first place may not serve as evidence that we'll develop extremely unsafe, extremely powerful AI in the first place."

If so, that's an interesting argument, and at least at first glance it seems to me to hold up.

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T01:30:41.949Z · LW · GW

Thanks for this reply!

Perhaps I should've been clear that I didn't expect what I was saying was things you hadn't heard. (I mean, I think I watched an EAG video of you presenting on 80k's ideas, and you were in The Precipice's acknowledgements.)

I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I've seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)

Also, re: Precipice, it's worth noting that Toby and I don't disagree much -- I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let's say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20[...] (emphasis added)

I find this quite interesting. Is this for existential risk from AI as a whole, or just "adversarial optimisation"/"misalignment" type scenarios? E.g., does it also include things like misuse and "structural risks" (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?

I'm not saying it'd be surprisingly low if it does include those things. I'm just wondering, as estimates like this are few and far between, so now that I've stumbled upon one I want to understand its scope and add it to my outside view.

Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, "there's no action from longtermists" would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I've usually not seen things presented that way.

I imagine you could also condition on something like "surprisingly much action from longtermists", which would reduce your estimated risk further?

Comment by michaela on How special are human brains among animal brains? · 2020-04-03T01:18:14.758Z · LW · GW

I think that makes sense. This seems similar to Vaniver's interpretation (if I'm interpreting the interpretation correctly). But as I mention in my reply to that comment, that looks to me like a different argument to the OP's one, and seems disjointed from "Since we shouldn’t expect to see more than one dominant species at a time".

Comment by michaela on How special are human brains among animal brains? · 2020-04-03T01:10:57.400Z · LW · GW

(Not sure the following makes sense - I think I find anthropics hard to think about.)

Interesting. This sounds to me like a reason why the anthropic principle suggests language may been harder to evolve than one might think, because we think we've got a data point of it evolving (which we do) and that this suggests it was likely to evolve by now and on Eath, but in fact it's just that we wouldn't be thinking about the question until/unless it evolved. So it could be that in the majority of cases it wouldn't have evolved (or not yet?), but we don't "observe" those.

But I thought the OP was using anthropics in the other direction, since that paragraph follows:

If language isn’t a particularly difficult cognitive capacity to acquire, why don’t we see more animal species with language? (emphasis added)

Basically, I interpreted the argument as something like "This is why the fact no other species has evolved language may be strong evidence that language is difficult." And it sounds like you're providing an interesting argument like "This is why the fact that we evolved language may not provide strong evidence that language is (relatively) easy."

Perhaps the OP was indeed doing similar, though; perhaps the idea was "Actually, it's not the case that language isn't a particularly difficult cognitive capacity to acquire."

But this all still seems disjointed from "Since we shouldn’t expect to see more than one dominant species at a time", which is true, but in context seems to imply that the argument involves the idea that we shouldn't see a second species to evolve language while we have it. Which seems like a separate matter.

Comment by michaela on How special are human brains among animal brains? · 2020-04-02T16:15:53.784Z · LW · GW

(I may be misunderstanding you or the OP. Also, I'm writing this when sleepy.)

I think that that's true. But I don't think that that's an anthropic explanation for why we got there first, or an anthropic explanation for why there's no other species with language. Instead, that argument seems itself premised on language being hard and unlikely in any given timestep. Given that, it's unlikely that two species will develop language within a few tens of thousands of years of each other. But it seems like that'd be the "regular explanation", in a sense, and seems to support that language is hard or unlikely.

It seemed like the OP was trying to make some other anthropic argument that somewhat "explains away" the apparent difficulty of language. (The OP also said "Since we shouldn’t expect to see more than one dominant species at a time", which in that context seems to imply that a second species developing language would topple us or be squashed by us and that that was important to the argument.)

This is why I said:

If this is the case, then it seems like the fact we're the only species that has mastered language remains as strong evidence as it seemed at first of the "difficulty" of mastering language (though I'm not sure how strong it is as evidence for that). (emphasis added)

Perhaps the idea is something like "Some species had to get there first. That species will be the 'first observer', in some meaningful sense. Whenever that happened, and whatever species became that first observer, there'd likely be a while in which no other species had language, and that species wondered why that was so."

But again, this doesn't seem to me to increase or decrease the strength (whatever it happens to have been) of the evidence that "the gap we've observed with no second species developing language" provides for the hypothesis "language is hard or computationally expensive or whatever to develop".

Perhaps the argument is something like that many species may be on separate pathways that will get to language, and humans just happened to get there first, and what this anthropic argument "explains away" (to some extent) is the idea that the very specific architecture of the human brain was very especially equipped for language?

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-02T05:53:10.216Z · LW · GW

I hope someone else answers your question properly, but here are two vaguely relevant things from Rob Wiblin.