Failures in technology forecasting? A reply to Ord and Yudkowsky 2020-05-08T12:41:39.371Z · score: 46 (22 votes)
Database of existential risk estimates 2020-04-20T01:08:39.496Z · score: 22 (7 votes)
[Article review] Artificial Intelligence, Values, and Alignment 2020-03-09T12:42:08.987Z · score: 13 (6 votes)
Feature suggestion: Could we get notifications when someone links to our posts? 2020-03-05T08:06:31.157Z · score: 33 (12 votes)
Understandable vs justifiable vs useful 2020-02-28T07:43:06.123Z · score: 11 (5 votes)
Memetic downside risks: How ideas can evolve and cause harm 2020-02-25T19:47:18.237Z · score: 15 (5 votes)
Information hazards: Why you should care and what you can do 2020-02-23T20:47:39.742Z · score: 15 (8 votes)
Mapping downside risks and information hazards 2020-02-20T14:46:30.259Z · score: 15 (4 votes)
What are information hazards? 2020-02-18T19:34:01.706Z · score: 26 (10 votes)
[Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? 2020-02-16T19:56:15.963Z · score: 24 (7 votes)
Value uncertainty 2020-01-29T20:16:18.758Z · score: 17 (7 votes)
Using vector fields to visualise preferences and make them consistent 2020-01-28T19:44:43.042Z · score: 40 (16 votes)
Risk and uncertainty: A false dichotomy? 2020-01-18T03:09:18.947Z · score: 3 (1 votes)
Can we always assign, and make sense of, subjective probabilities? 2020-01-17T03:05:57.077Z · score: 10 (6 votes)
MichaelA's Shortform 2020-01-16T11:33:31.728Z · score: 3 (1 votes)
Moral uncertainty: What kind of 'should' is involved? 2020-01-13T12:13:11.565Z · score: 16 (4 votes)
Moral uncertainty vs related concepts 2020-01-11T10:03:17.592Z · score: 25 (6 votes)
Morality vs related concepts 2020-01-07T10:47:30.240Z · score: 28 (7 votes)
Making decisions when both morally and empirically uncertain 2020-01-02T07:20:46.114Z · score: 14 (4 votes)
Making decisions under moral uncertainty 2019-12-30T01:49:48.634Z · score: 18 (8 votes)


Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T23:34:23.995Z · score: 1 (1 votes) · LW · GW

I actually quite like your four dot points, as summaries of some distinguishing features of these cases. (Although with Rutherford, I'd also highlight the point about whether or not the forecast is likely to reflect genuine beliefs, and perhaps more specifically whether or not a desire to mitigate attention hazards may be playing a role.)

And I think "Too many degrees of freedom to find some reason we shouldn't count them as "serious" predictions" gets at a good point. And I think it's improved my thinking on this a bit.

Overall, I think that your comment would be a good critique of this post if this post was saying or implying that these case studies provide no evidence for the sorts of claims Ord and Yudkowsky want to make. But my thesis was genuinely just that "I think those cases provide less clear evidence [not no evidence] than those authors seem to suggest". And I genuinely just aimed to "Highlight ways in which those cases may be murkier than Ord and Yudkowsky suggest" (and also separately note the sample size and representativeness points).

It wasn't the case that I was using terms like "less clear" and "may be murkier" to be polite or harder-to-criticise (in a motte-and-bailey sort of way), while in reality I harboured or wished to imply some stronger thesis; instead, I genuinely just meant what I said. I just wanted to "prod at each suspicious plank on its own terms", not utterly smash each suspicious plank, let alone bring the claims resting atop them crashing down.

That may also be why I didn't touch on what you see as the true crux (though I'm not certain, as I'm not certain I know precisely what you mean by that crux). This post had a very specific, limited scope. As I noted, "this post is far from a comprehensive discussion on the efficacy, pros, cons, and best practices for long-range or technology-focused forecasting."

To sort-of restate some things and sort-of address your points: I do think each of the cases provide some evidence in relation to the question (let's call it Q1) "How overly 'conservative' (or poorly-calibrated) do experts' quantitative forecasts of the likelihood or timelines of technology tend to be, under "normal" conditions?" I think the cases provide clearer evidence in relation to questions like how overly 'conservative' (or poorly-calibrated) do experts' forecasts of the likelihood or timelines of technology tend to be, when...

  • it seems likelier than normal that the forecasts themselves could change likelihoods or timelines
    • I'm not actually sure what we'd base that on. Perhaps unusually substantial prominence or publicity of the forecaster? Perhaps a domain in which there's a wide variety of goals that could be pursued, and which one is pursued has sometimes been decided partly to prove forecasts wrong? AI might indeed be an example; I don't really know.
  • it seems likelier than normal that the forecaster isn't actually giving their genuine forecast (and perhaps more specifically, that they're partly aiming to mitigate attention hazards)
  • cutting-edge development on the relevant tech is occurring in highly secretive or militarised ways well as questions about poor communication of forecasts by experts.

I think each of those questions other than Q1 are also important. And I'd agree that, in reality, we often won't know much about how far conditions differ from "normal conditions", or what "normal conditions" are really like (e.g., maybe forecasts are usually not genuine beliefs). These are both reasons why the "murkiness" I highlight about these cases might not be that big a deal in practice, or might do something more like drawing our attention to specific factors that should make us wary of expert predictions, rather than just making us wary in general.

In any case, I think the representativeness issue may actually be more important. As I note in footnote 4, I'd update more on these same cases (holding "murkiness" constant) if they were the first four cases drawn randomly, rather than through what I'd guess was a somewhat "biased" sampling process (which I don't mean as a loaded/pejorative term).

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T09:54:27.739Z · score: 3 (2 votes) · LW · GW

I do want to point out how small sample sizes are incredibly useful.

Yeah, I think that point is true, valuable, and relevant. (I also found How To Measure Anything very interesting and would recommend it, or at least this summary by Muehlhauser, to any readers of this comment who haven't read those yet.)

In this case, I think the issue of representativeness is more important/relevant than sample size. On reflection, I probably should've been clearer about that. I've now edited that section to make that clearer, and linked to this comment and Muehlhauser's summary post. So thanks for pointing that out!

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T09:37:41.586Z · score: 1 (1 votes) · LW · GW

Minor thing: did you mean to refer to Fermi rather than to Rutherford in that last paragraph?

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T04:14:50.866Z · score: 1 (1 votes) · LW · GW

Oh, good point, thanks! I had assumed Truman was VP for the whole time FDR was in office. I've now (a) edited the post to swap "during his years as Vice President" with "during his short time as Vice President", and (b) learned a fact I'm a tad embarrassed I didn't already know!

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T01:12:59.561Z · score: 1 (1 votes) · LW · GW

I think this comment raises some valid and interesting points. But I'd push back a bit on some points.

(Note that this comment was written quickly, so I may say things a bit unclearly or be saying opinions I haven't mulled over for a long time.)

More generally, on a strategic level there is very little difference between a genuinely incorrect forecast and one that is "correct", but communicated so poorly as to create a wrong impression in the mind of the listener.

There's at least some truth to this. But it's also possible to ask experts to give a number, as Fermi was asked. If the problem is poor communication, then asking experts to give a number will resolve at least part of the problem (though substantial damage may have been done by planting the verbal estimate in people's mind). If the problem is poor estimation, then asking for an explicit estimate might make things worse, as it could give a more precise incorrect answer for people to anchor on. (I don't know of specific evidence that people anchor more on numerical than verbal probability statements, but it seems likely me. Also, to be clear, despite this, I think I'm generally in favour of explicit probability estimates in many cases.)

If the state of affairs is such that anyone who privately believes there is a 10% chance of AGI is incentivized to instead report their assessment as "remote", the conclusion of Ord/Yudkowsky holds, and it remains impossible to discern whether AGI is imminent by listening to expert forecasts.

I think this is true if no one asks the experts for explicit numerical estimate, or if the incentives to avoid giving such estimates are strong enough that experts will refuse when asked. I think both of those conditions hold to a substantial extent in the real world and in relation to AGI, and that that is a reason why the Fermi case has substantial relevance to the AGI case. But it still seems useful to me to be aware of the distinction between failures of communication vs of estimation, as it seems we could sometimes get evidence that discriminates between which of those is occurring/common, and that which is occurring/common could sometimes be relevant.

Furthermore, and more importantly, however: I deny that Fermi's 10% somehow detracts from the point that forecasting the future of novel technologies is hard.

I definitely wasn't claiming that forecasting the future of novel technologies is easy, and I didn't interpret ESRogs as doing so either. What I was exploring was merely whether this case is a clear case of an expert's technology forecast being "wrong" (and, if so, "how wrong"), and what this reflects about the typical accuracy of expert technology forecasts. They could conceivably be typically accurate even if very very hard to make, if experts are really good at it and put in lots of effort. But I think more likely they're often wrong. The important question is essentially "how often", and this post bites off the smaller question "what does the Fermi case tell us about that".

As for the rest of the comment, I think both the point estimates and the uncertainty are relevant, at least when judging estimates (rather than making decisions based on them). This is in line with my understanding from e.g. Tetlock's work. I don't think I'd read much into an expert saying 1% rather than 10% for something as hard to forecast as an unprecedented tech development, unless I had reason to believe the expert was decently calibrated. But if they have given one of those numbers, and then we see what happens, then which number they gave makes a difference to how calibrated vs uncalibrated I should see them as (which I might then generalise in a weak way to experts more widely).

That said, I do generally think uncertainty of estimates is very important, and think the paper you linked to makes that point very well. And I do think one could easily focus too much on point estimates; e.g., I wouldn't plug Ord's existential risk estimates into a model as point estimates without explicitly representing a lot of uncertainty too.

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:41:32.825Z · score: 6 (4 votes) · LW · GW

So to summarize that case study criticism: everything you factchecked was accurate and you have no evidence of any kind that the Fermi story does not mean what O/Y interpret it as.

I find this a slightly odd sentence. My "fact-check" was literally just quoting and thinking about Ord's own footnote. So it would be very odd if that resulted in discovering that Ord was inaccurate. This connects back to the point I make in my first comment response: this post was not a takedown.

My point here was essentially that:

  • I think the main text of Ord's book (without the footnote) would make a reader think Fermi's forecast was very very wrong.
  • But in reality it is probably better interpreted as very very poorly communicated (which is itself relevant and interesting), and either somewhat wrong or well-calibrated but unlucky.

I do think the vast majority of people would think "remote possibility" means far less than 10%.

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:40:26.450Z · score: 6 (4 votes) · LW · GW

? If you are trying to make the point that technology is unpredictable, an example of a 'direct connection' and backfiring is a great example because it shows how fundamentally unpredictable things are: he could hardly have expected that his dismissal would spur an epochal discovery and that seems extremely surprising; this supports Ord & Yudkowsky, it doesn't contradict them. And if you're trying to make a claim that forecasts systematically backfire, that's even more alarming than O/Y's claims, because it means that expert forecasts will not just make a nontrivial number of errors (enough to be an x-risk concern) but will be systematically inversely correlated with risks and the biggest risks will come from the ones experts most certainly forecast to not be risks...

I think this paragraph makes valid points, and have updated in response (as well as in response to ESRogs indication of agreement). Here are my updated thoughts on the relevance of the "direct connection":

  • I may be wrong about the "direct connection" slightly weakening the evidence this case provides for Ord and Yudkowsky's claims. I still feel like there's something to that, but I find it hard to explain it precisely, and I'll take that, plus the responses from you and ESRogs, as evidence that there's less going on here than I think.
  • I guess I'd at least stand by my literal phrasings in that section, which were just about my perceptions. But perhaps those perceptions were erroneous or idiosyncratic, and perhaps to the point where they weren't worth raising.
  • That said, it also seems possible to me that, even if there's no "real" reason why a lack of direction connection should make this more "surprising", many people would (like me) erroneously feel it does. This could perhaps be why Ord writes "the very next morning" rather than just "the next morning".
  • Perhaps what I should've emphasised more is the point I make in footnote 2 (which is also in line with some of what you say):

This may not reduce the strength of the evidence this case provides for certain claims. One such claim would be that we should put little trust in experts’ forecasts of AGI being definitely a long way off, and this is specifically because such forecasts may themselves annoy other researchers and spur them to develop AGI faster. But Ord and Yudkowsky didn’t seem to be explicitly making claims like that.

  • Interestingly, Yudkowsky makes similar point in the essay this post partially responds to: "(Also, Demis Hassabis was present, so [people at a conference who were asked to make a particular forecast] all knew that if they named something insufficiently impossible, Demis would have DeepMind go and do it [and thereby make their forecast inaccurate].)" (Also, again, as I noted in this post, I do like that essay.)

  • I think that that phenomenon would cause some negative correlation between forecasts and truth, in some cases. I expect that, for the most part, that'd get largely overwhelmed by a mixture of random inaccuracies and a weak tendency towards accuracy. I wouldn't claim that, overall, "forecasts systematically backfire".

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:36:12.923Z · score: 7 (5 votes) · LW · GW

Firstly, I think I should say that this post was very much not intended as anything like a scathing takedown of Ord and Yudkowsky's claims or evidence. Nor did I mean to imply I'm giving definitive arguments that these cases provide no evidence for the claims made. I mean this to have more of a collaborative than combative spirit in relation to Ord and Yudkowsky's projects.

My aim was simply to "prod at each suspicious plank on its own terms, and update incrementally." And my key conclusion is that the authors, "in my opinion, imply these cases support their claims more clearly than they do" - not that the cases provide no evidence. It seems to me healthy to question evidence we have - even for conclusions we do still think are right, and even when our questions don't definitively cut down the evidence, but rather raise reasons for some doubt.

It's possible I could've communicated that better, and I'm open to suggestions on that front. But from re-reading the post again, especially the intro and conclusion, it does seem I repeatedly made explicit statements to this effect. (Although I did realise after going to bed last night that the "And I don’t think we should update much..." sentence was off, so I've now made that a tad clearer.)

I've split my response about the Rutherford and Fermi cases into different comments.

Of the 4 case studies you criticize, your claim actually supports them in the first one, you agree the second one is accurate, and you provide only speculations and no actual criticisms in the third and fourth.

Again, I think this sentence may reflect interpreting this post as much more strident and critical than it was really meant to be. I may be wrong about the "direct connection" thing (discussed in a separate comment), but I do think I raise plausible reasons for at least some doubt about (rather than outright dismissal of) the evidence each case provides, compared to how a reader might initially interpret them.

I'm also not sure what "only speculations and no actual criticisms" would mean. If you mean e.g. that I don't have evidence that a lot of Americans would've believed nuclear weapons would exist someday, then yes, that's true. I don't claim otherwise. But I point out a potentially relevant disanalogy between nuclear weapons development and AI development. And I point give some evidence that "the group of people who did know about nuclear weapons before the bombing of Hiroshima, or who believed such weapons may be developed soon, was (somewhat) larger than one might think from reading Yudkowsky’s essay." And I do give some evidence for that, as well as pointing out that I'm not aware of evidence either way for one relevant point.

Also, I don't really claim any of this post to be "criticism", at least in the usual fairly negative sense, just "prod[ding] at each suspicious plank on its own terms". I'm explicitly intending to make only relatively weak claims, really.

And then the "Sample size and representativeness" section provides largely separate reasons why it might not make much sense to update much on these cases (at least from a relatively moderate starting point) even ignoring those reasons for doubt. (Though see the interesting point 3 in Daniel Kokotajlo's comment.)

Comment by michaela on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-08T23:49:23.039Z · score: 7 (4 votes) · LW · GW


  1. Yes, I think that'd be very interesting. If this post could play a tiny role in prompting something like that, I'd be very happy. And that's the case whether or it supports some of Ord and Yudkowsky's stronger claims/implications (i.e., beyond just that experts are sometimes wrong about these things) - it just seems it'd be good to have some clearer data, either way. ETA: But I take this post by Muelhauser as indirect evidence that it'd be hard to do at least certain versions of this.

  2. Interesting point. I think that, if we expect AGI research to be closed during it shortly before really major/crazy AGI advances, then the nuclear engineering analogy would indeed have more direct relevance, from that point on. But it might not make the analogy stronger until those advances start happening. So perhaps we wouldn't necessarily strongly expect major surprises about when AGI development starts having major/crazy advances, but then expect a closing up and major surprises from that point on. (But this is all just about what that one analogy might suggest, and we obviously have other lines of argument and evidence too.)

  3. That's a good point; I hadn't really thought about that explicitly, and if I had I think I would've noted it in the post. But that's about how well the cases provide evidence about the likely inaccuracy of expert forecasts (or surprisingness) of the most important technology developments, or something like that. This is what Ord and Yudkowsky (and I) primarily care about in this context, as their focus when they make these claims is AGI. But they do sometimes (at least in my reading) make the claims as if they apply to technology forecasts more generally.

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-08T06:22:47.632Z · score: 1 (1 votes) · LW · GW

I've started collecting estimates of existential/extinction/similar risk from various causes (e.g., AI risk, biorisk). Do you know of a quick way I could find estimates of that nature (quantified and about extreme risks) in your spreadsheet? It seems like an impressive piece of work, but my current best idea for finding this specific type of thing in it would be to search for "%", for which there were 384 results...

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-04T04:20:56.820Z · score: 3 (2 votes) · LW · GW

I think I get what you're saying. Is it roughly the following?

"If an AI race did occur, maybe similar issues to what we saw in MAD might occur; there may well be an analogy there. But there's a disanalogy between the nuclear weapon case and the AI risk case with regards to the initial race, such that the initial nuclear race provides little/no evidence that a similar AI race may occur. And if a similar AI race doesn't occur, then the conditions under which MAD-style strategies may arise would not occur. So it might not really matter if there's an analogy between the AI risk situation if a race occurred and the MAD situation."

If so, I think that makes sense to me, and it seems an interesting/important argument. Though it seems to suggest something more like "We may be more ok than people might think, as long as we avoid an AI race, and we'll probably avoid an AI race", rather than simply "We may be more ok than people might think". And that distinction might e.g. suggest additional value to strategy/policy/governance work to avoid race dynamics, or to investigate how likely they are. (I don't think this is disagreeing with you, just highlighting a particular thing a bit more.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-04T04:13:43.472Z · score: 1 (1 votes) · LW · GW

Interesting (again!).

So you've updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an "optimist"... (which was already perhaps a tad misleading, given what the 1 in 20 was about)

(I mean, I know we're all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T05:01:28.870Z · score: 1 (1 votes) · LW · GW

That seems reasonable to me. I think what I'm thinking is that that's a disanalogy between a potential "race" for transformative AI, and the race/motivation for building the first nuclear weapons, rather than a disanalogy between the AI situation and MAD.

So it seems like this disanalogy is a reason to think that the evidene "we built nuclear weapons" is weaker evidence than one might otherwise think for the claim "we'll build dangerous AI" or the claim "we'll build AI so in an especially 'racing'/risky way". And that seems an important point.

But it seems like "MAD strategies have been used" remains however strong evidence it previously was for the claim "we'll do dangerous things with AI". E.g., MAD strategies could still serve as some evidence for the general idea that countries/institutions are sometimes willing to do things that are risky to themselves, and that pose very large negative externalities of risks to others, for strategic reasons. And that general idea still seems to apply at least somewhat to AI.

(I'm not sure this is actually disagreeing with what you meant/believe.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T04:52:30.441Z · score: 1 (1 votes) · LW · GW

Quite interesting. Thanks for that response.

And yes, this does seem quite consistent with Ord's framing. E.g., he writes "my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously." So I guess I've seen it presented this way at least that once, but I'm not sure I've seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).

But if we just exerted a lot more effort (i.e. "surprisingly much action"), the extra effort probably doesn't help much more than the initial effort, so maybe... 1 in 25? 1 in 30?

Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?

That's a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and "surprisingly much action" as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won't be very useful, vs thinking very super useful additional people will eventually jump aboard "by default".

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T01:41:41.890Z · score: 1 (1 votes) · LW · GW

I was already interpreting your comment as "if you deploy a dangerous AI system, that affects you too". I guess I'm just not sure your condition 2 is actually a key ingredient for the MAD doctrine. From the name, the start of Wikipedia's description, my prior impressions of MAD, and my general model of how it works, it seems like the key idea is that neither side wants to do the thing, because if they do the thing they get destroyed to.

The US doesn't want to nuke Russia, because then Russian nukes the US. This seems the same phenomena as some AI lab not wanting to develop and release a misaligned superintelligence (or whatever), because then the misaligned superintelligence would destroy them too. So in the key way, the analogy seems to me to hold. Which would then suggest that, however incautious or cautious society was about nuclear weapons, this analogy alone (if we ignore all other evidence) suggests we may do similar with AI. So it seems to me to suggest that there's not an important disanalogy that should update us towards expecting safety (i.e., the history of MAD for nukes should only make us expect AI safety to the extent we think MAD for nukes was handled safely).

Condition 2 does seem important for the initial step of the US developing the first nuclear weapon, and other countries trying to do so. Because it did mean that the first country who got it would get an advantage, since it could use it without being destroyed itself, at that point. And that doesn't apply for extreme AI accidents.

So would your argument instead be something like the following? "The initial development of nuclear weapons did not involve MAD. The first country who got them could use them without being itself harmed. However, the initial development of extremely unsafe, extremely powerful AI would substantially risk the destruction of its creator. So the fact we developed nuclear weapons in the first place may not serve as evidence that we'll develop extremely unsafe, extremely powerful AI in the first place."

If so, that's an interesting argument, and at least at first glance it seems to me to hold up.

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-03T01:30:41.949Z · score: 3 (2 votes) · LW · GW

Thanks for this reply!

Perhaps I should've been clear that I didn't expect what I was saying was things you hadn't heard. (I mean, I think I watched an EAG video of you presenting on 80k's ideas, and you were in The Precipice's acknowledgements.)

I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I've seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)

Also, re: Precipice, it's worth noting that Toby and I don't disagree much -- I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let's say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20[...] (emphasis added)

I find this quite interesting. Is this for existential risk from AI as a whole, or just "adversarial optimisation"/"misalignment" type scenarios? E.g., does it also include things like misuse and "structural risks" (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?

I'm not saying it'd be surprisingly low if it does include those things. I'm just wondering, as estimates like this are few and far between, so now that I've stumbled upon one I want to understand its scope and add it to my outside view.

Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, "there's no action from longtermists" would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I've usually not seen things presented that way.

I imagine you could also condition on something like "surprisingly much action from longtermists", which would reduce your estimated risk further?

Comment by michaela on How special are human brains among animal brains? · 2020-04-03T01:18:14.758Z · score: 1 (1 votes) · LW · GW

I think that makes sense. This seems similar to Vaniver's interpretation (if I'm interpreting the interpretation correctly). But as I mention in my reply to that comment, that looks to me like a different argument to the OP's one, and seems disjointed from "Since we shouldn’t expect to see more than one dominant species at a time".

Comment by michaela on How special are human brains among animal brains? · 2020-04-03T01:10:57.400Z · score: 1 (1 votes) · LW · GW

(Not sure the following makes sense - I think I find anthropics hard to think about.)

Interesting. This sounds to me like a reason why the anthropic principle suggests language may been harder to evolve than one might think, because we think we've got a data point of it evolving (which we do) and that this suggests it was likely to evolve by now and on Eath, but in fact it's just that we wouldn't be thinking about the question until/unless it evolved. So it could be that in the majority of cases it wouldn't have evolved (or not yet?), but we don't "observe" those.

But I thought the OP was using anthropics in the other direction, since that paragraph follows:

If language isn’t a particularly difficult cognitive capacity to acquire, why don’t we see more animal species with language? (emphasis added)

Basically, I interpreted the argument as something like "This is why the fact no other species has evolved language may be strong evidence that language is difficult." And it sounds like you're providing an interesting argument like "This is why the fact that we evolved language may not provide strong evidence that language is (relatively) easy."

Perhaps the OP was indeed doing similar, though; perhaps the idea was "Actually, it's not the case that language isn't a particularly difficult cognitive capacity to acquire."

But this all still seems disjointed from "Since we shouldn’t expect to see more than one dominant species at a time", which is true, but in context seems to imply that the argument involves the idea that we shouldn't see a second species to evolve language while we have it. Which seems like a separate matter.

Comment by michaela on How special are human brains among animal brains? · 2020-04-02T16:15:53.784Z · score: 1 (1 votes) · LW · GW

(I may be misunderstanding you or the OP. Also, I'm writing this when sleepy.)

I think that that's true. But I don't think that that's an anthropic explanation for why we got there first, or an anthropic explanation for why there's no other species with language. Instead, that argument seems itself premised on language being hard and unlikely in any given timestep. Given that, it's unlikely that two species will develop language within a few tens of thousands of years of each other. But it seems like that'd be the "regular explanation", in a sense, and seems to support that language is hard or unlikely.

It seemed like the OP was trying to make some other anthropic argument that somewhat "explains away" the apparent difficulty of language. (The OP also said "Since we shouldn’t expect to see more than one dominant species at a time", which in that context seems to imply that a second species developing language would topple us or be squashed by us and that that was important to the argument.)

This is why I said:

If this is the case, then it seems like the fact we're the only species that has mastered language remains as strong evidence as it seemed at first of the "difficulty" of mastering language (though I'm not sure how strong it is as evidence for that). (emphasis added)

Perhaps the idea is something like "Some species had to get there first. That species will be the 'first observer', in some meaningful sense. Whenever that happened, and whatever species became that first observer, there'd likely be a while in which no other species had language, and that species wondered why that was so."

But again, this doesn't seem to me to increase or decrease the strength (whatever it happens to have been) of the evidence that "the gap we've observed with no second species developing language" provides for the hypothesis "language is hard or computationally expensive or whatever to develop".

Perhaps the argument is something like that many species may be on separate pathways that will get to language, and humans just happened to get there first, and what this anthropic argument "explains away" (to some extent) is the idea that the very specific architecture of the human brain was very especially equipped for language?

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-02T05:53:10.216Z · score: 1 (1 votes) · LW · GW

I hope someone else answers your question properly, but here are two vaguely relevant things from Rob Wiblin.

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-02T05:42:08.565Z · score: 1 (1 votes) · LW · GW
You could imagine a situation where for some reason the US and China are like, “Whoever gets to AGI first just wins the universe.” And I think in that scenario maybe I’m a bit worried, but even then, it seems like extinction is just worse, and as a result, you get significantly less risky behavior? But I don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.

My interpretation of what Rohin is saying there is:

  • 1) Extinction is an extremely bad outcome.
  • 2) It's much worse than 'losing' an international competition to 'win the universe'.
  • 3) Countries/institutions/people will therefore be significantly inclined to avoid risking extinction, even if doing so would increase the chances of 'winning' an international competition to 'win the universe'.

I agree with claim 1.

I agree with some form of claim 3, in that:

  • I think the badness of extinction will reduce the risks people are willing to take
  • I also "don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning."
  • But I don't think the risks will be reduced anywhere near as much as they should be. (That said, I also believe that odds are in favour of things "going well by default", just not as much in favour of that as I'd like).

This is related to my sense that claim 2 is somewhat tricky/ambiguous. Are we talking about whether it is worse, or whether the relevant actors will perceive it as worse? One common argument for why existential risks are neglected is that it's basically a standard market failure. The vast majority of the harm from x-risks are externalities, and x-risk reduction is a global public good. Even if we consider deaths/suffering in the present generation, even China and India absorb less than half of that "cost", and most countries absorb less than 1% of them. And I believe most people focused on x-risk reduction are at least broadly longtermist, so they'd perceived the overwhelming majority of the costs to be to future generations, and thus also externalities.

So it seems like, unless we expect the relevant actors to act in accordance with something close to impartial altruism, we should expect them to avoid risks somewhat to avoid existential risks (or extinction specifically), but far less than they really should. (Roughly this argument is made in The Precipice, and I believe by 80k.)

(Rohin also discusses right after that quote why he doesn't "think that differences in who gets to AGI first are going to lead to you win the universe or not", which I do think somewhat bolsters the case for claim 2.)

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-02T05:40:31.463Z · score: 1 (1 votes) · LW · GW

Interesting interview, thanks for sharing it!

Asya Bergal: It seems like people believe there’s going to be some kind of pressure for performance or competitiveness that pushes people to try to make more powerful AI in spite of safety failures. Does that seem untrue to you or like you’re unsure about it?
Rohin Shah: It seems somewhat untrue to me. I recently made a comment about this on the Alignment Forum. People make this analogy between AI x-risk and risk of nuclear war, on mutually assured destruction. That particular analogy seems off to me because with nuclear war, you need the threat of being able to hurt the other side whereas with AI x-risk, if the destruction happens, that affects you too. So there’s no mutually assured destruction type dynamic.

I find this statement very confusing. I wonder if I'm misinterpreting Rohin. Wikipedia says "Mutual(ly) assured destruction (MAD) is a doctrine of military strategy and national security policy in which a full-scale use of nuclear weapons by two or more opposing sides would cause the complete annihilation of both the attacker and the defender (see pre-emptive nuclear strike and second strike)."

A core part of the idea of MAD is that the destruction would be mutual. So "with AI x-risk, if the destruction happens, that affects you too" seems like a reason why MAD is a good analogy, and why the way we engaged in MAD might suggest people would engage in similar brinkmanship or risks with AI x-risk, even if the potential for harm to people's "own side" would be extreme. There are other reasons why the analogy is imperfect, but the particular feature Rohin mentions seems like a reason why an analogy could be drawn.

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-02T04:20:50.710Z · score: 4 (3 votes) · LW · GW

FWIW I think I've only ever heard "nuclear arms race" used to refer to the buildup of more and more weapons, more advancements, etc., not a race to create the first nuclear weapon. And the Wikipedia article by that name opens with:

The nuclear arms race was an arms race competition for supremacy in nuclear warfare between the United States, the Soviet Union, and their respective allies during the Cold War.

This page uses the phrase 'A "Race" for the bomb' (rather than "nuclear arms race") to describe the US and Nazi Germany's respective efforts to create the first nuclear weapon. My impression is that this "race" was a key motivation in beginning the Manhattan Project and in the early stages, but I'm not sure to what extent that "race" remained "live" and remained a key motivation for the US (as opposed the US just clearly being ahead, and now being motivated by having invested a lot and wanting a powerful weapon to win the war sooner). That page says "By 1944, however, the evidence was clear: the Germans had not come close to developing a bomb and had only advanced to preliminary research."

Comment by michaela on Rohin Shah on reasons for AI optimism · 2020-04-01T11:23:54.061Z · score: 2 (2 votes) · LW · GW

That makes sense. Though if AI Impacts does more conversations like these in future, I’d be very interested in listening to them via a podcast app.

Comment by michaela on How special are human brains among animal brains? · 2020-04-01T09:51:46.247Z · score: 1 (1 votes) · LW · GW

A source you may find interesting or useful for this topic is the Open Philanthropy Project's 2017 Report on Consciousness and Moral Patienthood. Obviously the primary topic of that is different, but it contains some discussion of the general discussion/sophistication of nonhuman animals compared to humans, including specifically in relation to language (e.g., in this section). Although you may have already encountered the most relevant sources drawn on in that report.

Comment by michaela on How special are human brains among animal brains? · 2020-04-01T09:47:51.977Z · score: 5 (3 votes) · LW · GW

Interesting post.

One possibility is that the first species that masters language, by virtue of being able to access intellectual superpowers inaccessible to other animals, has a high probability of becoming the dominant species extremely quickly. (Humans underwent the agricultural revolution within 50,000 years of behavioral modernity—a blink of an eye on evolutionary timescales—after which their dominance as a species became unquestionable.) Since we shouldn’t expect to see more than one dominant species at a time, this would imply a simple anthropic argument for our unique capacities for language: we shouldn’t expect to see more than one species at a time with mastery of language, and we just happen to be the species that made it there first.

I agree with the first two sentences of that passage, but I'm not sure I see the logic behind the third sentence. Depending on how we define a "dominant species", perhaps we necessarily can only see one at a time, or should expect to only see one at a time. But the prior sentences were about how the first species to master language will become dominant. If another species now mastered language, we'd have a very strong lead on them in terms of cultural institutions and technology, so it seems exceedingly unlikely that they'd become dominant. So on that front, it seems like we could see another species master language, without anthropic issues arising.

The other question is whether we'd allow another species to master language. I've never considered this question before, but my guess is that we would. From examples so far where individual animals have appeared to get a handle on aspects of language, people seem fascinated and delighted, rather than afraid that we'll be overthrown. And species that are able to at least imitate human communication, like parrots, seem to often be kept as pets specifically for that ability, because some humans enjoy it.

So I'd guess that if we discovered that another species was mastering language, we'd become fascinated and/or delighted, and study them a lot, and make extra efforts to preserve them if necessary (e.g., if they were endangered). I think we'd quite reasonably not be afraid, because that species abilities, culture, power, etc. be so far behind ours. I think if that species started becoming especially capable, we might limit their advancements or even wipe them out, but that would likely happen years to millennia after mastery of language, not immediately.

This means that it seems to me totally plausible that a dominant species could witness another species coming to gradually master language, without any anthropic issues arising, because neither species is necessarily wiped out in the process. If this is the case, then it seems like the fact we're the only species that has mastered language remains as strong evidence as it seemed at first of the "difficulty" of mastering language (though I'm not sure how strong it is as evidence for that).

Is there a way I'm misinterpreting you or missing something?

Comment by michaela on My current framework for thinking about AGI timelines · 2020-04-01T09:30:19.565Z · score: 11 (5 votes) · LW · GW

Interesting post - I look forward to reading the rest of this series! (Have you considered making it into a "sequence"?)

Summary of my comment: It seems like this post lists variables that should inform views on how hard developing an AGI will be, but omits variables that should inform views on how much effort will be put into that task at various points, and how conducive the environment will be to those efforts. And it seems to me that AGI timelines are a function of all three of those high-level factors.

(Although note that I'm far from being an expert on AI timelines myself. I'm also not sure if the effort and conduciveness factors can be cleanly separated.)

Detailed version: I was somewhat surprised to see that the "background variables" listed seemed to all be fairly focused on things like neuroscience/biology, without any seeming focused on other economic, scientific, or cultural trends that might impact AI R&D or its effectiveness. By the latter, I mean things like (I spitballed these quickly just now, and some might overlap somewhat):

  • whether various Moore's-law-type trends will continue, or slow down, or speed up, and when
    • relatedly, whether there'll be major breakthroughs in technologies other than AI which feed into (or perhaps reduce the value of) AI R&D
  • whether investment (including e.g. government funding) in AI R&D will increase, decrease, or remain roughly constant
  • whether we'll see a proliferation of labs working on "fundamental" AI research, or a consolidation, or not much change
  • whether there'll be government regulation on AI research that slows down research, and how much this slows it down
  • whether AI will come to be strongly seen as a key military technology, and/or governments nationalise AI labs, and/or governments create their own major AI labs
  • whether there'll be another "AI winter"

I don't have any particular reason to believe that views on those specific things I've mentioned would do a better job at explaining disagreements about AGI timelines than the variables mentioned in this post would. Perhaps most experts already agree about the things I mentioned, or see them as not very significant. But I'd at least guess that there are things along those lines which either do or should inform views on AGI timelines.

I'd also guess that factors like those I've listed would seem increasingly important as we consider increasingly long timelines, and as we consider "slow" or "moderate" takeoff scenarios (like the scenarios in what failure looks like). E.g., I doubt there'd be huge changes in interest in, funding for, or regulation of AI over the next 10 years (though it's very hard to say), if AI doesn't become substantially more influential over that time. But over the next 50 years, or if we start seeing major impacts of AI before we reach something like AGI, it seems easy to imagine changes in those factors occurring.

Comment by michaela on Robin Hanson on the futurist focus on AI · 2020-04-01T06:33:02.047Z · score: 1 (1 votes) · LW · GW

From the transcript:

Robin Hanson: Well, even that is an interesting thing if people agree on it. You could say, “You know a lot of people who agree with you that AI risk is big and that we should deal with something soon. Do you know anybody who agrees with you for the same reasons?”
It’s interesting, so I did a poll, I’ve done some Twitter polls lately, and I did one on “Why democracy?” And I gave four different reasons why democracy is good. And I noticed that there was very little agreement, that is, relatively equal spread across these four reasons. And so, I mean that’s an interesting fact to know about any claim that many people agree on, whether they agree on it for the same reasons. And it would be interesting if you just asked people, “Whatever your reason is, what percentage of people interested in AI risk agree with your claim about it for the reason that you do?” Or, “Do you think your reason is unusual?”
Because if most everybody thinks their reason is unusual, then basically there isn’t something they can all share with the world to convince the world of it. There’s just the shared belief in this conclusion, based on very different reasons. And then it’s more on their authority of who they are and why they as a collective are people who should be listened to or something.

I think there's something to this idea. It also reminds me of the principle that one should beware surprising and suspicious convergence, as well as of the following passage from Richard Ngo:

What should we think about the fact that there are so many arguments for the same conclusion? As a general rule, the more arguments support a statement, the more likely it is to be true. However, I’m inclined to believe that quality matters much more than quantity - it’s easy to make up weak arguments, but you only need one strong one to outweigh all of them. And this proliferation of arguments is (weak) evidence against their quality: if the conclusions of a field remain the same but the reasons given for holding those conclusions change, that’s a warning sign for motivated cognition (especially when those beliefs are considered socially important). This problem is exacerbated by a lack of clarity about which assumptions and conclusions are shared between arguments, and which aren’t.

(I don't think those are the points Robin Hanson is making there, but they seem somewhat related.)

But I think another point should be acknowledged, which is that it seems at least possible that a wide range of people could actually "believe in" the exact same set of arguments, yet all differ in which argument they find most compelling. E.g., you can only vote for one option in a Twitter poll, so it might be that all of Hanson's followers believed in all four reasons why democracy is good, but just differed in which one seemed strongest to them.

Likewise, the answer to “Whatever your reason is, what percentage of people interested in AI risk agree with your claim about it for the reason that you do?” could potentially be "very low" even if a large percentage of people interested in AI risk would broadly accept that reason, or give it pretty high credence, because it might not be "the reason" they agree with that claim about it.

It's sort of like there might appear to be strong divergence of views in terms of the "frontrunner" argument, whereas approval voting would indicate that there's some subset of arguments that are pretty widely accepted. And that subset as a collective may be more important to people than the specific frontrunner they find most individually compelling.

Indeed, AI risk seems to me like precisely the sort of topic where I'd expect it to potentially make sense for people to find a variety of somewhat related, somewhat distinct arguments somewhat compelling, and not be 100% convinced by any of them, but still see them as adding up to good reason to pay significant attention to the risks. (I think that expectation of mine is very roughly based on how hard it is to predict the impacts of a new technology, but us having fairly good reason to believe that AI will eventually have extremely big impacts.)

“Do you think your reason is unusual?” seems like it'd do a better job than the other question for revealing whether people really disagree strongly about the arguments for the views, rather than just about which particular argument seems strongest. But I'm still not certain it would do that. I think it'd be good to explicitly ask questions about what argument people find most compelling, and separately what arguments they see as substantially plausible and that inform their overall views at least somewhat.

Comment by michaela on Robin Hanson on the futurist focus on AI · 2020-04-01T06:12:28.083Z · score: 1 (1 votes) · LW · GW
It would be interesting if you went into more detail on how long-termists should allocate their resources at some point; what proportion of resources should go into which scenarios, etc. (I know that you've written a bit on such themes.)

That was also probably my main question when listening to this interview.

I also found it interesting to hear that statement you quoted now that The Precipice has been released, and now that there are two more books on the horizon (by MacAskill and Sandberg) that I believe are meant to be broadly on longtermism but not specifically on AI. The Precipice has 8 chapters, with roughly a quarter of 1 chapter specifically on AI, and a bunch of other scenarios discussed, so it seems quite close to what Hanson was discussing. Perhaps at least parts of the longtermist community have shifted (were already shifting?) more towards the sort of allocation of attention/resources that Hanson was envisioning.

I share the view that research on the supposed "crying wolf effect" would be quite interesting. I think its results have direct implications for longtermist/EA/x-risk strategy and communication.

Comment by michaela on Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk” · 2020-03-31T04:58:54.860Z · score: 5 (3 votes) · LW · GW

This is great - thanks for posting it!

A possibly useful framing for future investigations might be, “what level of capability would an AGI need to achieve in a crucial capability in order to be dangerous”, where the definition of “dangerous” is free to vary based on how serious of a risk we are concerned about. One complication here is that this is a highly contextual question – with a superintelligence we can assume that the AGI may get basically omnipotent, but such a simplifying assumption won’t help us here. For example, the level of offensive biowarfare capability that would pose a major risk, depends on the level of the world’s defensive biowarfare capabilities. Also, we know that it’s possible to inflict enormous damage to humanity even with just human-level intelligence: whoever is authorized to control the arsenal of a nuclear power could trigger World War III, no superhuman smarts needed.

That made me properly realise something that I now feel should've been blindingly obvious to me already: Work to reduce humanity's/civilization's "vulnerabilities" in general may also help with a range of global catastrophic or existential risk scenarios where AI risk is the "trigger".

I imagine I must've already been aware of that in some sense, and I think it's implicit in various other things such as discussions of how AI could interact with e.g. nuclear weapons tech. But I don't think I'd previously thought explicitly about situations in which risk from an agenty AI pursuing its own goals (rather than e.g. AI just making automated launch decisions) could be exacerbated or mitigated by e.g. work on biorisk, because other major risks could be what the AI would harmfully use as "tools".

I'm not sure whether or not this should actually lead to a substantial update in e.g. how valuable I think biorisk work or work to reduce numbers of nuclear weapons is. But I'm glad to at least have that question explicitly in mind now.

Comment by michaela on A model I use when making plans to reduce AI x-risk · 2020-03-31T03:17:39.647Z · score: 1 (1 votes) · LW · GW

(Very late to this thread)

unlike the case of the nuclear war where the quality of the threat was visible to politicians and the public alike - alignment seems to be a problem which not even all AI researchers understand is worth mentioning. That in itself probably excludes the possibility of a direct political solution.

The failure to recognise/understand/appreciate the problem does seem an important factor. And if it were utterly unchangeable, maybe that would mean all efforts need to just go towards technical solutions. But it's not utterly unchangeable; in fact, it's a key variable which "political" (or just "not purely technical") efforts could intervene on to reduce AI x-risk.

E.g., a lot of EA movement building, outreach by AI safety researchers, Stuart Russell's book Human-Compatible, etc., is partly targeting at getting more AI researchers (and/or the broader public or certain elites like politicians) to recognise/understand/appreciate the problem. And by doing so, it could have other benefits like increasing the amount of technical work on AI safety, influencing policies that reduce risks of different AI groups rushing to the finish line and compromising on safety, etc.

I think this was actually somewhat similar in the case of nuclear war. Of course, the basic fact that nuclear weapons could be very harmful is a lot more obvious than the fact AGI/superintelligence could be very harmful. But the main x-risk from nuclear war is nuclear winter, and that's not immediately obvious - it requires some quite modelling, and is something unlike things people have seen in their lifetimes, really. And according to Toby Ord in The Precipice (page 65):

The discovery that atomic weapons may trigger a nuclear winter influenced both Ronald Reagan and Mikhail Gorbachev to reduce their country's arms to avoid war.

So in that case, technical work on understanding the problem was communicated to politicians (this communication being a non-technical intervention), and helped make the potential harms clearer to politicians, which helped lead to a political (partial) solution.

Basically, I think that technical and non-technical interventions are often intertwined or support each other, and that we should see current levels of recognition that AI risk is a big deal as something we can and should intervene to change, not as something fixed.

Comment by michaela on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2020-03-30T12:11:27.276Z · score: 1 (1 votes) · LW · GW

I have no citation for that being a big deal. But there's some discussion of the matter (which I haven't read) in the comments on this post, and it was also discussed on an episode of the 80k podcast:

Paul Christiano: I think the current state of the literature on carbon dioxide and cognition is absurd. I probably complained about this last time I was here.


Robert Wiblin: Yes, talk about the carbon dioxide one for a minute because this is one that’s also been driving me mad the last few months just to see that carbon dioxide potentially has enormous effects on people’s intelligence and in offices but you eventually just have extremely– And lecture halls especially just have potentially incredibly elevated CO2 levels that are dumbing us all down when we most need to be smart.

Paul Christiano: Yes. I reviewed the literature a few years ago and I’ve only been paying a little bit of attention since then, but I think the current state of play is, there was one study with preposterously large effect sizes from carbon dioxide in which the methodology was put people in rooms, dump some gas into all the rooms. Some of the gases were very rich in carbon dioxide and the effect sizes were absurdly large.

They were like, if you compare it to the levels of carbon dioxide that occur in my house or in the house I just moved out of, the most carbon dioxide-rich bedroom in that house had one standard deviation effect amongst Berkeley students on this test or something, which is absurd. That’s totally absurd. That’s almost certainly–

Robert Wiblin: It’s such a large effect that you should expect that people, when they walk into a room with carbon dioxide which has elevated carbon dioxide levels, they should just feel like idiots at that point or they should feel like noticeably dumber in their own minds.

Paul Christiano: Yes, you would think that. To be clear, the rooms that have levels that high, people can report it feels stuffy and so part of the reason that methodology and the papers like just dumping in carbon dioxide is to avoid like if you make a room naturally that CO2 rich, it’s going to also just be obvious that you’re in the intervention group instead of the control.

Although to be fair, even if I don’t know, at that point, like even a placebo effect maybe will do something. I think almost certainly that seems wrong to me. Although maybe this is not a good thing to be saying publicly on a podcast. There’s a bunch of respected researchers on that paper. Anyway, it would be great to see a replication of that. There was subsequently replication with exactly the same design which also had p = 0.0001.

Now, we’ve got the two precise replications with p = 0.0001. That’s where we’re at. Also the effects are stupidly large. So large. You really, really need to care about ventilation effects. This room probably is, this is madness. Well, this building is pretty well ventilated but still, we’re at least a third of a standard deviation dumber.

Robert Wiblin: Yes, I’m sure dear listeners you can hear us getting dumber over the course of this conversation as we fill this room with poison. Yes, I guess potentially the worst case would be in meeting rooms or boardrooms where people are having very long– Yes prolonged discussions about difficult issues. They’re just getting progressively dumber as the room fills up with carbon dioxide and it’s going to be more irritable as well.

Paul Christiano: Yes, it would be pretty serious and I think that people have often cited this in attempts to improve ventilation, but I think people do not take it nearly as seriously as they would have if they believed it. Which I think is right because I think it’s almost certainly, the effect is not this large. If it was this large, you’d really want to know and then–

Robert Wiblin: This is like lead poisoning or something?

Paul Christiano: Yes, that’s right.

Robert Wiblin: Well, this has been enough to convince me to keep a window open whenever I’m sleeping. I really don’t like sleeping in a room that has no ventilation or no open door or window. Maybe I just shouldn’t worry because at night who really cares how smart I’m feeling while I’m dreaming?

Paul Christiano: I don’t know what’s up. I also haven’t looked into it as much as maybe I should have. I would really just love to be able to stay away, it’s not that hard. The facts are large enough but it’s also short term enough to just like extremely easy to check. In some sense, it’s like ”What are you asking for, there’s already been a replication”, though, I don’t know, the studies they use are with these cognitive batteries that are not great.

If the effects are real you should be able to detect them in very– Basically with any instrument. At some point, I just want to see the effect myself. I want to actually see it happen and I want to see the people in the rooms.

Robert Wiblin: Seems like there’s a decent academic incentive to do this, you’d think, because you’d just end up being famous if you pioneer this issue that turns out to be extraordinarily important and then causes buildings to be redesigned. I don’t know, it could just be a big deal. I mean, even if you can’t profit from it in a financial sense, wouldn’t you just want the kudos for like identifying this massive unrealized problem?

Paul Christiano: Yes, I mean to be clear, I think a bunch of people work on the problem and we do have– At this point there’s I think there’s the original– The things I’m aware of which is probably out of date now is the original paper, a direct replication and a conceptual replication all with big looking effects but all with slightly dicey instruments. The conceptual replication is funded by this group that works on ventilation unsurprisingly.

Robert Wiblin: Oh, that’s interesting.

Paul Christiano: Big air quality. Yes, I think that probably the take of academics, insofar as there’s a formal consensus process in academia, I think it would be to the effect that this is real, it’s just that no one is behaving as if the effect of that size actually existed and I think they’re right to be skeptical of the process, in academia. I think that does make– The situation is a little bit complicated in terms of what you exactly get credit for.

I think people that would get credit should be and rightfully would be the people who’ve been investigating it so far. This is sort of more like checking it out more for– Checking it out for people who are skeptical. Although everyone is implicitly skeptical given how much they don’t treat it like an emergency when carbon dioxide levels are high.

Robert Wiblin: Yes, including us right now. Well, kudos to you for funding that creatine thing [discussed elsewhere in the episode]. It would be good if more people took the initiative to really insist on funding replications for issues that seemed important where they’re getting neglected.

Paul Christiano: Yes, I think a lot of it’s great– I feel like there are lots of good things for people to do. I feel like people are mostly at the bottleneck just like people who have the relevant kinds of expertise and interests. This is one category where I feel people could go far and I’m excited to see how that goes.


(Not quoting people anymore)

That's all the knowledge I have on the matter.

But I'll just add that I'm quite skeptical about the suggestion that "we might imagine that our hopes of finding better carbon sequestration technologies after that dumbing point may plummet." It seems like it's unclear whether increased CO2 leads to e.g. a several IQ point drop. And then on top of that it's also not clear to me that, if it did that globally (which would definitely be a very big deal), that would cause a "plummeting" in our chances of finding some particular tech. (Though I guess it might.)

Comment by michaela on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2020-03-30T11:51:15.067Z · score: 10 (3 votes) · LW · GW

I might be misunderstanding you, but I feel like this is sort of missing a key point. It seems like there could be situations in which the AI does indeed, as you point out, require "a bunch of safeguards to stop it destroying *itself*", in order to advance to a high level of capabilities. These could be built by its engineers, or developed by the AI itself, perhaps through trial and error.

But that doesn't seem to mean it'd have safeguards to not destroy other things we value, or in some more abstract sense "destroy" our future potential (e.g., by colonising space and "wasting" the resources optimising for something that we don't/barely care about, even if it doesn't harm anything on Earth). It seems possible for an AI to get safeguards like how to not have its robotic manifestation jump off things too high or disassemble itself, and thereby be "safe enough" itself to become more capable, but to not have the sort of "safeguards" that e.g. Russell cares about.

Indeed, this seems to related to the core point of ideas like instrumental convergent subgoals and differential progress. We or the AI might get really good at building its capabilities and building safeguards that allow it to become more capable or avoid harm to itself or its own current "goals", without necessarily getting good at building safeguards to protect "what we truly value".

But here's two things you might have meant that would be consistent with what I've said:

  • It is only when you expect a system to radically gain capability without needing any safeguards to protect a particular thing that it makes sense to expect there to be a dangerous AI created by a team with no experience of safe guards to protect that particular thing or how to embed them. This may inform LeCun's views, if he's focusing on safeguards for the AI's own ability to operate in the world, since these will have to be developed in order for the AI to become more capable. But Russell may be focusing on the fact that a system really could radically gain capability without needing safeguards to protect what we value.
  • It is only when you expect a system to radically gain capability without needing any safeguards of any type, does it makes sense to expect there to be a dangerous AI created by a team with no experience of safeguards in general or how to embed them. Since AI designers will have to learn how to develop and embed some types of safeguard, they're likely to pick up general skills for that, which could then also be useful for building safeguards to protect what we value.

If what you meant is the latter, then I don't think I'm comfortable resting on the assumption that lessons from developing/embedding "capability safeguards" (so to speak) will transfer to a high degree to "safety safeguards". Although I haven't looked into it a great deal.

Is one of those things what you meant?

Comment by michaela on MichaelA's Shortform · 2020-03-30T10:42:21.377Z · score: 7 (4 votes) · LW · GW

Collection of discussions of epistemic modesty, "rationalist/EA exceptionalism", and similar

These are currently in reverse-chronological order.

Epistemic learned helplessness - Scott Alexander, 2019

AI, global coordination, and epistemic humility - Jaan Tallinn, 2018

In defence of epistemic modesty - Greg Lewis, 2017

Inadequate Equilibria - Eliezer Yudkowsky, 2017

Common sense as a prior - Nick Beckstead, 2013

From memory, I think a decent amount of Rationality: A-Z by Eliezer Yudkowsky is relevant

Philosophical Majoritarianism - Hal Finney, 2007

Somewhat less relevant/substantial

This comment/question - Michael Aird (i.e., me), 2020

Naming Beliefs - Hal Finney, 2008

Likely relevant, but I'm not yet sure how relevant as I haven't yet read it

Are Disagreements Honest? - Cowen & Hanson, 2004

Uncommon Priors Require Origin Disputes - Robin Hanson, 2006

Aumann's agreement theorem - Wikipedia

I intend to add to this list over time. If you know of other relevant work, please mention it in a comment.

Comment by michaela on What can the principal-agent literature tell us about AI risk? · 2020-03-30T09:47:54.811Z · score: 1 (1 votes) · LW · GW

(This rambly comment is offered in the spirit of Socratic grilling.)

I hadn't noticed I should be confused about the agency rent vs monopoly rent distinction till I saw Wei Dai's comment, but now I realise I'm confused. And the replies don't seem to clear it up for me. Tom wrote:

Re the difference between Monopoly rents and agency rents: monopoly rents would be eliminated by competition between firms whereas agency rents would be eliminated by competition between workers. So they're different in that sense.

That's definitely one way in which they're different. Is that the only way? Are they basically the same concept, and it's just that you use one label (agency rents) when focusing on rents the worker can extract due to lack of competition between workers, and the other (monopoly rents) when focusing on rents the firms can extract due to lack of competition between firms? But everything is the same on an abstract/structural level?

Could we go a little further, and in fact describe the firm as an agent, with consumers as its principal? The agent (the firm) can extract agency rents to the extent that (a) its activities at least somewhat align with those of the principal (e.g., it produces a product that the public prefers to nothing, and that they're willing to pay something for), and (b) there's limited competition (e.g., due to a patent). I.e., are both types rents due to one actor (a) optimising for something other than what the other actors wants, and (b) being able to get away with it?

That seems consistent with (but not stated in) most of the following quote from you:

Re monopoly rents vs agency rents: Monopoly rents refer to the opposite extreme with very little competition, and in the economics literature is used when talking about firms, while agency rents are present whenever competition and monitoring are imperfect. Also, agency rents refer specifically to the costs inherent to delegating to an agent (e.g. an agent making investment decisions optimising for commission over firm profit) vs the rents from monopoly power (e.g. being the only firm able to use a technology due to a patent). But as you say, it's true that lack of competition is a cause of both of these.

What my proposed framing seems to not account for is that discussion of agency rents involves mention of imperfect monitoring as well as imperfect competition. But I think I share Wi Dai's confusion there. If the principal had no other choice (i.e., there's no competition), then even with perfect monitoring, wouldn't there still be agency rents, as long as the agent is optimising for something at least somewhat correlated with the principal's interests? Is it just that imperfect monitoring increases how much the agent can "get away with", at any given level of correlation between its activities and the principal's interests?

And could we say a similar thing for monopoly rents - e.g., a monopolistic firm, or one with little competition, may be able to extract somewhat more rents if it's especially hard to tell how valuable its product is in advance?

Note that I don't have a wealth of econ knowledge and didn't take the option of doing a bunch of googling to try to figure this out for myself. No one is obliged to placate my lethargy with a response :)

Comment by michaela on What can the principal-agent literature tell us about AI risk? · 2020-03-30T09:28:13.208Z · score: 1 (1 votes) · LW · GW

Very interesting post!

Furthermore, if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs; so we should expect contracts to be enforceable conditional upon AIs being used.

I could easily be wrong, but this strikes me as a plausible but debatable statement, rather than a certainty. It seems like more argument would be required even to establish that it's likely, and much more to establish we can say "people will promptly realise..." It also seems like that statement is sort of assuming part of precisely what's up for debate in these sorts of discussions.

Some fragmented thoughts that feed into those opinions:

  • As you note just before that: "The assumption [of contract enforceability] isn’t plausible in pessimistic scenarios where human principals and institutions are insufficiently powerful to punish the AI agent, e.g. due to very fast take-off." So the Bostrom/Yudkowsky scenario is precisely one in which contracts aren't enforceable, for very similar reasons to why that scenario could lead to existential catastrophe.
  • Very relatedly - perhaps this is even just the same point in different words - you say "then people will promptly realise and stop using AIs". This assumes some possibility of at least some trial-and-error, and thus assumes that there'll be neither a very discontinuous capability jump towards decisive strategic advantage, nor deception followed by a treacherous turn.
  • As you point out, Paul Christiano's "Part 1" scenario might be one in which all or most humans are happy, and increasingly wealthy, and don't have motivation to stop using the AIs. You quote him saying "humans are better off in absolute terms unless conflict leaves them worse off (whether military conflict or a race for scarce resources). Compare: a rising China makes Americans better off in absolute terms. Also true, unless we consider the possibility of conflict....[without conflict] humans are only worse off relative to AI (or to humans who are able to leverage AI effectively). The availability of AI still probably increases humans’ absolute wealth. This is a problem for humans because we care about our fraction of influence over the future, not just our absolute level of wealth over the short term."
    • Similarly, it seems to me that we could have a scenario in which people realise they can't enforce contracts with AIs, but the losses that result from that are relatively small, and are outweighed by the benefits of the AI, so people continue using the AIs despite the lack of enforceability of the contracts.
    • And then this could still lead to existential catastrophe due to black swan events people didn't adequately account for, competitive dynamics, or "externalities" e.g. in relation to future generations.

I'm not personally sure how likely I find any of the above scenarios. I'm just saying that they seem to reveal reasons to have at least some doubts that "if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs".

Although I think it would still be true that the possibilities of trial-and-error, recognition of lack of enforceability, and people's concerns about that are at least some reason to assume that if AIs are used contracts will be enforceability.

Comment by michaela on A shift in arguments for AI risk · 2020-03-30T07:13:59.976Z · score: 3 (2 votes) · LW · GW

I found the linked post very interesting, and seemingly useful. Thanks for cross-posting it! And shame that the author didn't get the time to pursue the project further.

One quibble:

For long-termists, I see three plausible attitudes:

They prioritise AI because of arguments that rely on a discontinuity, and they think a discontinuous scenario is probable. The likelihood of a discontinuity is a genuine crux of their decision to prioritise AI.
They prioritise AI for for reasons that do not rely on a discontinuity
They prioritise AI because of possibility of discontinuity, but its likelihood is not a genuine crux, because they see no plausible other ways of affecting the long-term future.

The author does provide hedges, such as that "these are three stylised attitudes. It’s likely that many people have an intermediate view that attaches some credence to each of these stories." But one thing that struck me as notably missing was the following variant of the first attitude:

They prioritise AI because of arguments that rely on a discontinuity, and they think a discontinuous scenario has sufficiently high probability to be worth serious attention. The likelihood of a discontinuity is a genuine crux of their decision to prioritise AI.

Indeed, my impression is that a large portion of people motivated by the discontinuity-based arguments actually see a discontinuity as less than 50% likely, perhaps even very unlikely, but not extremely unlikely. And they thus see it as a risk worth preparing for. (I don't have enough knowledge of the community to say how large that "large portion" is.)

And this isn't the same as the third attitude, really, because it may be that these people would shift their priorities to something else if they came to see a discontinuity as even less likely. It might not be the only lever they see as potentially worth pulling to affect the long-term future, and they might not be in properly Pascalian territory, just expected value territory.

That said, this is sort of like a blend between the first and third attitudes shown. And perhaps by "probable" the author actually meant something like "plausible", rather than "more likely than not". But this point still seemed to me worth mentioning, particularly as I think it's related to the general pattern of people outside of the existential risk community assuming that those within it see x-risks as likely, whereas most seem to see them as unlikely but still a really, really big deal.

Comment by michaela on A list of good heuristics that the case for AI x-risk fails · 2020-03-29T13:14:33.466Z · score: 1 (1 votes) · LW · GW

Also, one other heuristic/proposition that, as far as I'm aware, is simply factually incorrect (rather than "flawed but in debatable ways" or "actually pretty sound") is "AI researchers didn't come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies." So there it may also be worth pointing out in some manner that, in reality, quite early on prominent AI researchers raised concerns somewhat similar to those discussed now.

E.g., I. J. Good apparently wrote in 1959:

Whether [an intelligence explosion] will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.
Comment by michaela on A list of good heuristics that the case for AI x-risk fails · 2020-03-29T12:27:46.574Z · score: 1 (1 votes) · LW · GW

I think this list is interesting and potentially useful, and I think I'm glad you put it together. I also generally think it's a good and useful norm for people to seriously engage with the arguments they (at least sort-of/overall) disagree with.

But I'm also a bit concerned about how this is currently presented. In particular:

  • This is titled "A list of good heuristics that the case for AI x-risk fails".
  • The heuristics themselves are stated as facts, not as something like "People may believe that..." or "Some claim that..." (using words like "might" could also help).
    • A comment of yours suggests you've already noticed this. But I think it'd be pretty quick to fix.
  • Your final paragraph, a very useful caveat, comes after listing all the heuristics as facts.

I think these things will have relatively small downsides, given the likely quite informed and attentive audience here. But a bunch of psychological research I read a while ago (2015-2017) suggests there could be some degree of downsides. E.g.:

Information that initially is presumed to be correct, but that is later retracted or corrected, often continues to influence memory and reasoning. This occurs even if the retraction itself is well remembered. The present study investigated whether the continued influence of misinformation can be reduced by explicitly warning people at the outset that they may be misled. A specific warning--giving detailed information about the continued influence effect (CIE)--succeeded in reducing the continued reliance on outdated information but did not eliminate it. A more general warning--reminding people that facts are not always properly checked before information is disseminated--was even less effective. In an additional experiment, a specific warning was combined with the provision of a plausible alternative explanation for the retracted information. This combined manipulation further reduced the CIE but still failed to eliminate it altogether.

And also:

Information presented in news articles can be misleading without being blatantly false. Experiment 1 examined the effects of misleading headlines that emphasize secondary content rather than the article’s primary gist. [...] We demonstrate that misleading headlines affect readers’ memory, their inferential reasoning and behavioral intentions, as well as the impressions people form of faces. On a theoretical level, we argue that these effects arise not only because headlines constrain further information processing, biasing readers toward a specific interpretation, but also because readers struggle to update their memory in order to correct initial misconceptions.

Based on that sort of research (for a tad more info on it, see here), I'd suggest:

  • Renaming this to something like "A list of heuristics that suggest the case for AI x-risk is weak" (or even "fails", if you've said something like "suggest" or "might")
  • Rephrasing the heuristics to stated as disputable (or even false) claims, rather than facts. E.g., "Some people may believe that this concern is being voiced exclusively by non-experts like Elon Musk, Steven Hawking, and the talkative crazy guy next to you on the bus." ETA: Putting them in quote marks might be another option for that.
  • Moving what's currently the final paragraph caveat to before the list of heuristics.
  • Perhaps also adding sub-points about the particularly disputable dot points. E.g.:
    • "(But note that several AI experts have now voiced concern about the possibility of major catastrophes from advanced AI system, although there's still not consensus on this.)"

I also recognise that several of the heuristics really do seem good, and probably should make us at least somewhat less concerned about AI. So I'm not suggesting trying to make them all sound stupid. Just perhaps being more careful not to end up with some readers' brains, on some level, automatically processing all of these heuristics as definite truths that definitely suggest AI x-risk isn't worth of attention.

Sorry for the very unsolicited advice! It's just that preventing gradual slides into false beliefs (including from well-intentioned efforts that do contain the truth in them somewhere!) is sort of a hobby-horse of mine.

Comment by michaela on Section 7: Foundations of Rational Agency · 2020-03-28T11:45:50.767Z · score: 1 (1 votes) · LW · GW
Agents may tend to make the decisions on some reference class of decision problems. (That is, for some probability distribution on decision contexts C, P(Agent 1’s decision in context C=Agent 2’s decision in context C) is high.)

Should this say "make the same decisions" (i.e., is the word "same" missing)? (Asking partly in case I'm misunderstanding what possibility is being described there.)

Comment by michaela on Sections 5 & 6: Contemporary Architectures, Humans in the Loop · 2020-03-28T11:44:31.465Z · score: 2 (2 votes) · LW · GW

Inconsequential heads up: At least on my screen, it seemed there were symbols missing at the ends of each of the following two sentences:

Define the cooperative policies as .


An extreme case of a punishment policy is the one in which an agent commits to minimizing their counterpart's utility once they have defected: .
Comment by michaela on Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms · 2020-03-28T11:41:34.404Z · score: 1 (1 votes) · LW · GW

Interesting post!

see also the lengthy discussionsin Garfinkel and Dafoe (2019) and Kroll et al. (2016), discussed in Section 1 Footnote 1

Should that say "Section 2 footnote 3", or "Section 1 & 2 footnote 3", or something like that? And should that be Garfinkel (2018), rather than Garfinkel and Dafoe (2019)?

Comment by michaela on Sections 1 & 2: Introduction, Strategy and Governance · 2020-03-28T11:41:04.258Z · score: 1 (1 votes) · LW · GW

Interesting research agenda, thanks for posting it!

Very minor question, mainly to see whether I'm misunderstanding something:

we will say that a two-player normal-form game with payoffs denoted as in Table 1 is a social dilemma if the payoffs satisfy these criteria:
2R>T+S (Mutual cooperation is better than randomizing between cooperation and defection);

But it looks to me like, in Table 1, "Chicken" doesn't satisfy that criterion. 2*0 = 0, and -1 + 1 = 0, so 2R=T+S, rather than 2R being greater. Am I missing something? Or should that have been an "equal to or greater than" symbol?

Also, I spotted an apparent typo, in "Under what circumstances does mutual transparency or exacerbate commitment race dynamics". I believe a word like "mitigate" is missing there.

Comment by michaela on MichaelA's Shortform · 2020-03-28T08:37:24.483Z · score: 1 (1 votes) · LW · GW

Ah yes, meant to add that but apparently missed it. Added now. Thanks!

Comment by michaela on MichaelA's Shortform · 2020-03-28T07:42:49.724Z · score: 14 (6 votes) · LW · GW

Collection of discussions of key cruxes related to AI safety/alignment

These are works that highlight disagreements, cruxes, debates, assumptions, etc. about the importance of AI safety/alignment, about which risks are most likely, about which strategies to prioritise, etc.

I've also included some works that attempt to clearly lay out a particular view in a way that could be particularly helpful for others trying to see where the cruxes are, even if the work itself don't spend much time addressing alternative views. I'm not sure precisely where to draw the boundaries in order to make this collection maximally useful.

These are ordered from most to least recent.

I've put in bold those works that very subjectively seem to me especially worth reading.

General, or focused on technical work

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics - James Fodor, 2020; this received pushback from Rohin Shah, which resulted in a comment thread worth adding here in its own right

Fireside Chat: AI governance - Ben Garfinkel & Markus Anderljung, 2020

My personal cruxes for working on AI safety - Buck Shlegeris, 2020

What can the principal-agent literature tell us about AI risk? - Alexis Carlier & Tom Davidson, 2020

Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society - Carina Prunkl & Jess Whittlestone, 2020 (commentary here)

Interviews with Paul Christiano, Rohin Shah, Adam Gleave, and Robin Hanson - AI Impacts, 2019 (summaries and commentary here and here)

Brief summary of key disagreements in AI Risk - iarwain, 2019

A list of good heuristics that the case for AI x-risk fails - capybaralet, 2019

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More - 2019

Clarifying some key hypotheses in AI alignment - Ben Cottier & Rohin Shah, 2019

A shift in arguments for AI risk - Tom Sittler, 2019 (summary and discussion here)

The Main Sources of AI Risk? - Wei Dai & Daniel Kokotajlo, 2019

Current Work in AI Alignment - Paul Christiano, 2019 (key graph can be seen at 21:05)

What failure looks like - Paul Christiano, 2019 (critiques here and here; counter-critiques here; commentary here)

Disentangling arguments for the importance of AI safety - Richard Ngo, 2019

Reframing superintelligence - Eric Drexler, 2019 (I haven't yet read this; maybe it should be in bold)

Prosaic AI alignment - Paul Christiano, 2018

How sure are we about this AI stuff? - Ben Garfinkel, 2018 (it's been a while since I watched this; maybe it should be in bold)

AI Governance: A Research Agenda - Allan Dafoe, 2018

Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk” - Kaj Sotala, 2018 (full paper here)

A model I use when making plans to reduce AI x-risk - Ben Pace, 2018

Interview series on risks from AI - Alexander Kruel (XiXiDu), 2011 (or 2011 onwards?)

Focused on takeoff speed/discontinuity/FOOM specifically

Discontinuous progress in history: an update - Katja Grace, 2020 (also some more comments here)

My current framework for thinking about AGI timelines (and the subsequent posts in the series) - zhukeepa, 2020

What are the best arguments that AGI is on the horizon? - various authors, 2020

The AI Timelines Scam - jessicat, 2019 (I also recommend reading Scott Alexander's comment there)

Double Cruxing the AI Foom debate - agilecaveman, 2018

Quick Nate/Eliezer comments on discontinuity - 2018

Arguments about fast takeoff - Paul Christiano, 2018

Likelihood of discontinuous progress around the development of AGI - AI Impacts, 2018

The Hanson-Yudkowsky AI-Foom Debate - various works from 2008-2013

Focused on governance/strategy work

My Updating Thoughts on AI policy - Ben Pace, 2020

Some cruxes on impactful alternatives to AI policy work - Richard Ngo, 2018

Somewhat less relevant

A small portion of the answers here - 2020

I intend to add to this list over time. If you know of other relevant work, please mention it in a comment.

Comment by michaela on MichaelA's Shortform · 2020-03-28T02:48:56.201Z · score: 1 (1 votes) · LW · GW


Alatalo, R. V., Mappes, J., & Elgar, M. A. (1997). Heritabilities and paradigm shifts. Nature, 385(6615), 402-403. doi:10.1038/385402a0

Anderson, D. R., Burnham, K. P., Gould, W. R., & Cherry, S. (2001). Concerns about finding effects that are actually spurious. Wildlife Society Bulletin, 29(1), 311-316.

Anderson, M. S., Martinson, B. C., & Vries, R. D. (2007). Normative dissonance in science: Results from a national survey of U.S. scientists. Journal of Empirical Research on Human Research Ethics: An International Journal, 2(4), 3-14. doi:10.1525/jer.2007.2.4.3

Anderson, M. S., Ronning, E. A., Vries, R. D., & Martinson, B. C. (2010). Extending the Mertonian norms: Scientists' subscription to norms of research. The Journal of Higher Education, 81(3), 366-393. doi:10.1353/jhe.0.0095

Asendorpf, J. B., Conner, M., Fruyt, F. D., Houwer, J. D., Denissen, J. J., Fiedler, K., … Wicherts, J. M. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27(2), 108-119. doi:10.1002/per.1919

Bakker, M., Hartgerink, C. H., Wicherts, J. M., & Han L. J. Van Der Maas. (2016). Researchers' intuitions about power in psychological research. Psychological Science, 27(8), 1069-1077. doi:10.1177/0956797616647519

Bandura, A. (2002). Environmental sustainability by sociocognitive deceleration of population growth. In P. Shmuck & W. P. Schultz (Eds.), Psychology of sustainable development (pp. 209-238). New York, NY: Springer.

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-425. doi:10.1037/a0021524

Bennett, C. M., Miller, M. B., & Wolford, G. L. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: an argument for multiple comparisons correction. Neuroimage, 47(Suppl 1), S125. doi:10.1016/s1053-8119(09)71202-9

Berezow, A. B. (2012, July 13). Why psychology isn't science. Los Angeles Times. Retrieved from

Bringmann, L. F., & Eronen, M. I. (2015). Heating up the measurement debate: What psychologists can learn from the history of physics. Theory & Psychology, 26(1), 27-43. doi:10.1177/0959354315617253

Burke, D. (2014). Why isn't everyone an evolutionary psychologist? Frontiers in Psychology, 5. doi:10.3389/fpsyg.2014.00910

Campbell, H. (2012, July 17). A biologist and a psychologist square off over the definition of science. Science 2.0. Retrieved from

Chomsky, N. (1971). The case against BF Skinner. The New York Review of Books, 17(11), 18-24.

Cleland, C. E, & Brindell, S. (2013). Science and the messy, uncontrollable world of nature. In M. Pigliucci & M. Boudry (Eds.), The philosophy of pseudoscience (pp. 183-202). Chicago, IL: University of Chicago Press.

Confer, J. C., Easton, J. A., Fleischman, D. S., Goetz, C. D., Lewis, D. M., Perilloux, C., & Buss, D. M. (2010). Evolutionary psychology: Controversies, questions, prospects, and limitations. American Psychologist, 65(2), 110-126. doi:10.1037/a0018413

Dagher, Z. R., & Erduran, S. (2016). Reconceptualizing nature of science for science education: Why does it matter? Science & Education, 25, 147-164. doi:10.1007/s11191-015-9800-8

Delprato, D. J., & Midgley, B. D. (1992). Some fundamentals of B. F. Skinner's behaviorism. American Psychologist, 47(11), 1507-1520. doi:10.1037//0003-066x.47.11.1507

Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on?. Perspectives on Psychological Science, 6(3), 274-290. doi:10.1177/1745691611406920

Ecker, U. K., Lewandowsky, S., & Apai, J. (2011). Terrorists brought down the plane!—No, actually it was a technical fault: Processing corrections of emotive information. The Quarterly Journal of Experimental Psychology, 64(2), 283-310. doi:10.1080/17470218.2010.497927

Fanelli, D. (2010). “Positive” results increase down the hierarchy of the sciences. PLoS ONE, 5(4). doi:10.1371/journal.pone.0010068

Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories: Publication bias and psychological science's aversion to the null. Perspectives on Psychological Science, 7(6), 555-561. doi:10.1177/1745691612459059

Francis, G. (2012). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review, 19(2), 151-156. doi:10.3758/s13423-012-0227-9

Galak, J., LeBoeuf, R. A., Nelson, L. D., & Simmons, J. P. (2012). Correcting the past: Failures to replicate psi. Journal of Personality and Social Psychology, 103(6), 933-948. doi:10.1037/a0029709

Godin, G., Conner, M., & Sheeran, P. (2005). Bridging the intention-behaviour gap: The role of moral norm. British Journal of Social Psychology, 44(4), 497-512. doi:10.1348/014466604x17452

Hansson, S. O. (2013). Defining pseudoscience and science. In M. Pigliucci & M. Boudry (Eds.), The philosophy of pseudoscience (pp. 61-77). Chicago, IL: University of Chicago Press.

Ioannidis, J. P., Munafò, M. R., Fusar-Poli, P., Nosek, B. A., & David, S. P. (2014). Publication and other reporting biases in cognitive sciences: Detection, prevalence, and prevention. Trends in Cognitive Sciences, 18(5), 235-241. doi:10.1016/j.tics.2014.02.010

Irzik, G., & Nola, R. (2011). A family resemblance approach to the nature of science for science education. Science & Education, 20(7), 591-607. doi:10.1007/s11191-010-9293-4

Irzik, G., & Nola, R. (2014). New directions for nature of science research. In M. R. Matthews (Ed.), International Handbook of Research in History, Philosophy and Science Teaching (pp. 999-1021). Dordrecht: Springer.

John, L., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science, 23(5), 524-532. doi:10.1177/0956797611430953

Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196-217. doi:10.1207/s15327957pspr0203_4

Kahneman, D. (2014). A new etiquette for replication. Social Psychology, 45(4), 310-311.

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, S., Bernstein, M. J., Bocian, K., … Nosek, B. (2014a). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142-152. doi:10.1027/a000001

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, S., Bernstein, M. J., Bocian, K., … Nosek, B. (2014b). Theory building through replication: Response to commentaries on the “many labs” replication project. Social Psychology, 45(4), 299-311. doi:10.1027/1864-9335/a000202

Lebel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology, 15(4), 371-379. doi:10.1037/a0025172

Lilienfeld, S. O. (2011). Distinguishing scientific from pseudoscientific psychotherapies: Evaluating the role of theoretical plausibility, with a little help from Reverend Bayes. Clinical Psychology: Science and Practice, 18(2), 105-112. doi:10.1111/j.1468-2850.2011.01241.x

Lilienfeld, S. O., Ritschel, L. A., Lynn, S. J., Cautin, R. L., & Latzman, R. D. (2013). Why many clinical psychologists are resistant to evidence-based practice: Root causes and constructive remedies. Clinical Psychology Review, 33(7), 883-900. doi:10.1016/j.cpr.2012.09.008

Mahner, M. (2013). Science and pseudoscience: How to demarcate after the (alleged) demise of the demarcation problem. In M. Pigliucci & M. Boudry (Eds.), The philosophy of pseudoscience (pp. 29-43). Chicago, IL: University of Chicago Press.

McNutt, M. (2014). Reproducibility. Science, 343(6168), 229. doi:10.1126/science.1250475

Michell, J. (2013). Constructs, inferences, and mental measurement. New Ideas in Psychology, 31(1), 13-21. doi:10.1016/j.newideapsych.2011.02.004

Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K. M., Gerber, A., … Laan, M. V. (2014). Promoting transparency in social science research. Science, 343(6166), 30-31. doi:10.1126/science.1245317

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … & Contestabile, M. (2015). Promoting an open research culture. Science, 348(6242), 1422-1425.

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Popper, K. (1957). Philosophy of science: A personal report. In C. A. Mace (Ed.), British Philosophy in Mid-Century (155-160). London: Allen and Unwin.

Pigliucci, M. (2013). The demarcation problem: A (belated) response to Laudan. In M. Pigliucci & M. Boudry (Eds.), The philosophy of pseudoscience (pp. 9-28). Chicago, IL: University of Chicago Press.

Rhodes, R. E., & Bruijn, G. D. (2013). How big is the physical activity intention-behaviour gap? A meta-analysis using the action control framework. British Journal of Health Psychology, 18(2), 296-309. doi:10.1111/bjhp.12032

Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three unsuccessful attempts to replicate Bem’s “retroactive facilitation of recall” effect. PLoS ONE, 7(3), e33423. doi:10.1371/journal.pone.0033423

Sarewitz, D. (2012). Beware the creeping cracks of bias. Nature, 485(7397), 149.

Service, R. F. (2002). Scientific misconduct: Bell Labs fires star physicist found guilty of forging data. Science, 298(5591), 30-31. doi:10.1126/science.298.5591.30

Sheeran, P. (2002). Intention—behavior relations: A conceptual and empirical review. European Review of Social Psychology, 12(1), 1-36. doi:10.1080/14792772143000003

Skinner, B. F. (1987). Whatever happened to psychology as the science of behavior? American Psychologist, 42(8), 780-786. doi:10.1037/0003-066x.42.8.780

Stricker, G. (1997). Are science and practice commensurable? American Psychologist, 52(4), 442-448. doi:10.1037//0003-066x.52.4.442

Wagenmakers, E., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426-432. doi:10.1037/a0022790

Wagenmakers, E., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-638. doi:10.1177/1745691612463078

Zimbardo, P. G. (2012). Does psychology make a significant difference in our lives?. In Applied Psychology (pp. 39-64). Psychology Press.

Comment by michaela on MichaelA's Shortform · 2020-03-28T02:46:38.490Z · score: 1 (1 votes) · LW · GW

Psychology: An Imperfect and Improving Science

This is an essay I wrote in 2017 as coursework for the final year of my Psychology undergrad degree. (That was a year before I learned about EA and the rationalist movement.)

I’m posting this as a shortform comment, rather than as a full post, because it’s now a little outdated, it’s just one of many things that people have written on this topic, and I don’t think the topic is of central interest to a massive portion of LessWrong readers. But I do think it holds up well, is pretty clear, and makes some points that generalise decently beyond psychology (e.g., about drawing boundaries between science and pseudoscience, evaluating research fields, and good research practice).

I put the references in a “reply” to this.

Psychology's scientific status has been denied or questioned by some (e.g., Berezow, 2012; Campbell, 2012). Evaluating such critiques and their rebuttals requires defining “science”, considering what counts as psychology, and exploring how unscientific elements within a field influence the scientific standing of that field as a whole. This essay presents a conception of “science” that consolidates features commonly seen as important into a family resemblance model. Using this model, I argue psychology is indeed a science, despite unscientific individuals, papers, and practices within it. However, these unscientific practices make psychology less scientific than it could be. Thus, I outline their nature and effects, and how psychologists are correcting these issues.

Addressing whether psychology is a science requires specifying what is meant by “science”. This is more difficult than some writers seem to recognise. For example, Berezow (2012) states we can “definitively” say psychology is non-science “[b]ecause psychology often does not meet the five basic requirements for a field to be considered scientifically rigorous: clearly defined terminology, quantifiability, highly controlled experimental conditions, reproducibility and, finally, predictability and testability.” However, there are fields that do not meet those criteria whose scientific status is generally unquestioned. For example, astronomy and earthquake science do not utilise experiments (Irzik & Nola, 2014). Furthermore, Berezow leaves unmentioned other features associated with science, such as data-collection and inference-making (Irzik & Nola, 2011). Many such features have been noted by various writers, though some are contested by others or only present or logical in certain sciences. For example, direct observation of the matters of interest has been rightly noted as helping make fields scientific, as it reduces issues like the gap between self-reported intentions and the behaviours researchers seek to predict (Godin, Conner, & Sheeran, 2005; Rhodes & de Bruijn, 2013; Sheeran, 2002; Skinner, 1987). However, self-reported intentions are still useful predictors of behaviour and levers for manipulating it (Godin et al., 2005; Rhodes & de Bruijn, 2013; Sheeran, 2002), and science often productively investigates constructs such as gravity that are not directly observable (Bringmann & Eronen, 2016; Chomsky, 1971; Fanelli, 2010; Michell, 2013). Thus, definitions of science would benefit from noting the value of direct observation, but cannot exclude indirect measures or unobservable constructs. This highlights the difficulty – or perhaps impossibility – of defining science by way of a list of necessary and sufficient conditions for scientific status (Mahner, 2013).

An attractive solution is instead constructing a family resemblance model of science (Dagher & Erduran, 2016; Irzik & Nola, 2011, 2014; Pigliucci, 2013). Family resemblance models are sets of features shared by many but not all examples of something. To demonstrate, three characteristics common in science are experiments, double-blind trials, and the hypothetico-deductive method (Irzik & Nola, 2014). A definition of science omitting these would be missing something important. However, calling these “necessary” excludes many sciences; for example, particle physics would be rendered unscientific for lack of double-blind trials (Cleland & Brindell, 2013; Irzik & Nola, 2014). Thus, a family resemblance model of science only requires a field to have enough scientific features, rather than requiring the field to have all such features. The full list of features this model should include, the relative importance of each feature, and what number or combination is required for something to be a “science” could all be debated. However, for showing that psychology is a science, it will suffice to provide a rough family resemblance model incorporating some particularly important features, which I shall now outline.

Firstly, Berezow's (2012) “requirements”, while not actually necessary for scientific status, do belong in a family resemblance model of science. That is, when these features can be achieved, they make a field more scientific. The importance of reproducibility is highlighted also by Kahneman (2014) and Klein et al. (2014a, 2014b), and that of testability or falsifiability is also mentioned by Popper (1957) and Ferguson and Heene (2012). These features are related to the more fundamental idea that science should be empirical; claims should be required to be supported by evidence (Irzik & Nola, 2011; Pigliucci, 2013). Together, these features allow science to be self-correcting, incrementally progressing towards truth by accumulation of evidence and peer-review of ideas and findings (Open Science Collaboration, 2015). This is further supported by scientists' methods and results being made public and transparent (Anderson, Martinson, & De Vries, 2007, 2010; Nosek et al., 2015; Stricker, 1997). Additionally, findings and predictions should logically cohere with established theories, including those from other sciences (Lilienfeld, 2011; Mahner, 2013). These features all support science's ultimate aims to benefit humanity by explaining, predicting, and controlling phenomena (Hansson, 2013; Irzik & Nola, 2014; Skinner, cited in Delprato & Midgley, 1992). Each feature may not be necessary for scientific status, and many other features could be added, but the point is that each feature a field possesses makes that field more scientific. Thus, armed with this model, we are nearly ready to productively evaluate the scientific status of psychology.

However, two further questions must first be addressed: What is psychology, and how do unscientific occurrences within psychology affect the scientific status of the field as a whole? For example, it can generally be argued parapsychology is not truly part of psychology, for reasons such as its lack of support from mainstream psychologists. However, there are certain more challenging instances, such as the case of a paper by Bem (2011) claiming to find evidence for precognition. This used accepted methodological and analytical techniques, was published in a leading psychology journal, and was written by a prominent, mainstream psychologist. Thus, one must accept that this paper is, to a substantial extent, part of psychology. It therefore appears important to determine whether Bem's paper exemplifies science. It certainly has many scientific features, such as use of experiments and evidence. However, it lacks other features, such as logical coherence with the established principle of causation only proceeding forwards in time.

But it is unnecessary here to determine whether the paper is non-science, insufficiently scientific, or bad science, because, regardless, this episode shows psychology as a field being scientific. This is because scientific features such as self-correction and reproducibility are most applicable to a field as a whole, rather than to an individual scientist or article, and these features are visible in psychology's response to Bem's (2011) paper. Replication attempts were produced and supported the null hypothesis; namely, that precognition does not occur (Galak, LeBoeuf, Nelson, Simmons, 2012; Ritchie, Wiseman, & French, 2012; Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). Furthermore, publicity, peer-review, and self-correction of findings and ideas were apparent in those failed replications and in commentary on Bem's paper (Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011; Francis, 2012; LeBel & Peters, 2011). Peers discussed many issues with Bem's article, such as several variables having been recorded by Bem's experimental program yet not mentioned in the study (Galak et al., 2012; Ritchie et al., 2012), suggesting that the positive results reported may have been false positives emerging by chance from many, mostly unreported analyses. Wagenmakers et al. (2011) similarly noted other irregularities and unexplained choices in data transformation and analysis, and highlighted that Bem had previously recommended to psychologists: “If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. […] Go on a fishing expedition for something—anything—interesting” (Bem, cited in Wagenmakers et al., 2011). These responses to Bem’s study by psychologists highlight that, while the scientific status of that study is highly questionable, isolated events such as that need not overly affect the scientific status of the entire field of psychology.

Indeed, psychology's response to Bem's (2011) paper exemplifies ways in which the field in general fits the family resemblance model of science outlined earlier. This model captures how different parts of psychology can each be scientific, despite showing different combinations of scientific features. For example, behaviourists may use more direct observation and clearly defined terminology (see Delprato & Midgley, 1992; Skinner, 1987), while evolutionary psychologists better integrate their theories and findings with established theories from other sciences (see Burke, 2014; Confer et al., 2010). These features make subfields that have them more scientific, but lacking one feature does not make a subfield non-science. Similarly, while much of psychology utilises controlled experiments, those parts that do not, like longitudinal studies of the etiology of mental disorders, can still be scientific if they have enough other scientific features, such as accumulation of evidence to increase our capacity for prediction and intervention.

Meanwhile, other scientific features are essentially universal in psychology. For example, all psychological claims and theories are expected to be based on or confirmed by evidence, and are rejected or modified if found not to be. Additionally, psychological methods and findings are made public by publication, with papers being peer-reviewed before this and open to critique afterwards, facilitating self-correction. Such self-correction can be seen in the response to Bem's (2011) paper, as well as in how most psychological researchers now reject the untestable ideas of early psychoanalysis (see Cioffi, 2013; Pigliucci, 2013). Parts of psychology vary in their emphasis on basic versus applied research; for example, some psychologists investigate the processes underlying sadness while others conduct trials of specific cognitive therapy techniques for depression. However, these various branches can support each other, and all psychological research ultimately pursues benefitting humanity by explaining, predicting, and controlling phenomena. Indeed, while there is much work to be done and precision is rarely achieved, psychology can already make predictions much more accurate than chance or intuition in many areas, and thus provides benefits as diverse as anxiety-reduction via exposure therapy and HIV-prevention via soap operas informed by social-cognitive theories (Bandura, 2002; Lilienfeld, Ritschel, Lynn, Cautin, & Latzman, 2013; Zimbardo, 2004). All considered, most of psychology exemplifies most important scientific features, and thus psychology should certainly be considered a science.

However, psychology is not as scientific as it could be. Earlier I noted that isolated papers reporting inaccurate findings and utilising unscientific practices, as Bem (2011) seems highly likely to have, should not significantly affect psychology's scientific status, as long as the field self-corrects adequately. However, as several commentators on Bem's paper noted, more worrying is what that paper reflects regarding psychology more broadly, given that it largely met or exceeded psychology's methodological, analytical, and reporting standards (Francis, 2012; LeBel & Peters, 2011; Wagenmakers et al., 2011). The fact Bem met these standards, yet still “discovered” and got published results that seem to violate fundamental principles about how causation works, highlights the potential prevalence of spurious findings in psychological literature. These findings could result from various flaws and biases, yet might fail to be recognised or countered in the way Bem's report was if they are not as clearly false; indeed, they may be entirely plausible, yet inaccurate (LeBel & Peters, 2011). Thus, I will now discuss how critiques regarding Bem's paper apply to much of mainstream psychology.

Firstly, the kind of “fishing expedition” recommended by Bem (cited in Wagenmakers et al., 2011) is common in psychology. Researchers often record many variables, and have flexibility in which variables, interactions, participants, data transformations, and statistics they use in their analyses (John, Loewenstein, & Prelec, 2012). Wagenmakers et al. (2012) note that such practices are not inherently problematic, and indeed such explorations are useful for suggesting hypotheses to test in a confirmatory manner. The issue is that often these explorations are inadequately reported and are presented as confirmatory themselves, despite the increased risk of false positives when conducting multiple comparisons (Asendorpf et al., 2013; Wagenmakers et al., 2012). Neuropsychological studies can be particularly affected by failures to control for multiple comparisons, even if all analyses are reported, because analysis of brain activity makes huge numbers of comparisons the norm. Thus, without statistical controls, false positives are almost guaranteed (Bennett, Baird, Miller, & Wolford, 2009). The issue of uncontrolled multiple comparisons, whether reported or not, causing false positives can be compounded by hindsight bias making results seem plausible and predictable in retrospect (Wagenmakers et al., 2012). This can cause overconfidence in findings and make researchers feel comfortable writing articles as if these findings were hypothesised beforehand (Kerr, 1998). These practices inflate the number of false discoveries and spurious confirmations of theories in psychological literature.

This is compounded by publication bias. Journals are more likely to publish novel and positive results than replications or negative results (Ferguson & Heene, 2012; Francis, 2012; Ioannidis, Munafò, Fusar-Poli, Nosek, & David, 2014; Kerr, 1998). One reason for this is that, despite the importance of self-correction and incremental progress, replications or negative results are often treated as not show anything substantially interesting (Klein et al., 2014b). Another reason is the idea that null results are hard to interpret or overly likely to be false negatives (Ferguson & Heene, 2012; Kerr, 1998). Psychological studies regularly have insufficient power; their sample sizes mean that, even if an effect of the expected size does exist, the chance of not finding it is substantial (Asendorpf et al., 2013; Bakker, Hartgerink, Wicherts, & van der Maas, 2016). Further, the frequentist statistics typically used by psychologists cannot clearly quantify the support data provides for null hypotheses; these statistics have difficulty distinguishing between powerful evidence for no effect and simply a failure to find evidence for an effect (Dienes, 2011). While concerns about the interpretability of null results are thus often reasonable, they distort the psychological literature's representation of reality (see Fanelli, 2010; Kerr, 1998). Publication bias also takes the form of researchers being more likely to submit for publication those studies that revealed positive results (John et al., 2012). This can occur because researchers themselves also often find negative results difficult to interpret, and know they are less likely to be published or to lead to incentives like grants or prestige (Kerr, 1998; Open Science Collaboration, 2015). Thus, flexibility in analysis, failure to control for or report multiple comparisons, presentation of exploratory results as confirmatory, publication bias, low power, and difficulty interpreting null results are interrelated issues. These issues in turn make psychology less scientific by reducing the transparency of methods and findings.

These issues also undermine other scientific features. The Open Science Collaboration (2015) conducted replications of 100 studies from leading psychological journals, finding that less than half replicated successfully. This low level of reproducibility in itself makes psychology less scientific, and provides further evidence of the likely high prevalence and impact of the issues noted above (Asendorpf et al., 2013; Open Science Collaboration, 2015). Together, these problems impede self-correction, and make psychology's use of evidence and testability of theories less meaningful, as replications and negative tests are often unreported (Ferguson & Heene, 2012). This undermines psychology's ability to benefit humanity by explaining, predicting, and controlling phenomena.

However, while these issues make psychology less scientific, they do not make it non-science. Other sciences, including “hard sciences” like physics and biology, also suffer from issues like publication bias and low reproducibility and transparency (Alatalo, Mappes, & Edgar, 1997; Anderson, Burnham, Gould, & Cherry, 2001; McNutt, 2014; Miguel et al., 2014; Sarewitz, 2012; Service, 2002). Their presence is problematic and demands a response in any case, and may be more pronounced in psychology than in “harder” sciences, but it is not necessarily damning (see Fanelli, 2010). For example, the Open Science Collaboration (2015) did find a large portion of effects replicated, particularly effects whose initial evidence was stronger. Meanwhile, Klein et al. (2014a) found a much higher rate of replication for more established effects, compared to the Open Science Collaboration's quasi-random sample of recent findings. Both results highlight that, while psychology certainly has work to do to become more reliable, the field also has the capacity to scientifically progress towards truth and is already doing so to a meaningful extent.

Furthermore, psychologists themselves are highlighting these issues and researching and implementing solutions for them. Bakker et al. (2016) discuss the problem of low power and how to overcome it with larger sample sizes, reinforced by researchers habitually running power analyses prior to conducting studies and reviewers checking these analyses have been conducted. Nosek et al. (2015) proposed guidelines for promoting transparency by changing what journals encourage or require, such as replications, better reporting and sharing of materials and data, and pre-registration of studies and analysis plans. Pre-registration side-steps confirmation and hindsight bias and unreported, uncorrected multiple comparisons, as expectations and analysis plans are on record before data is gathered (Wagenmakers et al., 2012). Journals can also conditionally accept studies for publication based on pre-registered plans, minimising bias against null results by both journals and researchers. Such proposals still welcome exploratory analyses, but prevent these analyses being presented as confirmatory (Miguel et al., 2014). Finally, psychologists have argued for, outlined how to use, and adopted Bayesian statistics as an alternative to frequentist statistics (Ecker, Lewandowsky, & Apai, 2011; Wagenmakers et al., 2011). Bayesian statistics provide clear quantification of evidence for null hypotheses, combatting one source of publication bias and making testability of psychological claims more meaningful (Dienes, 2011; Francis, 2012). These proposals are beginning to take effect. For example, many journals and organisations are signatories to Nosek et al.'s guidelines. Additionally, the Centre for Open Science, led by the psychologist Brian Nosek, has set up online tools for researchers to routinely make their data, code, and pre-registered plans public (Miguel et al., 2014). This shows psychology self-correcting its practices, not just individual findings, to become more scientific.

I have argued here that claims that psychology is non-scientific may often reflect unworkable definitions of science and ignorance of what psychology actually involves. A family resemblance model of science overcomes the former issue by outlining features that sciences do not have to possess to be science, but do become more scientific by possessing. This model suggests psychology is a science because it generally exemplifies most scientific features; most importantly, it accumulates evidence publicly, incrementally, and self-critically to benefit humanity by explaining, predicting, and controlling phenomena. However, psychology is not as scientific as it could be. A variety of interrelated issues with researchers' and journals' practices and incentive structures impede the effectiveness and meaningfulness of psychology's scientific features. But failure to be perfectly scientific is not unique to psychology; it is universal among sciences. Science has achieved what it has because of its constant commitment to incremental improvement and self-correction of its own practices. In keeping with this, psychologists are researching and discussing psychology's issues and their potential solutions, and such solutions are being put into action. More work must be done, and more researchers and journals must act on and push for these discussions and solutions, but already it is clear both that psychology is a science and that it is actively working to become more scientific.

Comment by michaela on [Article review] Artificial Intelligence, Values, and Alignment · 2020-03-10T08:28:33.437Z · score: 1 (1 votes) · LW · GW
This seems incorrect - if we don't have "the one true theory" (assuming it exists), then how do we know it can't be reliably communicated?

To be fair to the paper, I'm not sure that that specifically is as strong an argument as it might look. E.g., I don't have a proof for [some as-yet-unproven mathematical conjecture], but I feel pretty confident that if I did come up with such a proof, I wouldn't be able to reliably communicate it to just any given random person.

But note that there I'm saying "I feel pretty confident", and "I wouldn't be able to". So I think the issue is more in the "can't", and the implication that we couldn't fix that "can't" even if we tried, rather than in the fact these arguments are being applied to something we haven't discovered yet.

That said, I do think it's an interesting and valid point that the fact we haven't found that theory yet (again, assuming it exists) adds at least a small extra reason to believe it's possible we could communicate it reliably. For example, my second-hand impression is that some philosophers think "the true moral theory" would be self-evidently true, once discovered, and would be intrinsically motivating, or something like that. That seems quite unlikely to me, and I wouldn't want to rely on it at all, but I guess it is yet another reason why it's possible the theory could be reliably communicated.

And I guess even if the theory was not quite "self-evidently true" or "intrinsically motivating", it might still be shockingly simple, intuitive, and appealing, making it easier to reliably communicate than we'd otherwise expect.

Perhaps given that we don't know that it can be reliably communicated, we shouldn't rely on that.

Yes, I'd strongly agree with that. I sort-of want us to make as few assumptions on philosophical matters as possible, though I'm not really sure precisely what that means or what that looks like.

"Designing AI in accordance with a single moral doctrine would therefore involve imposing a set of values and judgments on other people who did not agree with them"
Unless the correct moral theory doesn't involve doing that?

To again be fair to the paper, I believe the argument is that, given the assumption (which I contest) that we definitely couldn't reliably convince everyone of the "correct moral theory", if we wanted to align an AI with that theory we'd effectively end up imposing that theory on people who didn't sign up for it.

You might have been suggesting that such an imposition might be explicitly prohibited by the correct moral theory, or something like that. But in that case, I think the problem is instead that we wouldn't be able to align the AI with that theory, without at least some contradictions, if people couldn't be convinced of the theory (which, again, I don't see as certain).

Comment by michaela on Understandable vs justifiable vs useful · 2020-03-09T07:44:23.907Z · score: 1 (1 votes) · LW · GW

Just saw the following in SSC's review of The Seven Principles for Making Marriage Work:

Apart from whatever other exercise you’re doing each day, Gottman recommends a ritual of checking in after work and exchanging stories about your days. This time is a Designated Support Zone, no criticism allowed. You take your spouse’s side whether you secretly disagree with them or not. If your spouse gets angry that a police officer gave them a ticket for driving 110 mph through a 25 mph school zone, you are obligated by the terms of your marriage contract to shake your head and say “I know, cops these days have no respect.”
Gottman is slightly less strict in other situations, but he still thinks it’s very important that you take your spouse’s side in conflicts.

This made me wonder if the approach I often take, and that I suggest here, is unwise. That seems possible. But on reflection, I think this mostly just reemphasises what I already knew, which is that taking your partner's side is a good, simple way to stay on good terms with them. This fact is part of why I try to use the approach I suggest here, rather than simply saying exactly I think is true at all times, in whatever way feels most natural, with no consideration of tact and tone. And it's part of why I wrote that this approach allows me to "Do 4 without having to be too obnoxious or contrarian in the process, and while still letting people have space to vent, which I think is valuable in itself and to maintain relationships".

I'm guessing the difference between my rationale for my approach and Gottman's rationale for his advice (based on SSC's summary) is that I strongly value both staying on good terms with my partner and having good epistemic standards. And it seems that this approach allows me to achieve both quite well, most of the time, though there can certainly be tradeoffs and slip-ups along the way, and I'd advise proceeding with caution.

(Also, part III of SSC's review reveals major questions around the effectiveness and evidence-base of Gottman's advice, so maybe it shouldn't be given much thought anyway. But I do suspect Gottman is at least right in his apparent view about the value of taking your spouse's side for the goal of staying on good terms with them.)

Comment by michaela on Memetic downside risks: How ideas can evolve and cause harm · 2020-03-04T07:58:29.918Z · score: 3 (3 votes) · LW · GW

Yeah, that seems correct.

Another example that comes to mind is popular understandings of Maslow's hierarchy of needs. It was a few years ago that I read Maslow's actual paper, but I remember being surprised to find he was pretty clear that this was somewhat speculative and would need testing, and that there are cases where people pursue higher-level needs without having met all their lower-level ones yet. I'd previously criticised the hierarchy for overlooking those points, but it turned out what overlooked those points was instead the version of the hierarchy that had moved towards simplicity, apparent usefulness, and lack of caveats.