Four mindset disagreements behind existential risk disagreements in ML

robbbb

Four mindset disagreements behind existential risk disagreements in ML

post by Rob Bensinger (RobbBB) · 2023-04-11T04:53:48.427Z · LW · GW · 12 comments

12 comments

12 comments

Comments sorted by top scores.

comment by teradimich · 2023-04-11T13:08:49.341Z · LW(p) · GW(p)

The level of concern and seriousness I see from ML researchers discussing AGI on any social media platform or in any mainstream venue seems wildly out of step with "half of us think there's a 10+% chance of our work resulting in an existential catastrophe".

In fairness, this is not quite half the researchers. This is half the agreed survey.

'We contacted approximately 4271 researchers who published at the conferences NeurIPS or ICML in 2021. [...] We received 738 responses, some partial, for a 17% response rate'.

I expect that worried researchers are more likely to agree to participate in the survey.

Replies from: steve2152

↑ comment by Steven Byrnes (steve2152) · 2023-04-11T14:00:16.016Z · LW(p) · GW(p)

I recall that they tried to advertise / describe the survey in a way that would minimize response bias—like, they didn’t say “COME TAKE OUR SURVEY ABOUT AI DOOM”. That said, I am nevertheless still very concerned about response bias, and I strongly agree that the OP’s wording “48% of researchers” is a mistake that should be corrected.

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2023-04-11T15:35:20.890Z · LW(p) · GW(p)

I figured this would be obvious enough, and both surveys discuss this issue; but phrasing things in a way that encourages keeping selection bias in mind does seem like a good idea to me. I've tweaked the phrasing to say "In a survey, X".

comment by David Bravo (davidbravocomas) · 2023-04-11T19:13:34.576Z · LW(p) · GW(p)

I like this model, much of which I would encapsulate in the tendency to extrapolate from past evidence, not only because it resonates with the image I have of the people who are reluctant to take existential risks seriously, but because it is more fertile for actionable advice than the simple explanation of "because they haven't sat down to think deeply about it". This latter explanation might hold some truth, but tackling it would be unlikely to make them take more actions towards reducing existential risks if they weren't aware of, and weren't able to fix, possible failure modes in their thinking, and weren't aware that AGI is fundamentally different and extrapolating from past evidence is unhelpful.

I advocate shattering the Overton window and spreading arguments on the fundamental distinctions between AGI and our natural notions of intelligence, and these 4 points offer good, reasonable directions for addressing that. But the difficulty also lies in getting those arguments across to people outside specific or high-end communities like LW; in building a bridge between the ideas created at LessWrong, and the people who need to learn about them but are unlikely to come across LessWrong.

comment by Signer · 2023-04-11T06:46:49.335Z · LW(p) · GW(p)

But at the decision-making level, you should be “conservative” in a very different sense, by not gambling the future on your technology being low-impact.

What's the technical (like, with numbers) explanation for "why?"? And to what degree - it's common objection that being conservative to the extent of "what if AI will invents nanotechnology" is like worrying that your bridge will accelerate your traffic million times.

Replies from: RobbBB, gjm, jesper-norregaard-sorensen

↑ comment by Rob Bensinger (RobbBB) · 2023-04-11T15:52:01.705Z · LW(p) · GW(p)

This is why I said in the post:

Some people do have confident beliefs that imply "things will go well"; I disagree there, but I expect some amount of disagreement like that.

... and focused on the many people who don't have a confident objection to nanotech.

I and others have given lots of clear arguments for why relatively early AGI systems will plausibly be vastly smarter than humans. Eric Drexler has given lots of clear arguments for why nanotechnology is probably fairly easy to build.

None of this constitutes a proof that early AGI systems will be able to solve the inverse protein folding problem, etc., but it should at least raise the scenario to consideration and cause it to be taken seriously, for people who don't have specific reasons to dismiss the scenario.

I'll emphasize again this point I made in the OP:

Note that I'm not arguing "an AGI-mediated extinction event is such a big deal that we should make it a top priority even if it's very unlikely".

And this one:

My own view is that extreme disaster scenarios are very likely, not just a tail risk to hedge against. I actually expect AGI systems to achieve Drexler-style nanotechnology within anywhere from a few months to a few years of reaching human-level-or-better ability to do science and engineering work. At this point, I'm looking for any hope of us surviving at all [LW · GW], not holding out hope for a "conservative" scheme (sane as that would be).

So I'm not actually calling for much "conservatism" here. "Conservative" would be hedging against 1-in-a-thousand risks (or more remote tail risks of the sort that we routinely take into account when designing bridges or automobiles). I'm calling for people to take seriously their own probabilities insofar as they assign middling-ish probabilities to scenarios (e.g., 1-in-10 rather than 1-in-1000).

Another example would be that in 2018, Paul Christiano said he assigned around 30% probability to hard takeoff. But when I have conversations with others who seem to be taking Paul's views and running with them, I neither generally see them seriously engaging with hard takeoff as though they think it has a medium-ish probability, nor do I see them say anything about why they disagree with 2018-Paul about the plausibility of hard takeoff.

I don't think it's weird that there's disagreement here, but I do think it's weird how people are eliding the distinction between "these sci-fi scenarios aren't that implausible, but they aren't my mainline prediction" and "these sci-fi scenarios are laughably unlikely and can be dismissed". I feel like I rarely see pushback that's even concrete and explicit even to distinguish those two possibilities. (Which probably contributes to cascades of over-updating among people who reasonably expect more stuff to be said about nanotech if it's not obviously a silly sci-fi scenario.)

Replies from: Signer

↑ comment by Signer · 2023-04-11T19:54:09.263Z · LW(p) · GW(p)

To be clear, I very much agree with being careful with technologies that have 10% chance of causing existential catastrophe. But I don't see how the part of OP about conservatism connects to it. I think it's more likely that being conservative about impact would generate probabilities much less than 10%. And if anyone says that their probability is 10%, then maybe it's the case of people only having enough resolution for three kinds of probabilities and they think it's less than 50%. Or they are already trying to not be very certain and explicitly widen their confidence intervals (maybe after getting probability from someone more confident), but they actually believe in being conservative more than they believe in their stated probability. So then it becomes about why it is at least 10% - why being conservative in that direction is wrong in general or what are your clear arguments and how are we supposed to weight them against "it's hard to make impact"?

Replies from: RobbBB

↑ comment by Rob Bensinger (RobbBB) · 2023-04-11T23:27:40.845Z · LW(p) · GW(p)

I think it's more likely that being conservative about impact would generate probabilities much less than 10%.

I don't know what you mean by "conservative about impact". The OP distinguishes three things:

conservatism in decision-making and engineering: building in safety buffer, erring on the side of caution.
non-conservatism in decision-making and engineering, that at least doesn't shrug at things like "10% risk of killing all humans".
non-conservatism that does shrug at medium-probability existential risks.

It separately distinguishes these two things:

forecasting "conservatism", in the sense of being rigorous and circumspect in your predictions.
forecasting pseudo-conservatism ('assuming without argument that everything will be normal and familiar indefinitely').

It sounds like you're saying "being rigorous and circumspect in your predictions will tend to yield probabilities much less than 10%"? I don't know why you think that, and I obviously disagree, as do 91+% of the survey respondents in https://www.lesswrong.com/posts/QvwSr5LsxyDeaPK5s/existential-risk-from-ai-survey-results. [LW · GW] See e.g. AGI Ruin [LW · GW] for a discussion of why the risk looks super high to me.

Replies from: Signer

↑ comment by Signer · 2023-04-12T19:51:35.009Z · LW(p) · GW(p)

I don’t know what you mean by “conservative about impact”

I mean predicting modest impact for reasons futurist maybe should predict modest impacts (like "existential catastrophes never happened before" or "novel technologies always plateau" or whole cluster of similar heuristics in opposition to "building safety buffer").

It sounds like you’re saying “being rigorous and circumspect in your predictions will tend to yield probabilities much less than 10%”?

Not necessary "rigorous" - I'm not saying such thinking is definitely correct. I just can't visualize thought process that arrives at 50% before correction, then applies conservative adjustment, because it's all crazy, still gets 10% and proceeds to "then it's fine". So if survey respondents have higher probabilities and no complicated plan, then I don't actually believe that opposite-of-engineering-conservatism mindset applies to them. Yes, maybe you mostly said things about not being decision-maker, but then what's the point of that quote about bridges?

↑ comment by gjm · 2023-04-11T10:49:23.328Z · LW(p) · GW(p)

I'm not sure that a technical explanation is called for; "conservative" just means different things in different contexts. But how about this?

The general meaning of "conservative" that covers both these cases is something like "takes the worst case duly into account when making decisions".
When you are engaging in futurism, your most important real goal is not accurate prediction but sounding impressive, and accordingly the worst case is where you say something that sounds stupid and/or is spectacularly refuted by actual events.
- Therefore, to be "conservative" when engaging in futurism, you make "cautious" predictions in the sense of ones that don't sound stupid and that when they're wrong are highly excusably wrong. Which generally means predicting not too much change.
When you are trying to decide policy for dealing with some possible huge risk, your most important real goal is having things actually turn out well, and accordingly the worst case is where your decision leads to millions of deaths or economic catastrophe or something.
- Therefore, to be "conservative" when making large-scale-risk policy, you make "cautious" predictions in the sense of ones that take into account ways that things could go very badly. This means, on the one hand, that you don't ignore risks just because they sound weird; and, on the other hand, that you don't commit all your resources to addressing Weird Risk One when you might actually need them for Weird Risk Two, or for Making Normality Go Well.
If you want a version of the above with numbers, think in terms of expected utilities.
- If you don't actually anticipate what you say having much impact on anything other than what people think of you, and you think there's a 10% chance that runaway AI destroys everything of value to the human race, then your calculation goes something like this. If you remain calm, downplay the risks of catastrophe, etc., but maybe mention the catastrophes as unlikely possibilities then you pass up a 10% chance of looking prophetic when everything of value is destroyed (but in that case, well, everything of value has been destroyed so it doesn't much matter) and whatever shock value it might have to say "we're all doomed!!!!!111"; in exchange you get to look wise and reasonable and measured. Maybe being successfully prophetic would be +100 units of utility for you, except that shortly afterwards everyone is dead so let's call it +1 instead. Maybe shock value is +5 by drawing public attention to you, but looking crazy is -5 so that balances out. And looking wrong when AI hasn't killed us yet in 5 years' time is -5. That's -4.4 units of expected utility from being alarmist, versus what would have been say -200 with probability 0.1 but is actually only -1 because, again, we are all dead, plus +1 when AI continues to not kill us for 5 years; expectation is +0.8 units. 0.8 is better than -4.4 so don't be alarmist.
- If you do anticipate what you say having an impact, and you think there's a 10% chance of catastrophe if we don't take serious action, and that if you are alarmist it'll raise the chance of a meaningful response from 2% to 2.5%, and that if catastrophe happens / would otherwise happen that meaningful response gives us a 20% chance to survive, and you reckon the survival of the human race is enough more important than whether or not you look like an idiot or a prophet, then you completely ignore all the calculations in the previous paragraph, and essentially the only thing that matters is that being alarmist means an extra 20% chance of survival 0.5% of the time in a situation that happens 10% of the time, so an extra 0.01% chance that the entire human race survives, which is a very big deal. (If you're being carefully conservative you should also be considering the danger of taking resources away from other huge risks, or of crying wolf and actually making a meaningful response less likely if the catastrophe isn't very close, but probably these are second-order considerations.)
- I am not sure that the numbers really add anything much to the informal discussion above.
My account of what "conservative" means for futurists takes a cynical view where futurists are primarily interested in looking impressive. There is another perspective that can be called "conservative", which observes that futurists' predictions are commonly overdramatic and accordingly says that they should be moderated for the sake of accuracy. But I assume that when you arrive at your (say) 10% probability of catastrophe, that's the best estimate you can come up with after taking into account things like whatever tendency you might have to overdramatize or to extrapolate short-term trends too enthusiastically.

Replies from: Signer

↑ comment by Signer · 2023-04-12T19:09:14.993Z · LW(p) · GW(p)

Thank you. Your explanation fits "futurist/decision-maker" distinction, but I just don't feel calling decision-maker behavior "conservative" is appropriate? If you probability is already 10%, than treating it like 10% without adjustments is not worst-case thinking. It's certainly not the (only) kind of conservatism that Eliezer's quote talks about.

There is another perspective that can be called “conservative”, which observes that futurists’ predictions are commonly overdramatic and accordingly says that they should be moderated for the sake of accuracy.

This is perspective I'm mostly interested in. And this is where I would like to see numbers that balance caution about being overdramatic and having safety margin.

↑ comment by JNS (jesper-norregaard-sorensen) · 2023-04-11T07:34:00.455Z · LW(p) · GW(p)

Those are not the same at all.

We have tons of data on how traffic develops over time for bridges, and besides they are engineered to withstand being pack completely with vehicles (bumper to bumper).

And even if we didn't, we still know what vehicles look like and can do worst case calculations that look nothing like sci-fi scenarios (heavy truck bumper to bumper in all lanes).

On the other hand:

What are we building? Ask 10 people and get 10 different answer.

What does the architecture look like? We haven't built it yet, and nobody knows (with certainty).

Name one thing it can do: <Sci-Fi sounding thing goes here> or ask 10 people and get 10 very different answers (number 5 will shock you)

I'll give you my personal take on those three:

We are building something that can "do useful things we don't know how to do"
I don't know, but give current trajectory very likely something involving neural networks (but unlikely to be exclusively).
Design (and possible build) the technology necessary for making a molecular level duplicate of a strawberry, with the ability to identify and correct cellular damage and abnormalities.

Four mindset disagreements behind existential risk disagreements in ML

Contents

12 comments