Existential Risk Persuasion Tournament

petermccluskey

Existential Risk Persuasion Tournament

post by PeterMcCluskey · 2023-07-17T18:04:02.794Z · LW · GW · 1 comments

This is a link post for https://bayesianinvestor.com/blog/index.php/2023/07/17/existential-risk-persuasion-tournament/

  Incentives
  Quality of the Questions
  Persuasion
  Persistent Disagreement about AGI
  Concluding Thoughts
None
1 comment

I participated last summer in Tetlock's Existential Risk Persuasion Tournament (755(!) page paper here).

Superforecasters and "subject matter experts" engaged in a hybrid between a prediction market and debates, to predict catastrophic and existential risks this century.

I signed up as a superforecaster. My impression was that I knew as much about AI risk as any of the subject matter experts with whom I interacted (the tournament was divided up so that I was only aware of a small fraction of the 169 participants).

I didn't notice anyone with substantial expertise in machine learning. Experts were apparently chosen based on having some sort of respectable publication related to AI, nuclear, climate, or biological catastrophic risks. Those experts were more competent, in one of those fields, than news media pundits or politicians. I.e. they're likely to be more accurate than random guesses. But maybe not by a large margin.

That expertise leaves much to be desired. I'm unsure whether there was a realistic way for the sponsors to attract better experts. There seems to be not enough money or prestige to attract the very best experts.

Incentives

The success of the superforecasting approach depends heavily on forecasters having decent incentives.

It's tricky to give people incentives to forecast events that will be evaluated in 2100, or evaluated after humans go extinct.

The tournament provided a fairly standard scoring rule for questions that resolve by 2030. That's a fairly safe way to get parts of the tournament to work well.

The other questions were scored by how well the forecast matched the median forecast of other participants (excluding participants that the forecasters interacted with). It's hard to tell whether that incentive helped or hurt the accuracy of the forecasts. It's easy to imagine that it discouraged forecasters from relying on evidence that is hard to articulate, or hard to verify. It provided an incentive for groupthink. But the overall incentives were weak enough that altruistic pursuit of accuracy might have prevailed. Or ideological dogmatism might have prevailed. It will take time before we have even weak evidence as to which was the case.

One incentive that occurred to me toward the end of the tournament was the possibility of getting a verified longterm forecasting track record. Suppose that in 2050 they redo the scores based on evidence available then, and I score in the top 10% of tournament participants. That would likely mean that I'm one of maybe a dozen people in the world with a good track record for forecasting 28 years into the future. I can imagine that being valuable enough for someone to revive me from cryonic suspension when I'd otherwise be forgotten.

There were some sort of rewards for writing comments that influenced other participants. I didn't pay much attention to those.

Quality of the Questions

There were many questions loosely related to AGI timelines, none of them quite satisfying my desire for something closely related to extinction risk that could be scored before it's too late to avoid the risk.

One question was based on a Metaculus forecast for an advanced AI. It seems to represent clear progress toward the kind of AGI that could cause dramatic changes. But I expect important disagreements over how much progress it represents: what scale should we use to decide how close such an AI is to a dangerous AI? does the Turing test use judges who have expertise in finding the AI's weaknesses?

Another question was about when Nick Bostrom will decide that an AGI exists. Or if he doesn't say anything clear, then a panel of experts will guess what Bostrom would say. That's pretty close to a good question to forecast. Can we assume that it implicitly resolves as yes if an AI-related catastrophe kills all the relevant experts? It's not completely clear that such a catastrophe implies the existence of an AGI. What happens if experts decide that AGI is a confused concept, and replace it with several similar names that roughly correspond to "software that radically transforms the world"?

Two questions asked about GDP growth exceeding 15% per year. These do a good job of quantifying whether something has increased human activity as dramatically as did the industrial revolution. I think such growth can be accomplished with somewhat less than what I'd call AGI.

Also, the questions only count the human part of the economy, for a narrow definition of human. That means these questions could leave us in a perverse situation where some Age of Em cities are doubling every month, yet the economy of biological humans stagnates. (That specific scenario seems far-fetched.)

A question where the definition of human mattered more was question 12:

Just as these two groups disagree strongly about how likely humanity is to go extinct by 2100, they also disagree about other indicators of human flourishing. We asked forecasters in the XPT how many future human births there will be and by what year humanity is 50% likely to go extinct. The AI-concerned's median is 100 billion future human births, while the AI skeptics' median is just over 725 billion, with 78% of the AI skeptics above the median forecast of the AI-concerned.

My forecast for this question was a 50% chance of reaching 30 billion births. But if I had instead used my notion of transhuman beings that I consider to be descendents of humans, I'd likely have put something closer to 10^50.

Work has already begun to develop formal metrics for what constitutes a high-probative-value question and we plan to build a database of strong candidate questions.

Persuasion

The initial round of persuasion was likely moderately productive. The persuasion phases dragged on for nearly 3 months. We mostly reached drastically diminishing returns on discussion after a couple of weeks.

We devoted a fair amount of time toward the end writing team wikis, then reading wikis from other teams. This seemed to add a negligible amount of new insight. It seemed rare for other teams to think of anything important that my team had overlooked.

The persuasion seemed to be spread too thinly over 59 questions. In hindsight, I would have preferred to focus on core cruxes, such as when AGI would become dangerous if not aligned, and how suddenly AGI would transition from human levels to superhuman levels. That would have required ignoring the vast majority of those 59 questions during the persuasion stages. But the organizers asked us to focus on at least 15 questions that we were each assigned, and encouraged us to spread our attention to even more of the questions.

The user interface was too complex. We had to repeatedly work our way through many different pages to find all the new discussion.

Data entry was harder than it should have been. It was common for people's forecasts to be not visible to others because they'd done something in the wrong order.

I'm unsure whether I could have done much better at designing the user interface. I would likely have aimed for something simpler, erring in the direction of having unrelated discussions get jumbled together.

The most surprising update for me involved the two questions on non-anthropogenic risks. I initially relied heavily on Ord's The Precipice to say that these risks are pretty low.

Then someone pointed out that alien invasion might be a risk. The organizers confirmed (after a surprisingly long delay in responding) that they would classify this as a non-anthropogenic risk.

There was very little agreement about how to model this risk among the few participants who took this seriously. My main concern is that there's some sort of police force within maybe 10 light years of earth. They're watching us, and will get nervous sometime this century about our ability to colonize other systems. I increased my estimate of non-anthropogenic extinction risk from 0.01% (by 2100) to 2%.

I briefly though of trying to include in my forecasts the risks that we're in a simulation which will be shut down soon. I decided that was too hard for me to handle, and ended up ignoring it.

It seems likely that superforecasters were more persuasive than the experts. I'm guessing superforecasters had more experience at articulating how to convert general knowledge into probabilistic forecasts.

The report indicates that superforecasters were more persistent. The tournament was designed to have a fairly even balance between superforecasters and experts. Yet the fraction of superforecasters answering any single question was generally above 55%, sometimes much higher. E.g. question 45 (Maximum Compute Used in an AI Experiment, optional for most participants) had 33 superforecasters, 2 domain experts, and 5 other experts. Question 4 (AI Extinction Risk, required for all) had 88 superforecasters, 27 domain experts, and 44 other experts. I expect that generated something like mild peer pressure against the domain expert position.

Persistent Disagreement about AGI

Many superforecasters suspected that recent progress in AI was the same kind of hype that led to prior disappointments with AI. I didn't find a way to get them to look closely enough to understand why I disagreed.

My main success in that area was with someone who thought there was a big mystery about how an AI could understand causality. I pointed him to Pearl, which led him to imagine that problem might be solvable. But he likely had other similar cruxes which he didn't get around to describing.

That left us with large disagreements about whether AI will have a big impact this century.

I'm guessing that something like half of that was due to a large disagreement about how powerful AI will be this century.

I find it easy to understand how someone who gets their information about AI from news headlines, or from laymen-oriented academic reports, would see a fair steady pattern of AI being overhyped for 75 years, with it always looking like AI was about 30 years in the future. It's unusual for an industry to quickly switch from decades of overstating progress, to underhyping progress. Yet that's what I'm saying has happened.

I've been spending enough time on LessWrong that I mostly forgot the existence of smart people who thought recent AI advances were mostly hype. I was unprepared to explain why I thought AI was underhyped in 2022.

Today, I can point to evidence that OpenAI is devoting almost as much effort into suppressing abilities (e.g. napalm recipes and privacy violations) as it devotes to making AIs powerful. But in 2022, I had much less evidence that I could reasonably articulate.

What I wanted was a way to quantify what fraction of human cognition has been superseded by the most general-purpose AI at any given time. My impression is that that has risen from under 1% a decade ago, to somewhere around 10% in 2022, with a growth rate that looks faster than linear. I've failed so far at translating those impressions into solid evidence.

Skeptics pointed to memories of other technologies that had less impact (e.g. on GDP growth) than predicted (the internet). That generates a presumption that the people who predict the biggest effects from a new technology tend to be wrong.

Superforecasters' doubts about AI risk relative to the experts isn't primarily driven by an expectation of another "AI winter" where technical progress slows. ... That said, views on the likelihood of artificial general intelligence (AGI) do seem important: in the postmortem survey, conducted in the months following the tournament, we asked several conditional forecasting questions. The median superforecaster's unconditional forecast of AI-driven extinction by 2100 was 0.38%. When we asked them to forecast again, conditional on AGI coming into existence by 2070, that figure rose to 1%.

There was also little or no separation between the groups on the three questions about 2030 performance on AI benchmarks (MATH, Massive Multitask Language Understanding, QuALITY).

This suggests that a good deal of the disagreement is over whether measures of progress represent optimization for narrow tasks, versus symptoms of more general intelligence.

Predictions about risk are highly correlated across topic areas---for example, participants who are more concerned about AI are also more concerned about pandemics and nuclear weapons.

That indicates that there was some selection for participants who are generally pessimistic.

Evaluating AGI is hard. That means that modest biases can have large effects on forecasts.

There were few questions on which the AI-concerned and skeptics disagreed in ways that are likely to be resolved this decade. The most promising one seems to be question 46 (the cost of compute for the largest AI experiment): skeptics say $100 million in 2030, the AI-concerned say $156 million. My 50% forecast for 2024 was $300 million (and $12 billion for 2030). I now see an estimate for Anthropic's claude-next at "as much as $150 million". Is that the claude-2 that was just released? I'd say the main uncertainty about when the skeptics will be proven wrong on this question is the time it takes to get a clear report on spending.

Forecasts on this question converged more after the persuasion stages than with most other AI questions, with domain experts dropping their prediction from $10 billion to $180 million. This likely indicates something important, but I'm unclear what.

Still, question 46 seems far enough from a major crux that I don't expect anyone to update their catastrophic risk beliefs a lot on it.

Concluding Thoughts

There was some convergence on AI risk during the persuasion stages between domain experts and superforecasters. But that was in spite of superforecasters becoming generally more skeptical during that period (i.e. moving away from expert opinion).

Oddly, general x-risk experts showed no clear pattern as to whether they updated toward the mean or toward more concern.

That superforecaster trend seems to be clear evidence for AI skepticism. How much should I update on it? I don't know. I didn't see much evidence that either group knew much about the subject that I didn't already know. So maybe most of the updates during the tournament were instances of the blind leading the blind.

None of this seems to be as strong evidence as the changes, since the tournament, in opinions of leading AI researchers, such as Hinton and Bengio.

1 comments

Comments sorted by top scores.

comment by denkenberger · 2023-07-21T03:24:14.491Z · LW(p) · GW(p)

Nice summary! My subjective experience participating as an expert was that I was able to convince quite a few people to update towards greater risk by giving them some considerations that they had not thought of (and also by clearing up misinterpretations of the questions). But I guess in the scheme of things, it was not that much overall change.

What I wanted was a way to quantify what fraction of human cognition has been superseded by the most general-purpose AI at any given time. My impression is that that has risen from under 1% a decade ago, to somewhere around 10% in 2022, with a growth rate that looks faster than linear. I've failed so far at translating those impressions into solid evidence.

This is similar to my question [EA · GW]of what percent of tasks AI is superhuman at. Then I was thinking if we have some idea what percent of tasks AI will become superhuman at in the next generation (e.g. GPT5), and how many tasks the AI would need to be superhuman at in order to take over the world, we might be able to get some estimate of the risk of the next generation.

Existential Risk Persuasion Tournament

Contents

Incentives

Quality of the Questions

Persuasion

Persistent Disagreement about AGI

Concluding Thoughts

1 comments