Creating a database for base rates

post by nikos (followtheargument) · 2022-12-12T10:09:15.778Z · LW · GW · 1 comments

This is a link post for https://forum.effectivealtruism.org/posts/H7xWzvwvkyywDAEkL/creating-a-database-for-base-rates

Contents

  TLDR
  Project Summary
  Introduction
  Project outline
    Goal
    What we'll do
    Categories that we want to look into
    Specific research questions
  How you can help
    Suggesting new categories
    Providing feedback
None
1 comment

TLDR

We are creating a database to collect base rates for various categories of events. You can find the database here and can suggest new base rate categories for us to look into here.

Project Summary

The base rate database project collects base rates for different categories of events and makes them available to researchers, forecasters and philanthropic organisations. Its main goals are to develop better intuitions about the potential and limitations of reference class forecasting and to provide useful information to the public. The data will enable research that enhances our understanding of the kinds of circumstances in which reference forecasting is a promising approach, what kinds of methods of reference forecasting work best, how to construct reasonable reference classes, and what potential caveats and pitfalls are. In addition to the raw data we will collect qualitative feedback on individual reference classes and on the overall process of building a base rate database, adding context to the data and developing comprehensive knowledge to build upon in the future. We aim to select categories of base rates in a way that makes the information we collect useful to decision makers and philanthropic organisations. 

 

Introduction

If one wants to predict whether some event will happen in the future, it is often helpful to look at the past. One can ask: "Ignoring all the specifics of the current event I'm trying to predict, what would I predict just by looking at the base rate of similar events happening in the past?". This is called reference class forecasting and helps forecasters to obtain an 'outside view' on the forecasting question at hand. This outside view, of course, is usually complemented by the 'inside view': what are the specifics of the current event at hand that distinguish it from other events? 

Reference class forecasting is widely used among forecasters. To this date, however, there has been little systematic research done into how effective base rates are for forecasting future events, how they can best be used and what limitations apply. We aim to facilitate this research. 

 

Project outline

Goal

The main goal of this project is to develop a better understanding of the merits and limitations of reference class forecasting. 

A secondary goal is to collect information that may be useful for forecasters and EA stakeholders in the future. 

 

What we'll do

We want to achieve our goals by

 

Categories that we want to look into

We intend to look into categories as diverse as 

You can find a list of all the categories on our radar here. You can suggest new categories here

 

Specific research questions

The database is meant to be a resource for anyone who is interested in reference class forecasting. Please do feel free to use it for your own research as well as to reach out to us. 

So far, we have thought of the following quantitative analyses we think may be promising: 

We also aim to obtain a better qualitative understanding of reference class forecasting by asking that forecasters who collect the base rates to reflect on the process as well as the individual base rate categories, for example

How you can help

Suggesting new categories

You can suggest new categories to include in the database here. Suggested categories should ideally be at least one of the following: 

Providing feedback

If you have thoughts on anything presented here, please let us know in the comments or get in touch directly.  

1 comments

Comments sorted by top scores.

comment by mruwnik · 2022-12-12T19:45:47.236Z · LW(p) · GW(p)

This seems like a worthwhile resource to have, but I worry that it'll result in something more like a database of impactful events, rather than a database of base rates. E.g. in the case of lab leaks, you could use it to estimate , but not  or the base . Unfortunately I don't have a good idea of how to get round this, as I'm guessing a whole lot of problems go unreported.

That being said, even if this can't be easily used for base rate estimations, a database for large risk sources has a lot of potential worth and so is a very good idea.