Introducing Metaforecast: A Forecast Aggregator and Search Tool

post by NunoSempere (Radamantis), ozziegooen · 2021-03-07T19:03:35.920Z · LW · GW · 6 comments

Contents

  Introduction
  Metaforecast
  Select Search Screenshots
    
    
    
  Data Sources
  Future work
  Challenges
  Source code
None
6 comments

Introduction

The last few years have seen a proliferation of forecasting platforms. These platforms differ in many ways, and provide different experiences, filters, and incentives for forecasters. Some platforms like Metaculus and Hypermind use volunteers with prizes, others, like PredictIt and Smarkets are formal betting markets. 

Forecasting is a public good, providing information to the public. While the diversity among platforms has been great for experimentation, it also fragments information, making the outputs of forecasting far less useful. For instance, different platforms ask similar questions using different wordings. The questions may or may not be organized, and the outputs may be distributions, odds, or probabilities.

Fortunately, most of these platforms either have APIs or can be scraped. We’ve experimented with pulling their data to put together a listing of most of the active forecasting questions and most of their current estimates in a coherent and more easily accessible platform.

Metaforecast

Metaforecast is a free & simple app that shows predictions and summaries from 10+ forecasting platforms. It shows simple summaries of the key information; just the immediate forecasts, no history. Data is fetched daily. There’s a simple string search, and you can open the advanced options for some configurability. Currently between all of the indexed platforms we track ~2100 active forecasting questions, ~1200 (~55%) of which are on Metaculus. There are also 17,000 public models from Guesstimate. 

One obvious issue that arose was the challenge of comparing questions among platforms. Some questions have results that seem more reputable than others. Obviously a Metaculus question with 2000 predictions seems more robust than one with 3 predictions, but less obvious is how a Metaculus question with 3 predictions compares to one from Good Judgement Superforecasters where the number of forecasters is not clear, or to estimates from a Smarkets question with £1,000 traded. We believe that this is an area that deserves substantial research and design experimentation. In the meantime we use a star rating system. We created a function that estimates reputability as “stars” on a 1-5 system using the forecasting platform, forecast count, and liquidity for prediction markets. The estimation came from volunteers acquainted with the various forecasting platforms. We’re very curious for feedback here, both on what the function should be, and how to best explain and show the results. 

Metaforecast is being treated as an experimental endeavor of QURI. We spent a few weeks on it so far, after developing technologies and skill sets that made it fairly straightforward.  We're currently expecting to support it for at least a year and provide minor updates. We’re curious to see what interest is like and respond accordingly. 

Metaforecast is being spearheaded and led by Nuño Sempere. 

Select Search Screenshots

Biden

Germany

COVID

Data Sources

PlatformUrlInformation used in MetaforecastRobustness
Metaculushttps://www.metaculus.comActive questions only. The current aggregate is shown for binary questions, but not for continuous questions.2 stars if it has fewer than 100 forecasts, 3 stars when between 101 and 300, 4 stars if over 300
Foretell (CSET)https://www.cset-foretell.com/All active questions1 star if a question has fewer than 100 forecasts, 2 stars if it has more
Hypermindhttps://www.hypermind.comQuestions on various dashboards3 stars
Good Judgementhttps://goodjudgment.io/We use various superforecaster dashboards. You can see them here and here 4 stars
Good Judgement Openhttps://www.gjopen.com/All active questions2 stars if a question has fewer than 100 forecasts, 3 stars if it has more
Smarketshttps://smarkets.com/Only take the political markets, not sports or others. 2 stars
PredictIthttps://www.predictit.org/All active questions2 stars
PolyMarkethttps://polymarket.com/All active questions3 stars if they have more than $1000 of liquidity, 2 stars otherwise
Elicithttps://elicit.org/All active questions1 star
Foretoldhttps://www.foretold.io/Selected communities2 stars
Omenhttps://www.fsu.gr/en/fss/omenAll active questions1 star
Guesstimatehttps://www.getguesstimate.com/All public models. These aren’t exactly forecasts, but some of them are, and many are useful for forecasts.1 star
GiveWellhttps://www.givewell.org/Publicly listed forecasts2 stars
Open Philanthropy Projecthttps://www.openphilanthropy.org/Publicly listed forecasts2 stars

Since the initial version, the star rating has been improved by aggregating the judgment of multiple people, which mostly just increased Polymarket’s rating. However, the fact that we are aggregating different perspectives makes the star rating more difficult to summarize, and the numbers shown on the table are just those of Nuño’s perspective. 

Future work

Challenges

Doing this project exposed just how many platforms and questions there are. At this point there are thousands of questions and it's almost impossible to keep track of all of them. Almost all of the question names are rather ad-hoc. Metaforecast helps, but is limited. 

Most public forecasting platforms seem optimized for questions and user interfaces for forecasters and narrow interest groups, not public onlookers. There are a few public dashboards, but these are rather few compared to all of the existing forecasting questions, and these often aren’t particularly well done. It seems like there's a lot of design and figuring out to both reveal and organize information for intelligent consumers, and also doing so for more public groups.

Overall, this is early work for what seems like a fairly obvious and important area. We encourage others to either contribute to Metaforecast, or make other websites using this as inspiration.

Source code

The source code for the webpage is here, and the source code for the library used to fetch the probabilities is here. Pull requests or new issues with complaints or feature suggestions are welcome. 


Thanks to David Manheim, Jaime Sevilla, @meerpirat, Pablo Melchor, and Tamay Besiroglu for various comments, to Luke Muehlhauser for feature suggestions, and to Metaculus for graciously allowing us to use their forecasts.

6 comments

Comments sorted by top scores.

comment by D0TheMath · 2021-03-08T14:07:05.850Z · LW(p) · GW(p)

This is a very cool platform and I look forward to using it to supplement my own forecasts and decision making.

Do the active questions shown from Metaculus include just open questions, or open & closed (but not resolved) questions as well?

Replies from: Radamantis
comment by NunoSempere (Radamantis) · 2021-03-08T16:04:44.266Z · LW(p) · GW(p)

Thanks! It should just include open questions, not those which have closed and are yet to resolve. But this is easy to change.

Replies from: D0TheMath
comment by D0TheMath · 2021-03-08T18:14:35.568Z · LW(p) · GW(p)

I think it'd probably be better to include closed forecasts as well (maybe decreasing the star rating to indicate it may not be up to date on information), just because a lot of the time Metaculus will close a question long before it actually resolves. For instance, here https://www.metaculus.com/questions/6320/usas-gdpc-growth-in-2020-2029/ the question is closed, but will resolve at the end of 2029.

Replies from: Radamantis
comment by NunoSempere (Radamantis) · 2021-03-09T10:04:59.856Z · LW(p) · GW(p)

This is changed now. As a bonus, it also resolves a previous bug

Replies from: ChristianKl
comment by ChristianKl · 2021-03-09T19:39:46.608Z · LW(p) · GW(p)

Closed questions get in metaculus a metaculus prediction which is weighted to give users which good forcasting record higher weight which differs from the unweighted community prediction which gets shown when questions aren't yet closed. 

Replies from: Radamantis
comment by NunoSempere (Radamantis) · 2021-03-09T19:50:51.626Z · LW(p) · GW(p)

Mmh. OTOH, they loose the ability to incorporate new information. Do you have a sense of which factor dominates?