Metaculus Introduces New Forecast Scores, New Leaderboard & Medals

post by ChristianWilliams · 2023-11-20T20:33:57.119Z · LW · GW · 2 comments

This is a link post for https://www.metaculus.com/questions/20025/new-scores-new-leaderboard-new-medals/

2 comments

Comments sorted by top scores.

comment by Odd anon · 2023-11-21T04:20:59.354Z · LW(p) · GW(p)

It's good that Metaculus is trying to tackle the answer-many/answer-accurately balance, but I don't know if this solution is going to work. Couldn't one just get endless baseline points by predicting the Metaculus average on every question?

Also, there's no way to indicate "confidence" (like, outside-level confidence) in a prediction. If someone knows a lot about a particular topic, and spends a lot of time researching a particular question, but also occasionally predicts their best guess on random other questions outside their area of expertise, then the point-based "incentives" become messy. That's something I like about Manifold that's missing from Metaculus, and I wonder whether it might be possible to work in something like that while keeping Metaculus's general system.

Replies from: ChristianWilliams
comment by ChristianWilliams · 2023-11-24T20:20:25.452Z · LW(p) · GW(p)

Hi @Odd anon [LW · GW], thanks for the feedback and questions. 

1. To your point about copying the Community Prediction: It's true that if you copy the CP at all times you would indeed receive a high Baseline Accuracy score. The CP is generally a great forecast! Now, CP hidden periods do mitigate this issue somewhat. We are monitoring user behavior on this front, and will address it if it becomes an issue. We do have some ideas in our scoring trade-offs doc for further ways to address CP copying, e.g.: 

We could have a leaderboard that only considers the last prediction made before the hidden period ends to calculate Peer scores. This largely achieves the goal above: it rewards judgement, it does not require updates or tracking the news constantly. It does not reward finding stale questions.

Have a look here, and let us know what you think! (We also have some ideas we're tinkering with that are not listed in that doc, like accuracy metrics that don't include forecasts that are on the CP or +/- some delta.)

2. On indicating confidence:  You'll see in the tradeoffs doc that we're also considering the idea of letting users exclude a particular forecast from their peer score (Idea # 3), which could somewhat address this. (Interestingly, indicating confidence was attempted at Good Judgment Project, but ultimately didn't work and was abandoned.) 

We're continuing to develop ideas on the above, and we'd definitely welcome further feedback!