LLMs can teach themselves to better predict the future

post by Ben Turtel (ben-turtel) · 2025-02-13T01:01:12.175Z · LW · GW · 1 comments

This is a link post for https://arxiv.org/abs/2502.05253

Contents

1 comment

LLMs can teach themselves to better predict the future - no human examples or curation required.  

In this paper, we explore if AI can improve its forecasts via self-play and real-world outcomes:

- Dataset: 12,100 questions and outcomes from Polymarket (politics, sports, crypto, science, etc)
- Base model generates multiple distinct reasoning traces and predictions per question
- Rank predictions by how close they were to the actual outcome
- Fine-tune with DPO on the ranked traces & predictions

Result: +7-10% accuracy over control, bringing two small (14B) models on par with GPT-4o (over 10x larger).

1 comments

Comments sorted by top scores.

comment by JosephSBoyle (josephsboyle) · 2025-02-13T02:00:41.198Z · LW(p) · GW(p)

interesting result, I’d be curious to see some qualitative analysis of the reasoning CoT of their fine-tuned models vs base ones.

It seems to me that these approaches are not yet data saturated and that better performance could be reached with a better fine tuning dataset.

Naturally the space of things you could  forecast is very large, but plausibly one might continuously generate new forecasting questions using an LM and then use the self-play DPO used in this paper to improve your forecaster LM.  I guess I doubt that Polymarket has sufficient data to surpass human performance and expertise (or at least waiting for enough questions to be resolved seems likely to be a slower than necessary data generation process!)