LLMs can teach themselves to better predict the future
post by Ben Turtel (ben-turtel) · 2025-02-13T01:01:12.175Z · LW · GW · 1 commentsThis is a link post for https://arxiv.org/abs/2502.05253
Contents
1 comment
LLMs can teach themselves to better predict the future - no human examples or curation required.
In this paper, we explore if AI can improve its forecasts via self-play and real-world outcomes:
- Dataset: 12,100 questions and outcomes from Polymarket (politics, sports, crypto, science, etc)
- Base model generates multiple distinct reasoning traces and predictions per question
- Rank predictions by how close they were to the actual outcome
- Fine-tune with DPO on the ranked traces & predictions
Result: +7-10% accuracy over control, bringing two small (14B) models on par with GPT-4o (over 10x larger).
1 comments
Comments sorted by top scores.
comment by JosephSBoyle (josephsboyle) · 2025-02-13T02:00:41.198Z · LW(p) · GW(p)
interesting result, I’d be curious to see some qualitative analysis of the reasoning CoT of their fine-tuned models vs base ones.
It seems to me that these approaches are not yet data saturated and that better performance could be reached with a better fine tuning dataset.
Naturally the space of things you could forecast is very large, but plausibly one might continuously generate new forecasting questions using an LM and then use the self-play DPO used in this paper to improve your forecaster LM. I guess I doubt that Polymarket has sufficient data to surpass human performance and expertise (or at least waiting for enough questions to be resolved seems likely to be a slower than necessary data generation process!)