Lech Mazur's Shortform
post by Lech Mazur (lechmazur) · 2023-12-12T01:23:28.267Z · LW · GW · 4 commentsContents
4 comments
4 comments
Comments sorted by top scores.
comment by Lech Mazur (lechmazur) · 2024-08-28T14:56:54.357Z · LW(p) · GW(p)
I've created an ensemble model that employs techniques like multi-step reasoning to establish what should be considered the real current state-of-the-art in LLMs. It substantially exceeds the highest-scoring individual models and subjectively feels smarter:
MMLU-Pro 0-shot CoT: 78.2 vs 75.6 for GPT-4o
NYT Connections, 436 questions: 34.9 vs 26.5 for GPT-4o
GPQA 0-shot CoT: 56.0 vs 52.5 for Claude 3.5 Sonnet.
I might make it publicly accessible if there's enough interest. Of course, there are expected tradeoffs: it's slower and more expensive to run.
comment by Lech Mazur (lechmazur) · 2023-12-12T01:23:28.638Z · LW(p) · GW(p)
I'm a fan of prediction markets, but they're limited to pre-set bets and not ideal for long-shot, longer-term predictions, mainly because betting against such a prediction means a loss compared to risk-free bonds if money is tied up. Therefore, I'd like to fund a 2024 Long-Shot Prediction Contest offering up to three $500 prizes. However, I need volunteers to act as judges and help getting this publicized.
-
Entrants will submit one prediction for 2024 on any topic or event
-
Volunteer judges and I will vote on the likelihood of each prediction and how "interesting" it is, forming a ranked list
-
In January 2025, judges will determine which predictions came true, and winners will get their prizes
To start with a $500 prize, I need at least two people to volunteer as judges and a minimum of 10 predictions (judges cannot enter). If this receives, let's say, 50+ predictions, there will be two prizes. For 200+ predictions, three prizes.
Interested in judging or have any suggestions? Let me know.
Replies from: papetoast↑ comment by papetoast · 2023-12-12T04:53:37.329Z · LW(p) · GW(p)
suggestions:
- Duplicate this to the open thread to increase visibility
- I don't know your exact implementation for forming the ranked list, but I worry that if you (for example) simply sort from low likelihood to high likelihood, it encourages people to only submit very low probability predictions.
↑ comment by Lech Mazur (lechmazur) · 2023-12-12T05:34:00.278Z · LW(p) · GW(p)
-
Will do.
-
Entering an extremely unlikely prediction as a strategy to maximize EV only makes sense if there's a huge number of entrants, which seems improbable unless this contest goes viral. The inclusion of an "interesting" factor in the ranking criteria should deter spamming with low-quality entries.