Lech Mazur's Shortform

lechmazur

Lech Mazur's Shortform

post by Lech Mazur (lechmazur) · 2023-12-12T01:23:28.267Z · LW · GW · 4 comments

5 comments

4 comments

Comments sorted by top scores.

comment by Lech Mazur (lechmazur) · 2024-08-28T14:56:54.357Z · LW(p) · GW(p)

I've created an ensemble model that employs techniques like multi-step reasoning to establish what should be considered the real current state-of-the-art in LLMs. It substantially exceeds the highest-scoring individual models and subjectively feels smarter:

MMLU-Pro 0-shot CoT: 78.2 vs 75.6 for GPT-4o

NYT Connections, 436 questions: 34.9 vs 26.5 for GPT-4o

GPQA 0-shot CoT: 56.0 vs 52.5 for Claude 3.5 Sonnet.

I might make it publicly accessible if there's enough interest. Of course, there are expected tradeoffs: it's slower and more expensive to run.

comment by Lech Mazur (lechmazur) · 2023-12-12T01:23:28.638Z · LW(p) · GW(p)

I'm a fan of prediction markets, but they're limited to pre-set bets and not ideal for long-shot, longer-term predictions, mainly because betting against such a prediction means a loss compared to risk-free bonds if money is tied up. Therefore, I'd like to fund a 2024 Long-Shot Prediction Contest offering up to three $500 prizes. However, I need volunteers to act as judges and help getting this publicized.

Entrants will submit one prediction for 2024 on any topic or event
Volunteer judges and I will vote on the likelihood of each prediction and how "interesting" it is, forming a ranked list
In January 2025, judges will determine which predictions came true, and winners will get their prizes

To start with a $500 prize, I need at least two people to volunteer as judges and a minimum of 10 predictions (judges cannot enter). If this receives, let's say, 50+ predictions, there will be two prizes. For 200+ predictions, three prizes.

Interested in judging or have any suggestions? Let me know.

Replies from: papetoast

↑ comment by papetoast · 2023-12-12T04:53:37.329Z · LW(p) · GW(p)

suggestions:

Duplicate this to the open thread to increase visibility
I don't know your exact implementation for forming the ranked list, but I worry that if you (for example) simply sort from low likelihood to high likelihood, it encourages people to only submit very low probability predictions.

Replies from: lechmazur

↑ comment by Lech Mazur (lechmazur) · 2023-12-12T05:34:00.278Z · LW(p) · GW(p)

Will do.
Entering an extremely unlikely prediction as a strategy to maximize EV only makes sense if there's a huge number of entrants, which seems improbable unless this contest goes viral. The inclusion of an "interesting" factor in the ranking criteria should deter spamming with low-quality entries.

comment by Lech Mazur (lechmazur) · 2025-01-26T23:07:06.942Z · LW(p) · GW(p)

Lech Mazur's Shortform

Contents

4 comments