Defender7762's Shortform

post by Defender7762 · 2025-04-17T12:09:02.984Z · LW · GW · 2 comments

Contents

2 comments

2 comments

Comments sorted by top scores.

comment by Defender7762 · 2025-04-17T12:09:02.983Z · LW(p) · GW(p)

Anti-fitting generalized reasoning test for o3h/o4 mh https://llm-benchmark.github.io/ https://www.lesswrong.com/posts/CEHsJzBCmuhEDdNxg/debunk-the-myth-testing-the-generalized-reasoning-ability-of [LW · GW]

Disappointing, I thought it would be much better than GROK, it seems that this version cannot be the one shown by ARC AGI in mid-December.

Replies from: Defender7762
comment by Defender7762 · 2025-04-17T12:10:21.787Z · LW(p) · GW(p)

click the to expand all questions and answers for all models