Defender7762's Shortform
post by Defender7762 · 2025-04-17T12:09:02.984Z · LW · GW · 2 commentsContents
2 comments
2 comments
Comments sorted by top scores.
comment by Defender7762 · 2025-04-17T12:09:02.983Z · LW(p) · GW(p)
Anti-fitting generalized reasoning test for o3h/o4 mh https://llm-benchmark.github.io/ https://www.lesswrong.com/posts/CEHsJzBCmuhEDdNxg/debunk-the-myth-testing-the-generalized-reasoning-ability-of [LW · GW]
Disappointing, I thought it would be much better than GROK, it seems that this version cannot be the one shown by ARC AGI in mid-December.
Replies from: Defender7762↑ comment by Defender7762 · 2025-04-17T12:10:21.787Z · LW(p) · GW(p)
click the to expand all questions and answers for all models