Posts

Comments

Comment by Asta7k (elias-schneider) on Reactions to METR task length paper are insane · 2025-04-15T04:09:16.960Z · LW · GW

What are your current AGI timelines?

Comment by Asta7k (elias-schneider) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-04-14T22:02:11.573Z · LW · GW

Are you aware of the recent metr paper which  measured AI Ability to Complete Long Tasks and found out it doubles every 7 months?

Comment by Asta7k (elias-schneider) on METR: Measuring AI Ability to Complete Long Tasks · 2025-04-13T13:06:24.502Z · LW · GW

But then again, it seems like we wouldn’t be able to create accurate plots with any model, since models are inherently different, and each one has slight architectural variations. Even the 2024–2025 plot isn’t entirely accurate, as the models it includes also differ to some extent. Comparing LLMs to LRMs (Large Reasoning Models) is simply a natural step in their evolution, these models will always continue to develop.


 

Comment by Asta7k (elias-schneider) on ≤10-year Timelines Remain Unlikely Despite DeepSeek and o3 · 2025-04-13T09:29:33.377Z · LW · GW

When do you expect agents or AI systems to accelerate AI R&D by a good margin? Like 2x from where it’s now for example.

Comment by Asta7k (elias-schneider) on METR: Measuring AI Ability to Complete Long Tasks · 2025-04-12T20:14:46.240Z · LW · GW

Yes they used a 50% success rate and even then some sub 10min tasks are still troublesome for LLMs as seen in the graph. But I think this will improve aswell if we make the algorithms better