AI Rapidly Gets Smarter, And Makes Some of Us Dumber"

post by Evan_Gaensbauer · 2025-02-26T22:33:43.688Z · LW · GW · 3 comments

This is a link post for https://youtu.be/ipLA7E-X7Lk?si=hAOojfv5PyY24Gw4

Contents

  Grok 3
    Performance
    Application
  Google
  OpenAI
  Microsoft
None
3 comments

Sabine Hossenfelder is a theoretical physicist and science communicator who provides analysis and commentary on a variety of science and technology topics. I mention that upfront for anyone who isn't already familiar, since I understand a link post to some video full of hot takes on AI from some random YouTuber wouldn't be appreciated.

Even more than usual, so far in 2025 there has been a rapid set of developments on the performance of AI agents and programs, compared to that of humans, so Hossenfelder in this video provides summarizes some of the most significant of recent breakthroughs and findings. 

Here's a summary of the reviews of recent developments in AI covered in the video.

Grok 3

Performance

x.AI released its most recent model, Grok 3, a week ago. Grok 3 outperformed on most benchmarks the current iterations of competing models (DeepSeek, OpenAI's models, etc.)--including mathematics, coding, and scientific reasoning. In the last year, the rate of increase in the performance of Grok models has outpaced those of OpenAI and Anthropic, now more comparable to that of DeepSeek. An advantage for access to more data for learning that Grok 3 now has, over OpenAI and DeepSeek, is more exclusive data from Twitter/X.  Grok 3 is also the first general-purpose AI model to exceed 10^25 flop in training compute. This exceeds a threshold set in the European Union AI Act, so Grok 3 will now need to be subject to more safety tests to continue to be usable in the EU.

Application

 A current disadvantage of Grok 3 in its application is that the now more standard generative AI function, and the more novel 'reasoning' function, can't be used at the same time, e.g., to answer queries. Grok 3 still features the same problems of previous LLM models, including hallucinations, and how easy it is to jailbreak, including providing instructions to build bombs or unconventional weapons. 

Google

Last week Google announced an AI super-agent, specifically what the company is calling an 'AI co-scientist,' specifically designed to help scientists discover or search for new research hypotheses, and topics for potential grant proposals. The AI co-scientist, as a super-agent, supervises six sub-agents assigned roles based on the procedure of tasks for hypothesis generation (e.g., idea generation, criticism, modification, etc.). Google is still trialing this model with scientists, as it hasn't yet been made publicly available. 

OpenAI

 OpenAI CEO Sam Altman announced on X on February 12th that ChatGPT 4.5 will be released "soon," with the release of ChatGPT 5 coming within a few weeks or months after that. More specific dates or timelines weren't provided. 

Microsoft

Microsoft published the results of a recently completed study conducted to track the impact of critical thinking of the frequent, day-to-day use of AI by 319 subjects who work in a variety of knowledge-based/technical fields. Those with high confidence in the capabilities of AI used less critical thinking skills themselves. Those with less confidence in AI relative to their self-confidence, e.g., identifying proper answers to queries, used critical thinking skills more. (This is the part of the video referring to how AI "makes some of us dumber," which is sort of a clickbait-y way of describing it, though it seems like it could nonetheless be a notable finding.) The linked video doesn't cover or summarize how 'confidence' or 'critical thinking' were operationalized in the study by Microsoft. 

3 comments

Comments sorted by top scores.

comment by Raemon · 2025-02-26T22:38:58.871Z · LW(p) · GW(p)

It'd be nice to have the key observations/evidence in the tl;dr here. I'm worried about this but would like to stay grounded in how bad it is exactly.

Replies from: Evan_Gaensbauer
comment by Evan_Gaensbauer · 2025-02-27T01:52:47.885Z · LW(p) · GW(p)

I've now summarized those details as they were presented in the video. 'Staying more grounded in how bad it is' with more precision would require you or whoever learning more about these developments from the respective companies on your own, though the summaries I've now provided can hopefully serve as a starting point for doing so. 

Replies from: Raemon
comment by Raemon · 2025-02-27T02:10:32.547Z · LW(p) · GW(p)

Yep, thank you!