"AI Rapidly Gets Smarter, And Makes Some of Us Dumber," from Sabine Hossenfelder

post by Evan_Gaensbauer · 2025-02-26T22:33:43.688Z · LW · GW · 6 comments

This is a link post for https://youtu.be/ipLA7E-X7Lk?si=hAOojfv5PyY24Gw4

Contents

  Grok 3
    Performance
    Application
  Google
  OpenAI
  Microsoft
None
6 comments

Sabine Hossenfelder is a theoretical physicist and science communicator who provides analysis and commentary on a variety of science and technology topics. I mention that upfront for anyone who isn't already familiar, since I understand a link post to some video full of hot takes on AI from some random YouTuber wouldn't be appreciated.

Even more than usual, so far in 2025 there has been a rapid set of developments on the performance of AI agents and programs, compared to that of humans, so Hossenfelder in this video provides summarizes some of the most significant of recent breakthroughs and findings. 

Here's a summary of the reviews of recent developments in AI covered in the video.

Grok 3

Performance

x.AI released its most recent model, Grok 3, a week ago. Grok 3 outperformed on most benchmarks the current iterations of competing models (DeepSeek, OpenAI's models, etc.)--including mathematics, coding, and scientific reasoning. In the last year, the rate of increase in the performance of Grok models has outpaced those of OpenAI and Anthropic, now more comparable to that of DeepSeek. An advantage for access to more data for learning that Grok 3 now has, over OpenAI and DeepSeek, is more exclusive data from Twitter/X.  Grok 3 is also the first general-purpose AI model to exceed 10^25 flop in training compute. This exceeds a threshold set in the European Union AI Act, so Grok 3 will now need to be subject to more safety tests to continue to be usable in the EU.

Application

 A current disadvantage of Grok 3 in its application is that the now more standard generative AI function, and the more novel 'reasoning' function, can't be used at the same time, e.g., to answer queries. Grok 3 still features the same problems of previous LLM models, including hallucinations, and how easy it is to jailbreak, including providing instructions to build bombs or unconventional weapons. 

Google

Last week Google announced an AI super-agent, specifically what the company is calling an 'AI co-scientist,' specifically designed to help scientists discover or search for new research hypotheses, and topics for potential grant proposals. The AI co-scientist, as a super-agent, supervises six sub-agents assigned roles based on the procedure of tasks for hypothesis generation (e.g., idea generation, criticism, modification, etc.). Google is still trialing this model with scientists, as it hasn't yet been made publicly available. 

OpenAI

 OpenAI CEO Sam Altman announced on X on February 12th that ChatGPT 4.5 will be released "soon," with the release of ChatGPT 5 coming within a few weeks or months after that. More specific dates or timelines weren't provided. 

Microsoft

Microsoft published the results of a recently completed study conducted to track the impact of critical thinking of the frequent, day-to-day use of AI by 319 subjects who work in a variety of knowledge-based/technical fields. Those with high confidence in the capabilities of AI used less critical thinking skills themselves. Those with less confidence in AI relative to their self-confidence, e.g., identifying proper answers to queries, used critical thinking skills more. (This is the part of the video referring to how AI "makes some of us dumber," which is sort of a clickbait-y way of describing it, though it seems like it could nonetheless be a notable finding.) The linked video doesn't cover or summarize how 'confidence' or 'critical thinking' were operationalized in the study by Microsoft. 

6 comments

Comments sorted by top scores.

comment by Raemon · 2025-02-26T22:38:58.871Z · LW(p) · GW(p)

It'd be nice to have the key observations/evidence in the tl;dr here. I'm worried about this but would like to stay grounded in how bad it is exactly.

Replies from: Evan_Gaensbauer
comment by Evan_Gaensbauer · 2025-02-27T01:52:47.885Z · LW(p) · GW(p)

I've now summarized those details as they were presented in the video. 'Staying more grounded in how bad it is' with more precision would require you or whoever learning more about these developments from the respective companies on your own, though the summaries I've now provided can hopefully serve as a starting point for doing so. 

Replies from: Raemon
comment by Raemon · 2025-02-27T02:10:32.547Z · LW(p) · GW(p)

Yep, thank you!

Replies from: Evan_Gaensbauer
comment by Evan_Gaensbauer · 2025-02-27T23:07:31.449Z · LW(p) · GW(p)

I appreciate that, though it seems since yesterday my post may have been downvoted even more. I wouldn't mind as much except nobody has explained why when I bothered putting in the effort. I could think maybe it's because of the clickbait-y title, or on account of the fact that it's a YouTube video meant to convey important info about AI to, like, normies in a mainstream way, and is therefore assumed to be of super low quality.

Yet that'd be in spite of facts that:
1. I clarified this is from theoretical physicist and science communicator who's trying to inform the public in an approachable way, which is something I figure others on LessWrong could appreciate.
2. I have now summarized the details as best as I can so that others on LessWrong don't need to bother with digesting the info through a medium they don't prefer, in spite of the fact that having that available in a multimedia format is more preferred by others, so it could recognized how it's constructive there are multiple options.

I suspect part of it might just be a latent preference on LessWrong for the sort of lengthy blog posts in a style they're accustomed to, which is valid, but a tendency to presume the same sort of info they like being exposed to but delivered in a different way means it must be lower quality. That could be a bias I might question, though it's fair enough if others just disagree. I'm just hoping you can offer insight into whether I should keep bothering with the effort of posts like this because I'm the one who's off here, or others just have superficial reactions. 

Replies from: gwern, lechmazur
comment by gwern · 2025-02-28T03:06:08.549Z · LW(p) · GW(p)

I suspect part of it might just be a latent preference on LessWrong for the sort of lengthy blog posts in a style they're accustomed to, which is valid, but a tendency to presume the same sort of info they like being exposed to but delivered in a different way means it must be lower quality

You wrote a low quality summary of a low quality video of no particular importance by a talking head whose expertise has little to do with AI, about events described more informatively in other secondary sources like Zvi's newsletter, where you failed to follow up on basic details, like failing to name or link the study in the final item, and even pointing out how low-quality your own summary is & how you added nothing (despite praising your own "effort" repeatedly):

The linked video doesn't cover or summarize how 'confidence' or 'critical thinking' were operationalized in the study by Microsoft.

I do not think you really have to ask why your post is not being upvoted to the skies.

I wouldn't mind as much except nobody has explained why when I bothered putting in the effort...I'm just hoping you can offer insight into whether I should keep bothering with the effort of posts like this because I'm the one who's off here.

If you are spending a lot of "effort" on posts like this and you are upset by the reception, I would suggest that this is not your forte, and you are better off finding something that plays to your strengths (or is at least more your comparative advantage).

comment by Lech Mazur (lechmazur) · 2025-02-28T03:14:50.344Z · LW(p) · GW(p)

It's a video by an influencer who has repeatedly shown no particular insight in any field other than her own. For example, her video about the simulation hypothesis was atrocious. I gave this one a chance, and it's just a high-level summary of some recent developments, nothing interesting.