"AI Rapidly Gets Smarter, And Makes Some of Us Dumber," from Sabine Hossenfelder

evan_gaensbauer

"AI Rapidly Gets Smarter, And Makes Some of Us Dumber," from Sabine Hossenfelder

post by Evan_Gaensbauer · 2025-02-26T22:33:43.688Z · LW · GW · 9 comments

This is a link post for https://youtu.be/ipLA7E-X7Lk?si=hAOojfv5PyY24Gw4

  Grok 3
    Performance
    Application
  Google
  OpenAI
  Microsoft
None
9 comments

Sabine Hossenfelder is a theoretical physicist and science communicator who provides analysis and commentary on a variety of science and technology topics. I mention that upfront for anyone who isn't already familiar, since I understand a link post to some video full of hot takes on AI from some random YouTuber wouldn't be appreciated.

Even more than usual, so far in 2025 there has been a rapid set of developments on the performance of AI agents and programs, compared to that of humans, so Hossenfelder in this video provides summarizes some of the most significant of recent breakthroughs and findings.

Here's a summary of the reviews of recent developments in AI covered in the video.

Grok 3

Performance

x.AI released its most recent model, Grok 3, a week ago. Grok 3 outperformed on most benchmarks the current iterations of competing models (DeepSeek, OpenAI's models, etc.)--including mathematics, coding, and scientific reasoning. In the last year, the rate of increase in the performance of Grok models has outpaced those of OpenAI and Anthropic, now more comparable to that of DeepSeek. An advantage for access to more data for learning that Grok 3 now has, over OpenAI and DeepSeek, is more exclusive data from Twitter/X. Grok 3 is also the first general-purpose AI model to exceed 10^25 flop in training compute. This exceeds a threshold set in the European Union AI Act, so Grok 3 will now need to be subject to more safety tests to continue to be usable in the EU.

Application

A current disadvantage of Grok 3 in its application is that the now more standard generative AI function, and the more novel 'reasoning' function, can't be used at the same time, e.g., to answer queries. Grok 3 still features the same problems of previous LLM models, including hallucinations, and how easy it is to jailbreak, including providing instructions to build bombs or unconventional weapons.

Google

Last week Google announced an AI super-agent, specifically what the company is calling an 'AI co-scientist,' specifically designed to help scientists discover or search for new research hypotheses, and topics for potential grant proposals. The AI co-scientist, as a super-agent, supervises six sub-agents assigned roles based on the procedure of tasks for hypothesis generation (e.g., idea generation, criticism, modification, etc.). Google is still trialing this model with scientists, as it hasn't yet been made publicly available.

OpenAI

OpenAI CEO Sam Altman announced on X on February 12th that ChatGPT 4.5 will be released "soon," with the release of ChatGPT 5 coming within a few weeks or months after that. More specific dates or timelines weren't provided.

Microsoft

Microsoft published the results of a recently completed study conducted to track the impact of critical thinking of the frequent, day-to-day use of AI by 319 subjects who work in a variety of knowledge-based/technical fields. Those with high confidence in the capabilities of AI used less critical thinking skills themselves. Those with less confidence in AI relative to their self-confidence, e.g., identifying proper answers to queries, used critical thinking skills more. (This is the part of the video referring to how AI "makes some of us dumber," which is sort of a clickbait-y way of describing it, though it seems like it could nonetheless be a notable finding.) The linked video doesn't cover or summarize how 'confidence' or 'critical thinking' were operationalized in the study by Microsoft.

9 comments

Comments sorted by top scores.

comment by Raemon · 2025-02-26T22:38:58.871Z · LW(p) · GW(p)

It'd be nice to have the key observations/evidence in the tl;dr here. I'm worried about this but would like to stay grounded in how bad it is exactly.

Replies from: Evan_Gaensbauer

↑ comment by Evan_Gaensbauer · 2025-02-27T01:52:47.885Z · LW(p) · GW(p)

I've now summarized those details as they were presented in the video. 'Staying more grounded in how bad it is' with more precision would require you or whoever learning more about these developments from the respective companies on your own, though the summaries I've now provided can hopefully serve as a starting point for doing so.

Replies from: Raemon

↑ comment by Raemon · 2025-02-27T02:10:32.547Z · LW(p) · GW(p)

Yep, thank you!

Replies from: Evan_Gaensbauer

↑ comment by Evan_Gaensbauer · 2025-02-27T23:07:31.449Z · LW(p) · GW(p)

I appreciate that, though it seems since yesterday my post may have been downvoted even more. I wouldn't mind as much except nobody has explained why when I bothered putting in the effort. I could think maybe it's because of the clickbait-y title, or on account of the fact that it's a YouTube video meant to convey important info about AI to, like, normies in a mainstream way, and is therefore assumed to be of super low quality.

Yet that'd be in spite of facts that:
1. I clarified this is from theoretical physicist and science communicator who's trying to inform the public in an approachable way, which is something I figure others on LessWrong could appreciate.
2. I have now summarized the details as best as I can so that others on LessWrong don't need to bother with digesting the info through a medium they don't prefer, in spite of the fact that having that available in a multimedia format is more preferred by others, so it could recognized how it's constructive there are multiple options.

I suspect part of it might just be a latent preference on LessWrong for the sort of lengthy blog posts in a style they're accustomed to, which is valid, but a tendency to presume the same sort of info they like being exposed to but delivered in a different way means it must be lower quality. That could be a bias I might question, though it's fair enough if others just disagree. I'm just hoping you can offer insight into whether I should keep bothering with the effort of posts like this because I'm the one who's off here, or others just have superficial reactions.

Replies from: gwern, lechmazur

↑ comment by gwern · 2025-02-28T03:06:08.549Z · LW(p) · GW(p)

I suspect part of it might just be a latent preference on LessWrong for the sort of lengthy blog posts in a style they're accustomed to, which is valid, but a tendency to presume the same sort of info they like being exposed to but delivered in a different way means it must be lower quality

You wrote a low quality summary of a low quality secondary-source video of no particular importance by a talking head whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus), about events described more informatively in other secondary sources like Zvi's newsletter, where you added no original information or thought, and failed to follow up on basic details, like failing to name or link the study in the final item, and even pointing out how low-quality your own summary is & how you added nothing (despite praising your own "effort" repeatedly):

The linked video doesn't cover or summarize how 'confidence' or 'critical thinking' were operationalized in the study by Microsoft.

I do not think you really have to ask why your post is not being upvoted to the skies.

I wouldn't mind as much except nobody has explained why when I bothered putting in the effort...I'm just hoping you can offer insight into whether I should keep bothering with the effort of posts like this because I'm the one who's off here.

If you are spending a lot of "effort" on posts like this and you are upset by the reception, I would suggest that this sort of tertiary source writing is not your forte, and you are better off finding something that plays to your strengths (or is at least more your comparative advantage).

To be honest, I was surprised to read your comment complaining about your human effort not being appreciated, because I had assumed when reading it originally, that a post this derivative had to have been written by a low-end LLM whose use you had chosen to not disclose.

Replies from: Evan_Gaensbauer

↑ comment by Evan_Gaensbauer · 2025-02-28T21:33:26.706Z · LW(p) · GW(p)

You wrote a low quality summary of a low quality secondary-source video of no particular importance by a talking head whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus)

You're right that I was probably exaggerating when I said it was the best effort I could provide. It was more like what I expected would be considered a basic, accurate summary I could generate in a brief period of time.

low quality secondary-source video of no particular importance by a talking head whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus)

That the source itself is not considered to be of particularly significant quality or importance makes sense given that my post is only lightly downvoted relative to the number of votes it has received. While of course her expertise isn't that relevant to AI, the fact that she has expertise in a sufficiently technical field seemed to me relevant to clarify to indicate she shouldn't be strongly suspected to present the information based on a wild misunderstanding. I wasn't aware that Gary Marcus has previously criticized the quality of her coverage of these sort of issues, or whatnot, so I'll keep in mind for the future she shouldn't be regarded as a reliable source.

That she, or anyone else, might apparently be a "talking head whose expertise has little to do with AI" doesn't by itself seem like it'd be a strong argument against taking the person seriously as a source, relative to the standards on LW, given that the same could be said of many whose contributions are frequently well-received on LW. Similar criticisms have frequently been leveled at Eliezer Yudkowsky, or could be at Scott Alexander. I'd be unmoved by many such criticisms for the same reason as anyone else, though that criticism of others who've for longer been well-received on LW could be warranted. There are also many with expertise in AI who among rationalists are often dismissed as talking heads.

where you added no original information or thought, and failed to follow up on basic details, like failing to name or link the study in the final item

I wasn't aware that additional info or analysis/commentary beyond the contents of the source was expected. Anyone could follow up on basic details as easily as I could if they were curious to learn even more, and I'm technically not obliged to do so myself, though I'm also not entitled to be well received even if I don't bother citing other sources, so that seems fair enough others would be nonplussed by that.

I do not think you really have to ask why your post is not being upvoted to the skies.

I agree, which is why I didn't. I asked why it was being downvoted.

(despite praising your own "effort" repeatedly)

I didn't praise my effort but mentioned that I put in any. I didn't mean to use the word effort in any exaggerated sense. There's no need to diminish it as though it's not technically true. If someone takes two minutes to brush their teeth, I'd consider to say they put in two minutes of effort is as appropriate a way to describe that as any other.

If you are spending a lot of "effort" on posts like this and you are upset by the reception

I wasn't as upset by the reaction but more frustrated that nobody before had bothered before explaining why it was mostly receiving downvotes. I was aware the post might be banal, though it also seemed innocuous enough, that I didn't expect it to be mostly downvoted either, as though it somehow particularly subtracts from the quality of content on LW. I now understand better the reasons why, so thanks for explaining.

I mentioned above:

I'm just hoping you can offer insight into whether I should keep bothering with the effort of posts like this because I'm the one who's off here, or others just have superficial reactions.

You've offered enough that I understand that the answer to my own question is that it was mostly the former--that I was the one who was off--so I'm satisfied by this response.

↑ comment by Lech Mazur (lechmazur) · 2025-02-28T03:14:50.344Z · LW(p) · GW(p)

It's a video by an influencer who has repeatedly shown no particular insight in any field other than her own. For example, her video about the simulation hypothesis was atrocious. I gave this one a chance, and it's just a high-level summary of some recent developments, nothing interesting.

Replies from: Evan_Gaensbauer

↑ comment by Evan_Gaensbauer · 2025-02-28T21:35:55.100Z · LW(p) · GW(p)

I peruse her content occasionally but I wasn't aware that she is widely recognized as the quality of her analysis/commentary varying so wildly, and often particularly lacklustre outside of her own field. Gwern mentioned that Gary Marcus has apparently said as much in the past when it comes to her coverage of AI topics. I'll refrain from citing her as a source in the future.

Replies from: gwern

↑ comment by gwern · 2025-02-28T22:36:42.788Z · LW(p) · GW(p)

I didn't mean Marcus had said anything about Sabine. What I meant by "whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus)" is that 'a Gary Marcus' is 'regarded as' having 'expertise [much] to do with AI' and that is why, even though Marcus has been wrong about pretty much everything and has very little genuine expertise about AI these days, ie. DL scaling (and is remarkably inept at even the most basic entry-level use of LLMs) and his writings are intrinsically not worth the time it takes to read them, he is still popular and widely-regarded-as-an-expert and so it is useful to keep tabs on 'oh great, what's Marcus saying now that everyone is going to repeat for years to come?' You can read someone because they are right & informative, or you can read someone because they are wrong & uninformative but everyone else reads them; but you shouldn't read someone who is neither right nor read. So, you grit your teeth and wade into the Marcus posts that go viral...

"AI Rapidly Gets Smarter, And Makes Some of Us Dumber," from Sabine Hossenfelder

Contents

Grok 3

Performance

Application

Google

OpenAI

9 comments