LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (26)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (20)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (6)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (7)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (5)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (2)

OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak (tomek-korbak) · 2025-04-14T16:45:46.584Z · comments (0)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (20)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (2)

[link] Unbendable Arm as Test Case for Religious Belief
Ivan Vendrov (ivan-vendrov) · 2025-04-14T01:57:12.013Z · comments (30)

[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)

[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (4)

Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)

The Last Light
Bridgett Kay (bridgett-kay) · 2025-04-14T15:41:02.745Z · comments (0)

Offer: Team Conflict Counseling for AI Safety Orgs
Severin T. Seehrich (sts) · 2025-04-14T15:17:00.835Z · comments (1)

Ctrl-Z: Controlling AI Agents via Resampling
abhatt349 · 2025-04-16T16:21:23.781Z · comments (0)

[link] The real reason AI benchmarks haven’t reflected economic impacts
Noosphere89 (sharmake-farah) · 2025-04-15T13:44:06.225Z · comments (0)

[link] Slopworld 2035: The dangers of mediocre AI
titotal (lombertini) · 2025-04-14T13:14:08.390Z · comments (6)

A Talmudic Rationalist Cautionary Tale
Noah Birnbaum (daniel-birnbaum) · 2025-04-15T04:11:16.972Z · comments (1)

[link] Should AIs be Encouraged to Cooperate?
PeterMcCluskey · 2025-04-15T21:57:06.096Z · comments (1)

Risers for Foot Percussion
jefftk (jkaufman) · 2025-04-15T11:10:08.577Z · comments (0)

What empirical research directions has Eliezer commented positively on?
Chris_Leong · 2025-04-15T08:53:41.677Z · comments (1)

[link] Can LLM-based models do model-based planning?
jylin04 · 2025-04-16T12:38:00.793Z · comments (1)

[link] Top OpenAI Catastrophic Risk Official Steps Down Abruptly
garrison · 2025-04-16T16:04:28.115Z · comments (0)

$500 bounty for best short-form fiction about our near future world; $100 for recommending winning piece: new “Art of Near Future World” quarterly art project
Ramon Gonzalez (ramon-gonzalez) · 2025-04-15T00:46:10.637Z · comments (0)

[link] AISN #51: AI Frontiers
Corin Katzke (corin-katzke) · 2025-04-15T16:01:56.701Z · comments (1)

The Mirror Problem in AI: Why Language Models Say Whatever You Want
RobT · 2025-04-15T18:40:02.793Z · comments (2)

Луна Лавгуд и Комната Тайн, Часть 5
Kongo Landwalker (kongo-landwalker) · 2025-04-14T00:10:36.028Z · comments (0)

[link] 3M Subscriber YouTube Account 'Channel 5' Reporting On Rationalism
sakraf · 2025-04-15T13:02:33.736Z · comments (0)

Sam Altman's sister claims Sam sexually abused her -- Part 8: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:42:53.705Z · comments (0)

Some OthelloGPT Circuits
Alfred Wong (alfred-wong) · 2025-04-15T18:41:36.216Z · comments (0)

What if there was a nuke in Manhattan and why that could be a good thing
Ratburn · 2025-04-15T00:19:41.844Z · comments (10)

How Logic "Really" Works: An Engineering Perspective
Daniil Strizhov (mila-dolontaeva) · 2025-04-16T05:34:09.443Z · comments (0)

Gamify life from BayesianMind
P. João (gabriel-brito) · 2025-04-16T16:17:49.284Z · comments (0)

Creating 'Making God': a Feature Documentary on risks from AGI
Connor Axiotes (connor-axiotes-1) · 2025-04-15T02:56:09.206Z · comments (0)

How to Defend the Indefensible
Alex Beyman (alexbeyman) · 2025-04-15T07:45:15.971Z · comments (1)

What happens when LLMs learn new things? & Continual learning forever.
sunchipsster · 2025-04-15T18:38:35.166Z · comments (0)

Sam Altman's sister claims Sam sexually abused her -- Part 7: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:43:28.897Z · comments (0)

Opportunity to to learn more about AI Innovation & Security Policy
PolicyTakes · 2025-04-16T01:35:27.203Z · comments (0)

[link] AI is advancing fast
Vishakha (vishakha-agrawal) · 2025-04-16T08:17:06.055Z · comments (0)

[link] Human-level is not the limit
Vishakha (vishakha-agrawal) · 2025-04-16T08:33:15.498Z · comments (2)

[link] AI may attain human level soon
Vishakha (vishakha-agrawal) · 2025-04-16T08:28:55.592Z · comments (0)

An artistic illustration of Scalable Oversight - "A world apart, neither gods nor mortals"
Marius Adrian Nicoară · 2025-04-16T12:41:44.874Z · comments (0)

[link] The road from human-level to superintelligent AI may be short
Vishakha (vishakha-agrawal) · 2025-04-16T08:35:54.376Z · comments (0)

next page (older posts) →

Archive

Recent comments

dalcy on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

By the way, Gemini 2.5 Pro and o3-mini-high is good at tic-tac-toe. I was surprised because the last time I tested this on o1-preview, it did quite terribly.

thane-ruthenis on johnswentworth's Shortform

The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most

... or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.

How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren't "scale LLMs". Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren't really pushing the frontier today either; that wouldn't be much of a loss.

To what extent are the three AGI labs alive vs. dead players, then?

OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it's now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it's little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it's apparently capable of inventing some novel tricks (whatever's the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.

I think there's a decent chance that they're alive enough. Currently, they're busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people's attention on the potentially-doomed paradigm, if they're forced to correct the mistake (on this model) that they're making...

This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.

One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can't produce straight-line graphs suggesting godhood by 2027, and are reduced to "well we probably need a transformer-sized insight here...", it becomes much harder to generate hype and alarm that would be legible to investors and politicians.

But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? And how much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish?

On balance, upper-bounding FLOPs is probably still a positive thing to do. But I'm not really sure.

adam-karvonen on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

I do agree that it looks like there has been a lack of data to address this ability. That being said, I'm pretty surprised at how terrible models are, and there's a hierarchy of problems to be addressed here before models are actually useful in the physical world. Each step feels much more difficult than the step before, and all models are completely terrible at steps 2-4.

First, simply look at a part and identify features / if a part is symmetric / etc. This requires basically no spatial reasoning ability, yet almost all models are completely terrible. Even Gemini is very bad. I'm pretty surprised that this ability didn't just fall out of scaling on data, but it does seem like this could be easily addressed with synthetic data.
Have some basic spatial reasoning ability where you can propose operations that are practical and aren't physically impossible. This is much more challenging. First, it could be difficult to automatically generate practical solutions. Secondly, it may require moving beyond text chain of thought - when I walk through a setup, I don't use language at all and just visualize everything.
Have an understanding of much of the tacit knowledge in machining, or rederive everything from first principles. Getting data could be especially challenging here.
Once you can create a single part correctly, now propose multiple different ways to manufacture the part. Evaluate all of the different plans and choose the best combination of cost, simplicity, and speed. This is the part of the job that's actually challenging.

ann-brown on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

DeepSeek-R1 is currently the best model at creative writing as judged by Sonnet 3.7 (https://eqbench.com/creative_writing.html). This doesn't necessarily correlate with human preferences, including coherence preferences, but having interacted with both DeepSeek-v3 (original flavor), Deepseek-R1-Zero and DeepSeek-R1 ... Personally I think R1's unique flavor in creative outputs slipped in when the thinking process got RL'd for legibility. This isn't a particularly intuitive way to solve for creative writing with reasoning capability, but gestures at the potential in "solving for writing", given some feedback on writing style (even orthogonal feedback) seems to have significant impact on creative tasks.

Edit: Another (cheaper to run) comparison for creative capability in reasoning models is QwQ-32B vs Qwen2.5-32B (the base model) and Qwen2.5-32B-Instruct (original instruct tune, not clear if in the ancestry of QwQ). Basically I do not consider 3.7 currently a "reasoning" model at the same fundamental level as R1 or QwQ, even though they have learned to make use of reasoning better than they would have without training on it, and evidence from them about reasoning models is weaker.

lc on Shortform

Works now for me

ryan_greenblatt on To be legible, evidence of misalignment probably has to be behavioral

I'm not claiming that internals-based techniques aren't useful, just that internals-based techniques probably aren't that useful for specifically producing legible evidence of misalignment. Detecting misalignment with internals-based techniques could be useful for other reasons (which I list in the post) and internals based techniques could be used for applications other than detecting misalignment (e.g. better understanding some misaligned behavior).

If internals-based techniques are useful for further investigating misalignment, that seems good. And I think I agree that if we first find legible evidence of misalignment behaviorally and internals-based methods pick this up (without known false positives), then this will make future evidence with internals-based techniques more convincing. However, I think it might not end up being that much more convincing in practice unless this happens many times with misalignment which occurs in production models.

jonas-hallgren on ASI existential risk: Reconsidering Alignment as a Goal

I will check it out! Thanks!

johnswentworth on johnswentworth's Shortform

This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another [...]

Nope!

Even if the base models are improving, it can still be true that most of the progress measured on the benchmarks is fake, and has basically-nothing to do with the real improvements.

Even if the base models are improving, it can still be true that the dramatic sounding papers and toy models are fake, and have basically-nothing to do with the real improvements.

Even if the base models are improving, the propaganda about it can still be overblown and mostly fake, and have basically-nothing to do with the real improvements.

Even if the base models are improving, the people who feel real scared and just want to be validated can still be doing fake work and in fact be mostly useless, and their dynamic can still have basically-nothing to do with the real improvements.

Just because the base models are in fact improving does not mean that all this other stuff is actually coupled to the real improvement.

johnswentworth on johnswentworth's Shortform

Sure, they are more-than-zero helpful. Heck, in a relative sense, they'd be one of the biggest wins in AI safety to date. But alas, reality does not grade on a curve.

One has to bear in mind that the words on that snapshot do not all accurately describe reality in the world where SB1047 passes. "Implement shutdown ability" would not in fact be operationalized in a way which would ensure the ability to shutdown an actually-dangerous AI, because nobody knows how to do that. "Implement reasonable safeguards to prevent societal-scale catastrophes" would in fact be operationalized as checking a few boxes on a form and maybe writing some docs, without changing deployment practices at all, because the rules for the board responsible for overseeing these things made it pretty easy for the labs to capture.

When I discussed the bill with some others at the time, the main takeaway was that the actually-substantive part was just putting any bureaucracy in place at all to track which entities are training models over 10^26 FLOP/$100M. The bill seemed unlikely to do much of anything beyond that.

Even if the bill had been much more substantive, it would still run into the standard problems of AI regulation: we simply do not have a way to reliably tell which models are and are not dangerous, so the choice is to either ban a very large class of models altogether, or allow models which will predictably be dangerous sooner or later. The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most, but definitely not slow down timelines by a factor of 10 or more.

davidmanheim on Can LLM-based models do model-based planning?

Very interesting work. One question I've had about this is whether humans can do such planning 'natively', i.e. in our heads, or if we're using tools in ways that are essentially the same as doing "model-based planning inefficiently, with... bottleneck being a potential need to encode intermediate states."