LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (12)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (6)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (28)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

The principle of genomic liberty
TsviBT · 2025-03-19T14:27:57.175Z · comments (51)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (7)

100+ concrete projects and open problems in evals
Marius Hobbhahn (marius-hobbhahn) · 2025-03-22T15:21:40.970Z · comments (1)

[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (2)

Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (6)

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)

Will compute bottlenecks prevent a software intelligence explosion?
Tom Davidson (tom-davidson-1) · 2025-04-04T17:41:37.088Z · comments (2)

AI CoT Reasoning Is Often Unfaithful
Zvi · 2025-04-04T14:50:05.538Z · comments (4)

Selective modularity: a research agenda
cloud · 2025-03-24T04:12:44.822Z · comments (2)

Going Nova
Zvi · 2025-03-19T13:30:01.293Z · comments (14)

LLM AGI will have memory, and memory changes alignment
Seth Herd · 2025-04-04T14:59:13.070Z · comments (9)

Renormalization Roadmap
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:34:16.352Z · comments (7)

Feedback loops for exercise (VO2Max)
Elizabeth (pktechgirl) · 2025-03-18T00:10:06.827Z · comments (9)

Apply to MATS 8.0!
Ryan Kidd (ryankidd44) · 2025-03-20T02:17:58.018Z · comments (4)

[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)

FrontierMath Score of o3-mini Much Lower Than Claimed
YafahEdelman (yafah-edelman-1) · 2025-03-17T22:41:06.527Z · comments (7)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

[link] Sentinel's Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.
NunoSempere (Radamantis) · 2025-03-17T19:34:01.850Z · comments (3)

[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)

[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (32)

Solving willpower seems easier than solving aging
Yair Halberstadt (yair-halberstadt) · 2025-03-23T15:25:40.861Z · comments (28)

Socially Graceful Degradation
Screwtape · 2025-03-20T04:03:41.213Z · comments (9)

Housing Roundup #11
Zvi · 2025-04-01T16:30:03.694Z · comments (1)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

Consider showering
bohaska (Bohaska) · 2025-04-01T23:54:26.714Z · comments (15)

Alignment faking CTFs: Apply to my MATS stream
joshc (joshua-clymer) · 2025-04-04T16:29:02.070Z · comments (0)

On Google’s Safety Plan
Zvi · 2025-04-11T12:51:12.112Z · comments (6)

OpenAI Responses API changes models' behavior
Jan Betley (jan-betley) · 2025-04-11T13:27:29.942Z · comments (6)

My "infohazards small working group" Signal Chat may have encountered minor leaks
Linch · 2025-04-02T01:03:05.311Z · comments (0)

Gemini 2.5 is the New SoTA
Zvi · 2025-03-28T14:20:03.176Z · comments (1)

Reframing AI Safety as a Neverending Institutional Challenge
scasper · 2025-03-23T00:13:48.614Z · comments (12)

Notes on countermeasures for exploration hacking (aka sandbagging)
ryan_greenblatt · 2025-03-24T18:39:36.665Z · comments (6)

On MAIM and Superintelligence Strategy
Zvi · 2025-03-14T12:30:07.451Z · comments (2)

AI #110: Of Course You Know…
Zvi · 2025-04-03T13:10:05.674Z · comments (9)

Against Yudkowsky's evolution analogy for AI x-risk [unfinished]
Fiora Sunshine (Fiora from Rosebloom) · 2025-03-18T01:41:06.453Z · comments (18)

We’re not prepared for an AI market crash
Remmelt (remmelt-ellen) · 2025-04-01T04:33:55.040Z · comments (12)

AI "Deep Research" Tools Reviewed
sarahconstantin · 2025-03-24T18:40:03.864Z · comments (5)

Introducing BenchBench: An Industry Standard Benchmark for AI Strength
Jozdien · 2025-04-02T02:11:41.555Z · comments (0)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

The vision of Bill Thurston
TsviBT · 2025-03-28T11:45:14.297Z · comments (34)

A collection of approaches to confronting doom, and my thoughts on them
Ruby · 2025-04-06T02:11:31.271Z · comments (18)

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Czynski (JacobKopczynski) · 2025-03-29T02:51:29.786Z · comments (36)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

decaeneus on Decaeneus's Shortform

There’s a particular type of cognitive failure that I reliably experience, which seems like a pure kind of misconfiguration of the mind, and which I've found very difficult to will myself to not experience, which feels like some kind of fundamental limitation.

The quickest way to illustrate this is with an example: I'm playing a puzzle game that requires ordering 8 letters into a word, and I'm totally stuck. As soon as I look at a hint of what the first letter is, I can instantly find the word.

This seems wrong. In theory, I expect I can just iterate through each of the 8 letters, tell myself that's the first one, and then look for the word, and move on if the word isn't there. If it took me 2 seconds to “see the word” for a given starting letter (so, not quite insta-, budgeting some context switching time) then I ought to be able to reliably find the word within 16 seconds. Yet, I find that I can't do it even with a much greater time budget. It's almost like the "not being sure" if each particular letter I'm iterating through is actually the starting one reduces my mental horsepower and prevents me from earnestly trying to find the word, almost like I expect to feel some mental pain from whole-heartedly looking for something that may not be there.

So I'm truly stuck, and then when I get the hint, I'm insta-unstuck. I've had the same feeling when solving other puzzles or even math problems: I (with high confidence) know the superset from which the beginning of the solution is drawn, yet iterating through the superset does not make the solution pop to mind. However, once I get a hint that tells me *for sure* what the beginning of the solution is, I can solve the rest of it.

What gives? How to fix? Does this have implications outside of puzzles, in the more open-ended world of e.g. looking to make new friends, or to find a great business model?

(Putting this into Grok tells me it doesn’t have a name but it’s a commonly known thing in the puzzle-solving community, of which I’m not part.)

sharmake-farah on ASI existential risk: Reconsidering Alignment as a Goal

I think my 2 biggest disagreements are:

I think alignment is still important and net-positive, but yeah I've come to think it's no longer the number 1 priority, for the reasons you raise.
I think with the exception of biotech and maybe nanotech, no plausible technology in the physical world can actually become a recipe for ruin, unless we are deeply wrong about how the physical laws of the universe work, so we can just defer that question to AI superintelligences.

And even nanotech has been argued to be way less powerful than people think it is, and @Muireall [LW · GW] has argued against nanotech being powerful here:

https://muireall.space/pdf/considerations.pdf#page=17

https://forum.effectivealtruism.org/posts/oqBJk2Ae3RBegtFfn/my-thoughts-on-nanotechnology-strategy-research-as-an-ea?commentId=WQn4nEH24oFuY7pZy [EA(p) · GW(p)]

https://muireall.space/nanosystems/

philip_b on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

I think the first puzzle would be difficult for an average human to solve unless they are good with math. Because how would you know that your solution is optimal? You can notice a theorem that if a solution never moved any piece away from its target, then it's optimal, and then realize that this puzzle does allow such a solution, but this requires to have experience with mathematical reasoning.

dalcy on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

By the way, Gemini 2.5 Pro and o3-mini-high is good at tic-tac-toe. I was surprised because the last time I tested this on o1-preview, it did quite terribly.

thane-ruthenis on johnswentworth's Shortform

The most commonly proposed substantive proxy is to ban models over a certain size, which would likely slow down timelines by a factor of 2-3 at most

... or, if we do live in a world in which LLMs are not AGI-complete, it might accelerate timelines. After all, this would force the capabilities people to turn their brains on again instead of mindlessly scaling, and that might lead to them stumbling on something which is AGI-complete. And it would, due to a design constraint, need much less compute for committing omnicide.

How likely would that be? Companies/people able to pivot like this would need to be live players, capable of even conceiving of new ideas that aren't "scale LLMs". Naturally, that means 90% of the current AI industry would be out of the game. But then, 90% of the current AI industry aren't really pushing the frontier today either; that wouldn't be much of a loss.

To what extent are the three AGI labs alive vs. dead players, then?

OpenAI has certainly been alive back in 2022. Maybe the coup and the exoduses killed it and it's now a corpse whose apparent movement is just inertial (the reasoning models were invented prior to the coup, if Q* rumors are to be trusted, so it's little evidence that OpenAI was still alive in 2024). But maybe not.
Anthropic houses a bunch of the best OpenAI researchers now, and it's apparently capable of inventing some novel tricks (whatever's the mystery behind Sonnet 3.5 and 3.6).
DeepMind is even now consistently outputting some interesting non-LLM research.

I think there's a decent chance that they're alive enough. Currently, they're busy eating the best AI researchers and turning them into LLM researchers. If they stop focusing people's attention on the potentially-doomed paradigm, if they're forced to correct the mistake (on this model) that they're making...

This has always been my worry about all the proposals to upper-bound FLOPs, complicated by my uncertainty regarding whether LLMs are or are not AGI-complete after all.

One major positive effect this might have is memetic. It might create the impression of an (artificially created) AI Winter, causing people to reflexively give up. In addition, not having an (apparent) in-paradigm roadmap to AGI would likely dissolve the race dynamics, both between AGI companies and between geopolitical entities. If you can't produce straight-line graphs suggesting godhood by 2027, and are reduced to "well we probably need a transformer-sized insight here...", it becomes much harder to generate hype and alarm that would be legible to investors and politicians.

But then, in worlds in which LLMs are not AGI-complete, how much actual progress to AGI is happening due to the race dynamic? Is it more or less progress than would be produced by a much-downsized field in the counterfactual in which LLM research is banned? How much downsizing would it actually cause, now that the ideas of AGI and the Singularity have gone mainstream-ish? Comparatively, how much downsizing would be caused by the chilling effect if the doomed LLM paradigm is let to run its course of disappointing everyone by 2030 (when the AGI labs can scale no longer [LW(p) · GW(p)])?

On balance, upper-bounding FLOPs is probably still a positive thing to do. But I'm not really sure.

adam-karvonen on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

I do agree that it looks like there has been a lack of data to address this ability. That being said, I'm pretty surprised at how terrible models are, and there's a hierarchy of problems to be addressed here before models are actually useful in the physical world. Each step feels much more difficult than the step before, and all models are completely terrible at steps 2-4.

First, simply look at a part and identify features / if a part is symmetric / etc. This requires basically no spatial reasoning ability, yet almost all models are completely terrible. Even Gemini is very bad. I'm pretty surprised that this ability didn't just fall out of scaling on data, but it does seem like this could be easily addressed with synthetic data.
Have some basic spatial reasoning ability where you can propose operations that are practical and aren't physically impossible. This is much more challenging. First, it could be difficult to automatically generate practical solutions. Secondly, it may require moving beyond text chain of thought - when I walk through a setup, I don't use language at all and just visualize everything.
Have an understanding of much of the tacit knowledge in machining, or rederive everything from first principles. Getting data could be especially challenging here.
Once you can create a single part correctly, now propose multiple different ways to manufacture the part. Evaluate all of the different plans and choose the best combination of cost, simplicity, and speed. This is the part of the job that's actually challenging.

ann-brown on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

DeepSeek-R1 is currently the best model at creative writing as judged by Sonnet 3.7 (https://eqbench.com/creative_writing.html). This doesn't necessarily correlate with human preferences, including coherence preferences, but having interacted with both DeepSeek-v3 (original flavor), Deepseek-R1-Zero and DeepSeek-R1 ... Personally I think R1's unique flavor in creative outputs slipped in when the thinking process got RL'd for legibility. This isn't a particularly intuitive way to solve for creative writing with reasoning capability, but gestures at the potential in "solving for writing", given some feedback on writing style (even orthogonal feedback) seems to have significant impact on creative tasks.

Edit: Another (cheaper to run) comparison for creative capability in reasoning models is QwQ-32B vs Qwen2.5-32B (the base model) and Qwen2.5-32B-Instruct (original instruct tune, not clear if in the ancestry of QwQ). Basically I do not consider 3.7 currently a "reasoning" model at the same fundamental level as R1 or QwQ, even though they have learned to make use of reasoning better than they would have without training on it, and evidence from them about reasoning models is weaker.

lc on Shortform

Works now for me

ryan_greenblatt on To be legible, evidence of misalignment probably has to be behavioral

I'm not claiming that internals-based techniques aren't useful, just that internals-based techniques probably aren't that useful for specifically producing legible evidence of misalignment. Detecting misalignment with internals-based techniques could be useful for other reasons (which I list in the post) and internals based techniques could be used for applications other than detecting misalignment (e.g. better understanding some misaligned behavior).

If internals-based techniques are useful for further investigating misalignment, that seems good. And I think I agree that if we first find legible evidence of misalignment behaviorally and internals-based methods pick this up (without known false positives), then this will make future evidence with internals-based techniques more convincing. However, I think it might not end up being that much more convincing in practice unless this happens many times with misalignment which occurs in production models.

jonas-hallgren on ASI existential risk: Reconsidering Alignment as a Goal

I will check it out! Thanks!