LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (7)

Thread for Sense-Making on Recent Murders and How to Sanely Respond
Ben Pace (Benito) · 2025-01-31T03:45:48.201Z · comments (15)

Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (6)

[link] Steering Gemini with BiDPO
TurnTrout · 2025-01-31T02:37:55.839Z · comments (3)

[link] The Failed Strategy of Artificial Intelligence Doomers
Ben Pace (Benito) · 2025-01-31T18:56:06.784Z · comments (26)

Some articles in “International Security” that I enjoyed
Buck · 2025-01-31T16:23:27.061Z · comments (1)

In response to critiques of Guaranteed Safe AI
Nora_Ammann · 2025-01-31T01:43:05.787Z · comments (2)

DeepSeek: Don’t Panic
Zvi · 2025-01-31T14:20:08.264Z · comments (4)

[link] Takeaways from sketching a control safety case
joshc (joshua-clymer) · 2025-01-31T04:43:45.917Z · comments (0)

Review: The Lathe of Heaven
dr_s · 2025-01-31T08:10:58.673Z · comments (0)

Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Stuart_Armstrong · 2025-01-31T15:36:01.050Z · comments (1)

Re: Taste
lsusr · 2025-02-01T03:34:10.918Z · comments (0)

[question] Is weak-to-strong generalization an alignment technique?
cloud · 2025-01-31T07:13:03.332Z · answers+comments (0)

5,000 calories of peanut butter every week for 3 years straight
Declan Molony (declan-molony) · 2025-01-31T17:29:35.190Z · comments (4)

2024 was the year of the big battery, and what that means for solar power
transhumanist_atom_understander · 2025-02-01T06:27:39.082Z · comments (0)

[question] Strong, Stable, Open: Choose Two - in search of an article
Eli_ · 2025-01-31T14:48:21.438Z · answers+comments (0)

Proposal: Safeguarding Against Jailbreaking Through Iterative Multi-TurnTesting
jacquesallen · 2025-01-31T23:00:42.665Z · comments (0)

[link] Interviews with Moonshot AI's CEO, Yang Zhilin
Cosmia_Nebula · 2025-01-31T09:19:36.561Z · comments (0)

Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World
sweenesm · 2025-01-31T01:00:55.064Z · comments (0)

Safe Search is off: root causes of AI catastrophic risks
Jemal Young (ghostwheel) · 2025-01-31T18:22:43.947Z · comments (0)

Thoughts about Policy Ecosystems: The Missing Links in AI Governance
Echo Huang (echo-huang) · 2025-02-01T01:54:54.333Z · comments (0)

[question] How do biological or spiking neural networks learn?
Dom Polsinelli (dom-polsinelli) · 2025-01-31T16:03:38.425Z · answers+comments (0)

Can 7B-8B LLMs judge their own homework?
dereshev · 2025-02-01T08:29:32.639Z · comments (0)

next page (older posts) →

Archive

Recent comments

tailcalled on The Failed Strategy of Artificial Intelligence Doomers

Correspondingly the importance I assign to increasing the intelligence of humans has drastically increased.

I feel like human intelligence enhancement would increase capabilities development faster than alignment development, maybe unless you've got a lot of discrimination in favor of only increasing the intelligence of those involved with alignment.

mitchell_porter on The Failed Strategy of Artificial Intelligence Doomers

I can't bring myself to read it properly. The author has an ax to grind, he wants interplanetary civilization and technological progress for humanity, and it's inconvenient to that vision if progress in one form of technology (AI) has the natural consequence of replacing humanity, or at the very least removing it from the driver's seat. So he simply declares "There is No Reason to Think Superintelligence is Coming Soon", and the one doomer strategy he does approve of - the enhancement of human biological intelligence - happens to be one that once again involves promoting a form of technological progress.

If there is a significant single failure behind getting to where we are now, perhaps it is the dissociation between "progress in AI" and "humanity being surpassed and replaced by AI" that has occurred. It should be common sense that the latter is the natural outcome of creating superhuman AI.

lc on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I know you're not endorsing the quoted claim, but just to make this super duper extra stupid explicit: running terrorist organizations is illegal, so this is the type of thing you would also say if Ziz was leading a terrorist organization, and you didn't want to see Ziz arrested.

juggins on Superintelligent AI will make mistakes

I think it was my error as I realise now the first paragraph was a confusing setup. I've trimmed it a bit so hopefully it won't be so any more!

vladimir_nesov on MiloSal's Shortform

DeepSeek-R1 ... Run RL to convergence

Not to convergence, the graphs in the paper keep going up. Which across the analogy might explain some of the change from o1 to o3 (the graphs in the o1 post also keep going up), though new graders coded for additional verifiable problems are no doubt a large part of it as well.

o3-mini has the same knowledge cutoff date as 4o and o1 (late 2023)

It seems like o1-mini is its own thing, might even start with a base model that's unrelated to GPT-4o-mini (it might be using its own specialized pretraining data mix). So a clue about o3-mini data doesn't obviously transfer to o3.

if it used GPT-5 as a base model

The numbering in GPT-N series advances with roughly 100x in raw compute at a time. If original GPT-4 is 2e25 FLOPs, then a GPT-5 would need 2e27 FLOPs, and a 100K H100s training system (like the Microsoft/OpenAI system at the site near the Goodyear airport) can only get you 3e26 FLOPs or so (in BF16 in 3 months). The initial Stargate training system at Abilene site, after it gets 300K B200s, will be 7x stronger than that, so will be able to get 2e27 FLOPs. Thus I expect GPT-5 in 2026 if OpenAI keeps following the naming convention, while the new 100K H100s model this year will be GPT-4.5o or something like that.

habryka4 on The Failed Strategy of Artificial Intelligence Doomers

I feel like intelligence enhancement being pretty solidly in the near-term technological horizon provides strong argument for future governance being much better. There are also maybe 3-5 other technologies that seem likely to be achieved in the next 30 years bar AGI that would all hugely improve future AGI governance.

And then a lot of the post seems to make really quite bad arguments against forecasting AI timelines and other technologies, doing so with... I really don't know, a rejection of bayesianism? A random invocation of an asymmetric burden of proof? If anyone learned anything useful fromits section on timelines or technological forecasting, please tell me, since it really is among the worst things I have heard Ben Landau Taylor write, who I respect a lot. The stuff as written really makes no sense. I am personally on the longer end of timelines, but none of my reasoning looks anything like that.

Seriously, what are the technological forecasts in this essay:

While there is no firm ground for any prediction as to how long it will take before any technological breakthrough [to substantial intelligence enhancement], if ever, it seems more likely that such a regime would have to last worldwide for a century or several centuries before such technology were created.

I will very gladly take all your bets that intelligence augmentation will not take "several centuries". What is the basis of this claim? Like, IDK, I see no methodology that suggests anything remotely as long as this, and so many forms of trend extrapolation, first principles argument, reference class forecasting and so many other things that suggest things happen faster than that.

I really don't get the worldview that writes this essay. A worldview in which not-even-particularly-sci-fi technologies should by default be assumed to take centuries (centuries!!!) to be developed. A worldview in which even as AI systems destroy every single benchmark anyone has ever come up with, the hypothesis that AI might be soon gets dismissed because... I really don't know. Because the author wants to maintain authority over reference classes and therefore vaguely implied it can't happen soon.

There is no obvious prior over technological developments or progress. There not being proofs around doesn't support that things won't happen soon. And I would so gladly take this worldview's money if it's willing to actually draw some probability distributions that are spread out enough to put large fractions of its probability mass on centuries away.

benito on The Failed Strategy of Artificial Intelligence Doomers

I wrote that this "is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen." The line has some disagree reacts inline; I expect this is primarily an expression that the disagree-ers have a low quality assessment of the article, but I would be curious to see links to any other articles or posts that attempt something similar to this one, in order to compare whether they do better/worse/different. I actually can't easily think of any (which is why I felt it was not that bold to say this was the best).

Edit: I've expanded the opening paragraph, to not confuse my comment for me agreeing with the object level assessment of the article..

vladimir_nesov on Nathan Helm-Burger's Shortform

The fact that RL seems to be working well on LLMs now, without special tricks, as reported by many replications of r1, suggests to me that AGI is indeed not far off.

Still, at least as long as base model effective training compute isn't scaled another 1,000x (which is 2028-2029), this kind of RL training probably won't generalize far enough without neural (LLM) rewards, which for now don't let RL scale as much as with explicitly coded verifiers.

benito on The Failed Strategy of Artificial Intelligence Doomers

I'm not particularly resolute on this question. But I get this sense when I look at (a) the best agent foundations work that's happened over ~10 years of work on the matter, and (b) the work output of scaling up the number of people working on 'alignment' by ~100x.

For the first, trying to get a better understand of the basic concepts like logical induction and corrigibility and low-impact and ontological updates, while I feel like there's been progress (in timeless decision theory taking a clear step forward in figuring out how think about decision-makers as algorithms; in logical induction as moving forward on how to think about logical uncertainty; notably in the Embedded Agency sequence outlining many basic confusions; and in various writings like Radical Probabilism and Geometric Rationality in finding the breaking edges of expected utility maximization) I don't feel like the work done over the last 10 years is on track to be a clear ~10% of the work needed.

I'm not confident it makes sense to try to count it linearly. But I don't know that there's enough edges here or new results to feel good about, given 10x as much time to think about it, a new paradigm / set of concepts falling into place.

For the second, I think mostly there's been (as Wentworth would say) a lot of street-lighting, and a lot of avoiding of actually working on the problem. I mean, there's definitely been a great amount of bias introduced by ML labs having billions of dollars and setting incentives, but I don't feel confident that good things would happen in the absence of that. I'd guess that most ideas for straightforwardly increasing the number of people working on these problems will result in them bouncing off and doing unrelated things.

I think partly I'm also thinking that very few researchers cared about these problems in the last few decades before AGI seemed like a big deal, and still very few researchers seem to care about them, and when I've see researchers like Bengio and Sutskever talk about it's looked to me like they bounce off / become very confident they've solved the problems while missing obvious things, so my sense is that it will continue to be a major uphill battle to get the real problems actually worked on.

Perhaps I should focus on a world where I get to build such a field and scale it slowly and set a lot of the culture. I'm not exactly sure how ideal of a setup I should be imagining. Given 100 years, I would give it my best shot. My gut right now says I'd have maybe a 25% chance of success, though if I have to deal with as much random bullshit as we have so far in this timeline (random example: my CEO being unable to do much leadership of Lightcone due to 9 months of litigation from the FTX fallout) then I am less confident.

My guess is that given 100 years I would be slightly more excited to try out the human intelligence enhancement storyline. But I've not thought about that one much, I might well update against it as I learn more of the details.

anonymousacquaintance on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I believe it was Teresa Youngblut, not Ophelia (Felix) , who first pulled and gun and opened fire. Only after the shooting between Teresa and the border patrol started did Ophelia (Felix) pull a gun.