LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (40)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (11)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (21)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (18)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Kessler's Second Syndrome
Jesse Hoogland (jhoogland) · 2025-01-26T07:04:17.852Z · comments (2)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (19)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

aprilsr on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I think it is worth knowing that—I haven't heard of any examples of people who have been radicalizing in a Zizianish direction, lately, who are unaccounted for. I and people I know thought about it when we heard about the border patrol shootout, and the only person we came up with was Audere / Maximilian Snyder, who is now under arrest for the murder of Curtis Lind.

Seeing the one person you and your partner have been kind of worried about for a while... end up being the one who did a murder... it's, well, a hell of an observation to have to update on. Apparently a ball was dropped.

I haven't made a particular point of going around thoroughly asking everyone who might plausibly know someone, but—all three of the people who recently got in conflicts were known by someone or another I've spoken with, to at least plausibly be at risk. So I think there's some chance that we mostly do collectively have eyes on the "new people becoming Zizian" part.

Of course, maybe it becoming a national news story entirely changes the dynamics there, I don't know what the situation will look like in a year. But—despite there having been three new people here that haven't been discussed in any previous community alerts on Zizians, which maybe most people around hadn't heard of at all, I don't currently worry much that there's some substantial number of unknown Zizians out there or something.

milan-w on The Failed Strategy of Artificial Intelligence Doomers

Yes, "we are against Omnicidal AI" is better marketing than "we are for AI Notkilleveryoneism".

meiren on o3

What score would it take for you to update your p(LLMs scale to AGI) above 50%?

vladimir_nesov on MiloSal's Shortform

they write: "We then apply RL training on the fine-tuned model until it achieves convergence on reasoning tasks."

Ah, I failed to take a note of that when reading the paper. My takeaway was the opposite. In Figure 2 for R1-Zero, the first impression is convergence, both from saturation of the benchmark, and in the graph apparently leveling off. But if replotted in log-steps instead of linear steps, there isn't even any leveling off for pass@1, despite near-saturation of the benchmark for cons@16: accuracy for pass@1 is 0.45 after 2K steps, 0.55 (+0.10) after 4K steps, then 0.67 (+0.12) after 8K steps, it just keeps going up by +0.10 every doubling in training steps. And the plots-that-don't-level-off in the o1 post are in log-steps. Also, the average number of reasoning steps for R1-Zero in Figure 3 is a straight line that's probably good for something if it goes further up. So I guess I might disagree with the authors even, in characterizing step 10K as "at convergence", though your quote is about R1 rather than R1-Zero for which there are plots in the paper...

your analysis of GPT-5--which is worrying for short-term scaling

Well, I mostly argued about naming, not facts, though the recent news [LW(p) · GW(p)] seem to be suggesting that the facts [LW(p) · GW(p)] are a bit better [LW(p) · GW(p)] than I expected only a month ago [LW · GW], namely 1 GW training systems might only get built in 2026 rather than in 2025, except possibly at Google. And as a result even Google might feel less pressure to actually get this done in 2025.

sil-ver on o3

o3-mini-high gets 3/10; this is essentially the same as DeepSeek (there were two where DeepSeek came very close, this is one of them). I'm still slightly more impressed with DeepSeek despite the result, but it's very close.

tup99 on How to (hopefully ethically) make money off of AGI

There are some scenarios where having control, rather than ownership/profit, could be important.

I'm curious what kind of scenarios you're thinking about. Having actual control, yes, that could be important. But having 0.001% of control of Google does not seem like it would have any effect on either Google or me, under any scenario.

siebe on The Failed Strategy of Artificial Intelligence Doomers

The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.

Bad faith

There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.

Bad faith

AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”

Seems wrong

vale on 5,000 calories of peanut butter every week for 3 years straight

Speaking as a fellow Declan, I'm wondering if an unhealthy love for peanut butter is a "Declan-thing"...

richard_kennaway on Thread for Sense-Making on Recent Murders and How to Sanely Respond

I take seriously radical animal-suffering-is-bad-ism[1], but we would only save a small portion of animals by trading ourselves off 1-for-1 against animal eaters, and just convincing one of them to go vegan would prevent at least as many torturous animal lives in expectation, while being legal.

That is a justification for not personally being Ziz. But obviously it would have cut no ice with Ziz. And why should it? An individual must choose whether to pursue peaceful or violent action, because if you are Taking the Ideas Seriously then either one will demand your whole life, and you can’t do everything. A movement, on the other hand, can divide its efforts, fighting on all fronts while maintaining a more or less plausible deniability of any connection between them. This is a common strategy. For example, Sinn Fein and the IRA, respectively the legal and illegal wings of one side of the conflict over Northern Ireland.

It doesn’t even have to be explicitly organised. Some will take the right-hand path and some the left. And so here we are.

milosal on MiloSal's Shortform

Thanks for your comments!

Not to convergence, the graphs in the paper keep going up.

On page 10, when describing the training process for R1, they write: "We then apply RL training on the fine-tuned model until it achieves convergence on reasoning tasks." I refer to this.

I basically agree with your analysis of GPT-5--which is worrying for short-term scaling, as I tried to argue.