LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (43)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (38)

AI-enabled coups: a small group could use AI to seize power
Tom Davidson (tom-davidson-1) · 2025-04-16T16:51:29.561Z · comments (16)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)

Three Months In, Evaluating Three Rationalist Cases for Trump
Arjun Panickssery (arjun-panickssery) · 2025-04-18T08:27:27.257Z · comments (10)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (29)

Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-18T12:27:35.795Z · comments (2)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

On Google’s Safety Plan
Zvi · 2025-04-11T12:51:12.112Z · comments (6)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (29)

OpenAI Responses API changes models' behavior
Jan Betley (jan-betley) · 2025-04-11T13:27:29.942Z · comments (6)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (11)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (4)

Four Types of Disagreement
silentbob · 2025-04-13T11:22:38.466Z · comments (2)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (6)

Vestigial reasoning in RL
Caleb Biddulph (caleb-biddulph) · 2025-04-13T15:40:11.954Z · comments (7)

Youth Lockout
Xavi CF (xavi-cf) · 2025-04-11T15:05:54.441Z · comments (6)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

[link] College Advice For People Like Me
henryj · 2025-04-12T14:36:46.643Z · comments (5)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)

ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs
denkenberger · 2025-04-16T21:47:40.687Z · comments (8)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

Paper
dynomight · 2025-04-11T12:20:04.200Z · comments (12)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (8)

Handling schemers if shutdown is not an option
Buck · 2025-04-18T14:39:18.609Z · comments (0)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)

[link] The Russell Conjugation Illuminator
TimmyM (timmym) · 2025-04-17T19:33:06.924Z · comments (13)

OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)

Why do misalignment risks increase as AIs get more capable?
ryan_greenblatt · 2025-04-11T03:06:50.928Z · comments (6)

[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (6)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak (tomek-korbak) · 2025-04-14T16:45:46.584Z · comments (1)

[link] Unbendable Arm as Test Case for Religious Belief
Ivan Vendrov (ivan-vendrov) · 2025-04-14T01:57:12.013Z · comments (45)

What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (7)

Thoughts on the Double Impact Project
Mati_Roy (MathieuRoy) · 2025-04-13T19:07:57.687Z · comments (10)

MONA: Three Month Later - Updates and Steganography Without Optimization Pressure
David Lindner · 2025-04-12T23:15:07.964Z · comments (0)

AI #112: Release the Everything
Zvi · 2025-04-17T15:10:02.029Z · comments (6)

Scaffolding Skills
Screwtape · 2025-04-18T17:39:25.634Z · comments (0)

[link] Understanding and overcoming AGI apathy
Dhruv Sumathi (dhruv-sumathi) · 2025-04-17T01:04:53.853Z · comments (1)

GPT-4.1 Is a Mini Upgrade
Zvi · 2025-04-16T19:00:03.181Z · comments (6)

[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)

Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)

[link] Currency Collapse
prue (prue0) · 2025-04-11T03:48:01.469Z · comments (3)

o3 Will Use Its Tools For You
Zvi · 2025-04-18T21:20:02.566Z · comments (1)

Prodromes and Biomarkers in Chronic Disease
sarahconstantin · 2025-04-16T21:30:02.978Z · comments (2)

Understanding Trust: Overview Presentations
abramdemski · 2025-04-16T18:08:31.064Z · comments (0)

The Internal Model Principle: A Straightforward Explanation
Alfred Harwood · 2025-04-12T10:58:51.479Z · comments (1)

next page (older posts) →

^{^}

A group of humans that compromises on making the ASI loyal to humanity is likely more realistic than a group of humans which is actually loyal to humanity. E. g., because the group has some psychopaths and some idealists, and all psychopaths have to individually LARP being prosocial in order to not end up with the idealists ganging up against them, with this LARP then being carried far enough to end up in the ASI's goals. But this still involves that small group having ultimate power; still involves the future being determined by how the dynamics within that small group shake out.

^{^}

Rather than keeping him in the dark or playing him, which reduces to Scenario 1.

LessWrong 2.0 Reader

Archive

Recent comments