LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)
[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (47)
Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)
Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes (john-hughes) · 2025-04-08T17:32:55.315Z · comments (16)
[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (36)
Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (39)
Learned pain as a leading cause of chronic pain
SoerenMind · 2025-04-09T11:57:58.523Z · comments (13)
Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)
AI-enabled coups: a small group could use AI to seize power
Tom Davidson (tom-davidson-1) · 2025-04-16T16:51:29.561Z · comments (14)
AI 2027: Responses
Zvi · 2025-04-08T12:50:02.197Z · comments (3)
Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)
The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)
How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)
One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)
A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)
[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)
Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (28)
[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)
[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)
Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (7)
AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)
[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)
Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)
[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (32)
On Google’s Safety Plan
Zvi · 2025-04-11T12:51:12.112Z · comments (6)
How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)
OpenAI Responses API changes models' behavior
Jan Betley (jan-betley) · 2025-04-11T13:27:29.942Z · comments (6)
To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (10)
Reactions to METR task length paper are insane
Cole Wyeth (Amyr) · 2025-04-10T17:13:36.428Z · comments (41)
Four Types of Disagreement
silentbob · 2025-04-13T11:22:38.466Z · comments (2)
The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (6)
Vestigial reasoning in RL
Caleb Biddulph (caleb-biddulph) · 2025-04-13T15:40:11.954Z · comments (7)
A collection of approaches to confronting doom, and my thoughts on them
Ruby · 2025-04-06T02:11:31.271Z · comments (18)
Youth Lockout
Xavi CF (xavi-cf) · 2025-04-11T15:05:54.441Z · comments (6)
[link] College Advice For People Like Me
henryj · 2025-04-12T14:36:46.643Z · comments (5)
OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)
Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)
[link] American College Admissions Doesn't Need to Be So Competitive
Arjun Panickssery (arjun-panickssery) · 2025-04-07T17:35:26.791Z · comments (18)
Paper
dynomight · 2025-04-11T12:20:04.200Z · comments (12)
The first AI war will be in your computer
Viliam · 2025-04-08T09:28:53.191Z · comments (9)
[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)
ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs
denkenberger · 2025-04-16T21:47:40.687Z · comments (7)
[link] The case for AGI by 2030
Benjamin_Todd · 2025-04-09T20:35:55.167Z · comments (6)
Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (4)
D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (7)
A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (28)
[link] Existing Safety Frameworks Imply Unreasonable Confidence
Joe Rogero · 2025-04-10T16:31:50.240Z · comments (1)
[link] Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
elifland · 2025-04-10T23:10:23.063Z · comments (0)
Austin Chen on Winning, Risk-Taking, and FTX
Elizabeth (pktechgirl) · 2025-04-07T19:00:08.039Z · comments (3)
Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)
next page (older posts) →