LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)
[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (36)
[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (38)
What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (19)
The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (152)
What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (36)
[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (94)
Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (42)
Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (25)
[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (65)
o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (155)
[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)
Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (28)
“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)
Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (16)
Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (7)
What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)
Hire (or Become) a Thinking Assistant
Raemon · 2024-12-23T03:58:42.061Z · comments (43)
[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)
2024 in AI predictions
jessicata (jessica.liu.taylor) · 2025-01-01T20:29:49.132Z · comments (2)
The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)
The Plan - 2024 Update
johnswentworth · 2024-12-31T13:29:53.888Z · comments (27)
AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)
Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)
Why I'm Moving from Mechanistic to Prosaic Interpretability
Daniel Tan (dtch1997) · 2024-12-30T06:35:43.417Z · comments (34)
[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (5)
Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (7)
A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)
[link] Parkinson's Law and the Ideology of Statistics
Benquo · 2025-01-04T15:49:21.247Z · comments (1)
A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (7)
Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (8)
The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)
How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (3)
The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)
[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (73)
A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)
My AGI safety research—2024 review, ’25 plans
Steven Byrnes (steve2152) · 2024-12-31T21:05:19.037Z · comments (4)
Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (27)
AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)
Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)
Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)
Circling as practice for “just be yourself”
Kaj_Sotala · 2024-12-16T07:40:04.482Z · comments (5)
What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (17)
The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-03T09:13:59.245Z · comments (17)
Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (45)
[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)
Remap your caps lock key
bilalchughtai (beelal) · 2024-12-15T14:03:33.623Z · comments (17)
Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)
Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (13)
Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (37)
next page (older posts) →