LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Case Against AI Control Research
johnswentworth · 2025-01-21T16:03:10.143Z · comments (69)
Mechanisms too simple for humans to design
Malmesbury (Elmer of Malmesbury) · 2025-01-22T16:54:37.601Z · comments (40)
Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (48)
[link] Quotes from the Stargate press conference
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-22T00:50:14.793Z · comments (7)
AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (4)
“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (4)
Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (2)
[link] Training on Documents About Reward Hacking Induces Reward Hacking
evhub · 2025-01-21T21:32:24.691Z · comments (13)
Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (5)
Ten people on the inside
Buck · 2025-01-28T16:41:22.990Z · comments (16)
[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (11)
My supervillain origin story
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · comments (0)
Planning for Extreme AI Risks
joshc (joshua-clymer) · 2025-01-29T18:33:14.844Z · comments (2)
The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)
The Game Board has been Flipped: Now is a good time to rethink what you’re doing
Alex Lintz (alex-lintz) · 2025-01-28T23:36:18.106Z · comments (18)
[link] The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Corin Katzke (corin-katzke) · 2025-01-21T16:57:00.998Z · comments (7)
Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)
MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (29)
Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (51)
[link] Yudkowsky on The Trajectory podcast
Seth Herd · 2025-01-24T19:52:15.104Z · comments (36)
Retrospective: 12 [sic] Months Since MIRI
james.lucassen · 2025-01-21T02:52:06.271Z · comments (0)
Should you go with your best guess?: Against precise Bayesianism and related views
Anthony DiGiovanni (antimonyanthony) · 2025-01-27T20:25:26.809Z · comments (8)
Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (7)
Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (17)
Kessler's Second Syndrome
Jesse Hoogland (jhoogland) · 2025-01-26T07:04:17.852Z · comments (2)
[link] Paper: Open Problems in Mechanistic Interpretability
Lee Sharkey (Lee_Sharkey) · 2025-01-29T10:25:54.727Z · comments (0)
Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (2)
On polytopes
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-25T13:56:35.681Z · comments (5)
Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (17)
On DeepSeek’s r1
Zvi · 2025-01-22T19:50:17.168Z · comments (1)
Tail SP 500 Call Options
sapphire (deluks917) · 2025-01-23T05:21:51.221Z · comments (27)
AI #100: Meet the New Boss
Zvi · 2025-01-23T15:40:07.473Z · comments (4)
Logits, log-odds, and loss for parallel circuits
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-20T09:56:26.031Z · comments (3)
DeepSeek Panic at the App Store
Zvi · 2025-01-28T19:30:07.555Z · comments (13)
Against blanket arguments against interpretability
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-22T09:46:23.486Z · comments (4)
Things I have been using LLMs for
Kaj_Sotala · 2025-01-20T14:20:02.600Z · comments (5)
[link] We don't want to post again "This might be the last AI Safety Camp"
Remmelt (remmelt-ellen) · 2025-01-21T12:03:33.171Z · comments (17)
Brainrot
Jesse Hoogland (jhoogland) · 2025-01-26T05:35:35.396Z · comments (0)
Sleep, Diet, Exercise and GLP-1 Drugs
Zvi · 2025-01-21T12:20:06.018Z · comments (4)
Worries about latent reasoning in LLMs
CBiddulph (caleb-biddulph) · 2025-01-20T09:09:02.335Z · comments (3)
Evolution and the Low Road to Nash
Aydin Mohseni (aydin-mohseni) · 2025-01-22T07:06:32.305Z · comments (2)
Why care about AI personhood?
Francis Rhys Ward (francis-rhys-ward) · 2025-01-26T11:24:45.596Z · comments (6)
Why We Need More Shovel-Ready AI Notkilleveryoneism Megaproject Proposals
Peter Berggren (peter-berggren) · 2025-01-20T22:38:26.593Z · comments (1)
Kitchen Air Purifier Comparison
jefftk (jkaufman) · 2025-01-22T03:20:03.224Z · comments (2)
Writing experiments and the banana escape valve
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-23T13:11:24.215Z · comments (1)
Monthly Roundup #26: January 2025
Zvi · 2025-01-20T15:30:08.680Z · comments (15)
Eliciting bad contexts
Geoffrey Irving · 2025-01-24T10:39:39.358Z · comments (2)
[link] Dario Amodei: On DeepSeek and Export Controls
Zach Stein-Perlman · 2025-01-29T17:15:18.986Z · comments (2)
Theory of Change for AI Safety Camp
Linda Linsefors · 2025-01-22T22:07:10.664Z · comments (3)
Why Aligning an LLM is Hard, and How to Make it Easier
RogerDearnaley (roger-d-1) · 2025-01-23T06:44:04.048Z · comments (3)
next page (older posts) →