LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Case Against AI Control Research
johnswentworth · 2025-01-21T16:03:10.143Z · comments (69)
Mechanisms too simple for humans to design
Malmesbury (Elmer of Malmesbury) · 2025-01-22T16:54:37.601Z · comments (34)
[link] Quotes from the Stargate press conference
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-22T00:50:14.793Z · comments (6)
AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (2)
[link] Training on Documents About Reward Hacking Induces Reward Hacking
evhub · 2025-01-21T21:32:24.691Z · comments (12)
Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (3)
Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (42)
Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (1)
Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)
The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)
[link] The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Corin Katzke (corin-katzke) · 2025-01-21T16:57:00.998Z · comments (6)
MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (19)
Retrospective: 12 [sic] Months Since MIRI
james.lucassen · 2025-01-21T02:52:06.271Z · comments (0)
Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (41)
Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (7)
[link] Yudkowsky on The Trajectory podcast
Seth Herd · 2025-01-24T19:52:15.104Z · comments (27)
Detect Goodhart and shut down
Jeremy Gillen (jeremy-gillen) · 2025-01-22T18:45:30.910Z · comments (17)
[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (9)
On polytopes
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-25T13:56:35.681Z · comments (5)
Kessler's Second Syndrome
Jesse Hoogland (jhoogland) · 2025-01-26T07:04:17.852Z · comments (2)
Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (15)
Logits, log-odds, and loss for parallel circuits
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-20T09:56:26.031Z · comments (2)
Tail SP 500 Call Options
sapphire (deluks917) · 2025-01-23T05:21:51.221Z · comments (27)
AI #100: Meet the New Boss
Zvi · 2025-01-23T15:40:07.473Z · comments (3)
Against blanket arguments against interpretability
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-22T09:46:23.486Z · comments (4)
Things I have been using LLMs for
Kaj_Sotala · 2025-01-20T14:20:02.600Z · comments (5)
On DeepSeek’s r1
Zvi · 2025-01-22T19:50:17.168Z · comments (1)
[link] We don't want to post again "This might be the last AI Safety Camp"
Remmelt (remmelt-ellen) · 2025-01-21T12:03:33.171Z · comments (17)
Sleep, Diet, Exercise and GLP-1 Drugs
Zvi · 2025-01-21T12:20:06.018Z · comments (4)
Worries about latent reasoning in LLMs
CBiddulph (caleb-biddulph) · 2025-01-20T09:09:02.335Z · comments (3)
Evolution and the Low Road to Nash
Aydin Mohseni (aydin-mohseni) · 2025-01-22T07:06:32.305Z · comments (2)
Why We Need More Shovel-Ready AI Notkilleveryoneism Megaproject Proposals
Peter Berggren (peter-berggren) · 2025-01-20T22:38:26.593Z · comments (1)
Brainrot
Jesse Hoogland (jhoogland) · 2025-01-26T05:35:35.396Z · comments (0)
Kitchen Air Purifier Comparison
jefftk (jkaufman) · 2025-01-22T03:20:03.224Z · comments (2)
Why care about AI personhood?
Francis Rhys Ward (francis-rhys-ward) · 2025-01-26T11:24:45.596Z · comments (4)
Writing experiments and the banana escape valve
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-23T13:11:24.215Z · comments (1)
Monthly Roundup #26: January 2025
Zvi · 2025-01-20T15:30:08.680Z · comments (15)
Theory of Change for AI Safety Camp
Linda Linsefors · 2025-01-22T22:07:10.664Z · comments (3)
Why Aligning an LLM is Hard, and How to Make it Easier
RogerDearnaley (roger-d-1) · 2025-01-23T06:44:04.048Z · comments (2)
Agents don't have to be aligned to help us achieve an indefinite pause.
Hastings (hastings-greer) · 2025-01-25T18:51:03.523Z · comments (0)
[Cross-post] Every Bay Area "Walled Compound"
davekasten · 2025-01-23T15:05:08.629Z · comments (3)
Eliciting bad contexts
Geoffrey Irving · 2025-01-24T10:39:39.358Z · comments (2)
Arbitrage Drains Worse Markets to Feeds Better Ones
Cedar (xida-ren) · 2025-01-21T03:44:46.111Z · comments (1)
[link] Counterintuitive effects of minimum prices
dynomight · 2025-01-24T23:05:26.099Z · comments (0)
[link] You Have Two Brains
Eneasz · 2025-01-23T00:52:43.063Z · comments (5)
[question] Is the output of the softmax in a single transformer attention head usually winner-takes-all?
Linda Linsefors · 2025-01-27T15:33:28.992Z · answers+comments (0)
14+ AI Safety Advisors You Can Speak to – New AISafety.com Resource
Bryce Robertson (bryceerobertson) · 2025-01-21T17:34:02.170Z · comments (0)
Early Experiments in Human Auditing for AI Control
Joey Yudelson (JosephY) · 2025-01-23T01:34:31.682Z · comments (0)
[link] When does capability elicitation bound risk?
joshc (joshua-clymer) · 2025-01-22T03:42:36.289Z · comments (0)
next page (older posts) →