LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (158)
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (83)
[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)
[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (114)
Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)
[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (59)
Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (5)
Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (76)
OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)
[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (93)
[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (21)
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (18)
Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (41)
On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (14)
Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (34)
[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)
LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)
My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (31)
RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)
[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)
A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)
The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (11)
[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)
[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)
[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)
Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)
Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)
A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (5)
[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)
Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)
When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (62)
A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)
Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (7)
A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)
Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (12)
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (8)
Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)
Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)
[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)
[link] The 2nd Demographic Transition
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-06T14:10:13.095Z · comments (17)
Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (19)
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (7)
[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)
Generalized Stat Mech: The Boltzmann Approach
David Lorell · 2024-04-12T17:47:31.880Z · comments (7)
AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)
Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (15)
How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (11)
Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)
next page (older posts) →