LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (64)
[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (98)
Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)
[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (58)
Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)
[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (21)
[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (89)
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (17)
Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (52)
[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)
On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (10)
[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)
[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (2)
The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (10)
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)
Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (8)
Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)
[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)
A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)
Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (12)
[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (8)
When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (61)
Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (19)
[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)
[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)
[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)
Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (7)
Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)
Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (8)
On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)
[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (25)
AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (2)
Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
Ruby · 2024-04-23T03:58:43.443Z · comments (15)
So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)
Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (10)
Experiment on repeating choices
KatjaGrace · 2024-04-19T04:20:03.992Z · comments (1)
[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)
[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)
Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)
[link] Let's Design A School, Part 1
Sable · 2024-04-23T21:50:20.937Z · comments (1)
LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (15)
Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (3)
[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)
Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (11)
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (3)
[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (23)
Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (1)
AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)
I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (12)
[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)
next page (older posts) →