LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (156)
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (79)
[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)
The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (123)
[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (103)
Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)
[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (59)
Failures in Kindness
silentbob · 2024-03-26T21:30:11.052Z · comments (27)
Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (89)
Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)
My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (186)
[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)
OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)
Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (66)
[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (90)
[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (7)
[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (21)
Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (40)
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (18)
Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (26)
[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)
Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (33)
LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)
On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (11)
RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)
My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (31)
[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)
My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (15)
A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)
The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (11)
[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (17)
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)
[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)
[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)
Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (11)
SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (15)
Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)
Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)
Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)
A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (5)
[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (24)
[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (14)
[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)
Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)
[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)
A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)
When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (62)
ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)
Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (7)
[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)
next page (older posts) →