LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (156)

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (79)

[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)

The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (123)

[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (103)

Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (59)

Failures in Kindness
silentbob · 2024-03-26T21:30:11.052Z · comments (27)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (89)

Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (186)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (66)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (90)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (7)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (21)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (40)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (18)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (26)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (33)

LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (11)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (31)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (15)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (11)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (17)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (11)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (15)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (5)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (24)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (14)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (62)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Coherence of Caches and Agents
johnswentworth · 2024-04-01T23:04:31.320Z · comments (7)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

next page (older posts) →

^{^}

I'm not sure this is true in the limit (e.g. it seems plausible to me that the Solomonoff prior is malign). But it's most likely true in the next few years and plausibly true in all practical cases that we might consider.

LessWrong 2.0 Reader

Archive

Recent comments