LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)
The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (25)
The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (22)
[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)
[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (43)
[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)
A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)
New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (63)
Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)
2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (34)
The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)
Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)
[link] A Chess-GPT Linear Emergent World Representation
karvonenadam · 2024-02-08T04:25:15.222Z · comments (14)
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)
A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (10)
Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (177)
[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)
General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)
Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)
Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)
[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (53)
[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)
Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (25)
Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)
[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)
[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)
[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)
[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)
Simple versus Short: Higher-order degeneracy and error-correction
Daniel Murfet (dmurfet) · 2024-03-11T07:52:46.307Z · comments (5)
Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)
[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)
Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)
[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)
Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)
OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)
Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (58)
On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)
OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)
How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (10)
[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)
Everything Wrong with Roko's Claims about an Engineered Pandemic
EZ97 · 2024-02-22T15:59:08.439Z · comments (10)
[link] New report: Safety Cases for AI
joshc (joshua-clymer) · 2024-03-20T16:45:27.984Z · comments (13)
[link] Introducing METR's Autonomy Evaluation Resources
Megan Kinniment (megan-kinniment) · 2024-03-15T23:16:59.696Z · comments (0)
[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (23)
Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)
Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)
story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)
SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (15)
On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)
Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)
← previous page (newer posts) · next page (older posts) →