LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

There is way too much serendipity
Malmesbury (Elmer of Malmesbury) · 2024-01-19T19:37:57.068Z · comments (56)
[link] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
evhub · 2024-01-12T19:51:01.021Z · comments (94)
Gentleness and the artificial Other
Joe Carlsmith (joekc) · 2024-01-02T18:21:34.746Z · comments (33)
The case for ensuring that powerful AIs are controlled
ryan_greenblatt · 2024-01-24T16:11:51.354Z · comments (66)
MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)
Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (17)
This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (33)
The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (63)
Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)
What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)
Apologizing is a Core Rationalist Skill
johnswentworth · 2024-01-02T17:47:35.950Z · comments (42)
[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)
What good is G-factor if you're dumped in the woods? A field report from a camp counselor.
Hastings (hastings-greer) · 2024-01-12T13:17:23.829Z · comments (22)
Deep atheism and AI risk
Joe Carlsmith (joekc) · 2024-01-04T18:58:47.745Z · comments (22)
Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)
Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (7)
The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (14)
An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)
A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)
[link] Steering Llama-2 with contrastive activation additions
Nina Rimsky (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)
Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)
Four visions of Transformative AI success
Steven Byrnes (steve2152) · 2024-01-17T20:45:46.976Z · comments (22)
[link] Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings)
tailcalled · 2024-01-03T17:07:13.990Z · comments (25)
[link] Gender Exploration
sapphire (deluks917) · 2024-01-14T18:57:32.893Z · comments (23)
Being nicer than Clippy
Joe Carlsmith (joekc) · 2024-01-16T19:44:23.893Z · comments (23)
' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (15)
The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (25)
2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (34)
Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)
[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)
[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)
On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)
The Aspiring Rationalist Congregation
maia · 2024-01-10T22:52:54.298Z · comments (23)
Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (18)
Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (5)
An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)
Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)
[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)
Some Vacation Photos
johnswentworth · 2024-01-04T17:15:01.187Z · comments (0)
Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)
When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)
[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)
[link] [Repost] The Copenhagen Interpretation of Ethics
mesaoptimizer · 2024-01-25T15:20:08.162Z · comments (3)
Epistemic Hell
rogersbacon · 2024-01-27T17:13:09.578Z · comments (20)
[link] InterLab – a toolkit for experiments with multi-agent interactions
Tomáš Gavenčiak (tomas-gavenciak) · 2024-01-22T18:23:35.661Z · comments (0)
OpenAI's Preparedness Framework: Praise & Recommendations
Akash (akash-wasil) · 2024-01-02T16:20:04.249Z · comments (1)
[link] The Perceptron Controversy
Yuxi_Liu · 2024-01-10T23:07:23.341Z · comments (18)
[question] Will quantum randomness affect the 2028 election?
Thomas Kwa (thomas-kwa) · 2024-01-24T22:54:30.800Z · answers+comments (48)
We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)
next page (older posts) →