LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The case for becoming a black-box investigator of language models
Buck · 2022-05-06T14:35:24.630Z · comments (20)
Shared reality: a key driver of human behavior
kdbscott · 2022-12-24T19:35:51.126Z · comments (25)
Insights from Euclid's 'Elements'
TurnTrout · 2020-05-04T15:45:30.711Z · comments (17)
Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems
Vaniver · 2023-02-17T20:11:39.255Z · comments (12)
A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)
Niceness is unnatural
So8res · 2022-10-13T01:30:02.046Z · comments (20)
AI Safety "Success Stories"
Wei Dai (Wei_Dai) · 2019-09-07T02:54:15.003Z · comments (27)
One Minute Every Moment
abramdemski · 2023-09-01T20:23:56.391Z · comments (23)
[link] Gene drives: why the wait?
Metacelsus · 2022-09-19T23:37:17.595Z · comments (50)
Baking is Not a Ritual
Sisi Cheng (sisi-cheng) · 2020-05-25T18:08:24.836Z · comments (28)
An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)
From fear to excitement
Richard_Ngo (ricraz) · 2023-05-15T06:23:18.656Z · comments (9)
A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (8)
Selection Theorems: A Program For Understanding Agents
johnswentworth · 2021-09-28T05:03:19.316Z · comments (28)
[link] Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
eukaryote · 2022-08-04T20:37:59.388Z · comments (15)
[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)
Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis · 2023-12-15T20:16:09.723Z · comments (157)
[link] Steering Llama-2 with contrastive activation additions
Nina Panickssery (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)
"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)
[link] Bayesian Injustice
Kevin Dorst · 2023-12-14T15:44:08.664Z · comments (10)
Parasites (not a metaphor)
lemonhope (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (17)
[link] When discussing AI risks, talk about capabilities, not intelligence
Vika · 2023-08-11T13:38:48.844Z · comments (7)
There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
Taran · 2023-02-19T12:25:52.212Z · comments (34)
The Wicked Problem Experience
HoldenKarnofsky · 2022-03-02T17:50:18.621Z · comments (6)
Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-06-19T21:11:03.505Z · comments (70)
Transcript: "You Should Read HPMOR"
TurnTrout · 2021-11-02T18:20:53.161Z · comments (12)
Deconfusing Direct vs Amortised Optimization
beren · 2022-12-02T11:30:46.754Z · comments (19)
A Longlist of Theories of Impact for Interpretability
Neel Nanda (neel-nanda-1) · 2022-03-11T14:55:35.356Z · comments (41)
My Effortless Weightloss Story: A Quick Runthrough
CuoreDiVetro · 2023-09-30T23:02:45.128Z · comments (78)
High schoolers can apply to the Atlas Fellowship: $50k scholarship + summer program
sydney (sydney-von-arx) · 2022-04-03T00:53:05.397Z · comments (18)
Explaining the Twitter Postrat Scene
Jacob Falkovich (Jacobian) · 2022-04-05T22:23:27.125Z · comments (28)
[question] What do coherence arguments actually prove about agentic behavior?
sunwillrise (andrei-alexandru-parfeni) · 2024-06-01T09:37:28.451Z · answers+comments (35)
[link] Introducing the Center for AI Policy (& we're hiring!)
Thomas Larsen (thomas-larsen) · 2023-08-28T21:17:11.703Z · comments (50)
Reward Is Not Enough
Steven Byrnes (steve2152) · 2021-06-16T13:52:33.745Z · comments (19)
Induction heads - illustrated
CallumMcDougall (TheMcDouglas) · 2023-01-02T15:35:20.550Z · comments (10)
[link] The 300-year journey to the covid vaccine
jasoncrawford · 2020-11-09T23:06:45.790Z · comments (9)
$250 prize for checking Jake Cannell's Brain Efficiency
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-04-26T16:21:06.035Z · comments (170)
[link] Did ChatGPT just gaslight me?
TW123 (ThomasWoodside) · 2022-12-01T05:41:46.560Z · comments (45)
In Defense of Chatbot Romance
Kaj_Sotala · 2023-02-11T14:30:05.696Z · comments (52)
BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)
Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (18)
Natural Latents: The Math
johnswentworth · 2023-12-27T19:03:01.923Z · comments (40)
Movable Housing for Scalable Cities
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2020-05-15T21:21:05.395Z · comments (28)
Deep Forgetting & Unlearning for Safely-Scoped LLMs
scasper · 2023-12-05T16:48:18.177Z · comments (30)
Law of No Evidence
Zvi · 2021-12-20T13:50:01.189Z · comments (20)
Book review: The Checklist Manifesto
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2021-09-17T23:09:09.590Z · comments (13)
How bad a future do ML researchers expect?
KatjaGrace · 2023-03-09T04:50:05.122Z · comments (8)
Book review: "A Thousand Brains" by Jeff Hawkins
Steven Byrnes (steve2152) · 2021-03-04T05:10:44.929Z · comments (18)
Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (22)
← previous page (newer posts) · next page (older posts) →