LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen (thomas-larsen) · 2022-08-29T01:23:58.073Z · comments (89)
DeepMind alignment team opinions on AGI ruin arguments
Vika · 2022-08-12T21:06:40.582Z · comments (37)
[link] A Mechanistic Interpretability Analysis of Grokking
Neel Nanda (neel-nanda-1) · 2022-08-15T02:41:36.245Z · comments (47)
Two-year update on my personal AI timelines
Ajeya Cotra (ajeya-cotra) · 2022-08-02T23:07:48.698Z · comments (60)
Common misconceptions about OpenAI
Jacob_Hilton · 2022-08-25T14:02:26.257Z · comments (142)
What do ML researchers think about AI in 2022?
KatjaGrace · 2022-08-04T15:40:05.024Z · comments (33)
Worlds Where Iterative Design Fails
johnswentworth · 2022-08-30T20:48:29.025Z · comments (30)
Language models seem to be much better than humans at next-token prediction
Buck · 2022-08-11T17:45:41.294Z · comments (59)
How To Go From Interpretability To Alignment: Just Retarget The Search
johnswentworth · 2022-08-10T16:08:11.402Z · comments (33)
Some conceptual alignment research projects
Richard_Ngo (ricraz) · 2022-08-25T22:51:33.478Z · comments (15)
Shard Theory: An Overview
David Udell · 2022-08-11T05:44:52.852Z · comments (34)
Nate Soares' Life Advice
CatGoddess · 2022-08-23T02:46:43.369Z · comments (41)
Your posts should be on arXiv
JanB (JanBrauner) · 2022-08-25T10:35:12.087Z · comments (44)
What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth · 2022-08-15T22:48:38.671Z · comments (18)
The Parable of the Boy Who Cried 5% Chance of Wolf
KatWoods (ea247) · 2022-08-15T14:33:21.649Z · comments (24)
How might we align transformative AI if it’s developed very soon?
HoldenKarnofsky · 2022-08-29T15:42:08.985Z · comments (55)
Externalized reasoning oversight: a research direction for language model alignment
tamera · 2022-08-03T12:03:16.630Z · comments (23)
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth · 2022-08-08T18:05:11.982Z · comments (12)
[link] Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
eukaryote · 2022-08-04T20:37:59.388Z · comments (15)
Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker (D0TheMath) · 2022-08-26T18:26:47.667Z · comments (48)
Meditation course claims 65% enlightenment rate: my review
KatWoods (ea247) · 2022-08-01T11:25:37.017Z · comments (33)
[link] The lessons of Xanadu
jasoncrawford · 2022-08-07T17:59:57.839Z · comments (20)
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie (naimenz) · 2022-08-24T18:37:00.419Z · comments (4)
The alignment problem from a deep learning perspective
Richard_Ngo (ricraz) · 2022-08-10T22:46:46.752Z · comments (15)
How likely is deceptive alignment?
evhub · 2022-08-30T19:34:25.997Z · comments (28)
Announcing Encultured AI: Building a Video Game
Andrew_Critch · 2022-08-18T02:16:26.726Z · comments (26)
[link] everything is okay
Tamsin Leake (carado-1) · 2022-08-23T09:20:33.250Z · comments (22)
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth · 2022-08-12T16:30:24.060Z · comments (50)
Introducing Pastcasting: A tool for forecasting practice
Sage Future (aaron-ho-1) · 2022-08-11T17:38:06.474Z · comments (10)
Survey advice
KatjaGrace · 2022-08-24T03:10:21.424Z · comments (11)
Rant on Problem Factorization for Alignment
johnswentworth · 2022-08-05T19:23:24.262Z · comments (51)
Less Threat-Dependent Bargaining Solutions?? (3/2)
Diffractor · 2022-08-20T02:19:11.405Z · comments (7)
How to do theoretical research, a personal perspective
Mark Xu (mark-xu) · 2022-08-19T19:41:21.562Z · comments (6)
[question] Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout · 2022-08-11T22:22:32.198Z · answers+comments (42)
High Reliability Orgs, and AI Companies
Raemon · 2022-08-04T05:45:34.928Z · comments (7)
[link] Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika · 2022-08-12T15:17:38.304Z · comments (4)
I’m mildly skeptical that blindness prevents schizophrenia
Steven Byrnes (steve2152) · 2022-08-15T23:36:59.003Z · comments (9)
[link] Most Ivy-smart students aren't at Ivy-tier schools
Aaron Bergman (aaronb50) · 2022-08-07T03:18:02.298Z · comments (7)
[link] Paper is published! 100,000 lumens to treat seasonal affective disorder
Fabienne · 2022-08-20T19:48:29.687Z · comments (3)
«Boundaries», Part 2: trends in EA's handling of boundaries
Andrew_Critch · 2022-08-06T00:42:48.744Z · comments (14)
The Loire Is Not Dry
jefftk (jkaufman) · 2022-08-20T13:40:01.237Z · comments (2)
What's the Least Impressive Thing GPT-4 Won't be Able to Do
Algon · 2022-08-20T19:48:14.811Z · comments (125)
Human Mimicry Mainly Works When We’re Already Close
johnswentworth · 2022-08-17T18:41:18.140Z · comments (16)
AI strategy nearcasting
HoldenKarnofsky · 2022-08-25T17:26:28.455Z · comments (4)
Evolution is a bad analogy for AGI: inner alignment
Quintin Pope (quintin-pope) · 2022-08-13T22:15:57.223Z · comments (15)
How (not) to choose a research project
Garrett Baker (D0TheMath) · 2022-08-09T00:26:37.045Z · comments (11)
The Core of the Alignment Problem is...
Thomas Larsen (thomas-larsen) · 2022-08-17T20:07:35.157Z · comments (10)
Announcing the Introduction to ML Safety course
Dan H (dan-hendrycks) · 2022-08-06T02:46:00.295Z · comments (6)
Discovering Agents
zac_kenton (zkenton) · 2022-08-18T17:33:43.317Z · comments (11)
[question] COVID-19 Group Testing Post-mortem?
gwern · 2022-08-05T16:32:55.157Z · answers+comments (6)
next page (older posts) →