LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

2019 AI Alignment Literature Review and Charity Comparison
Larks · 2019-12-19T03:00:54.708Z · comments (18)
A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble
Kaj_Sotala · 2020-05-05T19:09:44.484Z · comments (40)
Why We Launched LessWrong.SubStack
Ben Pace (Benito) · 2021-04-01T06:34:00.907Z · comments (44)
Basic Facts about Language Model Internals
beren · 2023-01-04T13:01:35.223Z · comments (18)
A mechanistic model of meditation
Kaj_Sotala · 2019-11-06T21:37:03.819Z · comments (11)
Why Not Subagents?
johnswentworth · 2023-06-22T22:16:55.249Z · comments (36)
Evaluations (of new AI Safety researchers) can be noisy
LawrenceC (LawChan) · 2023-02-05T04:15:02.117Z · comments (10)
Externalized reasoning oversight: a research direction for language model alignment
tamera · 2022-08-03T12:03:16.630Z · comments (23)
Confused why a "capabilities research is good for alignment progress" position isn't discussed more
Kaj_Sotala · 2022-06-02T21:41:44.784Z · comments (27)
Clarifying and predicting AGI
Richard_Ngo (ricraz) · 2023-05-04T15:55:26.283Z · comments (42)
Wolf Incident Postmortem
jefftk (jkaufman) · 2023-01-09T03:20:03.723Z · comments (13)
Orexin and the quest for more waking hours
ChristianKl · 2022-09-24T19:54:56.207Z · comments (39)
The feeling of breaking an Overton window
AnnaSalamon · 2021-02-17T05:31:40.629Z · comments (29)
Response to Quintin Pope's Evolution Provides No Evidence For the Sharp Left Turn
Zvi · 2023-10-05T11:39:02.393Z · comments (29)
Misgeneralization as a misnomer
So8res · 2023-04-06T20:43:33.275Z · comments (22)
Self-sacrifice is a scarce resource
mingyuan · 2020-06-28T05:08:05.010Z · comments (18)
Graphical tensor notation for interpretability
Jordan Taylor (Nadroj) · 2023-10-04T08:04:33.341Z · comments (11)
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth · 2022-08-08T18:05:11.982Z · comments (12)
Seemingly Popular Covid-19 Model is Obvious Nonsense
Zvi · 2020-04-11T23:10:00.594Z · comments (28)
[Closed] Job Offering: Help Communicate Infrabayesianism
abramdemski · 2022-03-23T18:35:16.790Z · comments (22)
My current thoughts on the risks from SETI
Matthew Barnett (matthew-barnett) · 2022-03-15T17:18:19.722Z · comments (27)
[link] [Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Vika · 2023-03-07T11:55:01.131Z · comments (13)
[link] Tales from Prediction Markets
ike · 2021-04-03T23:38:22.728Z · comments (15)
Tools for keeping focused
benkuhn · 2020-08-05T02:10:08.707Z · comments (26)
Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)
Only Asking Real Questions
jefftk (jkaufman) · 2022-04-14T15:50:02.970Z · comments (45)
Intergenerational trauma impeding cooperative existential safety efforts
Andrew_Critch · 2022-06-03T08:13:25.439Z · comments (29)
Third Time: a better way to work
bfinn · 2022-01-07T21:15:57.789Z · comments (74)
My Overview of the AI Alignment Landscape: A Bird's Eye View
Neel Nanda (neel-nanda-1) · 2021-12-15T23:44:31.873Z · comments (9)
The 99% principle for personal problems
Kaj_Sotala · 2023-10-02T08:20:07.379Z · comments (20)
I left Russia on March 8
avturchin · 2022-03-10T20:05:59.650Z · comments (16)
How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (75)
COVID Skepticism Isn't About Science
jaspax · 2021-12-29T17:53:43.354Z · comments (76)
A Longlist of Theories of Impact for Interpretability
Neel Nanda (neel-nanda-1) · 2022-03-11T14:55:35.356Z · comments (35)
"Pivotal Acts" means something specific
Raemon · 2022-06-07T21:56:00.574Z · comments (23)
Clarifying AI X-risk
zac_kenton (zkenton) · 2022-11-01T11:03:01.144Z · comments (24)
Luna Lovegood and the Chamber of Secrets - Part 3
lsusr · 2020-12-01T12:43:42.647Z · comments (11)
Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (7)
Don't Dismiss Simple Alignment Approaches
Chris_Leong · 2023-10-07T00:35:26.789Z · comments (8)
[link] Introducing Fatebook: the fastest way to make and track predictions
Adam B (adam-b) · 2023-07-11T15:28:13.798Z · comments (34)
The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (14)
On the Diplomacy AI
Zvi · 2022-11-28T13:20:00.884Z · comments (29)
Insights from Euclid's 'Elements'
TurnTrout · 2020-05-04T15:45:30.711Z · comments (17)
Long covid: probably worth avoiding—some considerations
KatjaGrace · 2022-01-16T11:46:52.087Z · comments (88)
[link] The Hubinger lectures on AGI safety: an introductory lecture series
evhub · 2023-06-22T00:59:27.820Z · comments (0)
Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning
Zack_M_Davis · 2020-06-07T07:52:09.143Z · comments (16)
[link] FLI open letter: Pause giant AI experiments
Zach Stein-Perlman · 2023-03-29T04:04:23.333Z · comments (123)
[link] Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave · 2023-07-20T17:31:35.814Z · comments (21)
Shared reality: a key driver of human behavior
kdbscott · 2022-12-24T19:35:51.126Z · comments (25)
ARC is hiring theoretical researchers
paulfchristiano · 2023-06-12T18:50:08.232Z · comments (12)
← previous page (newer posts) · next page (older posts) →