LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

2019 AI Alignment Literature Review and Charity Comparison
Larks · 2019-12-19T03:00:54.708Z · comments (18)

A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble
Kaj_Sotala · 2020-05-05T19:09:44.484Z · comments (40)

Why We Launched LessWrong.SubStack
Ben Pace (Benito) · 2021-04-01T06:34:00.907Z · comments (44)

Basic Facts about Language Model Internals
beren · 2023-01-04T13:01:35.223Z · comments (18)

A mechanistic model of meditation
Kaj_Sotala · 2019-11-06T21:37:03.819Z · comments (11)

Why Not Subagents?
johnswentworth · 2023-06-22T22:16:55.249Z · comments (36)

Evaluations (of new AI Safety researchers) can be noisy
LawrenceC (LawChan) · 2023-02-05T04:15:02.117Z · comments (10)

Externalized reasoning oversight: a research direction for language model alignment
tamera · 2022-08-03T12:03:16.630Z · comments (23)

Confused why a "capabilities research is good for alignment progress" position isn't discussed more
Kaj_Sotala · 2022-06-02T21:41:44.784Z · comments (27)

Clarifying and predicting AGI
Richard_Ngo (ricraz) · 2023-05-04T15:55:26.283Z · comments (42)

Wolf Incident Postmortem
jefftk (jkaufman) · 2023-01-09T03:20:03.723Z · comments (13)

Orexin and the quest for more waking hours
ChristianKl · 2022-09-24T19:54:56.207Z · comments (39)

The feeling of breaking an Overton window
AnnaSalamon · 2021-02-17T05:31:40.629Z · comments (29)

Response to Quintin Pope's Evolution Provides No Evidence For the Sharp Left Turn
Zvi · 2023-10-05T11:39:02.393Z · comments (29)

Misgeneralization as a misnomer
So8res · 2023-04-06T20:43:33.275Z · comments (22)

Self-sacrifice is a scarce resource
mingyuan · 2020-06-28T05:08:05.010Z · comments (18)

Graphical tensor notation for interpretability
Jordan Taylor (Nadroj) · 2023-10-04T08:04:33.341Z · comments (11)

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth · 2022-08-08T18:05:11.982Z · comments (12)

Seemingly Popular Covid-19 Model is Obvious Nonsense
Zvi · 2020-04-11T23:10:00.594Z · comments (28)

[Closed] Job Offering: Help Communicate Infrabayesianism
abramdemski · 2022-03-23T18:35:16.790Z · comments (22)

My current thoughts on the risks from SETI
Matthew Barnett (matthew-barnett) · 2022-03-15T17:18:19.722Z · comments (27)

[link] [Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy
Vika · 2023-03-07T11:55:01.131Z · comments (13)

[link] Tales from Prediction Markets
ike · 2021-04-03T23:38:22.728Z · comments (15)

Tools for keeping focused
benkuhn · 2020-08-05T02:10:08.707Z · comments (26)

Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)

Only Asking Real Questions
jefftk (jkaufman) · 2022-04-14T15:50:02.970Z · comments (45)

Intergenerational trauma impeding cooperative existential safety efforts
Andrew_Critch · 2022-06-03T08:13:25.439Z · comments (29)

Third Time: a better way to work
bfinn · 2022-01-07T21:15:57.789Z · comments (74)

My Overview of the AI Alignment Landscape: A Bird's Eye View
Neel Nanda (neel-nanda-1) · 2021-12-15T23:44:31.873Z · comments (9)

The 99% principle for personal problems
Kaj_Sotala · 2023-10-02T08:20:07.379Z · comments (20)

I left Russia on March 8
avturchin · 2022-03-10T20:05:59.650Z · comments (16)

How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (75)

COVID Skepticism Isn't About Science
jaspax · 2021-12-29T17:53:43.354Z · comments (76)

A Longlist of Theories of Impact for Interpretability
Neel Nanda (neel-nanda-1) · 2022-03-11T14:55:35.356Z · comments (35)

"Pivotal Acts" means something specific
Raemon · 2022-06-07T21:56:00.574Z · comments (23)

Clarifying AI X-risk
zac_kenton (zkenton) · 2022-11-01T11:03:01.144Z · comments (24)

Luna Lovegood and the Chamber of Secrets - Part 3
lsusr · 2020-12-01T12:43:42.647Z · comments (11)

Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (7)

Don't Dismiss Simple Alignment Approaches
Chris_Leong · 2023-10-07T00:35:26.789Z · comments (8)

[link] Introducing Fatebook: the fastest way to make and track predictions
Adam B (adam-b) · 2023-07-11T15:28:13.798Z · comments (34)

The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (14)

On the Diplomacy AI
Zvi · 2022-11-28T13:20:00.884Z · comments (29)

Insights from Euclid's 'Elements'
TurnTrout · 2020-05-04T15:45:30.711Z · comments (17)

Long covid: probably worth avoiding—some considerations
KatjaGrace · 2022-01-16T11:46:52.087Z · comments (88)

[link] The Hubinger lectures on AGI safety: an introductory lecture series
evhub · 2023-06-22T00:59:27.820Z · comments (0)

Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning
Zack_M_Davis · 2020-06-07T07:52:09.143Z · comments (16)

[link] FLI open letter: Pause giant AI experiments
Zach Stein-Perlman · 2023-03-29T04:04:23.333Z · comments (123)

[link] Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave · 2023-07-20T17:31:35.814Z · comments (21)

Shared reality: a key driver of human behavior
kdbscott · 2022-12-24T19:35:51.126Z · comments (25)

ARC is hiring theoretical researchers
paulfchristiano · 2023-06-12T18:50:08.232Z · comments (12)

← previous page (newer posts) · next page (older posts) →

^{^}

Of course, you need an argument that "making AIs aligned with user intent" eventually leads to "AIs with humane values", but I think the straightforward argument goes through -- i.e. it seems that a lot of the immediate risk comes from AIs that aren't doing what their users intended, and having AIs that are aligned with user intent seems really helpful for tackling the tricky ambitious value learning problem.

LessWrong 2.0 Reader

Archive

Recent comments