LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Simulators
janus · 2022-09-02T12:45:33.723Z · comments (161)

The Redaction Machine
Ben (ben-lang) · 2022-09-20T22:03:15.309Z · comments (46)

Losing the root for the tree
Adam Zerner (adamzerner) · 2022-09-20T04:53:53.435Z · comments (30)

You Are Not Measuring What You Think You Are Measuring
johnswentworth · 2022-09-20T20:04:22.899Z · comments (44)

Why I think strong general AI is coming soon
porby · 2022-09-28T05:40:38.395Z · comments (139)

Announcing Balsa Research
Zvi · 2022-09-25T22:50:00.626Z · comments (64)

The shard theory of human values
Quintin Pope (quintin-pope) · 2022-09-04T04:28:11.752Z · comments (66)

How I buy things when Lightcone wants them fast
jacobjacob · 2022-09-26T05:02:09.003Z · comments (21)

How my team at Lightcone sometimes gets stuff done
jacobjacob · 2022-09-19T05:47:06.787Z · comments (43)

7 traps that (we think) new alignment researchers often fall into
Akash (akash-wasil) · 2022-09-27T23:13:46.697Z · comments (10)

Do bamboos set themselves on fire?
Malmesbury (Elmer of Malmesbury) · 2022-09-19T15:34:13.574Z · comments (14)

Most People Start With The Same Few Bad Ideas
johnswentworth · 2022-09-09T00:29:12.740Z · comments (30)

The Onion Test for Personal and Institutional Honesty
chanamessinger (cmessinger) · 2022-09-27T15:26:34.567Z · comments (31)

Public-facing Censorship Is Safety Theater, Causing Reputational Damage
Yitz (yitz) · 2022-09-23T05:08:14.149Z · comments (42)

AI coordination needs clear wins
evhub · 2022-09-01T23:41:48.334Z · comments (16)

Takeaways from our robust injury classifier project [Redwood Research]
dmz (DMZ) · 2022-09-17T03:55:25.868Z · comments (12)

Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor · 2022-09-28T01:20:11.605Z · comments (19)

Understanding Infra-Bayesianism: A Beginner-Friendly Video Series
Jack Parker · 2022-09-22T13:25:04.254Z · comments (6)

Interpreting Neural Networks through the Polytope Lens
Sid Black (sid-black) · 2022-09-23T17:58:30.639Z · comments (29)

Monitoring for deceptive alignment
evhub · 2022-09-08T23:07:03.327Z · comments (8)

Orexin and the quest for more waking hours
ChristianKl · 2022-09-24T19:54:56.207Z · comments (39)

An Update on Academia vs. Industry (one year into my faculty job)
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2022-09-03T20:43:37.701Z · comments (18)

[link] Gene drives: why the wait?
Metacelsus · 2022-09-19T23:37:17.595Z · comments (50)

LW Petrov Day 2022 (Monday, 9/26)
Ruby · 2022-09-22T02:56:19.738Z · comments (111)

Quintin's alignment papers roundup - week 1
Quintin Pope (quintin-pope) · 2022-09-10T06:39:01.773Z · comments (6)

Announcing $5,000 bounty for (responsibly) ending malaria
lc · 2022-09-24T04:28:22.189Z · comments (40)

Rejected Early Drafts of Newcomb's Problem
zahmahkibo · 2022-09-06T19:04:54.284Z · comments (5)

Petrov Day Retrospective: 2022
Ruby · 2022-09-28T22:16:20.325Z · comments (41)

Understanding Conjecture: Notes from Connor Leahy interview
Akash (akash-wasil) · 2022-09-15T18:37:51.653Z · comments (23)

[link] My emotional reaction to the current funding situation
Sam F. Brown (sam-4) · 2022-09-09T22:02:46.301Z · comments (36)

Ukraine Post #12
Zvi · 2022-09-22T14:40:03.753Z · comments (3)

Funding is All You Need: Getting into Grad School by Hacking the NSF GRFP Fellowship
hapanin · 2022-09-22T21:39:15.399Z · comments (9)

Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes (beth-barnes) · 2022-09-09T22:46:47.569Z · comments (7)

[link] [Linkpost] A survey on over 300 works about interpretability in deep networks
scasper · 2022-09-12T19:07:09.156Z · comments (7)

[link] Inverse Scaling Prize: Round 1 Winners
Ethan Perez (ethan-perez) · 2022-09-26T19:57:01.367Z · comments (16)

The ethics of reclining airplane seats
braces · 2022-09-04T17:59:51.347Z · comments (70)

[link] Linkpost: Github Copilot productivity experiment
Daniel Kokotajlo (daniel-kokotajlo) · 2022-09-08T04:41:41.496Z · comments (4)

[link] Why we're not founding a human-data-for-alignment org
L Rudolf L (LRudL) · 2022-09-27T20:14:45.393Z · comments (5)

Let's Terraform West Texas
blackstampede · 2022-09-04T16:24:15.151Z · comments (33)

Nearcast-based "deployment problem" analysis
HoldenKarnofsky · 2022-09-21T18:52:22.674Z · comments (2)

Towards deconfusing wireheading and reward maximization
leogao · 2022-09-21T00:36:43.244Z · comments (7)

[link] Dath Ilan's Views on Stopgap Corrigibility
David Udell · 2022-09-22T16:16:07.467Z · comments (19)

AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022
Sam Bowman (sbowman) · 2022-09-01T19:15:40.713Z · comments (2)

Builder/Breaker for Deconfusion
abramdemski · 2022-09-29T17:36:37.725Z · comments (9)

Bugs or Features?
qbolec · 2022-09-03T07:04:09.702Z · comments (9)

Ambiguity in Prediction Market Resolution is Harmful
aphyer · 2022-09-26T16:22:48.809Z · comments (17)

Alignment Org Cheat Sheet
Akash (akash-wasil) · 2022-09-20T17:36:58.708Z · comments (8)

Solar Blackout Resistance
jefftk (jkaufman) · 2022-09-08T13:30:01.207Z · comments (32)

[link] Toy Models of Superposition
evhub · 2022-09-21T23:48:03.072Z · comments (4)

Stop Discouraging Microwave Formula Preparation
jefftk (jkaufman) · 2022-09-02T02:10:01.185Z · comments (12)

next page (older posts) →

^{^}

Of course, you need an argument that "making AIs aligned with user intent" eventually leads to "AIs with humane values", but I think the straightforward argument goes through -- i.e. it seems that a lot of the immediate risk comes from AIs that aren't doing what their users intended, and having AIs that are aligned with user intent seems really helpful for tackling the tricky ambitious value learning problem.

^{^}

I'm not sure this is true in the limit (e.g. it seems plausible to me that the Solomonoff prior is malign). But it's most likely true in the next few years and plausibly true in all practical cases that we might consider.

LessWrong 2.0 Reader

Archive

Recent comments