LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

AI Safety is Dropping the Ball on Clown Attacks
trevor (TrevorWiesinger) · 2023-10-22T20:09:31.810Z · comments (73)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

Some Rules for an Algebra of Bayes Nets
johnswentworth · 2023-11-16T23:53:11.650Z · comments (30)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc (joshua-clymer) · 2023-11-15T19:00:41.908Z · comments (2)

Related Discussion from Thomas Kwa's MIRI Research Experience
Raemon · 2023-10-07T06:25:00.994Z · comments (140)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

[link] [Repost] The Copenhagen Interpretation of Ethics
mesaoptimizer · 2024-01-25T15:20:08.162Z · comments (4)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (65)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)

“Artificial General Intelligence”: an extremely brief FAQ
Steven Byrnes (steve2152) · 2024-03-11T17:49:02.496Z · comments (6)

[Valence series] 3. Valence & Beliefs
Steven Byrnes (steve2152) · 2023-12-11T20:21:30.570Z · comments (11)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (4)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

Update on Chinese IQ-related gene panels
Lao Mein (derpherpize) · 2023-12-14T10:12:21.212Z · comments (7)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd · 2024-05-15T19:38:03.185Z · comments (25)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (12)

Eliezer's example on Bayesian statistics is wr... oops!
Zane · 2023-10-17T18:38:18.327Z · comments (13)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

Epistemic Hell
rogersbacon · 2024-01-27T17:13:09.578Z · comments (20)

Linking Alt Accounts
jefftk (jkaufman) · 2023-10-06T17:00:09.802Z · comments (33)

Flagging Potentially Unfair Parenting
jefftk (jkaufman) · 2023-12-26T12:40:05.099Z · comments (1)

AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan · 2024-04-11T21:30:04.244Z · comments (10)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

Grokking, memorization, and generalization — a discussion
Kaarel (kh) · 2023-10-29T23:17:30.098Z · comments (11)

[link] Former OpenAI Superalignment Researcher: Superintelligence by 2030
Julian Bradshaw · 2024-06-05T03:35:19.251Z · comments (30)

[link] Paper: Understanding and Controlling a Maze-Solving Policy Network
TurnTrout · 2023-10-13T01:38:09.147Z · comments (0)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

[link] InterLab – a toolkit for experiments with multi-agent interactions
Tomáš Gavenčiak (tomas-gavenciak) · 2024-01-22T18:23:35.661Z · comments (0)

[link] Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
simeon_c (WayZ) · 2024-02-01T21:30:44.090Z · comments (17)

Multiplex Gene Editing: Where Are We Now?
sarahconstantin · 2024-07-16T20:50:04.590Z · comments (6)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

How We Picture Bayesian Agents
johnswentworth · 2024-04-08T18:12:48.595Z · comments (14)

[link] A framing for interpretability
Nina Panickssery (NinaR) · 2023-11-14T16:14:15.713Z · comments (5)

← previous page (newer posts) · next page (older posts) →

^{^}

By “slowing down”, I mean all activities and goals which are about preventing people from building lethal superpowerful AI, be it via getting them to stop, getting to go slower because they’re being more cautious, limiting what resources they can use, setting up conditions for stopping, etc.

^{^}

How to build a superpowerful AI that does what we want.

^{^}

They’re wrong about their ability to safely harness the power, but not if you could harness, you’d have a lot of very valuable stuff.

^{^}

My understanding is a lot of falsehoods were used to argue against SB1047 by e.g. a16z

^{^}

Also some people arguing for AI slowdown will fight dirty too, eroding trust in AI slowdown people, because some people think that when the stakes are high you just have to do anything to win, and are bad at consequentialist reasoning.

LessWrong 2.0 Reader

Archive

Recent comments

Further Thoughts

What I Feel Motivated To Work On