LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

State of Generally Available Self-Driving
jefftk (jkaufman) · 2023-08-22T18:50:01.166Z · comments (6)
Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)
[link] Open Problems and Fundamental Limitations of RLHF
scasper · 2023-07-31T15:31:28.916Z · comments (6)
Steven Wolfram on AI Alignment
Bill Benzon (bill-benzon) · 2023-08-20T19:49:28.953Z · comments (15)
How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)
Managing risks of our own work
Beth Barnes (beth-barnes) · 2023-08-18T00:41:30.832Z · comments (0)
Preventing model exfiltration with upload limits
ryan_greenblatt · 2024-02-06T16:29:33.999Z · comments (21)
AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)
[link] AI Safety Hub Serbia Soft Launch
DusanDNesic · 2023-10-20T07:11:48.389Z · comments (1)
[link] How LDT helps reduce the AI arms race
Tamsin Leake (carado-1) · 2023-12-10T16:21:44.409Z · comments (13)
[question] Will quantum randomness affect the 2028 election?
Thomas Kwa (thomas-kwa) · 2024-01-24T22:54:30.800Z · answers+comments (52)
Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (4)
METR is hiring!
Beth Barnes (beth-barnes) · 2023-12-26T21:00:50.625Z · comments (1)
Implementing activation steering
Annah (annah) · 2024-02-05T17:51:55.851Z · comments (7)
How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)
List of how people have become more hard-working
Chi Nguyen · 2023-09-29T11:30:38.802Z · comments (7)
Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)
AI #29: Take a Deep Breath
Zvi · 2023-09-14T12:00:03.818Z · comments (21)
[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)
AI Regulation May Be More Important Than AI Alignment For Existential Safety
otto.barten (otto-barten) · 2023-08-24T11:41:54.690Z · comments (39)
[link] The Perceptron Controversy
Yuxi_Liu · 2024-01-10T23:07:23.341Z · comments (18)
[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)
2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (10)
Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (22)
Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)
Interpreting and Steering Features in Images
Gytis Daujotas (gytis-daujotas) · 2024-06-20T18:33:59.512Z · comments (6)
“Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them
Nora_Ammann · 2023-08-20T09:13:34.225Z · comments (4)
[link] So you want to save the world? An account in paladinhood
Tamsin Leake (carado-1) · 2023-11-22T17:40:33.048Z · comments (19)
Aumann-agreement is common
tailcalled · 2023-08-26T20:22:03.738Z · comments (31)
[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)
How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)
[link] The Gods of Straight Lines
Richard_Ngo (ricraz) · 2023-10-14T04:10:50.020Z · comments (13)
[link] GPT-4 for personal productivity: online distraction blocker
Sergii (sergey-kharagorgiev) · 2023-09-26T17:41:31.031Z · comments (12)
Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)
[link] Understanding strategic deception and deceptive alignment
Marius Hobbhahn (marius-hobbhahn) · 2023-09-25T16:27:47.357Z · comments (16)
Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (11)
[link] A free to enter, 240 character, open-source iterated prisoner's dilemma tournament
Isaac King (KingSupernova) · 2023-11-09T08:24:43.277Z · comments (19)
Interpretability Externalities Case Study - Hungry Hungry Hippos
Magdalena Wache · 2023-09-20T14:42:44.371Z · comments (22)
[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)
A Social History of Truth
Vaniver · 2023-07-31T22:49:23.209Z · comments (2)
[link] What Does a Marginal Grant at LTFF Look Like? Funding Priorities and Grantmaking Thresholds at the Long-Term Future Fund
Linch · 2023-08-11T03:59:51.757Z · comments (0)
A to Z of things
KatjaGrace · 2023-11-17T05:20:03.134Z · comments (6)
Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)
Announcing New Beginner-friendly Book on AI Safety and Risk
Darren McKee · 2023-11-25T15:57:08.078Z · comments (2)
a rant on politician-engineer coalitional conflict
bhauth · 2023-09-04T17:15:25.765Z · comments (12)
On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)
Ideas for improving epistemics in AI safety outreach
mic (michael-chen) · 2023-08-21T19:55:45.654Z · comments (6)
"Is There Anything That's Worth More"
Zack_M_Davis · 2023-08-02T03:28:16.116Z · comments (6)
Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)
On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)
← previous page (newer posts) · next page (older posts) →