LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)
Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)
An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)
[link] Takeaways from sketching a control safety case
joshc (joshua-clymer) · 2025-01-31T04:43:45.917Z · comments (0)
Is AI Alignment Enough?
Aram Panasenco (panasenco) · 2025-01-10T18:57:48.409Z · comments (6)
Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)
Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)
[link] Impact in AI Safety Now Requires Specific Strategic Insight
MiloSal (milosal) · 2024-12-29T00:40:53.780Z · comments (1)
[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)
[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)
[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)
Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)
First Solo Bus Ride
jefftk (jkaufman) · 2024-12-03T12:20:02.344Z · comments (1)
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)
Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)
the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)
Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)
AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)
Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)
Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)
[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)
[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)
[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)
Infra-Bayesian haggling
hannagabor (hanna-gabor) · 2024-05-20T12:23:30.165Z · comments (0)
[link] What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:42:07.215Z · comments (6)
Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)
Probably Not a Ghost Story
George Ingebretsen (george-ingebretsen) · 2024-06-12T22:55:26.264Z · comments (4)
[link] The Takeoff Speeds Model Predicts We May Be Entering Crunch Time
johncrox · 2025-02-21T02:26:31.768Z · comments (0)
Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)
Monthly Roundup #27: February 2025
Zvi · 2025-02-17T14:10:06.486Z · comments (3)
Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)
Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)
Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (29)
Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (2)
An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)
Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (3)
Chicanery: No
Screwtape · 2025-02-06T05:42:45.095Z · comments (10)
Knitting a Sweater in a Burning House
CrimsonChin · 2025-02-15T19:50:33.275Z · comments (2)
Celtic Knots on a hex lattice
Ben (ben-lang) · 2025-02-14T14:29:08.223Z · comments (10)
Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)
Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)
[link] Altman blog on post-AGI world
Julian Bradshaw · 2025-02-09T21:52:30.631Z · comments (10)
Export Surplusses
lsusr · 2025-02-24T05:53:23.422Z · comments (16)
A City Within a City
Declan Molony (declan-molony) · 2025-02-24T15:51:19.118Z · comments (1)
The Foraging (Ex-)Bandit [Ruleset & Reflections]
abstractapplic · 2024-11-14T20:16:21.535Z · comments (3)
Early Experiments in Human Auditing for AI Control
Joey Yudelson (JosephY) · 2025-01-23T01:34:31.682Z · comments (0)
Towards building blocks of ontologies
Daniel C (harper-owen) · 2025-02-08T16:03:29.854Z · comments (0)
Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)
[link] debating buying NVDA in 2019
bhauth · 2025-01-04T05:06:54.047Z · comments (0)
[link] Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain · 2024-12-24T09:57:55.111Z · comments (4)
← previous page (newer posts) · next page (older posts) →