LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (4)
Maintaining Alignment during RSI as a Feedback Control Problem
beren · 2025-03-02T00:21:43.432Z · comments (4)
Open problems in emergent misalignment
Jan Betley (jan-betley) · 2025-03-01T09:47:58.889Z · comments (3)
Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (2)
[question] Will LLM agents become the first takeover-capable AGIs?
Seth Herd · 2025-03-02T17:15:37.056Z · answers+comments (6)
[link] Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis (adam-scherlis) · 2025-03-01T02:11:56.313Z · comments (5)
Cautions about LLMs in Human Cognitive Loops
Alice Blair (Diatom) · 2025-03-02T19:53:10.253Z · comments (3)
[question] Share AI Safety Ideas: Both Crazy and Not
ank · 2025-03-01T19:08:25.605Z · answers+comments (22)
Open Thread Spring 2025
Ben Pace (Benito) · 2025-03-02T02:33:16.307Z · comments (1)
[link] Historiographical Compressions: Renaissance as An Example
adamShimi · 2025-03-01T18:21:42.586Z · comments (2)
AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
DanielFilan · 2025-03-01T01:20:04.778Z · comments (0)
Saving Zest
jefftk (jkaufman) · 2025-03-02T12:00:41.732Z · comments (1)
Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week
spencerg · 2025-03-02T18:01:32.880Z · comments (0)
Real-Time Gigstats
jefftk (jkaufman) · 2025-03-01T14:10:41.060Z · comments (0)
[question] Request for Comments on AI-related Prediction Market Ideas
PeterMcCluskey · 2025-03-02T20:52:41.114Z · answers+comments (0)
Not-yet-falsifiable beliefs?
Benjamin Hendricks (benjamin-hendricks) · 2025-03-02T14:11:07.121Z · comments (4)
[question] Examples of self-fulfilling prophecies in AI alignment?
Chipmonk · 2025-03-03T02:45:51.619Z · answers+comments (3)
[question] help, my self image as rational is affecting my ability to empathize with others
KvmanThinking (avery-liu) · 2025-03-02T02:06:36.376Z · answers+comments (8)
[question] What nation did Trump prevent from going to war (Feb. 2025)?
James Camacho (james-camacho) · 2025-03-01T01:46:58.929Z · answers+comments (3)
[link] AI Safety Policy Won't Go On Like This – AI Safety Advocacy Is Failing Because Nobody Cares.
henophilia · 2025-03-01T20:15:16.645Z · comments (0)
Positional kernels of attention heads
Alex Gibson · 2025-03-03T01:40:13.014Z · comments (0)
Meaning Machines
appromoximate (antediluvian) · 2025-03-01T19:16:08.539Z · comments (0)
next page (older posts) →