LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (9)
Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (3)
Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (7)
Maintaining Alignment during RSI as a Feedback Control Problem
beren · 2025-03-02T00:21:43.432Z · comments (4)
[question] Will LLM agents become the first takeover-capable AGIs?
Seth Herd · 2025-03-02T17:15:37.056Z · answers+comments (10)
On GPT-4.5
Zvi · 2025-03-03T13:40:05.843Z · comments (5)
What goals will AIs have? A list of hypotheses
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-03T20:08:31.539Z · comments (3)
Cautions about LLMs in Human Cognitive Loops
Alice Blair (Diatom) · 2025-03-02T19:53:10.253Z · comments (9)
Saving Zest
jefftk (jkaufman) · 2025-03-02T12:00:41.732Z · comments (1)
Middle School Choice
jefftk (jkaufman) · 2025-03-03T16:10:03.163Z · comments (0)
[question] Request for Comments on AI-related Prediction Market Ideas
PeterMcCluskey · 2025-03-02T20:52:41.114Z · answers+comments (0)
[link] Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers
Nikola Jurkovic (nikolaisalreadytaken) · 2025-03-03T19:05:31.212Z · comments (0)
Open Thread Spring 2025
Ben Pace (Benito) · 2025-03-02T02:33:16.307Z · comments (1)
[question] Examples of self-fulfilling prophecies in AI alignment?
Chipmonk · 2025-03-03T02:45:51.619Z · answers+comments (3)
Takeaways From Our Recent Work on SAE Probing
Josh Engels (JoshEngels) · 2025-03-03T19:50:16.692Z · comments (0)
Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week
spencerg · 2025-03-02T18:01:32.880Z · comments (0)
[link] Why People Commit White Collar Fraud (Ozy linkpost)
sapphire (deluks917) · 2025-03-03T19:33:15.609Z · comments (0)
The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset.
keltan · 2025-03-03T23:15:44.495Z · comments (0)
Not-yet-falsifiable beliefs?
Benjamin Hendricks (benjamin-hendricks) · 2025-03-02T14:11:07.121Z · comments (4)
Positional kernels of attention heads
Alex Gibson · 2025-03-03T01:40:13.014Z · comments (0)
Coalescence - Determinism In Ways We Care About
vitaliya · 2025-03-03T13:20:44.408Z · comments (0)
Identity Alignment (IA) in AI
Davey Morse (davey-morse) · 2025-03-03T06:26:12.015Z · comments (1)
[link] AI Safety at the Frontier: Paper Highlights, February '25
gasteigerjo · 2025-03-03T22:09:37.845Z · comments (0)
[question] help, my self image as rational is affecting my ability to empathize with others
KvmanThinking (avery-liu) · 2025-03-02T02:06:36.376Z · answers+comments (9)
Expanding HarmBench: Investigating Gaps & Extending Adversarial LLM Testing
racinkc1 · 2025-03-03T19:23:20.687Z · comments (0)
[question] Ask Me Anything - Samuel
samuelshadrach (xpostah) · 2025-03-03T19:24:44.316Z · answers+comments (0)
next page (older posts) →