LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
DanielFilan · 2025-03-01T01:20:04.778Z · comments (0)
[question] What nation did Trump prevent from going to war (Feb. 2025)?
James Camacho (james-camacho) · 2025-03-01T01:46:58.929Z · answers+comments (3)
[link] Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis (adam-scherlis) · 2025-03-01T02:11:56.313Z · comments (10)
Open problems in emergent misalignment
Jan Betley (jan-betley) · 2025-03-01T09:47:58.889Z · comments (13)
Real-Time Gigstats
jefftk (jkaufman) · 2025-03-01T14:10:41.060Z · comments (0)
[link] Historiographical Compressions: Renaissance as An Example
adamShimi · 2025-03-01T18:21:42.586Z · comments (4)
[question] Share AI Safety Ideas: Both Crazy and Not
ank · 2025-03-01T19:08:25.605Z · answers+comments (28)
Meaning Machines
appromoximate (antediluvian) · 2025-03-01T19:16:08.539Z · comments (0)
[link] AI Safety Policy Won't Go On Like This – AI Safety Advocacy Is Failing Because Nobody Cares.
henophilia · 2025-03-01T20:15:16.645Z · comments (1)
Maintaining Alignment during RSI as a Feedback Control Problem
beren · 2025-03-02T00:21:43.432Z · comments (6)
[question] help, my self image as rational is affecting my ability to empathize with others
KvmanThinking (avery-liu) · 2025-03-02T02:06:36.376Z · answers+comments (13)
Open Thread Spring 2025
Ben Pace (Benito) · 2025-03-02T02:33:16.307Z · comments (22)
Saving Zest
jefftk (jkaufman) · 2025-03-02T12:00:41.732Z · comments (1)
Not-yet-falsifiable beliefs?
Benjamin Hendricks (benjamin-hendricks) · 2025-03-02T14:11:07.121Z · comments (4)
[question] Will LLM agents become the first takeover-capable AGIs?
Seth Herd · 2025-03-02T17:15:37.056Z · answers+comments (10)
Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week
spencerg · 2025-03-02T18:01:32.880Z · comments (0)
[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (27)
Cautions about LLMs in Human Cognitive Loops
Alice Blair (Diatom) · 2025-03-02T19:53:10.253Z · comments (9)
Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)
[question] Request for Comments on AI-related Prediction Market Ideas
PeterMcCluskey · 2025-03-02T20:52:41.114Z · answers+comments (1)
[question] Examples of self-fulfilling prophecies in AI alignment?
Chipmonk · 2025-03-03T02:45:51.619Z · answers+comments (6)
Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)
Coalescence - Determinism In Ways We Care About
vitaliya · 2025-03-03T13:20:44.408Z · comments (0)
On GPT-4.5
Zvi · 2025-03-03T13:40:05.843Z · comments (12)
Middle School Choice
jefftk (jkaufman) · 2025-03-03T16:10:03.163Z · comments (10)
[link] Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers
jleibowich · 2025-03-03T19:05:31.212Z · comments (1)
Expanding HarmBench: Investigating Gaps & Extending Adversarial LLM Testing
racinkc1 · 2025-03-03T19:23:20.687Z · comments (0)
[question] Ask Me Anything - Samuel
samuelshadrach (xpostah) · 2025-03-03T19:24:44.316Z · answers+comments (0)
[link] Why People Commit White Collar Fraud (Ozy linkpost)
sapphire (deluks917) · 2025-03-03T19:33:15.609Z · comments (1)
Takeaways From Our Recent Work on SAE Probing
Josh Engels (JoshEngels) · 2025-03-03T19:50:16.692Z · comments (0)
What goals will AIs have? A list of hypotheses
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-03T20:08:31.539Z · comments (19)
[link] AI Safety at the Frontier: Paper Highlights, February '25
gasteigerjo · 2025-03-03T22:09:37.845Z · comments (0)
The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset.
keltan · 2025-03-03T23:15:44.495Z · comments (10)
The Milton Friedman Model of Policy Change
JohnofCharleston · 2025-03-04T00:38:56.778Z · comments (17)
[question] shouldn't we try to get media attention?
KvmanThinking (avery-liu) · 2025-03-04T01:39:06.596Z · answers+comments (1)
[question] How much should I worry about the Atlanta Fed's GDP estimates?
Brendan Long (korin43) · 2025-03-04T02:03:58.835Z · answers+comments (2)
[link] Observations About LLM Inference Pricing
Aaron_Scher · 2025-03-04T03:03:09.141Z · comments (2)
The Semi-Rational Militar Firefighter
P. João (gabriel-brito) · 2025-03-04T12:23:37.253Z · comments (10)
On Writing #1
Zvi · 2025-03-04T13:30:06.103Z · comments (2)
Formation Research: Organisation Overview
alamerton · 2025-03-04T15:03:33.196Z · comments (0)
[link] Progress links and short notes, 2025-03-03
jasoncrawford · 2025-03-04T15:20:35.619Z · comments (0)
For scheming, we should first focus on detection and then on prevention
Marius Hobbhahn (marius-hobbhahn) · 2025-03-04T15:22:06.105Z · comments (7)
Validating against a misalignment detector is very different to training against one
mattmacdermott · 2025-03-04T15:41:04.692Z · comments (4)
[question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Thane Ruthenis · 2025-03-04T16:23:39.296Z · answers+comments (51)
2028 Should Not Be AI Safety's First Foray Into Politics
Jesse Richardson (SharkoRubio) · 2025-03-04T16:46:37.370Z · comments (0)
Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource
Bryce Robertson (bryceerobertson) · 2025-03-04T17:01:18.758Z · comments (2)
Distillation of Meta's Large Concept Models Paper
NickyP (Nicky) · 2025-03-04T17:33:40.116Z · comments (3)
Energy Markets Temporal Arbitrage with Batteries
NickyP (Nicky) · 2025-03-04T17:37:56.804Z · comments (3)
What is the best / most proper definition of "Feeling the AGI" there is?
Annapurna (jorge-velez) · 2025-03-04T20:13:40.946Z · comments (5)
[link] Could this be an unusually good time to Earn To Give?
TomGardiner (HorusXVI) · 2025-03-04T21:51:19.148Z · comments (0)
next page (older posts) →