LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

[question] Request for Comments on AI-related Prediction Market Ideas
PeterMcCluskey · 2025-03-02T20:52:41.114Z · answers+comments (0)
Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (1)
Cautions about LLMs in Human Cognitive Loops
Alice Blair (Diatom) · 2025-03-02T19:53:10.253Z · comments (0)
[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (1)
Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week
spencerg · 2025-03-02T18:01:32.880Z · comments (0)
[question] Will LLM agents become the first takeover-capable AGIs?
Seth Herd · 2025-03-02T17:15:37.056Z · answers+comments (6)
Not-yet-falsifiable beliefs?
Benjamin Hendricks (benjamin-hendricks) · 2025-03-02T14:11:07.121Z · comments (4)
Saving Zest
jefftk (jkaufman) · 2025-03-02T12:00:41.732Z · comments (1)
Open Thread Spring 2025
Ben Pace (Benito) · 2025-03-02T02:33:16.307Z · comments (1)
[question] help, my self image as rational is affecting my ability to empathize with others
KvmanThinking (avery-liu) · 2025-03-02T02:06:36.376Z · answers+comments (8)
Maintaining Alignment during RSI as a Feedback Control Problem
beren · 2025-03-02T00:21:43.432Z · comments (4)
[link] AI Safety Policy Won't Go On Like This – AI Safety Advocacy Is Failing Because Nobody Cares.
henophilia · 2025-03-01T20:15:16.645Z · comments (0)
Meaning Machines
appromoximate (antediluvian) · 2025-03-01T19:16:08.539Z · comments (0)
[question] Share AI Safety Ideas: Both Crazy and Not
ank · 2025-03-01T19:08:25.605Z · answers+comments (20)
[link] Historiographical Compressions: Renaissance as An Example
adamShimi · 2025-03-01T18:21:42.586Z · comments (1)
Real-Time Gigstats
jefftk (jkaufman) · 2025-03-01T14:10:41.060Z · comments (0)
Open problems in emergent misalignment
Jan Betley (jan-betley) · 2025-03-01T09:47:58.889Z · comments (3)
[link] Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis (adam-scherlis) · 2025-03-01T02:11:56.313Z · comments (5)
[question] What nation did Trump prevent from going to war (Feb. 2025)?
James Camacho (james-camacho) · 2025-03-01T01:46:58.929Z · answers+comments (3)
AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
DanielFilan · 2025-03-01T01:20:04.778Z · comments (0)
TamperSec is hiring for 3 Key Roles!
Jonathan_H (JonathanH) · 2025-02-28T23:10:31.540Z · comments (0)
Do we want alignment faking?
Florian_Dietz · 2025-02-28T21:50:48.891Z · comments (2)
Few concepts mixing dark fantasy and science fiction
Marek Zegarek (marek-zegarek) · 2025-02-28T21:03:35.307Z · comments (0)
Latent Space Collapse? Understanding the Effects of Narrow Fine-Tuning on LLMs
tenseisoham · 2025-02-28T20:22:17.721Z · comments (0)
How to Contribute to Theoretical Reward Learning Research
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:27:52.552Z · comments (0)
Other Papers About the Theory of Reward Learning
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:26:11.490Z · comments (0)
Defining and Characterising Reward Hacking
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:25:42.777Z · comments (0)
Misspecification in Inverse Reinforcement Learning - Part II
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:24:59.570Z · comments (0)
STARC: A General Framework For Quantifying Differences Between Reward Functions
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:24:52.965Z · comments (0)
Misspecification in Inverse Reinforcement Learning
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:24:49.204Z · comments (0)
Markdown Object Notation
bhauth · 2025-02-28T19:24:25.422Z · comments (2)
Partial Identifiability in Reward Learning
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:23:30.738Z · comments (0)
The Theoretical Reward Learning Research Agenda: Introduction and Motivation
Joar Skalse (Logical_Lunatic) · 2025-02-28T19:20:30.168Z · comments (2)
[link] An Open Letter To EA and AI Safety On Decelerating AI Development
kenneth_diao · 2025-02-28T17:21:42.826Z · comments (0)
Dance Weekend Pay II
jefftk (jkaufman) · 2025-02-28T15:10:02.030Z · comments (0)
Existentialists and Trolleys
David Gross (David_Gross) · 2025-02-28T14:01:49.509Z · comments (2)
On Emergent Misalignment
Zvi · 2025-02-28T13:10:05.973Z · comments (5)
[link] Do safety-relevant LLM steering vectors optimized on a single example generalize?
Jacob Dunefsky (jacob-dunefsky) · 2025-02-28T12:01:12.514Z · comments (1)
[link] Tetherware #2: What every human should know about our most likely AI future
Jáchym Fibír · 2025-02-28T11:12:59.033Z · comments (0)
Notes on Superwisdom & Moral RSI
welfvh · 2025-02-28T10:34:54.767Z · comments (4)
Cycles (a short story by Claude 3.7 and me)
Knight Lee (Max Lee) · 2025-02-28T07:04:46.602Z · comments (0)
January-February 2025 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2025-02-28T03:10:01.909Z · comments (1)
Exploring unfaithful/deceptive CoT in reasoning models
Lucy Wingard (lucy-wingard) · 2025-02-28T02:54:43.481Z · comments (0)
Weirdness Points
lsusr · 2025-02-28T02:23:56.508Z · comments (16)
[link] Do clients need years of therapy, or can one conversation resolve the issue?
Chipmonk · 2025-02-28T00:06:29.276Z · comments (10)
[New Jersey] HPMOR 10 Year Anniversary Party 🎉
🟠UnlimitedOranges🟠 (mr-mar) · 2025-02-27T22:30:26.009Z · comments (0)
[link] OpenAI releases GPT-4.5
Seth Herd · 2025-02-27T21:40:45.010Z · comments (12)
AEPF_OpenSource is Live – A New Open Standard for Ethical AI
ethoshift · 2025-02-27T20:40:18.997Z · comments (0)
The Elicitation Game: Evaluating capability elicitation techniques
Teun van der Weij (teun-van-der-weij) · 2025-02-27T20:33:24.861Z · comments (0)
For the Sake of Pleasure Alone
Greenless Mirror (mikhail-2) · 2025-02-27T20:07:54.852Z · comments (9)
next page (older posts) →