LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (26)
Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (20)
One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)
[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (6)
To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (7)
The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (5)
Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)
OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)
[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)
Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (2)
OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak (tomek-korbak) · 2025-04-14T16:45:46.584Z · comments (0)
Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)
A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (20)
D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (2)
[link] Unbendable Arm as Test Case for Religious Belief
Ivan Vendrov (ivan-vendrov) · 2025-04-14T01:57:12.013Z · comments (30)
[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)
[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (4)
Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)
The Last Light
Bridgett Kay (bridgett-kay) · 2025-04-14T15:41:02.745Z · comments (0)
Offer: Team Conflict Counseling for AI Safety Orgs
Severin T. Seehrich (sts) · 2025-04-14T15:17:00.835Z · comments (1)
Ctrl-Z: Controlling AI Agents via Resampling
abhatt349 · 2025-04-16T16:21:23.781Z · comments (0)
[link] The real reason AI benchmarks haven’t reflected economic impacts
Noosphere89 (sharmake-farah) · 2025-04-15T13:44:06.225Z · comments (0)
[link] Slopworld 2035: The dangers of mediocre AI
titotal (lombertini) · 2025-04-14T13:14:08.390Z · comments (6)
A Talmudic Rationalist Cautionary Tale
Noah Birnbaum (daniel-birnbaum) · 2025-04-15T04:11:16.972Z · comments (1)
[link] Should AIs be Encouraged to Cooperate?
PeterMcCluskey · 2025-04-15T21:57:06.096Z · comments (1)
Risers for Foot Percussion
jefftk (jkaufman) · 2025-04-15T11:10:08.577Z · comments (0)
What empirical research directions has Eliezer commented positively on?
Chris_Leong · 2025-04-15T08:53:41.677Z · comments (1)
[link] Can LLM-based models do model-based planning?
jylin04 · 2025-04-16T12:38:00.793Z · comments (1)
[link] Top OpenAI Catastrophic Risk Official Steps Down Abruptly
garrison · 2025-04-16T16:04:28.115Z · comments (0)
$500 bounty for best short-form fiction about our near future world; $100 for recommending winning piece: new “Art of Near Future World” quarterly art project
Ramon Gonzalez (ramon-gonzalez) · 2025-04-15T00:46:10.637Z · comments (0)
[link] AISN #51: AI Frontiers
Corin Katzke (corin-katzke) · 2025-04-15T16:01:56.701Z · comments (1)
The Mirror Problem in AI: Why Language Models Say Whatever You Want
RobT · 2025-04-15T18:40:02.793Z · comments (2)
Луна Лавгуд и Комната Тайн, Часть 5
Kongo Landwalker (kongo-landwalker) · 2025-04-14T00:10:36.028Z · comments (0)
[link] 3M Subscriber YouTube Account 'Channel 5' Reporting On Rationalism
sakraf · 2025-04-15T13:02:33.736Z · comments (0)
Sam Altman's sister claims Sam sexually abused her -- Part 8: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:42:53.705Z · comments (0)
Some OthelloGPT Circuits
Alfred Wong (alfred-wong) · 2025-04-15T18:41:36.216Z · comments (0)
What if there was a nuke in Manhattan and why that could be a good thing
Ratburn · 2025-04-15T00:19:41.844Z · comments (10)
How Logic "Really" Works: An Engineering Perspective
Daniil Strizhov (mila-dolontaeva) · 2025-04-16T05:34:09.443Z · comments (0)
Gamify life from BayesianMind
P. João (gabriel-brito) · 2025-04-16T16:17:49.284Z · comments (0)
Creating 'Making God': a Feature Documentary on risks from AGI
Connor Axiotes (connor-axiotes-1) · 2025-04-15T02:56:09.206Z · comments (0)
How to Defend the Indefensible
Alex Beyman (alexbeyman) · 2025-04-15T07:45:15.971Z · comments (1)
What happens when LLMs learn new things? & Continual learning forever.
sunchipsster · 2025-04-15T18:38:35.166Z · comments (0)
Sam Altman's sister claims Sam sexually abused her -- Part 7: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:43:28.897Z · comments (0)
Opportunity to to learn more about AI Innovation & Security Policy
PolicyTakes · 2025-04-16T01:35:27.203Z · comments (0)
[link] AI is advancing fast
Vishakha (vishakha-agrawal) · 2025-04-16T08:17:06.055Z · comments (0)
[link] Human-level is not the limit
Vishakha (vishakha-agrawal) · 2025-04-16T08:33:15.498Z · comments (2)
[link] AI may attain human level soon
Vishakha (vishakha-agrawal) · 2025-04-16T08:28:55.592Z · comments (0)
An artistic illustration of Scalable Oversight - "A world apart, neither gods nor mortals"
Marius Adrian Nicoară · 2025-04-16T12:41:44.874Z · comments (0)
[link] The road from human-level to superintelligent AI may be short
Vishakha (vishakha-agrawal) · 2025-04-16T08:35:54.376Z · comments (0)
next page (older posts) →