LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Should AI safety be a mass movement?
mhampton · 2025-03-13T20:36:59.284Z · comments (1)
Auditing language models for hidden objectives
Sam Marks (samuel-marks) · 2025-03-13T19:18:32.638Z · comments (15)
Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)
Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (26)
[link] A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
simeon_c (WayZ) · 2025-03-13T18:29:52.776Z · comments (0)
Creating Complex Goals: A Model to Create Autonomous Agents
theraven · 2025-03-13T18:17:58.519Z · comments (1)
[link] Habermas Machine
NicholasKees (nick_kees) · 2025-03-13T18:16:50.453Z · comments (7)
The Other Alignment Problem: Maybe AI Needs Protection From Us
Peterpiper · 2025-03-13T18:03:43.086Z · comments (0)
AI #107: The Misplaced Hype Machine
Zvi · 2025-03-13T14:40:05.318Z · comments (10)
[link] Intelsat as a Model for International AGI Governance
rosehadshar · 2025-03-13T12:58:11.692Z · comments (0)
[link] Stacity: a Lock-In Risk Benchmark for Large Language Models
alamerton · 2025-03-13T12:08:47.329Z · comments (0)
The prospect of accelerated AI safety progress, including philosophical progress
Mitchell_Porter · 2025-03-13T10:52:13.745Z · comments (0)
[link] The "Reversal Curse": you still aren't antropomorphising enough.
lumpenspace (lumpen-space) · 2025-03-13T10:24:45.965Z · comments (0)
Formalizing Space-Faring Civilizations Saturation concepts and metrics
Maxime Riché (maxime-riche) · 2025-03-13T09:40:03.465Z · comments (0)
The Economics of p(doom)
Jakub Growiec (jakub-growiec) · 2025-03-13T07:33:50.940Z · comments (0)
Social Media: How to fix them before they become the biggest news platform
Sam G (sam-g) · 2025-03-13T07:28:51.487Z · comments (2)
Penny Whistle in E?
jefftk (jkaufman) · 2025-03-13T02:40:02.653Z · comments (1)
Anthropic, and taking "technical philosophy" more seriously
Raemon · 2025-03-13T01:48:54.184Z · comments (29)
LW/ACX Social Meetup
Stefan (stefan-1) · 2025-03-12T23:13:43.163Z · comments (0)
I grade every NBA basketball game I watch based on enjoyability
proshowersinger · 2025-03-12T21:46:26.791Z · comments (2)
Kairos is hiring a Head of Operations/Founding Generalist
agucova · 2025-03-12T20:58:49.661Z · comments (0)
[link] USAID Outlook: A Metaculus Forecasting Series
ChristianWilliams · 2025-03-12T20:34:03.495Z · comments (0)
[link] What is instrumental convergence?
Vishakha (vishakha-agrawal) · 2025-03-12T20:28:35.556Z · comments (0)
Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs
Sanyu Rajakumar (sanyu-rajakumar) · 2025-03-12T17:56:31.910Z · comments (0)
Why Obedient AI May Be the Real Catastrophe
G~ (gal-1) · 2025-03-12T17:50:09.577Z · comments (2)
[link] Your Communication Preferences Aren’t Law
Jonathan Moregård (JonathanMoregard) · 2025-03-12T17:20:11.117Z · comments (4)
Reflections on Neuralese
Alice Blair (Diatom) · 2025-03-12T16:29:31.230Z · comments (0)
Field tests of semi-rationality in Brazilian military training
P. João (gabriel-brito) · 2025-03-12T16:14:12.590Z · comments (0)
[link] Many life-saving drugs fail for lack of funding. But there’s a solution: desperate rich people
Mvolz (mvolz) · 2025-03-12T15:24:46.889Z · comments (0)
The Most Forbidden Technique
Zvi · 2025-03-12T13:20:04.732Z · comments (9)
[link] You don't actually need a physical multiverse to explain anthropic fine-tuning.
Fraser · 2025-03-12T07:33:43.278Z · comments (8)
[link] AI Can't Write Good Fiction
JustisMills · 2025-03-12T06:11:57.786Z · comments (19)
Existing UDTs test the limits of Bayesianism (and consistency)
Cole Wyeth (Amyr) · 2025-03-12T04:09:11.615Z · comments (20)
[link] (Anti)Aging 101
George3d6 · 2025-03-12T03:59:21.859Z · comments (2)
[link] The Grapes of Hardness
adamShimi · 2025-03-11T21:01:14.963Z · comments (0)
Don't over-update on FrontierMath results
David Matolcsi (matolcsid) · 2025-03-11T20:44:04.459Z · comments (5)
Response to Scott Alexander on Imprisonment
Zvi · 2025-03-11T20:40:06.250Z · comments (4)
[link] Paths and waystations in AI safety
Joe Carlsmith (joekc) · 2025-03-11T18:52:57.772Z · comments (1)
Meridian Cambridge Visiting Researcher Programme: Turn AI safety ideas into funded projects in one week!
Meridian Cambridge · 2025-03-11T17:46:29.656Z · comments (0)
Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25 · 2025-03-11T17:45:06.599Z · comments (22)
Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?
Katalina Hernandez (katalina-hernandez) · 2025-03-11T16:51:41.651Z · comments (1)
[link] How Language Models Understand Nullability
Anish Tondwalkar (anish-tondwalkar) · 2025-03-11T15:57:28.686Z · comments (0)
Forethought: a new AI macrostrategy group
Max Dalton (max-dalton) · 2025-03-11T15:39:25.086Z · comments (0)
[link] Preparing for the Intelligence Explosion
fin · 2025-03-11T15:38:29.524Z · comments (17)
stop solving problems that have already been solved
dhruvmethi · 2025-03-11T15:30:41.896Z · comments (3)
AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)
When is it Better to Train on the Alignment Proxy?
dil-leik-og (samuel-buteau) · 2025-03-11T13:35:51.152Z · comments (0)
[link] A different take on the Musk v OpenAI preliminary injunction order
TFD · 2025-03-11T12:46:23.497Z · comments (0)
[link] Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
Fabien Roger (Fabien) · 2025-03-11T11:52:38.994Z · comments (23)
A Hogwarts Guide to Citizenship
WillPetillo · 2025-03-11T05:50:02.768Z · comments (1)
← previous page (newer posts) · next page (older posts) →