LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

14+ AI Safety Advisors You Can Speak to – New AISafety.com Resource
Bryce Robertson (bryceerobertson) · 2025-01-21T17:34:02.170Z · comments (0)

Navigation by Moonlight
Jacob Falkovich (Jacobian) · 2025-04-07T15:32:17.353Z · comments (39)

Come join Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-15T22:10:02.166Z · comments (0)

The non-tribal tribes
PatrickDFarley · 2025-02-26T17:22:59.949Z · comments (4)

Whether governments will control AGI is important and neglected
Seth Herd · 2025-03-14T09:48:34.062Z · comments (2)

Bike Lights are Cheap Enough to Give Away
jefftk (jkaufman) · 2025-03-14T02:10:02.482Z · comments (0)

MATS Spring 2024 Extension Retrospective
HenningB (HenningBlue) · 2025-02-12T22:43:58.193Z · comments (1)

[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)

Logical Correlation
niplav · 2025-02-10T23:29:10.518Z · comments (6)

Saving Zest
jefftk (jkaufman) · 2025-03-02T12:00:41.732Z · comments (1)

Against podcasts
Adam Zerner (adamzerner) · 2025-04-05T19:20:00.716Z · comments (19)

[question] What faithfulness metrics should general claims about CoT faithfulness be based upon?
Rauno Arike (rauno-arike) · 2025-04-08T15:27:20.346Z · answers+comments (0)

Explaining the Joke: Pausing is The Way
WillPetillo · 2025-04-04T09:04:38.847Z · comments (2)

I grade every NBA basketball game I watch based on enjoyability
proshowersinger · 2025-03-12T21:46:26.791Z · comments (2)

[link] New Report: Multi-Agent Risks from Advanced AI
Lewis Hammond (lewis-hammond-1) · 2025-02-23T00:32:29.534Z · comments (0)

Export Surplusses
lsusr · 2025-02-24T05:53:23.422Z · comments (21)

The present perfect tense is ruining your life
PatrickDFarley · 2025-01-27T16:14:48.843Z · comments (14)

Interesting ACX 2024 Book Review Entries
jenn (pixx) · 2025-04-20T18:10:04.973Z · comments (1)

How to mitigate sandbagging
Teun van der Weij (teun-van-der-weij) · 2025-03-23T17:19:07.452Z · comments (0)

Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)

[link] Forging A New AGI Social Contract
Deric Cheng (deric-cheng) · 2025-04-10T13:41:11.817Z · comments (3)

What is a circuit? [in interpretability]
Yudhister Kumar (randomwalks) · 2025-02-14T04:40:42.978Z · comments (1)

[link] Currency Collapse
prue (prue0) · 2025-04-11T03:48:01.469Z · comments (3)

[link] Notes on the Presidential Election of 1836
Arjun Panickssery (arjun-panickssery) · 2025-02-13T23:40:23.224Z · comments (0)

Review: The Lathe of Heaven
dr_s · 2025-01-31T08:10:58.673Z · comments (0)

Two flaws in the Machiavelli Benchmark
TheManxLoiner · 2025-02-12T19:34:35.241Z · comments (0)

A model of the final phase: the current frontier AIs as de facto CEOs of their own companies
Mitchell_Porter · 2025-03-08T22:15:35.260Z · comments (2)

[question] LessWrong merch?
Brendan Long (korin43) · 2025-04-03T21:51:47.190Z · answers+comments (2)

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability
DanielFilan · 2025-03-28T18:40:01.856Z · comments (0)

Prodromes and Biomarkers in Chronic Disease
sarahconstantin · 2025-04-16T21:30:02.978Z · comments (2)

The Last Light
Bridgett Kay (bridgett-kay) · 2025-04-14T15:41:02.745Z · comments (2)

A Bunch of Matryoshka SAEs
chanind · 2025-04-04T14:53:56.805Z · comments (0)

The Leapfrogging Terminus and the Fuzzy Cut
Jim Pivarski (jim-pivarski) · 2025-03-31T04:08:24.023Z · comments (6)

[link] AI Tools for Existential Security
Lizka · 2025-03-14T18:38:06.110Z · comments (4)

[link] Why People Commit White Collar Fraud (Ozy linkpost)
sapphire (deluks917) · 2025-03-03T19:33:15.609Z · comments (1)

[link] The Peeperi (unfinished) - By Katja Grace
Nathan Young · 2025-02-17T19:33:29.894Z · comments (0)

so you have a chronic health issue
agencypilled · 2025-01-26T19:00:29.972Z · comments (9)

Notes on handling non-concentrated failures with AI control: high level methods and different regimes
ryan_greenblatt · 2025-03-24T01:00:38.222Z · comments (3)

Doing principle-of-charity better
Sniffnoy · 2025-03-27T05:19:52.195Z · comments (1)

[question] Does the AI control agenda broadly rely on no FOOM being possible?
Noosphere89 (sharmake-farah) · 2025-03-29T19:38:23.971Z · answers+comments (3)

[question] Examples of self-fulfilling prophecies in AI alignment?
Chipmonk · 2025-03-03T02:45:51.619Z · answers+comments (6)

[question] Is weak-to-strong generalization an alignment technique?
cloud · 2025-01-31T07:13:03.332Z · answers+comments (1)

The Uses of Complacency
sarahconstantin · 2025-04-21T18:50:02.725Z · comments (1)

Seven sources of goals in LLM agents
Seth Herd · 2025-02-08T21:54:20.186Z · comments (3)

Opportunity Space: Renormalization for AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:55:52.155Z · comments (0)

Grok3 On Kant On AI Slavery
JenniferRM · 2025-04-01T04:10:48.093Z · comments (3)

Ruling Out Lookup Tables
Alfred Harwood · 2025-02-04T10:39:34.899Z · comments (11)

Understanding Trust: Overview Presentations
abramdemski · 2025-04-16T18:08:31.064Z · comments (0)

Introduction to Representing Sentences as Logical Statements
Towards_Keeperhood (Simon Skade) · 2025-04-05T20:35:31.422Z · comments (9)

[link] Published report: Pathways to short TAI timelines
Zershaaneh Qureshi (zershaaneh-qureshi) · 2025-02-20T22:10:12.276Z · comments (0)

← previous page (newer posts) · next page (older posts) →

^{^}

Of course, moving a pass@400 capability to pass@1 isn't nothing, but it's clearly astronomically short of a Singularity-enabling technique that RL-on-CoTs is touted as.

LessWrong 2.0 Reader

Archive

Recent comments