LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)
Announcement: Learning Theory Online Course
Yegreg · 2025-01-20T19:55:57.598Z · comments (29)
[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)
Some lessons from the OpenAI-FrontierMath debacle
7vik (satvik-golechha) · 2025-01-19T21:09:17.990Z · comments (9)
[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)
[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)
[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)
[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)
Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)
Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)
[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)
AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)
Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)
[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)
[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)
RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)
What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)
Alignment can be the ‘clean energy’ of AI
Cameron Berg (cameron-berg) · 2025-02-22T00:08:30.391Z · comments (7)
A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)
[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)
[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)
Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)
Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)
[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)
"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)
Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)
[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)
A Problem to Solve Before Building a Deception Detector
Eleni Angelou (ea-1) · 2025-02-07T19:35:23.307Z · comments (8)
MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)
ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)
What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)
[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)
[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)
Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)
A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)
There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)
Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)
Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)
A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)
Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)
Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)
[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)
[link] Gary Marcus now saying AI can't do things it can already do
Benjamin_Todd · 2025-02-09T12:24:11.954Z · comments (12)
[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)
AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)
[link] How do we solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:27:27.712Z · comments (8)
5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)
[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)
[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)
Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)
← previous page (newer posts) · next page (older posts) →