LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)
What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (9)
Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (22)
o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (142)
“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)
[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (41)
What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (10)
Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)
[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (0)
Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)
Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (16)
The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (4)
[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (44)
A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (2)
🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)
When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (12)
Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)
[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (16)
AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)
Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)
Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)
[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (4)
A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (19)
[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)
Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)
Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)
Trying to translate when people talk past each other
Kaj_Sotala · 2024-12-17T09:40:02.640Z · comments (12)
A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)
AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)
[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)
[link] Announcing the Q1 2025 Long-Term Future Fund grant round
Linch · 2024-12-20T02:20:22.448Z · comments (0)
[link] Moderately Skeptical of "Risks of Mirror Biology"
Davidmanheim · 2024-12-20T12:57:31.824Z · comments (3)
Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (21)
[link] What I expected from this site: A LessWrong review
Nathan Young · 2024-12-20T11:27:39.683Z · comments (5)
[link] A progress policy agenda
jasoncrawford · 2024-12-19T18:42:37.327Z · comments (1)
You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)
Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)
Elevating Air Purifiers
jefftk (jkaufman) · 2024-12-17T01:40:05.401Z · comments (0)
[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)
Good Reasons for Alts
jefftk (jkaufman) · 2024-12-21T01:30:03.113Z · comments (2)
AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)
The Second Gemini
Zvi · 2024-12-17T15:50:06.373Z · comments (0)
Non-Obvious Benefits of Insurance
jefftk (jkaufman) · 2024-12-23T03:40:02.184Z · comments (3)
Elon Musk and Solar Futurism
transhumanist_atom_understander · 2024-12-21T02:55:28.554Z · comments (27)
[link] Forecast 2025 With Vox's Future Perfect Team — $2,500 Prize Pool
ChristianWilliams · 2024-12-20T23:00:35.334Z · comments (0)
subfunctional overlaps in attentional selection history implies momentum for decision-trajectories
Emrik (Emrik North) · 2024-12-22T14:12:49.027Z · comments (1)
[link] Can o1-preview find major mistakes amongst 59 NeurIPS '24 MLSB papers?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-18T14:21:03.661Z · comments (0)
Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (0)
Proof Explained for "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-22T15:06:16.880Z · comments (0)
AGI with RL is Bad News for Safety
Nadav Brandes (nadav-brandes) · 2024-12-21T19:36:03.970Z · comments (16)
next page (older posts) →