LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Crime and Punishment #1
Zvi · 2025-04-21T15:30:06.420Z · comments (10)
Cautions about LLMs in Human Cognitive Loops
Alice Blair (Diatom) · 2025-03-02T19:53:10.253Z · comments (9)
AI #104: American State Capacity on the Brink
Zvi · 2025-02-20T14:50:06.375Z · comments (9)
Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format
Roland Pihlakas (roland-pihlakas) · 2025-03-16T23:23:30.989Z · comments (6)
LessOnline 2025: Early Bird Tickets On Sale
Ben Pace (Benito) · 2025-03-18T00:22:02.653Z · comments (4)
They Took MY Job?
Zvi · 2025-03-21T13:30:38.507Z · comments (4)
[link] Three Types of Intelligence Explosion
rosehadshar · 2025-03-17T14:47:46.696Z · comments (8)
[link] Existing Safety Frameworks Imply Unreasonable Confidence
Joe Rogero · 2025-04-10T16:31:50.240Z · comments (2)
We need (a lot) more rogue agent honeypots
Ozyrus · 2025-03-23T22:24:52.785Z · comments (12)
Boots theory and Sybil Ramkin
philh · 2025-03-18T22:10:08.855Z · comments (17)
On Writing #1
Zvi · 2025-03-04T13:30:06.103Z · comments (2)
AI #113: The o3 Era Begins
Zvi · 2025-04-24T13:40:06.043Z · comments (4)
Grok Grok
Zvi · 2025-02-24T14:20:08.877Z · comments (2)
ParaScopes: Do Language Models Plan the Upcoming Paragraph?
NickyP (Nicky) · 2025-02-21T16:50:20.745Z · comments (2)
The Rise of Hyperpalatability
Jack (jack-3) · 2025-04-02T20:18:04.407Z · comments (10)
Worries About AI Are Usually Complements Not Substitutes
Zvi · 2025-04-25T20:00:03.421Z · comments (3)
Announcing EXP: Experimental Summer Workshop on Collective Cognition
Jan_Kulveit · 2025-03-15T20:14:47.972Z · comments (2)
[Cross-post] Every Bay Area "Walled Compound"
davekasten · 2025-01-23T15:05:08.629Z · comments (3)
Extended analogy between humans, corporations, and AIs.
Daniel Kokotajlo (daniel-kokotajlo) · 2025-02-13T00:03:13.956Z · comments (2)
2024 was the year of the big battery, and what that means for solar power
transhumanist_atom_understander · 2025-02-01T06:27:39.082Z · comments (1)
[question] Will LLM agents become the first takeover-capable AGIs?
Seth Herd · 2025-03-02T17:15:37.056Z · answers+comments (10)
Any-Benefit Mindset and Any-Reason Reasoning
silentbob · 2025-03-15T17:10:14.682Z · comments (9)
Meditation and Reduced Sleep Need
niplav · 2025-04-04T14:42:54.792Z · comments (8)
Scaffolding Skills
Screwtape · 2025-04-18T17:39:25.634Z · comments (8)
[link] Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
elifland · 2025-04-10T23:10:23.063Z · comments (0)
Call for Collaboration: Renormalization for AI safety 
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T21:01:56.500Z · comments (0)
Operator
Zvi · 2025-01-28T20:00:08.374Z · comments (1)
System 2 Alignment
Seth Herd · 2025-02-13T19:17:56.868Z · comments (0)
[link] SuperBabies podcast with Gene Smith
Eneasz · 2025-02-19T19:36:49.852Z · comments (1)
[link] Introducing MASK: A Benchmark for Measuring Honesty in AI Systems
Richard Ren (RichardR) · 2025-03-05T22:56:46.155Z · comments (5)
Split Personality Training: Revealing Latent Knowledge Through Personality-Shift Tokens
Florian_Dietz · 2025-03-10T16:07:45.215Z · comments (3)
[link] AI Can't Write Good Fiction
JustisMills · 2025-03-12T06:11:57.786Z · comments (19)
[link] Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale (govind-pimpale) · 2025-02-24T16:51:32.022Z · comments (0)
Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)
[link] Well-foundedness as an organizing principle of healthy minds and societies
Richard_Ngo (ricraz) · 2025-04-07T00:31:34.098Z · comments (7)
ARENA 5.0 - Call for Applicants
JamesH (AtlasOfCharts) · 2025-01-30T13:18:27.052Z · comments (2)
AI #106: Not so Fast
Zvi · 2025-03-06T15:40:05.919Z · comments (4)
Why Are The Human Sciences Hard? Two New Hypotheses
Aydin Mohseni (aydin-mohseni) · 2025-03-18T15:45:52.239Z · comments (14)
[link] Hunting for AI Hackers: LLM Agent Honeypot
Reworr R (reworr-reworr) · 2025-02-12T20:29:32.269Z · comments (0)
Austin Chen on Winning, Risk-Taking, and FTX
Elizabeth (pktechgirl) · 2025-04-07T19:00:08.039Z · comments (3)
Writing experiments and the banana escape valve
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-23T13:11:24.215Z · comments (1)
Avoid the Counterargument Collapse
marknm · 2025-03-26T03:19:58.655Z · comments (3)
Everything I Know About Semantics I Learned From Music Notation
J Bostock (Jemist) · 2025-03-09T18:09:11.789Z · comments (2)
Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
Alex Mallen (alex-mallen) · 2025-03-24T17:55:59.358Z · comments (0)
More Fun With GPT-4o Image Generation
Zvi · 2025-04-03T02:10:02.317Z · comments (3)
Reasons-based choice and cluelessness
JesseClifton · 2025-02-07T22:21:47.232Z · comments (0)
FLAKE-Bench: Outsourcing Awkwardness in the Age of AI
annas (annasoli) · 2025-04-01T17:08:25.092Z · comments (0)
OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)
[link] Meta: Frontier AI Framework
Zach Stein-Perlman · 2025-02-03T22:00:17.103Z · comments (2)
DeepSeek: Lemon, It’s Wednesday
Zvi · 2025-01-29T15:00:07.914Z · comments (0)
← previous page (newer posts) · next page (older posts) →