LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How much progress actually happens in theoretical physics?
ChristianKl · 2025-04-04T23:08:00.633Z · comments (32)
Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource
Bryce Robertson (bryceerobertson) · 2025-03-04T17:01:18.758Z · comments (2)
What Uniparental Disomy Tells Us About Improper Imprinting in Humans
Morpheus · 2025-03-28T11:24:47.133Z · comments (1)
Goodhart Typology via Structure, Function, and Randomness Distributions
JustinShovelain · 2025-03-25T16:01:08.327Z · comments (0)
Most Questionable Details in 'AI 2027'
scarcegreengrass · 2025-04-05T00:32:54.896Z · comments (4)
The Upcoming PEPFAR Cut Will Kill Millions, Many of Them Children
omnizoid · 2025-01-27T16:03:51.214Z · comments (2)
Field tests of semi-rationality in Brazilian military training
P. João (gabriel-brito) · 2025-03-12T16:14:12.590Z · comments (0)
Llama Does Not Look Good 4 Anything
Zvi · 2025-04-09T13:20:01.799Z · comments (1)
Non-Monotonic Infra-Bayesian Physicalism
Marcus Ogren · 2025-04-02T12:14:19.783Z · comments (0)
[link] What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:42:07.215Z · comments (6)
An overview of areas of control work
ryan_greenblatt · 2025-03-25T22:02:16.178Z · comments (0)
When the Wannabe Rambo Comedian Cried
P. João (gabriel-brito) · 2025-03-31T14:47:50.660Z · comments (0)
AI #105: Hey There Alexa
Zvi · 2025-02-27T14:30:08.038Z · comments (3)
Knocking Down My AI Optimist Strawman
tailcalled · 2025-02-08T10:52:33.183Z · comments (3)
Chicanery: No
Screwtape · 2025-02-06T05:42:45.095Z · comments (10)
Monthly Roundup #28: March 2025
Zvi · 2025-03-17T12:50:03.097Z · comments (8)
On the Implications of Recent Results on Latent Reasoning in LLMs
Rauno Arike (rauno-arike) · 2025-03-31T11:06:23.939Z · comments (6)
[link] How prediction markets can create harmful outcomes: a case study
B Jacobs (Bob Jacobs) · 2025-04-02T15:37:09.285Z · comments (2)
Eliciting bad contexts
Geoffrey Irving · 2025-01-24T10:39:39.358Z · comments (8)
[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (6)
Who wants to bet me $25k at 1:7 odds that there won't be an AI market crash in the next year?
Remmelt (remmelt-ellen) · 2025-04-08T08:31:59.900Z · comments (15)
How Close We Are to a Complete List of Imprinted Genes
Morpheus · 2025-04-19T18:37:57.074Z · comments (1)
Meetups Notes (Q1 2025)
jenn (pixx) · 2025-03-31T01:12:11.774Z · comments (2)
Why Aligning an LLM is Hard, and How to Make it Easier
RogerDearnaley (roger-d-1) · 2025-01-23T06:44:04.048Z · comments (3)
AI #103: Show Me the Money
Zvi · 2025-02-13T15:20:07.057Z · comments (9)
Prospects for Alignment Automation: Interpretability Case Study
Jacob Pfau (jacob-pfau) · 2025-03-21T14:05:51.528Z · comments (4)
Why you maybe should lift weights, and How to.
samusasuke · 2025-02-12T05:15:32.011Z · comments (29)
Nonpartisan AI safety
Yair Halberstadt (yair-halberstadt) · 2025-02-10T14:55:50.913Z · comments (4)
[link] A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology
Cosmia_Nebula · 2025-01-30T09:53:16.152Z · comments (1)
[link] Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis (adam-scherlis) · 2025-03-01T02:11:56.313Z · comments (10)
EIS XV: A New Proof of Concept for Useful Interpretability
scasper · 2025-03-17T20:05:30.580Z · comments (2)
Takeaways From Our Recent Work on SAE Probing
Josh Engels (JoshEngels) · 2025-03-03T19:50:16.692Z · comments (0)
[link] Anthropic CEO calls for RSI
Andrea_Miotti (AndreaM) · 2025-01-29T16:54:24.943Z · comments (10)
Deep sparse autoencoders yield interpretable features too
Armaan A. Abraham (armaanabraham) · 2025-02-23T05:46:59.189Z · comments (8)
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak (tomek-korbak) · 2025-04-14T16:45:46.584Z · comments (1)
[Linkpost] Visual roadmap to strong human germline engineering
TsviBT · 2025-04-05T22:22:57.744Z · comments (0)
Notes on Occam via Solomonoff vs. hierarchical Bayes
JesseClifton · 2025-02-10T17:55:14.689Z · comments (7)
Agents don't have to be aligned to help us achieve an indefinite pause.
Hastings (hastings-greer) · 2025-01-25T18:51:03.523Z · comments (0)
[link] Altman blog on post-AGI world
Julian Bradshaw · 2025-02-09T21:52:30.631Z · comments (10)
Towards building blocks of ontologies
Daniel C (harper-owen) · 2025-02-08T16:03:29.854Z · comments (0)
Validating against a misalignment detector is very different to training against one
mattmacdermott · 2025-03-04T15:41:04.692Z · comments (4)
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng (ccstan99) · 2025-02-07T03:57:30.904Z · comments (0)
Selection Pressures on LM Personas
Raymond D · 2025-03-28T20:33:09.918Z · comments (0)
Deference and Decision-Making
ben_levinstein (benlev) · 2025-01-27T22:02:17.578Z · comments (2)
MONA: Three Month Later - Updates and Steganography Without Optimization Pressure
David Lindner · 2025-04-12T23:15:07.964Z · comments (0)
AI #112: Release the Everything
Zvi · 2025-04-17T15:10:02.029Z · comments (6)
[link] Reasoning models don't always say what they think
Joe Benton · 2025-04-09T19:48:58.733Z · comments (4)
[link] Takeaways from sketching a control safety case
joshc (joshua-clymer) · 2025-01-31T04:43:45.917Z · comments (0)
[link] Smelling Nice is Good, Actually
Gordon Seidoh Worley (gworley) · 2025-03-18T16:54:43.324Z · comments (8)
How much does it cost to back up solar with batteries?
jasoncrawford · 2025-03-25T16:35:52.834Z · comments (6)
← previous page (newer posts) · next page (older posts) →