LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

AI companies are unlikely to make high-assurance safety cases if timelines are short
ryan_greenblatt · 2025-01-23T18:41:40.546Z · comments (2)
Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (0)
Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (34)
MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (17)
Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (16)
[link] Yudkowsky on The Trajectory podcast
Seth Herd · 2025-01-24T19:52:15.104Z · comments (25)
AI #100: Meet the New Boss
Zvi · 2025-01-23T15:40:07.473Z · comments (3)
Tail SP 500 Call Options
sapphire (deluks917) · 2025-01-23T05:21:51.221Z · comments (26)
On polytopes
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-25T13:56:35.681Z · comments (0)
[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (4)
Writing experiments and the banana escape valve
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-23T13:11:24.215Z · comments (1)
Why Aligning an LLM is Hard, and How to Make it Easier
RogerDearnaley (roger-d-1) · 2025-01-23T06:44:04.048Z · comments (2)
[Cross-post] Every Bay Area "Walled Compound"
davekasten · 2025-01-23T15:05:08.629Z · comments (3)
Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (0)
[link] Counterintuitive effects of minimum prices
dynomight · 2025-01-24T23:05:26.099Z · comments (0)
Eliciting bad contexts
Geoffrey Irving · 2025-01-24T10:39:39.358Z · comments (2)
[link] You Have Two Brains
Eneasz · 2025-01-23T00:52:43.063Z · comments (5)
Agents don't have to be aligned to help us achieve an indefinite pause.
Hastings (hastings-greer) · 2025-01-25T18:51:03.523Z · comments (0)
Early Experiments in Human Auditing for AI Control
Joey Yudelson (JosephY) · 2025-01-23T01:34:31.682Z · comments (0)
[link] Insights from "The Manga Guide to Physiology"
TurnTrout · 2025-01-24T05:18:57.772Z · comments (2)
A hierarchy of disagreement
Adam Zerner (adamzerner) · 2025-01-23T03:17:59.051Z · comments (4)
[question] How useful would alien alignment research be?
Donald Hobson (donald-hobson) · 2025-01-23T10:59:22.330Z · answers+comments (5)
The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (1)
QFT and neural nets: the basic idea
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-24T13:54:45.099Z · comments (0)
[question] Do you consider perfect surveillance inevitable?
samuelshadrach (xpostah) · 2025-01-24T04:57:48.266Z · answers+comments (24)
[link] Is there such a thing as an impossible protein?
Abhishaike Mahajan (abhishaike-mahajan) · 2025-01-24T17:12:01.174Z · comments (3)
Monet: Mixture of Monosemantic Experts for Transformers Explained
CalebMaresca (caleb-maresca) · 2025-01-25T19:37:09.078Z · comments (1)
Why I'm Pouring Cold Water in My Left Ear, and You Should Too
Maloew (maloew-valenar) · 2025-01-24T23:13:52.340Z · comments (0)
Contra Dances Getting Shorter and Earlier
jefftk (jkaufman) · 2025-01-23T23:30:03.595Z · comments (0)
[link] Uncontrollable: A Surprisingly Good Introduction to AI Risk
PeterMcCluskey · 2025-01-24T04:30:37.499Z · comments (0)
AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI
DanielFilan · 2025-01-24T23:00:07.562Z · comments (0)
Liron Shapira vs Ken Stanley on Doom Debates. A review
TheManxLoiner · 2025-01-24T18:01:56.646Z · comments (0)
[link] What are the differences between AGI, transformative AI, and superintelligence?
Vishakha (vishakha-agrawal) · 2025-01-23T10:03:31.886Z · comments (3)
[link] AISN #46: The Transition
Corin Katzke (corin-katzke) · 2025-01-23T18:09:36.858Z · comments (0)
In the future, language models will be our interface to the world
Daniel Tan (dtch1997) · 2025-01-24T23:16:49.999Z · comments (0)
[question] A Floating Cube - Rejected HLE submission
Shankar Sivarajan (shankar-sivarajan) · 2025-01-25T04:52:22.194Z · answers+comments (0)
Brainrot
Jesse Hoogland (jhoogland) · 2025-01-26T05:35:35.396Z · comments (0)
What does success look like?
Raymond D · 2025-01-23T17:48:35.618Z · comments (0)
[link] Notes on Argentina
Annapurna (jorge-velez) · 2025-01-26T03:51:15.393Z · comments (0)
Empirical Insights into Feature Geometry in Sparse Autoencoders
Jason Boxi Zhang (jason-boxi-zhang) · 2025-01-24T19:02:19.167Z · comments (0)
How are Those AI Participants Doing Anyway?
mushroomsoup · 2025-01-24T22:37:47.999Z · comments (0)
[question] Recommendations for Recent Posts/Sequences on Instrumental Rationality?
Benjamin Hendricks (benjamin-hendricks) · 2025-01-26T00:41:08.577Z · answers+comments (0)
[link] A concise definition of what it means to win
testingthewaters · 2025-01-25T06:37:37.305Z · comments (0)
[question] are there 2 types of alignment?
KvmanThinking (avery-liu) · 2025-01-23T00:08:20.885Z · answers+comments (9)
Starting Thoughts on RLHF
Michael Flood (michael-flood) · 2025-01-23T22:16:49.793Z · comments (0)
Updating and Editing Factual Knowledge in Language Models
Dhananjay Ashok (dhananjay-ashok) · 2025-01-23T19:34:37.121Z · comments (2)
Locating and Editing Knowledge in LMs
Dhananjay Ashok (dhananjay-ashok) · 2025-01-24T22:53:40.559Z · comments (0)
[link] Ideas for CoT Models: A Geometric Perspective on Latent Space Reasoning
Rohan Ganapavarapu (rohan-ganapavarapu) · 2025-01-24T19:01:47.339Z · comments (0)
[question] AI Safety in secret
Michael Flood (michael-flood) · 2025-01-25T18:16:03.181Z · answers+comments (0)
next page (older posts) →