LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Against Almost Every Theory of Impact of Interpretability
Charbel-Raphaël (charbel-raphael-segerie) · 2023-08-17T18:44:41.099Z · comments (83)
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
evhub · 2023-08-08T01:30:10.847Z · comments (26)
Dear Self; we need to talk about ambition
Elizabeth (pktechgirl) · 2023-08-27T23:10:04.720Z · comments (25)
My current LK99 questions
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-08-01T22:48:00.733Z · comments (38)
[link] Large Language Models will be Great for Censorship
Ethan Edwards · 2023-08-21T19:03:55.323Z · comments (14)
[link] OpenAI API base models are not sycophantic, at any size
nostalgebraist · 2023-08-29T00:58:29.007Z · comments (19)
Feedbackloop-first Rationality
Raemon · 2023-08-07T17:58:56.349Z · comments (65)
A list of core AI safety problems and how I hope to solve them
davidad · 2023-08-26T15:12:18.484Z · comments (26)
[link] ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Beth Barnes (beth-barnes) · 2023-08-01T18:30:57.068Z · comments (12)
The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate
Adam David Long (adam-david-long-1) · 2023-08-01T00:08:30.908Z · comments (30)
6 non-obvious mental health issues specific to AI safety
Igor Ivanov (igor-ivanov) · 2023-08-18T15:46:09.938Z · comments (24)
Password-locked models: a stress case for capabilities evaluation
Fabien Roger (Fabien) · 2023-08-03T14:53:12.459Z · comments (14)
Responses to apparent rationalist confusions about game / decision theory
Anthony DiGiovanni (antimonyanthony) · 2023-08-30T22:02:12.218Z · comments (14)
Inflection.ai is a major AGI lab
nikola (nikolaisalreadytaken) · 2023-08-09T01:05:54.604Z · comments (13)
The U.S. is becoming less stable
lc · 2023-08-18T21:13:11.909Z · comments (66)
[link] Ten Thousand Years of Solitude
agp (antonio-papa) · 2023-08-15T17:45:34.556Z · comments (17)
Book Launch: "The Carving of Reality," Best of LessWrong vol. III
Raemon · 2023-08-16T23:52:12.518Z · comments (22)
Invulnerable Incomplete Preferences: A Formal Statement
Sami Petersen (sami-petersen) · 2023-08-30T21:59:36.186Z · comments (32)
[link] Report on Frontier Model Training
YafahEdelman (yafah-edelman-1) · 2023-08-30T20:02:46.317Z · comments (21)
[link] Introducing the Center for AI Policy (& we're hiring!)
Thomas Larsen (thomas-larsen) · 2023-08-28T21:17:11.703Z · comments (50)
[link] When discussing AI risks, talk about capabilities, not intelligence
Vika · 2023-08-11T13:38:48.844Z · comments (7)
Assume Bad Faith
Zack_M_Davis · 2023-08-25T17:36:32.678Z · comments (52)
Summary of and Thoughts on the Hotz/Yudkowsky Debate
Zvi · 2023-08-16T16:50:02.808Z · comments (47)
Biosecurity Culture, Computer Security Culture
jefftk (jkaufman) · 2023-08-30T16:40:03.101Z · comments (10)
A Theory of Laughter
Steven Byrnes (steve2152) · 2023-08-23T15:05:59.694Z · comments (13)
What's A "Market"?
johnswentworth · 2023-08-08T23:29:24.722Z · comments (16)
[link] Biological Anchors: The Trick that Might or Might Not Work
Scott Alexander (Yvain) · 2023-08-12T00:53:30.159Z · comments (3)
[link] LTFF and EAIF are unusually funding-constrained right now
Linch · 2023-08-30T01:03:30.321Z · comments (24)
Problems with Robin Hanson's Quillette Article On AI
DaemonicSigil · 2023-08-06T22:13:43.654Z · comments (33)
We Should Prepare for a Larger Representation of Academia in AI Safety
Leon Lang (leon-lang) · 2023-08-13T18:03:19.799Z · comments (13)
[question] Exercise: Solve "Thinking Physics"
Raemon · 2023-08-01T00:44:48.975Z · answers+comments (23)
Dating Roundup #1: This is Why You’re Single
Zvi · 2023-08-29T12:50:04.964Z · comments (27)
My checklist for publishing a blog post
Steven Byrnes (steve2152) · 2023-08-15T15:04:56.219Z · comments (6)
Decomposing independent generalizations in neural networks via Hessian analysis
Dmitry Vaintrob (dmitry-vaintrob) · 2023-08-14T17:04:40.071Z · comments (3)
Stepping down as moderator on LW
Kaj_Sotala · 2023-08-14T10:46:58.163Z · comments (1)
Long-Term Future Fund: April 2023 grant recommendations
abergal · 2023-08-02T07:54:49.083Z · comments (3)
AI pause/governance advocacy might be net-negative, especially without focus on explaining the x-risk
Mikhail Samin (mikhail-samin) · 2023-08-27T23:05:01.718Z · comments (9)
The Low-Hanging Fruit Prior and sloped valleys in the loss landscape
Dmitry Vaintrob (dmitry-vaintrob) · 2023-08-23T21:12:58.599Z · comments (1)
The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)
moyamo · 2023-08-29T18:28:54.015Z · comments (70)
The God of Humanity, and the God of the Robot Utilitarians
Raemon · 2023-08-24T08:27:57.396Z · comments (12)
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
Georg Lange (GeorgLange) · 2023-08-29T01:04:18.688Z · comments (4)
Computational Thread Art
CallumMcDougall (TheMcDouglas) · 2023-08-06T21:42:30.306Z · comments (2)
Digital brains beat biological ones because diffusion is too slow
GeneSmith · 2023-08-26T02:22:25.014Z · comments (21)
A plea for more funding shortfall transparency
porby · 2023-08-07T21:33:11.912Z · comments (4)
[link] A Proof of Löb's Theorem using Computability Theory
jessicata (jessica.liu.taylor) · 2023-08-16T18:57:41.048Z · comments (0)
3 levels of threat obfuscation
HoldenKarnofsky · 2023-08-02T14:58:32.506Z · comments (14)
[link] Barriers to Mechanistic Interpretability for AGI Safety
Connor Leahy (NPCollapse) · 2023-08-29T10:56:45.639Z · comments (13)
Modulating sycophancy in an RLHF model via activation steering
Nina Rimsky (NinaR) · 2023-08-09T07:06:50.859Z · comments (20)
Managing risks of our own work
Beth Barnes (beth-barnes) · 2023-08-18T00:41:30.832Z · comments (0)
State of Generally Available Self-Driving
jefftk (jkaufman) · 2023-08-22T18:50:01.166Z · comments (6)
next page (older posts) →