LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

On Emergent Misalignment
Zvi · 2025-02-28T13:10:05.973Z · comments (1)
Weirdness Points
lsusr · 2025-02-28T02:23:56.508Z · comments (1)
[link] OpenAI releases GPT-4.5
Seth Herd · 2025-02-27T21:40:45.010Z · comments (7)
[link] How to Corner Liars: A Miasma-Clearing Protocol
ymeskhout · 2025-02-27T17:18:36.028Z · comments (9)
AI #105: Hey There Alexa
Zvi · 2025-02-27T14:30:08.038Z · comments (1)
Space-Faring Civilization density estimates and models - Review
Maxime Riché (maxime-riche) · 2025-02-27T11:44:21.101Z · comments (0)
January-February 2025 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2025-02-28T03:10:01.909Z · comments (1)
The Elicitation Game: Evaluating capability elicitation techniques
Teun van der Weij (teun-van-der-weij) · 2025-02-27T20:33:24.861Z · comments (0)
Kingfisher Tour February 2025
jefftk (jkaufman) · 2025-02-27T02:20:04.988Z · comments (0)
Dance Weekend Pay II
jefftk (jkaufman) · 2025-02-28T15:10:02.030Z · comments (0)
Cycles (a short story by Claude 3.7 and me)
Knight Lee (Max Lee) · 2025-02-28T07:04:46.602Z · comments (0)
[link] Do safety-relevant LLM steering vectors optimized on a single example generalize?
Jacob Dunefsky (jacob-dunefsky) · 2025-02-28T12:01:12.514Z · comments (0)
Universal AI Maximizes Variational Empowerment: New Insights into AGI Safety
Yusuke Hayashi (hayashiyus) · 2025-02-27T00:46:46.989Z · comments (0)
You should use Consumer Reports
KvmanThinking (avery-liu) · 2025-02-27T01:52:17.235Z · comments (3)
[link] An Open Letter To EA and AI Safety On Decelerating AI Development
kenneth_diao · 2025-02-28T17:21:42.826Z · comments (0)
Existentialists and Trolleys
David Gross (David_Gross) · 2025-02-28T14:01:49.509Z · comments (0)
[link] Market Capitalization is Semantically Invalid
Zero Contradictions · 2025-02-27T11:27:47.765Z · comments (9)
For the Sake of Pleasure Alone
Greenless Mirror (mikhail-2) · 2025-02-27T20:07:54.852Z · comments (6)
[New Jersey] HPMOR 10 Year Anniversary Party 🎉
🟠UnlimitedOranges🟠 (mr-mar) · 2025-02-27T22:30:26.009Z · comments (0)
Short & long term tradeoffs of strategic voting
kaleb (geomaturge) · 2025-02-27T04:25:04.304Z · comments (0)
[link] Recursive alignment with the principle of alignment
hive · 2025-02-27T02:34:37.940Z · comments (0)
Economic Topology, ASI, and the Separation Equilibrium
mkualquiera · 2025-02-27T16:36:48.098Z · comments (11)
Notes on Superwisdom & Moral RSI
welfvh · 2025-02-28T10:34:54.767Z · comments (2)
Exploring unfaithful/deceptive CoT in reasoning models
Lucy Wingard (lucy-wingard) · 2025-02-28T02:54:43.481Z · comments (0)
[link] Tetherware #2: What every human should know about our most likely AI future
Jáchym Fibír · 2025-02-28T11:12:59.033Z · comments (0)
The Illusion of Iterative Improvement: Why AI (and Humans) Fail to Track Their Own Epistemic Drift
Andy E Williams (andy-e-williams) · 2025-02-27T16:26:52.718Z · comments (2)
[link] Keeping AI Subordinate to Human Thought: A Proposal for Public AI Conversations
syh · 2025-02-27T20:00:26.150Z · comments (0)
[link] Do clients need years of therapy, or can one conversation resolve the issue?
Chipmonk · 2025-02-28T00:06:29.276Z · comments (7)
Proposing Human Survival Strategy based on the NAIA Vision: Toward the Co-evolution of Diverse Intelligences
Hiroshi Yamakawa (hiroshi-yamakawa) · 2025-02-27T05:18:05.369Z · comments (0)
AEPF_OpenSource is Live – A New Open Standard for Ethical AI
ethoshift · 2025-02-27T20:40:18.997Z · comments (0)
next page (older posts) →