LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (0)
Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (3)
Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)
AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (4)
[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)
In the Name of All That Needs Saving
pleiotroth · 2024-11-07T15:26:12.252Z · comments (2)
The Logistics of Distribution of Meaning
Sahil · 2024-11-07T05:27:20.276Z · comments (0)
Fundamental Uncertainty: Chapter 9 - How do we live with uncertainty?
Gordon Seidoh Worley (gworley) · 2024-11-07T18:15:45.049Z · comments (2)
Curriculum of Ascension
andrew sauer (andrew-sauer) · 2024-11-07T23:54:18.983Z · comments (0)
[question] What are the primary drivers that caused selection pressure for intelligence in humans?
Towards_Keeperhood (Simon Skade) · 2024-11-07T09:40:20.275Z · answers+comments (3)
Agency overhang as a proxy for Sharp left turn
Eris (anton-zheltoukhov) · 2024-11-07T12:14:24.333Z · comments (0)
Quantum Immortality: A Perspective if AI Doomers are Probably Right
avturchin · 2024-11-07T16:06:08.106Z · comments (17)
[link] Markets Are Information - Beating the Sportsbooks at Their Own Game
JJXW · 2024-11-07T20:58:43.389Z · comments (0)
[link] The Case Against Moral Realism
Zero Contradictions · 2024-11-07T10:14:26.269Z · comments (4)
next page (older posts) →