LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The 2024 Petrov Day Scenario
Ben Pace (Benito) · 2024-09-26T08:08:32.495Z · comments (82)
How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (4)
AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (11)
[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (14)
Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (2)
[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (12)
[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (3)
[link] Stanislav Petrov Quarterly Performance Review
Ricki Heicklen (bayesshammai) · 2024-09-26T21:20:11.646Z · comments (2)
[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (13)
[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (1)
Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)
Alignment by default: the simulation hypothesis
gb (ghb) · 2024-09-25T16:26:00.552Z · comments (17)
[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (0)
[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)
Chevy Bolt Review
jefftk (jkaufman) · 2024-09-26T13:40:05.456Z · comments (2)
Source Control for Prototyping and Analysis
jefftk (jkaufman) · 2024-09-26T01:50:04.145Z · comments (0)
Self location for LLMs by LLMs: Self-Assessment Checklist.
weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · comments (0)
Gell-Mann checks
Cleo Scrolls (cleo-scrolls) · 2024-09-26T22:45:43.569Z · comments (0)
A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)
[link] Comparing Forecasting Track Records for AI Benchmarking and Beyond
ChristianWilliams · 2024-09-25T21:01:15.975Z · comments (0)
The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (0)
[link] Four Levels of Voting Methods
hive · 2024-09-26T18:15:00.565Z · comments (0)
[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)
[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (2)
AIS Hungary Operations Officer role, Deadline: 2024 October 6th
gergogaspar (gergo-gaspar) · 2024-09-25T13:54:25.077Z · comments (0)
Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility
OwenChen · 2024-09-25T20:38:22.928Z · comments (0)
[link] Climate Change And Global Warming
Zero Contradictions · 2024-09-25T19:13:09.508Z · comments (0)
[question] Non-human centric view of existence
ZY (AliceZ) · 2024-09-25T05:47:07.480Z · answers+comments (12)
[link] How to Live Well: My Philosophy of Life
Philosofer123 · 2024-09-25T01:13:37.952Z · comments (0)
next page (older posts) →