LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (22)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (7)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (15)

[link] Understanding Shapley Values with Venn Diagrams
agucova · 2024-12-06T21:56:43.960Z · comments (8)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (14)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (19)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (12)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (7)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

Correct my H5N1 research ($reward)
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (15)

[link] SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can (Can Rager) · 2024-12-11T06:30:37.076Z · comments (0)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (2)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

Causal Undertow: A Work of Seed Fiction
Daniel Murfet (dmurfet) · 2024-12-08T21:41:48.132Z · comments (0)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (0)

Litigate-for-Impact: Preparing Legal Action against an AGI Frontier Lab Leader
Sonia Joseph (redhat) · 2024-12-07T21:42:29.038Z · comments (7)

[link] A car journey with conservative evangelicals - Understanding some British political-religious beliefs
Nathan Young · 2024-12-06T11:22:45.563Z · comments (7)

Childhood and Education Roundup #7
Zvi · 2024-12-09T13:10:05.588Z · comments (10)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (0)

Algebraic Linguistics
abstractapplic · 2024-12-07T19:18:39.935Z · comments (27)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

Mask and Respirator Intelligibility Comparison
jefftk (jkaufman) · 2024-12-07T03:20:01.585Z · comments (5)

Alternatives to Masks for Infectious Aerosols
jefftk (jkaufman) · 2024-12-08T14:00:01.670Z · comments (9)

Second-Time Free
jefftk (jkaufman) · 2024-12-11T03:30:01.289Z · comments (4)

LessWrong audio: help us choose the new voice
PeterH · 2024-12-11T02:24:37.026Z · comments (0)

[link] Announcement: AI for Math Fund
sarahconstantin · 2024-12-05T18:33:13.556Z · comments (2)

Higher and lower pleasures
Chris_Leong · 2024-12-05T13:13:46.526Z · comments (3)

minifest
Austin Chen (austin-chen) · 2024-12-07T03:50:38.573Z · comments (1)

Why Isn't Tesla Level 3?
jefftk (jkaufman) · 2024-12-11T14:50:01.159Z · comments (5)

Most Minds are Irrational
Davidmanheim · 2024-12-10T09:36:33.144Z · comments (4)

Historical Net Worth
jefftk (jkaufman) · 2024-12-07T23:10:01.519Z · comments (0)

[link] Frontier AI systems have surpassed the self-replicating red line
aproteinengine · 2024-12-11T03:06:14.927Z · comments (4)

Low-effort review of "AI For Humanity"
Charlie Steiner · 2024-12-11T09:54:42.871Z · comments (0)

Computational functionalism probably can't explain phenomenal consciousness
EuanMcLean (euanmclean) · 2024-12-10T17:11:28.044Z · comments (13)

The first AGI may be a good engineer but bad strategist
Knight Lee (Max Lee) · 2024-12-09T06:34:54.082Z · comments (2)

EC2 Scripts
jefftk (jkaufman) · 2024-12-10T03:00:01.906Z · comments (1)

Post-Quantum Investing: Dump Crypto for Index Funds and Real Estate?
G (g-1) · 2024-12-11T11:59:11.062Z · comments (2)

Re Hanson's Grabby Aliens: Humanity is not a natural anthropic sample space
Lorec · 2024-12-09T18:07:23.510Z · comments (17)

Rethink Wellbeing’s Year 2 Update: Foster Sustainable High Performance for Ambitious Altruists
Inga G. (inga-g) · 2024-12-08T14:32:39.902Z · comments (1)

Escape Plan: Brain Preservation ("Cryonics" sort of), Digitization, Metaverse, Off-Planet Hardware, Backups
amelia (314159) · 2024-12-11T18:05:52.453Z · comments (0)

Purging Corrupted Capabilities across Language Models
Amirali Abdullah (amirali-abdullah) · 2024-12-06T22:56:33.519Z · comments (0)

A good way to build many air filters on the cheap
winstonBosan · 2024-12-08T01:47:58.236Z · comments (5)

[link] o1 tried to avoid being shut down
Raelifin · 2024-12-05T19:52:03.620Z · comments (5)

next page (older posts) →

^{^}

ig you actually wrote 'they dont notice flaws', which is ambiguously between 'they approve' and 'they don't find affirmative failure cases'.

it's understandable because we do have to refer to humans to call something unintuitive.

LessWrong 2.0 Reader

Archive

Recent comments