LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (25)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (19)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (5)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (52)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (9)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (26)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (71)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (17)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (62)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (10)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (15)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (7)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (78)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (13)

[link] Video lectures on the learning-theoretic agenda
Vanessa Kosoy (vanessa-kosoy) · 2024-10-27T12:01:32.777Z · comments (0)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

[link] Cost, Not Sacrifice
Joe Rogero · 2024-11-20T21:32:26.281Z · comments (13)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (11)

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (19)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (12)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (16)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (9)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (20)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (3)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

[link] "Map of AI Futures" - An interactive flowchart
swante · 2024-11-27T21:31:40.269Z · comments (3)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (36)

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

← previous page (newer posts) · next page (older posts) →

^{^}

ig you actually wrote 'they dont notice flaws', which is ambiguously between 'they approve' and 'they don't find affirmative failure cases'. and maybe the latter was your intent all along.

it's understandable because we do have to refer to humans to call something unintuitive.

LessWrong 2.0 Reader

Archive

Recent comments