LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

[link] New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters
Jesse Hoogland (jhoogland) · 2024-11-27T22:06:12.914Z · comments (4)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (8)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (5)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (13)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (15)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

← previous page (newer posts) · next page (older posts) →

^{^}

ig you actually wrote 'they dont notice flaws', which is ambiguously between 'they approve' and 'they don't find affirmative failure cases'. and maybe the latter was your intent all along.

it's understandable because we do have to refer to humans to call something unintuitive.

LessWrong 2.0 Reader

Archive

Recent comments