LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (14)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (10)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (49)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (21)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (31)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (19)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (1)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (58)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (12)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (42)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

← previous page (newer posts) · next page (older posts) →

^{^}

ig you actually wrote 'they dont notice flaws', which is ambiguously between 'they approve' and 'they don't find affirmative failure cases'. and maybe the latter was your intent all along.

it's understandable because we do have to refer to humans to call something unintuitive.

LessWrong 2.0 Reader

Archive

Recent comments