LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

[link] Contra Acemoglu on AI
Maxwell Tabarrok (maxwell-tabarrok) · 2024-06-28T13:13:15.796Z · comments (0)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (13)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

← previous page (newer posts) · next page (older posts) →

^{^}

"Curated", a term which here means "This just got emailed to 30,000 people, of whom typically half open the email, and it gets shown at the top of the frontpage to anyone who hasn't read it for ~1 week."

LessWrong 2.0 Reader

Archive

Recent comments