LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)
5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)
Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)
Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)
Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)
[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)
If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (129)
A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)
Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)
JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)
[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)
Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)
Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)
$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?
johnswentworth · 2025-04-21T20:19:30.808Z · comments (12)
OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (25)
OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)
Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (40)
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)
We should try to automate AI safety work asap
Marius Hobbhahn (marius-hobbhahn) · 2025-04-26T16:35:43.770Z · comments (9)
A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)
How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (58)
Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)
[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (60)
Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)
[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)
AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)
[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (7)
AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (20)
Effective Evil's AI Misalignment Plan
lsusr · 2024-12-15T07:39:34.046Z · comments (9)
[link] SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can (Can Rager) · 2024-12-11T06:30:37.076Z · comments (6)
[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (54)
AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)
[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)
The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)
o3 Is a Lying Liar
Zvi · 2025-04-23T20:00:05.429Z · comments (19)
Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (5)
3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)
On the OpenAI Economic Blueprint
Zvi · 2025-01-15T14:30:06.773Z · comments (2)
[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)
Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)
Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)
[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)
Microplastics: Much Less Than You Wanted To Know
jenn (pixx) · 2025-02-15T19:08:14.561Z · comments (8)
You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)
Open problems in emergent misalignment
Jan Betley (jan-betley) · 2025-03-01T09:47:58.889Z · comments (13)
I'm offering free math consultations!
Gurkenglas · 2025-01-14T16:30:40.115Z · comments (7)
MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (29)
No one has the ball on 1500 Russian olympiad winners who've received HPMOR
Mikhail Samin (mikhail-samin) · 2025-01-12T11:43:36.560Z · comments (21)
[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (12)
What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (23)
← previous page (newer posts) · next page (older posts) →