LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

I changed my mind about orca intelligence
Towards_Keeperhood (Simon Skade) · 2025-03-18T10:15:29.860Z · comments (24)
Equations Mean Things
abstractapplic · 2025-03-19T08:16:35.312Z · comments (10)
On (Not) Feeling the AGI
Zvi · 2025-03-25T14:30:02.215Z · comments (25)
A collection of approaches to confronting doom, and my thoughts on them
Ruby · 2025-04-06T02:11:31.271Z · comments (9)
Metacognition Broke My Nail-Biting Habit
Rafka · 2025-03-16T12:36:47.437Z · comments (20)
Interpreting Complexity
Maxwell Adam (intern) · 2025-03-14T04:52:32.103Z · comments (7)
[link] Intelsat as a Model for International AGI Governance
rosehadshar · 2025-03-13T12:58:11.692Z · comments (0)
[question] Why do many people who care about AI Safety not clearly endorse PauseAI?
humnrdble · 2025-03-30T18:06:32.426Z · answers+comments (39)
We Have No Plan for Preventing Loss of Control in Open Models
Andrew Dickson · 2025-03-10T15:35:12.597Z · comments (11)
Silly Time
jefftk (jkaufman) · 2025-03-21T12:30:08.560Z · comments (2)
Alignment faking CTFs: Apply to my MATS stream
joshc (joshua-clymer) · 2025-04-04T16:29:02.070Z · comments (0)
Tabula Bio: towards a future free of disease (& looking for collaborators)
mpoon (michael-poon) · 2025-03-23T16:30:15.523Z · comments (14)
AI #108: Straight Line on a Graph
Zvi · 2025-03-20T13:50:00.983Z · comments (5)
Notes on countermeasures for exploration hacking (aka sandbagging)
ryan_greenblatt · 2025-03-24T18:39:36.665Z · comments (4)
An Advent of Thought
Kaarel (kh) · 2025-03-17T14:21:08.765Z · comments (8)
[link] Automated Researchers Can Subtly Sandbag
gasteigerjo · 2025-03-26T19:13:26.879Z · comments (0)
Follow me on TikTok
lsusr · 2025-04-01T08:22:29.521Z · comments (8)
AI #109: Google Fails Marketing Forever
Zvi · 2025-03-27T14:50:01.825Z · comments (12)
Response to Scott Alexander on Imprisonment
Zvi · 2025-03-11T20:40:06.250Z · comments (4)
[link] Paths and waystations in AI safety
Joe Carlsmith (joekc) · 2025-03-11T18:52:57.772Z · comments (1)
Analyzing long agent transcripts (Docent)
jsteinhardt · 2025-03-24T20:49:54.472Z · comments (2)
[link] Map of all 40 copyright suits v. AI in U.S.
Remmelt (remmelt-ellen) · 2025-03-26T07:57:58.976Z · comments (3)
An overview of control measures
ryan_greenblatt · 2025-03-24T23:16:49.400Z · comments (0)
SHIFT relies on token-level features to de-bias Bias in Bios probes
Tim Hua · 2025-03-19T21:29:15.974Z · comments (2)
We need (a lot) more rogue agent honeypots
Ozyrus · 2025-03-23T22:24:52.785Z · comments (11)
They Took MY Job?
Zvi · 2025-03-21T13:30:38.507Z · comments (4)
LessOnline 2025: Early Bird Tickets On Sale
Ben Pace (Benito) · 2025-03-18T00:22:02.653Z · comments (3)
Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format
Roland Pihlakas (roland-pihlakas) · 2025-03-16T23:23:30.989Z · comments (6)
Announcing EXP: Experimental Summer Workshop on Collective Cognition
Jan_Kulveit · 2025-03-15T20:14:47.972Z · comments (2)
Any-Benefit Mindset and Any-Reason Reasoning
silentbob · 2025-03-15T17:10:14.682Z · comments (9)
[link] Three Types of Intelligence Explosion
rosehadshar · 2025-03-17T14:47:46.696Z · comments (8)
Meditation and Reduced Sleep Need
niplav · 2025-04-04T14:42:54.792Z · comments (7)
[link] AI Can't Write Good Fiction
JustisMills · 2025-03-12T06:11:57.786Z · comments (19)
Boots theory and Sybil Ramkin
philh · 2025-03-18T22:10:08.855Z · comments (17)
Split Personality Training: Revealing Latent Knowledge Through Personality-Shift Tokens
Florian_Dietz · 2025-03-10T16:07:45.215Z · comments (3)
Avoid the Counterargument Collapse
marknm · 2025-03-26T03:19:58.655Z · comments (3)
Everything I Know About Semantics I Learned From Music Notation
J Bostock (Jemist) · 2025-03-09T18:09:11.789Z · comments (2)
Why Are The Human Sciences Hard? Two New Hypotheses
Aydin Mohseni (aydin-mohseni) · 2025-03-18T15:45:52.239Z · comments (14)
More Fun With GPT-4o Image Generation
Zvi · 2025-04-03T02:10:02.317Z · comments (2)
The Rise of Hyperpalatability
Jack (jack-3) · 2025-04-02T20:18:04.407Z · comments (10)
FLAKE-Bench: Outsourcing Awkwardness in the Age of AI
annas (annasoli) · 2025-04-01T17:08:25.092Z · comments (0)
[link] Center on Long-Term Risk: Summer Research Fellowship 2025 - Apply Now
Tristan Cook · 2025-03-26T17:29:14.797Z · comments (0)
Is instrumental convergence a thing for virtue-driven agents?
mattmacdermott · 2025-04-02T03:59:20.064Z · comments (37)
Goodhart Typology via Structure, Function, and Randomness Distributions
JustinShovelain · 2025-03-25T16:01:08.327Z · comments (0)
More on Various AI Action Plans
Zvi · 2025-03-24T13:10:05.637Z · comments (0)
Field tests of semi-rationality in Brazilian military training
P. João (gabriel-brito) · 2025-03-12T16:14:12.590Z · comments (0)
When the Wannabe Rambo Comedian Cried
P. João (gabriel-brito) · 2025-03-31T14:47:50.660Z · comments (0)
An overview of areas of control work
ryan_greenblatt · 2025-03-25T22:02:16.178Z · comments (0)
On the Implications of Recent Results on Latent Reasoning in LLMs
Rauno Arike (rauno-arike) · 2025-03-31T11:06:23.939Z · comments (6)
Monthly Roundup #28: March 2025
Zvi · 2025-03-17T12:50:03.097Z · comments (8)
← previous page (newer posts) · next page (older posts) →