LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI 2027 is a Bet Against Amdahl's Law
snewman · 2025-04-21T03:09:40.751Z · comments (26)
My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (49)
[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)
C'mon guys, Deliberate Practice is Real
Raemon · 2025-02-05T22:33:59.069Z · comments (25)
[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)
AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)
Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)
Timaeus in 2024
Jesse Hoogland (jhoogland) · 2025-02-20T23:54:56.939Z · comments (1)
Reviewing LessWrong: Screwtape's Basic Answer
Screwtape · 2025-02-05T04:30:34.347Z · comments (18)
The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)
Impact, agency, and taste
benkuhn · 2025-04-19T21:10:06.960Z · comments (1)
How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)
Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (41)
The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)
How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)
[link] Elite Coordination via the Consensus of Power
Richard_Ngo (ricraz) · 2025-03-19T06:56:44.825Z · comments (15)
Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (55)
[link] Towards a scale-free theory of intelligent agency
Richard_Ngo (ricraz) · 2025-03-21T01:39:42.251Z · comments (24)
Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (11)
We should start looking for scheming "in the wild"
Marius Hobbhahn (marius-hobbhahn) · 2025-03-06T13:49:39.739Z · comments (4)
How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (10)
[link] Wired on: "DOGE personnel with admin access to Federal Payment System"
Raemon · 2025-02-05T21:32:11.205Z · comments (45)
[link] Anthropic releases Claude 3.7 Sonnet with extended thinking mode
LawrenceC (LawChan) · 2025-02-24T19:32:43.947Z · comments (8)
On Emergent Misalignment
Zvi · 2025-02-28T13:10:05.973Z · comments (5)
Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red
Julian Bradshaw · 2025-04-21T03:52:34.759Z · comments (10)
What goals will AIs have? A list of hypotheses
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-03T20:08:31.539Z · comments (19)
The Risk of Gradual Disempowerment from AI
Zvi · 2025-02-05T22:10:06.979Z · comments (15)
Voting Results for the 2023 Review
Raemon · 2025-02-06T08:00:37.461Z · comments (3)
How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)
Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (26)
Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)
One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)
[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)
How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (58)
A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)
OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)
The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)
Is Gemini now better than Claude at Pokémon?
Julian Bradshaw · 2025-04-19T23:34:43.298Z · comments (11)
Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (4)
MONA: Managed Myopia with Approval Feedback
Seb Farquhar · 2025-01-23T12:24:18.108Z · comments (29)
Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)
Open problems in emergent misalignment
Jan Betley (jan-betley) · 2025-03-01T09:47:58.889Z · comments (13)
Microplastics: Much Less Than You Wanted To Know
jenn (pixx) · 2025-02-15T19:08:14.561Z · comments (8)
You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)
What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (22)
[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik (lucy.fa) · 2025-02-26T12:50:04.204Z · comments (8)
Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25 · 2025-03-11T17:45:06.599Z · comments (22)
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)
Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-03T17:01:06.004Z · comments (1)
← previous page (newer posts) · next page (older posts) →