LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)
AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (19)
[link] Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-02-06T15:46:53.024Z · comments (9)
Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (11)
How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (25)
My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (49)
[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)
C'mon guys, Deliberate Practice is Real
Raemon · 2025-02-05T22:33:59.069Z · comments (25)
AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)
[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)
Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)
Timaeus in 2024
Jesse Hoogland (jhoogland) · 2025-02-20T23:54:56.939Z · comments (1)
The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)
How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)
Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (41)
We probably won't just play status games with each other after AGI
Matthew Barnett (matthew-barnett) · 2025-01-15T04:56:38.330Z · comments (21)
The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)
How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)
Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (55)
[link] Towards a scale-free theory of intelligent agency
Richard_Ngo (ricraz) · 2025-03-21T01:39:42.251Z · comments (22)
[link] Elite Coordination via the Consensus of Power
Richard_Ngo (ricraz) · 2025-03-19T06:56:44.825Z · comments (15)
Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (11)
Three Months In, Evaluating Three Rationalist Cases for Trump
Arjun Panickssery (arjun-panickssery) · 2025-04-18T08:27:27.257Z · comments (11)
Agent Foundations 2025 at CMU
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-19T23:48:22.569Z · comments (10)
Thoughts on the conservative assumptions in AI control
Buck · 2025-01-17T19:23:38.575Z · comments (5)
Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (14)
Implications of the inference scaling paradigm for AI safety
Ryan Kidd (ryankidd44) · 2025-01-14T02:14:53.562Z · comments (70)
We should start looking for scheming "in the wild"
Marius Hobbhahn (marius-hobbhahn) · 2025-03-06T13:49:39.739Z · comments (4)
[link] Five Recent AI Tutoring Studies
Arjun Panickssery (arjun-panickssery) · 2025-01-19T03:53:47.714Z · comments (0)
[link] Anthropic releases Claude 3.7 Sonnet with extended thinking mode
LawrenceC (LawChan) · 2025-02-24T19:32:43.947Z · comments (8)
On Emergent Misalignment
Zvi · 2025-02-28T13:10:05.973Z · comments (5)
[link] Wired on: "DOGE personnel with admin access to Federal Payment System"
Raemon · 2025-02-05T21:32:11.205Z · comments (45)
How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (10)
Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-18T12:27:35.795Z · comments (2)
How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)
Voting Results for the 2023 Review
Raemon · 2025-02-06T08:00:37.461Z · comments (3)
The Risk of Gradual Disempowerment from AI
Zvi · 2025-02-05T22:10:06.979Z · comments (15)
Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (26)
[link] The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Corin Katzke (corin-katzke) · 2025-01-21T16:57:00.998Z · comments (11)
Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)
What goals will AIs have? A list of hypotheses
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-03T20:08:31.539Z · comments (19)
One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)
A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)
OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)
How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (58)
The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)
[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)
On the OpenAI Economic Blueprint
Zvi · 2025-01-15T14:30:06.773Z · comments (2)
Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (4)
← previous page (newer posts) · next page (older posts) →