LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

I would have shit in that alley, too
Declan Molony (declan-molony) · 2024-06-18T04:41:06.545Z · comments (135)
Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)
Andrew_Critch · 2024-06-14T00:16:47.850Z · comments (38)
My AI Model Delta Compared To Yudkowsky
johnswentworth · 2024-06-10T16:12:53.179Z · comments (103)
Getting 50% (SoTA) on ARC-AGI with GPT-4o
ryan_greenblatt · 2024-06-17T18:44:01.039Z · comments (50)
SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)
LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (119)
Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)
My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)
Two easy things that maybe Just Work to improve AI discourse
Bird Concept (jacobjacob) · 2024-06-08T15:51:18.078Z · comments (35)
Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)
[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)
[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)
Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)
[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton · 2024-06-25T15:40:03.535Z · comments (11)
The Incredible Fentanyl-Detecting Machine
sarahconstantin · 2024-06-28T22:10:01.223Z · comments (26)
0. CAST: Corrigibility as Singular Target
Max Harms (max-harms) · 2024-06-07T22:29:12.934Z · comments (12)
How it All Went Down: The Puzzle Hunt that took us way, way Less Online
A* (agendra) · 2024-06-02T08:01:40.109Z · comments (5)
Loving a world you don’t trust
Joe Carlsmith (joekc) · 2024-06-18T19:31:36.581Z · comments (13)
Why I don't believe in the placebo effect
transhumanist_atom_understander · 2024-06-10T02:37:07.776Z · comments (22)
The Standard Analogy
Zack_M_Davis · 2024-06-03T17:15:42.327Z · comments (28)
[question] What do coherence arguments actually prove about agentic behavior?
[deleted] · 2024-06-01T09:37:28.451Z · answers+comments (37)
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner (ejenner) · 2024-06-04T15:50:47.475Z · comments (14)
AI catastrophes and rogue deployments
Buck · 2024-06-03T17:04:51.206Z · comments (16)
Anthropic's Certificate of Incorporation
Zach Stein-Perlman · 2024-06-12T13:00:30.806Z · comments (7)
The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)
Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)
Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)
In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)
[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (9)
Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)
On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)
[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)
Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)
Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)
OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)
Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)
[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (4)
On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)
[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)
Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)
Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)
Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)
I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)
[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)
[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)
[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)
Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)
[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)
Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)
next page (older posts) →