LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)
[link] President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
Tristan Williams (tristan-williams) · 2023-10-30T11:15:38.422Z · comments (39)
Thomas Kwa's MIRI research experience
Thomas Kwa (thomas-kwa) · 2023-10-02T16:42:37.886Z · comments (53)
Defunding My Mistake
ymeskhout · 2023-09-04T14:43:14.274Z · comments (41)
Evaluating the historical value misspecification argument
Matthew Barnett (matthew-barnett) · 2023-10-05T18:34:15.695Z · comments (142)
This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)
Thoughts on the AI Safety Summit company policy requests and responses
So8res · 2023-10-31T23:54:09.566Z · comments (14)
2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)
Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (35)
[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)
Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)
The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (21)
A list of core AI safety problems and how I hope to solve them
davidad · 2023-08-26T15:12:18.484Z · comments (29)
Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)
[link] The King and the Golem
Richard_Ngo (ricraz) · 2023-09-25T19:51:22.980Z · comments (16)
RSPs are pauses done right
evhub · 2023-10-14T04:06:02.709Z · comments (70)
Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)
Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (33)
How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)
Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)
[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)
[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)
You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)
[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)
DeepMind's "​​Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)
Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)
And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (8)
[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)
Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)
When can we trust model evaluations?
evhub · 2023-07-28T19:42:21.799Z · comments (9)
[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob · 2023-10-20T21:04:32.645Z · comments (30)
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper · 2024-05-21T20:15:36.502Z · comments (16)
What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)
[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton · 2024-06-25T15:40:03.535Z · comments (11)
Password-locked models: a stress case for capabilities evaluation
Fabien Roger (Fabien) · 2023-08-03T14:53:12.459Z · comments (14)
You can just spontaneously call people you haven't met in years
lc · 2023-11-13T05:21:05.726Z · comments (21)
Announcing Dialogues
Ben Pace (Benito) · 2023-10-07T02:57:39.005Z · comments (52)
Is being sexy for your homies?
Valentine · 2023-12-13T20:37:02.043Z · comments (92)
My thoughts on the social response to AI risk
Matthew Barnett (matthew-barnett) · 2023-11-01T21:17:08.184Z · comments (37)
Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (54)
Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (25)
Dyslucksia
Shoshannah Tekofsky (DarkSym) · 2024-05-09T19:21:33.874Z · comments (45)
[link] "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation
titotal (lombertini) · 2023-09-29T14:01:15.453Z · comments (79)
[link] Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
sudo · 2023-10-29T23:09:56.730Z · comments (22)
[link] ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Beth Barnes (beth-barnes) · 2023-08-01T18:30:57.068Z · comments (12)
← previous page (newer posts) · next page (older posts) →