LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H (dan-hendrycks) · 2022-05-09T17:18:53.978Z · comments (8)
Rationality !== Winning
Raemon · 2023-07-24T02:53:59.764Z · comments (51)
A Personal (Interim) COVID-19 Postmortem
Davidmanheim · 2020-06-25T18:10:40.885Z · comments (41)
Outline of Galef's "Scout Mindset"
Rob Bensinger (RobbBB) · 2021-08-10T00:16:59.050Z · comments (17)
Be less scared of overconfidence
benkuhn · 2022-11-30T15:20:07.738Z · comments (22)
[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)
A transparency and interpretability tech tree
evhub · 2022-06-16T23:44:14.961Z · comments (11)
The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)
How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (90)
The 2021 Less Wrong Darwin Game
lsusr · 2021-09-24T21:16:35.356Z · comments (102)
Gradient hacking is extremely difficult
beren · 2023-01-24T15:45:46.518Z · comments (22)
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
Bird Concept (jacobjacob) · 2023-10-20T21:04:32.645Z · comments (30)
Small and Vulnerable
sapphire (deluks917) · 2021-05-03T04:55:52.149Z · comments (17)
[link] Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout · 2019-12-05T02:33:34.321Z · comments (39)
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor · 2022-09-28T01:20:11.605Z · comments (19)
The Onion Test for Personal and Institutional Honesty
chanamessinger (cmessinger) · 2022-09-27T15:26:34.567Z · comments (31)
Secure homes for digital people
paulfchristiano · 2021-10-10T15:50:02.697Z · comments (37)
You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)
Rereading Atlas Shrugged
Vaniver · 2020-07-28T18:54:45.272Z · comments (36)
RAISE post-mortem
[deleted] · 2019-11-24T16:19:05.163Z · comments (12)
[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)
ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger (RobbBB) · 2022-07-05T00:15:36.308Z · comments (36)
The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)
The Dial of Progress
Zvi · 2023-06-13T13:40:06.354Z · comments (119)
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)
Logical induction for software engineers
Alex Flint (alexflint) · 2022-12-03T19:55:35.474Z · comments (8)
[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)
o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)
Saving Time
Scott Garrabrant · 2021-05-18T20:11:14.651Z · comments (20)
[link] Pseudorandomness contest: prizes, results, and analysis
Eric Neyman (UnexpectedValues) · 2021-01-15T06:24:15.317Z · comments (22)
Agentized LLMs will change the alignment landscape
Seth Herd · 2023-04-09T02:29:07.797Z · comments (102)
Jailbreaking GPT-4's code interpreter
Nikola Jurkovic (nikolaisalreadytaken) · 2023-07-13T18:43:54.484Z · comments (22)
[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)
[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)
Repeal the Foreign Dredge Act of 1906
Zvi · 2022-05-05T15:20:01.739Z · comments (16)
Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)
Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)
My research methodology
paulfchristiano · 2021-03-22T21:20:07.046Z · comments (38)
[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)
DeepMind's "​​Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)
Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)
Curing insanity with malaria
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2021-08-04T02:28:11.731Z · comments (8)
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (8)
Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)
«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch · 2022-07-26T23:03:55.941Z · comments (33)
[link] What would a compute monitoring plan look like? [Linkpost]
[deleted] · 2023-03-26T19:33:46.896Z · comments (10)
[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
Steven Byrnes (steve2152) · 2022-01-26T15:23:22.429Z · comments (19)
Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)
Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (87)
← previous page (newer posts) · next page (older posts) →