LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Gradient hacking is extremely difficult
beren · 2023-01-24T15:45:46.518Z · comments (22)
Small and Vulnerable
sapphire (deluks917) · 2021-05-03T04:55:52.149Z · comments (17)
The 2021 Less Wrong Darwin Game
lsusr · 2021-09-24T21:16:35.356Z · comments (102)
Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor · 2022-09-28T01:20:11.605Z · comments (19)
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)
You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)
o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)
[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)
Secure homes for digital people
paulfchristiano · 2021-10-10T15:50:02.697Z · comments (37)
RAISE post-mortem
[deleted] · 2019-11-24T16:19:05.163Z · comments (12)
The Dial of Progress
Zvi · 2023-06-13T13:40:06.354Z · comments (119)
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)
Rereading Atlas Shrugged
Vaniver · 2020-07-28T18:54:45.272Z · comments (36)
ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger (RobbBB) · 2022-07-05T00:15:36.308Z · comments (36)
[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)
And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)
Saving Time
Scott Garrabrant · 2021-05-18T20:11:14.651Z · comments (20)
Logical induction for software engineers
Alex Flint (alexflint) · 2022-12-03T19:55:35.474Z · comments (8)
Agentized LLMs will change the alignment landscape
Seth Herd · 2023-04-09T02:29:07.797Z · comments (102)
[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)
Jailbreaking GPT-4's code interpreter
Nikola Jurkovic (nikolaisalreadytaken) · 2023-07-13T18:43:54.484Z · comments (22)
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (8)
[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)
Repeal the Foreign Dredge Act of 1906
Zvi · 2022-05-05T15:20:01.739Z · comments (16)
Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)
Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)
DeepMind's "​​Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)
[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)
My research methodology
paulfchristiano · 2021-03-22T21:20:07.046Z · comments (38)
Curing insanity with malaria
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2021-08-04T02:28:11.731Z · comments (8)
Wireless is a trap
benkuhn · 2020-06-07T15:30:02.352Z · comments (13)
Why all the fuss about recursive self-improvement?
So8res · 2022-06-12T20:53:42.392Z · comments (62)
«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch · 2022-07-26T23:03:55.941Z · comments (33)
[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)
Godzilla Strategies
johnswentworth · 2022-06-11T15:44:16.385Z · comments (71)
[link] What would a compute monitoring plan look like? [Linkpost]
Akash (akash-wasil) · 2023-03-26T19:33:46.896Z · comments (10)
Slack matters more than any outcome
Valentine · 2022-12-31T20:11:02.287Z · comments (56)
[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
Steven Byrnes (steve2152) · 2022-01-26T15:23:22.429Z · comments (19)
Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)
[link] Pseudorandomness contest: prizes, results, and analysis
Eric Neyman (UnexpectedValues) · 2021-01-15T06:24:15.317Z · comments (22)
How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (89)
Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)
AI doom from an LLM-plateau-ist perspective
Steven Byrnes (steve2152) · 2023-04-27T13:58:10.973Z · comments (24)
My computational framework for the brain
Steven Byrnes (steve2152) · 2020-09-14T14:19:21.974Z · comments (26)
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth · 2022-06-04T05:41:56.713Z · comments (55)
Biology-Inspired AGI Timelines: The Trick That Never Works
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-12-01T22:35:28.379Z · comments (142)
My thoughts on the social response to AI risk
Matthew Barnett (matthew-barnett) · 2023-11-01T21:17:08.184Z · comments (37)
Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (36)
[link] grey goo is unlikely
bhauth · 2023-04-17T01:59:57.054Z · comments (120)
[link] Tuning your Cognitive Strategies
Raemon · 2023-04-27T20:32:06.337Z · comments (57)
← previous page (newer posts) · next page (older posts) →