LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth · 2022-08-08T18:05:11.982Z · comments (12)
Covid 1/21: Turning the Corner
Zvi · 2021-01-21T16:40:00.941Z · comments (41)
EA orgs' legal structure inhibits risk taking and information sharing on the margin
Elizabeth (pktechgirl) · 2023-11-05T19:13:56.135Z · comments (17)
[Completed] The 2024 Petrov Day Scenario
Ben Pace (Benito) · 2024-09-26T08:08:32.495Z · comments (114)
A mechanistic model of meditation
Kaj_Sotala · 2019-11-06T21:37:03.819Z · comments (11)
"Rationalist Discourse" Is Like "Physicist Motors"
Zack_M_Davis · 2023-02-26T05:58:29.249Z · comments (153)
Read the Roon
Zvi · 2024-03-05T13:50:04.967Z · comments (6)
[question] LessWrong Coronavirus Agenda
Elizabeth (pktechgirl) · 2020-03-18T04:48:56.769Z · answers+comments (65)
Contra EY: Can AGI destroy us without trial & error?
Nikita Sokolsky (nikita-sokolsky) · 2022-06-13T18:26:09.460Z · comments (72)
Four mindset disagreements behind existential risk disagreements in ML
Rob Bensinger (RobbBB) · 2023-04-11T04:53:48.427Z · comments (12)
[link] Ten Thousand Years of Solitude
agp (antonio-papa) · 2023-08-15T17:45:34.556Z · comments (19)
[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)
Carrying the Torch: A Response to Anna Salamon by the Guild of the Rose
moridinamael · 2022-07-06T14:20:14.847Z · comments (16)
[link] Neuronpedia
Johnny Lin (hijohnnylin) · 2023-07-26T16:29:28.884Z · comments (51)
Five Ways To Prioritize Better
lynettebye · 2020-06-27T18:40:26.600Z · comments (7)
Debate update: Obfuscated arguments problem
Beth Barnes (beth-barnes) · 2020-12-23T03:24:38.191Z · comments (24)
LessWrong Now Has Dark Mode
jimrandomh · 2022-05-10T01:21:44.065Z · comments (31)
On Bounded Distrust
Zvi · 2022-02-03T14:50:00.883Z · comments (19)
Don't Dismiss Simple Alignment Approaches
Chris_Leong · 2023-10-07T00:35:26.789Z · comments (9)
Monitoring for deceptive alignment
evhub · 2022-09-08T23:07:03.327Z · comments (8)
Possible takeaways from the coronavirus pandemic for slow AI takeoff
Vika · 2020-05-31T17:51:26.437Z · comments (36)
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau · 2022-10-27T01:32:44.750Z · comments (14)
The 99% principle for personal problems
Kaj_Sotala · 2023-10-02T08:20:07.379Z · comments (20)
2018 Review: Voting Results!
Ben Pace (Benito) · 2020-01-24T02:00:34.656Z · comments (59)
Loving a world you don’t trust
Joe Carlsmith (joekc) · 2024-06-18T19:31:36.581Z · comments (13)
Integrity in AI Governance and Advocacy
habryka (habryka4) · 2023-11-03T19:52:33.180Z · comments (57)
Message Length
Zack_M_Davis · 2020-10-20T05:52:56.277Z · comments (25)
Limitations on Formal Verification for AI Safety
Andrew Dickson · 2024-08-19T23:03:52.706Z · comments (60)
Mechanistic anomaly detection and ELK
paulfchristiano · 2022-11-25T18:50:04.447Z · comments (22)
A non-mystical explanation of insight meditation and the three characteristics of existence: introduction and preamble
Kaj_Sotala · 2020-05-05T19:09:44.484Z · comments (40)
[link] How to slow down scientific progress, according to Leo Szilard
jasoncrawford · 2023-01-05T18:26:12.121Z · comments (18)
How LLMs are and are not myopic
janus · 2023-07-25T02:19:44.949Z · comments (16)
You have a place to stay in Sweden, should you need it.
Dojan · 2022-02-27T01:21:19.552Z · comments (3)
Brainstorm of things that could force an AI team to burn their lead
So8res · 2022-07-24T23:58:16.988Z · comments (8)
Wolf Incident Postmortem
jefftk (jkaufman) · 2023-01-09T03:20:03.723Z · comments (13)
[question] Will COVID-19 survivors suffer lasting disability at a high rate?
jimrandomh · 2020-02-11T20:23:50.664Z · answers+comments (11)
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda (neel-nanda-1) · 2024-07-07T17:39:35.064Z · comments (16)
How to evaluate (50%) predictions
Rafael Harth (sil-ver) · 2020-04-10T17:12:02.867Z · comments (50)
[question] Forecasting Thread: AI Timelines
Amandango · 2020-08-22T02:33:09.431Z · answers+comments (98)
Pretraining Language Models with Human Preferences
Tomek Korbak (tomek-korbak) · 2023-02-21T17:57:09.774Z · comments (20)
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak · 2022-11-22T18:57:29.604Z · comments (97)
Apocalypse insurance, and the hardline libertarian take on AI risk
So8res · 2023-11-28T02:09:52.400Z · comments (40)
Modal Fixpoint Cooperation without Löb's Theorem
Andrew_Critch · 2023-02-05T00:58:40.975Z · comments (34)
Invulnerable Incomplete Preferences: A Formal Statement
SCP (sami-petersen) · 2023-08-30T21:59:36.186Z · comments (38)
The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (47)
How it All Went Down: The Puzzle Hunt that took us way, way Less Online
A* (agendra) · 2024-06-02T08:01:40.109Z · comments (5)
On saying "Thank you" instead of "I'm Sorry"
Michael Cohn (michael-cohn) · 2024-07-08T03:13:50.663Z · comments (16)
LessWrong is paying $500 for Book Reviews
Ruby · 2021-09-14T00:24:23.507Z · comments (25)
Nuclear war is unlikely to cause human extinction
Jeffrey Ladish (jeff-ladish) · 2020-11-07T05:42:24.380Z · comments (48)
Demand offsetting
paulfchristiano · 2021-03-21T18:20:05.090Z · comments (41)
← previous page (newer posts) · next page (older posts) →