LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?
gwern · 2023-07-03T00:48:47.131Z · comments (54)

Alignment Grantmaking is Funding-Limited Right Now
johnswentworth · 2023-07-19T16:49:08.811Z · comments (68)

Accidentally Load Bearing
jefftk (jkaufman) · 2023-07-13T16:10:00.806Z · comments (18)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

Self-driving car bets
paulfchristiano · 2023-07-29T18:10:01.112Z · comments (44)

[link] Cultivating a state of mind where new ideas are born
Henrik Karlsson (henrik-karlsson) · 2023-07-27T09:16:42.566Z · comments (21)

Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn · 2023-07-04T17:32:48.047Z · comments (32)

Consciousness as a conflationary alliance term for intrinsically valued internal experiences
Andrew_Critch · 2023-07-10T08:09:48.881Z · comments (54)

My "2.9 trauma limit"
Raemon · 2023-07-01T19:32:14.805Z · comments (31)

Towards Developmental Interpretability
Jesse Hoogland (jhoogland) · 2023-07-12T19:33:44.788Z · comments (10)

Grant applications and grand narratives
Elizabeth (pktechgirl) · 2023-07-02T00:16:25.129Z · comments (22)

Cryonics and Regret
MvB (martin-von-berg) · 2023-07-24T09:16:01.456Z · comments (35)

[link] [Linkpost] Introducing Superalignment
beren · 2023-07-05T18:23:18.419Z · comments (69)

Rationality !== Winning
Raemon · 2023-07-24T02:53:59.764Z · comments (51)

When can we trust model evaluations?
evhub · 2023-07-28T19:42:21.799Z · comments (10)

Why it's so hard to talk about Consciousness
Rafael Harth (sil-ver) · 2023-07-02T15:56:05.188Z · comments (210)

Jailbreaking GPT-4's code interpreter
Nikola Jurkovic (nikolaisalreadytaken) · 2023-07-13T18:43:54.484Z · comments (22)

OpenAI Launches Superalignment Taskforce
Zvi · 2023-07-11T13:00:06.232Z · comments (40)

Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-07-24T11:30:10.602Z · comments (12)

[link] The Goddess of Everything Else - The Animation
Writer · 2023-07-13T16:26:25.552Z · comments (4)

The Seeker’s Game – Vignettes from the Bay
Yulia · 2023-07-09T19:32:58.717Z · comments (19)

Going Crazy and Getting Better Again
Evenstar · 2023-07-02T18:55:25.790Z · comments (13)

How LLMs are and are not myopic
janus · 2023-07-25T02:19:44.949Z · comments (16)

[link] Neuronpedia
Johnny Lin (hijohnnylin) · 2023-07-26T16:29:28.884Z · comments (51)

[link] Introducing Fatebook: the fastest way to make and track predictions
Adam B (adam-b) · 2023-07-11T15:28:13.798Z · comments (41)

Ten Levels of AI Alignment Difficulty
Sammy Martin (SDM) · 2023-07-03T20:20:21.403Z · comments (24)

[link] Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave · 2023-07-20T17:31:35.814Z · comments (22)

Reducing sycophancy and improving honesty via activation steering
Nina Panickssery (NinaR) · 2023-07-28T02:46:23.122Z · comments (18)

Views on when AGI comes and on strategy to reduce existential risk
TsviBT · 2023-07-08T09:00:19.735Z · comments (56)

Why was the AI Alignment community so unprepared for this moment?
Ras1513 · 2023-07-15T00:26:29.769Z · comments (65)

“Reframing Superintelligence” + LLMs + 4 years
Eric Drexler · 2023-07-10T13:42:09.739Z · comments (9)

[link] Introducing bayescalc.io
Adele Lopez (adele-lopez-1) · 2023-07-07T16:11:12.854Z · comments (29)

[link] Winners of AI Alignment Awards Research Contest
Orpheus16 (akash-wasil) · 2023-07-13T16:14:38.243Z · comments (4)

QAPR 5: grokking is maybe not *that* big a deal?
Quintin Pope (quintin-pope) · 2023-07-23T20:14:33.405Z · comments (15)

Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-07-18T16:36:34.473Z · comments (15)

A transcript of the TED talk by Eliezer Yudkowsky
Mikhail Samin (mikhail-samin) · 2023-07-12T12:12:34.399Z · comments (13)

[link] Priorities for the UK Foundation Models Taskforce
Andrea_Miotti (AndreaM) · 2023-07-21T15:23:34.029Z · comments (4)

Consider Joining the UK Foundation Model Taskforce
Zvi · 2023-07-10T13:50:05.097Z · comments (12)

Anthropic Observations
Zvi · 2023-07-25T12:50:03.178Z · comments (1)

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck · 2023-07-26T17:02:56.456Z · comments (19)

Fixed Point: a love story
Richard_Ngo (ricraz) · 2023-07-08T13:56:54.807Z · comments (2)

When Someone Tells You They're Lying, Believe Them
ymeskhout · 2023-07-14T00:31:48.168Z · comments (3)

"Justice, Cherryl."
Zack_M_Davis · 2023-07-23T16:16:40.835Z · comments (21)

BCIs and the ecosystem of modular minds
beren · 2023-07-21T15:58:27.081Z · comments (14)

Apollo Neuro Results
Elizabeth (pktechgirl) · 2023-07-30T18:40:05.213Z · comments (17)

[question] What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?
lukemarks (marc/er) · 2023-07-08T11:42:38.625Z · answers+comments (28)

Underwater Torture Chambers: The Horror Of Fish Farming
omnizoid · 2023-07-26T00:27:15.490Z · comments (50)

[link] A $10k retroactive grant for VaccinateCA
Austin Chen (austin-chen) · 2023-07-27T18:14:44.305Z · comments (0)

Sapient Algorithms
Valentine · 2023-07-17T16:30:01.350Z · comments (15)

[UPDATE: deadline extended to July 24!] New wind in rationality’s sails: Applications for Epistea Residency 2023 are now open
Jana Meixnerová (Epistea) · 2023-07-11T11:02:28.705Z · comments (7)

next page (older posts) →

Goal	Understanding v Control	Confidence	Concept v Algorithm	(Un)supervised?	How context specific?
Alignment evaluations	Understanding	Any	Concept+	Either	Either
FaithfulReasoning	Understanding^∗	Any	Concept+	Supervised+	Either
DebuggingFailures	Understanding^∗	Low	Either	Unsupervised+	Specific
Monitoring	Understanding	Any	Concept+	Supervised+	General
Red teaming	Either	Low	Either	Unsupervised+	Specific
Amplified oversight	Understanding	Complicated	Concept	Either	Specific

Technique	Understanding v Control	Confidence	Concept v Algorithm	(Un)supervised?	How specific?	Scalability
Probing	Understanding	Low	Concept	Supervised	Specific-ish	Cheap
Dictionary learning	Both	Low	Concept	Unsupervised	General^∗	Expensive
Steering vectors	Control	Low	Concept	Supervised	Specific-ish	Cheap
Training data attribution	Understanding	Low	Concept	Unsupervised	General^∗	Expensive
Auto-interp	Understanding	Low	Concept	Unsupervised	General^∗	Cheap
Component Attribution	Both	Medium	Concept	Complicated	Specific	Cheap
Circuit analysis (causal)	Understanding	Medium	Algorithm	Complicated	Specific	Expensive

LessWrong 2.0 Reader

Archive

Recent comments