LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Responses to apparent rationalist confusions about game / decision theory
Anthony DiGiovanni (antimonyanthony) · 2023-08-30T22:02:12.218Z · comments (13)
Invulnerable Incomplete Preferences: A Formal Statement
Sami Petersen (sami-petersen) · 2023-08-30T21:59:36.186Z · comments (13)
[link] Report on Frontier Model Training
YafahEdelman (yafah-edelman-1) · 2023-08-30T20:02:46.317Z · comments (18)
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
Can Rager · 2023-08-30T17:36:59.034Z · comments (None)
A Letter to the Editor of MIT Technology Review
Jeffs · 2023-08-30T16:59:14.906Z · comments (None)
Biosecurity Culture, Computer Security Culture
jefftk (jkaufman) · 2023-08-30T16:40:03.101Z · comments (9)
Why I hang out at LessWrong and why you should check-in there every now and then
Bill Benzon (bill-benzon) · 2023-08-30T15:20:44.439Z · comments (5)
"Wanting" and "liking"
Mateusz Bagiński (mateusz-baginski) · 2023-08-30T14:52:04.571Z · comments (2)
Open Call for Research Assistants in Developmental Interpretability
Jesse Hoogland (jhoogland) · 2023-08-30T09:02:59.781Z · comments (11)
[link] LTFF and EAIF are unusually funding-constrained right now
Linch · 2023-08-30T01:03:30.321Z · comments (24)
[link] Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Neel Nanda (neel-nanda-1) · 2023-08-29T22:07:04.059Z · comments (1)
An OV-Coherent Toy Model of Attention Head Superposition
LaurenGreenspan · 2023-08-29T19:44:11.242Z · comments (None)
The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)
moyamo · 2023-08-29T18:28:54.015Z · comments (70)
The Epistemic Authority of Deep Learning Pioneers
Dylan Bowman (dylan-bowman) · 2023-08-29T18:14:12.244Z · comments (2)
[link] Democratic Fine-Tuning
Joe Edelman (joe-edelman) · 2023-08-29T18:13:16.684Z · comments (2)
Should rationalists (be seen to) win?
Will_Pearson · 2023-08-29T18:13:09.629Z · comments (7)
Frankfurt meetup
sultan · 2023-08-29T18:10:23.925Z · comments (None)
Istanbul meetup
sultan · 2023-08-29T18:10:23.909Z · comments (None)
[link] Broken Benchmark: MMLU
awg · 2023-08-29T18:09:02.907Z · comments (5)
[link] AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities
aogara (Aidan O'Gara) · 2023-08-29T15:07:03.215Z · comments (None)
Loft Bed Fan Guard
jefftk (jkaufman) · 2023-08-29T13:30:02.682Z · comments (3)
Dating Roundup #1: This is Why You’re Single
Zvi · 2023-08-29T12:50:04.964Z · comments (26)
Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world]
Bill Benzon (bill-benzon) · 2023-08-29T11:33:56.065Z · comments (None)
[link] Barriers to Mechanistic Interpretability for AGI Safety
Connor Leahy (NPCollapse) · 2023-08-29T10:56:45.639Z · comments (13)
Newcomb Variant
lsusr · 2023-08-29T07:02:58.510Z · comments (22)
[question] Incentives affecting alignment-researcher encouragement
NicholasKross · 2023-08-29T05:11:59.729Z · answers+comments (3)
Anyone want to debate publicly about FDT?
omnizoid · 2023-08-29T03:45:54.239Z · comments (31)
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein (simon-goldstein) · 2023-08-29T01:29:50.916Z · comments (3)
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
Georg Lange (GeorgLange) · 2023-08-29T01:04:18.688Z · comments (1)
[link] OpenAI API base models are not sycophantic, at any size
nostalgebraist · 2023-08-29T00:58:29.007Z · comments (16)
Paradigms and Theory Choice in AI: Adaptivity, Economy and Control
particlemania · 2023-08-28T22:19:11.167Z · comments (0)
[question] Humanities In A Post-Conscious AI World?
Netcentrica · 2023-08-28T21:59:57.848Z · answers+comments (1)
[link] Introducing the Center for AI Policy (& we're hiring!)
Thomas Larsen (thomas-larsen) · 2023-08-28T21:17:11.703Z · comments (49)
[question] 45% to 55% vs. 90% to 100%
yhoiseth · 2023-08-28T19:15:18.524Z · answers+comments (8)
Information warfare historically revolved around human conduits
trevor (TrevorWiesinger) · 2023-08-28T18:54:27.169Z · comments (7)
The Evidence for Question Decomposition is Weak
niplav · 2023-08-28T15:46:31.529Z · comments (7)
ACX Meetup Anywhere, Bratislava, Slovakia
David Varga (david-varga) · 2023-08-28T15:40:48.155Z · comments (None)
The Anthropic Principle Tells Us That AGI Will Not Be Conscious
nem · 2023-08-28T15:25:28.569Z · comments (8)
No More Freezer Pucks
jefftk (jkaufman) · 2023-08-28T15:20:02.929Z · comments (7)
The mind as a polyviscous fluid
Bill Benzon (bill-benzon) · 2023-08-28T14:38:26.937Z · comments (None)
[question] Who can most reduce X-Risk?
sudhanshu_kasewa · 2023-08-28T14:38:07.188Z · answers+comments (12)
Drinks at a bar
yakimoff · 2023-08-28T03:13:06.438Z · comments (None)
Dear Self; we need to talk about ambition
Elizabeth (pktechgirl) · 2023-08-27T23:10:04.720Z · comments (18)
AI pause/governance advocacy might be net-negative, especially without focus on explaining the x-risk
Mikhail Samin (mikhail-samin) · 2023-08-27T23:05:01.718Z · comments (9)
[link] Will issues are quite nearly skill issues
dkl9 · 2023-08-27T16:42:11.264Z · comments (1)
Xanadu, GPT, and Beyond: An adventure of the mind
Bill Benzon (bill-benzon) · 2023-08-27T16:19:58.916Z · comments (None)
High level overview on how to go about estimating "p(doom)" or the like
Aryeh Englander (alenglander) · 2023-08-27T16:01:07.467Z · comments (None)
Trying a Wet Suit
jefftk (jkaufman) · 2023-08-27T15:00:08.728Z · comments (5)
Apply to a small iteration of MLAB in Oxford
RP (Complex Bubble Tea) · 2023-08-27T14:54:46.864Z · comments (None)
Apply to a small iteration of MLAB to be run in Oxford
RP (Complex Bubble Tea) · 2023-08-27T14:21:18.310Z · comments (None)
next page (older posts) →