LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Talk: a brief explanation of sexual dimorphism
Malmesbury (Elmer of Malmesbury) · 2023-09-18T16:23:56.073Z · comments (72)
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth · 2023-09-25T16:08:17.040Z · comments (53)
Sharing Information About Nonlinear
Ben Pace (Benito) · 2023-09-07T06:51:11.846Z · comments (323)
[link] EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
Elizabeth (pktechgirl) · 2023-09-28T23:30:03.390Z · comments (247)
[link] Sum-threshold attacks
TsviBT · 2023-09-08T17:13:37.044Z · comments (52)
[link] AI presidents discuss AI alignment agendas
TurnTrout · 2023-09-09T18:55:37.931Z · comments (22)
What I would do if I wasn’t at ARC Evals
LawrenceC (LawChan) · 2023-09-05T19:19:36.830Z · comments (8)
UDT shows that decision theory is more puzzling than ever
Wei Dai (Wei_Dai) · 2023-09-13T12:26:09.739Z · comments (51)
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB (JanBrauner) · 2023-09-28T18:53:58.896Z · comments (37)
A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX
jacobjacob · 2023-09-01T04:03:41.067Z · comments (23)
There should be more AI safety orgs
Marius Hobbhahn (marius-hobbhahn) · 2023-09-21T14:53:52.779Z · comments (25)
Defunding My Mistake
ymeskhout · 2023-09-04T14:43:14.274Z · comments (41)
[link] The King and the Golem
Richard_Ngo (ricraz) · 2023-09-25T19:51:22.980Z · comments (15)
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (7)
[link] "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation
titotal (lombertini) · 2023-09-29T14:01:15.453Z · comments (81)
Meta Questions about Metaphilosophy
Wei Dai (Wei_Dai) · 2023-09-01T01:17:57.578Z · comments (78)
One Minute Every Moment
abramdemski · 2023-09-01T20:23:56.391Z · comments (23)
[link] Paper: LLMs trained on “A is B” fail to learn “B is A”
lberglund (brglnd) · 2023-09-23T19:55:53.427Z · comments (73)
[link] The smallest possible button (or: moth traps!)
Neil (neil-warren) · 2023-09-02T15:24:20.453Z · comments (17)
Interpreting OpenAI's Whisper
EllenaR · 2023-09-24T17:53:44.955Z · comments (10)
[link] Paper: On measuring situational awareness in LLMs
Owain_Evans · 2023-09-04T12:54:20.516Z · comments (16)
[link] ActAdd: Steering Language Models without Optimization
technicalities · 2023-09-06T17:21:56.214Z · comments (3)
PSA: The community is in Berkeley/Oakland, not "the Bay Area"
maia · 2023-09-11T15:59:47.132Z · comments (7)
[link] Cohabitive Games so Far
mako yass (MakoYass) · 2023-09-28T15:41:27.986Z · comments (118)
[link] Reproducing ARC Evals' recent report on language model agents
Thomas Broadley (thomas-broadley) · 2023-09-01T16:52:17.147Z · comments (17)
[link] Explaining grokking through circuit efficiency
Vikrant Varma (amrav) · 2023-09-08T14:39:23.910Z · comments (10)
Closing Notes on Nonlinear Investigation
Ben Pace (Benito) · 2023-09-15T22:44:58.488Z · comments (47)
“X distracts from Y” as a thinly-disguised fight over group status / politics
Steven Byrnes (steve2152) · 2023-09-25T15:18:18.644Z · comments (14)
Announcing FAR Labs, an AI safety coworking space
bgold · 2023-09-29T16:52:37.753Z · comments (0)
[link] Atoms to Agents Proto-Lectures
johnswentworth · 2023-09-22T06:22:05.456Z · comments (14)
[link] Logical Share Splitting
DaemonicSigil · 2023-09-11T04:08:32.350Z · comments (16)
AI #31: It Can Do What Now?
Zvi · 2023-09-28T16:00:01.919Z · comments (6)
[link] Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-09-19T15:09:27.235Z · comments (23)
Making AIs less likely to be spiteful
Nicolas Macé (NicolasMace) · 2023-09-26T14:12:06.202Z · comments (2)
[link] I compiled a ebook of `Project Lawful` for eBook readers
OrwellGoesShopping · 2023-09-15T18:09:31.703Z · comments (4)
[link] Benchmarks for Detecting Measurement Tampering [Redwood Research]
ryan_greenblatt · 2023-09-05T16:44:48.032Z · comments (18)
Highlights: Wentworth, Shah, and Murphy on "Retargeting the Search"
RobertM (T3t) · 2023-09-14T02:18:05.890Z · comments (4)
Navigating an ecosystem that might or might not be bad for the world
habryka (habryka4) · 2023-09-15T23:58:00.389Z · comments (20)
Memory bandwidth constraints imply economies of scale in AI inference
Ege Erdil (ege-erdil) · 2023-09-17T14:01:34.701Z · comments (33)
[question] How have you become more hard-working?
Chi Nguyen · 2023-09-25T12:37:39.860Z · answers+comments (40)
AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo
Zvi · 2023-09-21T12:00:06.616Z · comments (8)
Text Posts from the Kids Group: 2023 I
jefftk (jkaufman) · 2023-09-05T02:00:04.118Z · comments (3)
Find Hot French Food Near Me: A Follow-up
aphyer · 2023-09-06T12:32:02.844Z · comments (19)
Luck based medicine: angry eldritch sugar gods edition
Elizabeth (pktechgirl) · 2023-09-19T04:40:06.334Z · comments (13)
[question] How to talk about reasons why AGI might not be near?
Kaj_Sotala · 2023-09-17T08:18:31.100Z · answers+comments (19)
A quick update from Nonlinear
KatWoods (ea247) · 2023-09-07T21:28:26.569Z · comments (23)
Contra Yudkowsky on Epistemic Conduct for Author Criticism
Zack_M_Davis · 2023-09-13T15:33:14.987Z · comments (38)
Influence functions - why, what and how
Nina Rimsky (NinaR) · 2023-09-15T20:42:08.653Z · comments (6)
Would You Work Harder In The Least Convenient Possible World?
Firinn · 2023-09-22T05:17:05.148Z · comments (93)
High-level interpretability: detecting an AI's objectives
Paul Colognese (paul-colognese) · 2023-09-28T19:30:16.753Z · comments (4)
next page (older posts) →