LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Is AI Physical?
Lauren Greenspan (LaurenGreenspan) · 2025-01-14T21:21:39.999Z · comments (3)
[link] It looks like there are some good funding opportunities in AI safety right now
Benjamin_Todd · 2024-12-22T12:41:02.151Z · comments (0)
[link] Does natural selection favor AIs over humans?
cdkg · 2024-10-03T18:47:43.517Z · comments (1)
Lab governance reading list
Zach Stein-Perlman · 2024-10-25T18:00:28.346Z · comments (3)
[link] Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders
PaulPauls · 2024-11-24T05:45:20.124Z · comments (3)
[link] Announcement: AI for Math Fund
sarahconstantin · 2024-12-05T18:33:13.556Z · comments (9)
[question] What is the alpha in one bit of evidence?
J Bostock (Jemist) · 2024-10-22T21:57:09.056Z · answers+comments (13)
Gwerns
Tomás B. (Bjartur Tómas) · 2024-11-16T14:31:57.791Z · comments (2)
AI Can be “Gradient Aware” Without Doing Gradient hacking.
Sodium · 2024-10-20T21:02:10.754Z · comments (0)
[link] I read every major AI lab’s safety plan so you don’t have to
sarahhw · 2024-12-16T18:51:38.499Z · comments (0)
Latent Adversarial Training (LAT) Improves the Representation of Refusal
alexandraabbas · 2025-01-06T10:24:53.419Z · comments (6)
[link] Update on the Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-11-04T19:22:06.540Z · comments (9)
[link] Fragile, Robust, and Antifragile Preference Satisfaction
adamShimi · 2024-11-02T17:25:55.986Z · comments (0)
[link] To Be Born in a Bag
Niko_McCarty (niko-2) · 2024-10-06T17:21:00.605Z · comments (1)
Balsa Research 2024 Update
Zvi · 2024-12-03T12:30:06.829Z · comments (0)
Open Thread Winter 2024/2025
habryka (habryka4) · 2024-12-25T21:02:41.760Z · comments (10)
subfunctional overlaps in attentional selection history implies momentum for decision-trajectories
Emrik (Emrik North) · 2024-12-22T14:12:49.027Z · comments (1)
Proof Explained for "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-22T15:06:16.880Z · comments (0)
minifest
Austin Chen (austin-chen) · 2024-12-07T03:50:38.573Z · comments (1)
Whistleblowing Twitter Bot
Mckiev · 2024-12-26T04:09:45.493Z · comments (5)
Economics Roundup #4
Zvi · 2024-10-15T13:20:06.923Z · comments (4)
Review: “The Case Against Reality”
David Gross (David_Gross) · 2024-10-29T13:13:29.643Z · comments (9)
Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph (redhat) · 2024-10-28T14:41:41.969Z · comments (5)
Definition of alignment science I like
quetzal_rainbow · 2025-01-06T20:40:38.187Z · comments (0)
[link] Why OpenAI’s Structure Must Evolve To Advance Our Mission
stuhlmueller · 2024-12-28T04:24:19.937Z · comments (1)
[link] Can o1-preview find major mistakes amongst 59 NeurIPS '24 MLSB papers?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-18T14:21:03.661Z · comments (0)
Turning up the Heat on Deceptively-Misaligned AI
J Bostock (Jemist) · 2025-01-07T00:13:28.191Z · comments (16)
AGI with RL is Bad News for Safety
Nadav Brandes (nadav-brandes) · 2024-12-21T19:36:03.970Z · comments (22)
Write Good Enough Code, Quickly
Oliver Daniels (oliver-daniels-koch) · 2024-12-15T04:45:56.797Z · comments (10)
[link] Forecast 2025 With Vox's Future Perfect Team — $2,500 Prize Pool
ChristianWilliams · 2024-12-20T23:00:35.334Z · comments (0)
Higher and lower pleasures
Chris_Leong · 2024-12-05T13:13:46.526Z · comments (3)
[link] Chess As The Model Game
criticalpoints · 2024-11-17T19:45:26.499Z · comments (0)
Really radical empathy
MichaelStJules · 2025-01-06T17:46:31.269Z · comments (0)
Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal]
Jason Gross (jason-gross) · 2025-01-06T04:22:12.633Z · comments (0)
Can we rescue Effective Altruism?
Elizabeth (pktechgirl) · 2025-01-09T16:40:02.405Z · comments (0)
D/acc AI Security Salon
Allison Duettmann (allison-duettmann) · 2024-10-19T22:17:57.067Z · comments (0)
Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (0)
[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (49)
[link] GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
ChengCheng (ccstan99) · 2024-11-01T00:10:50.718Z · comments (0)
[link] AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:08:56.604Z · comments (0)
[link] AI & wisdom 2: growth and amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:07:39.449Z · comments (0)
[link] A primer on machine learning in cryo-electron microscopy (cryo-EM)
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-22T15:11:58.860Z · comments (0)
Monthly Roundup #25: December 2024
Zvi · 2024-12-23T14:20:04.682Z · comments (3)
[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)
Beliefs and state of mind into 2025
RussellThor · 2025-01-10T22:07:01.060Z · comments (9)
Announcing the CLR Foundations Course and CLR S-Risk Seminars
JamesFaville (elephantiskon) · 2024-11-19T01:18:10.085Z · comments (0)
Word Spaghetti
Gordon Seidoh Worley (gworley) · 2024-10-23T05:39:20.105Z · comments (9)
[link] AI safety content you could create
Adam Jones (domdomegg) · 2025-01-06T15:35:56.167Z · comments (0)
[link] Genesis
PeterMcCluskey · 2024-12-31T22:01:17.277Z · comments (0)
We need a universal definition of 'agency' and related words
CstineSublime · 2025-01-11T03:22:56.623Z · comments (1)
← previous page (newer posts) · next page (older posts) →