LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Seven sources of goals in LLM agents
Seth Herd · 2025-02-08T21:54:20.186Z · comments (2)
Longtermist implications of aliens Space-Faring Civilizations - Introduction
Maxime Riché (maxime-riche) · 2025-02-21T12:08:42.403Z · comments (0)
[NSFW] The BDSM Path to Zen
lsusr · 2025-02-24T13:05:09.624Z · comments (9)
Come join Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-15T22:10:02.166Z · comments (0)
Wiki on Suspects in Lind, Zajko, and Maland Killings
Rebecca_Records · 2025-02-08T04:16:08.589Z · comments (4)
Don't go bankrupt, don't go rogue
Nathan Young · 2025-02-06T10:31:14.312Z · comments (1)
AI Strategy Updates that You Should Make
Alice Blair (Diatom) · 2025-01-27T21:10:41.838Z · comments (2)
[link] Can Knowledge Hurt You? The Dangers of Infohazards (and Exfohazards)
aggliu · 2025-02-08T15:51:43.143Z · comments (0)
Less Laptop Velcro
jefftk (jkaufman) · 2025-02-09T03:30:03.403Z · comments (0)
List of most interesting ideas I encountered in my life, ranked
Lucien (lucien) · 2025-02-23T12:36:48.158Z · comments (3)
[link] Poetic Methods I: Meter as Communication Protocol
adamShimi · 2025-02-01T18:22:39.676Z · comments (0)
[link] What are the "no free lunch" theorems?
Vishakha (vishakha-agrawal) · 2025-02-04T02:02:18.423Z · comments (4)
QFT and neural nets: the basic idea
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-24T13:54:45.099Z · comments (0)
Moral Hazard in Democratic Voting
lsusr · 2025-02-12T23:17:39.355Z · comments (8)
Monet: Mixture of Monosemantic Experts for Transformers Explained
CalebMaresca (caleb-maresca) · 2025-01-25T19:37:09.078Z · comments (2)
Ruling Out Lookup Tables
Alfred Harwood · 2025-02-04T10:39:34.899Z · comments (11)
[link] When should we worry about AI power-seeking?
Joe Carlsmith (joekc) · 2025-02-19T19:44:25.062Z · comments (0)
[Job ad] LISA CEO
Ryan Kidd (ryankidd44) · 2025-02-09T00:18:35.254Z · comments (4)
Inefficiencies in Pharmaceutical Research Practices
ErioirE (erioire) · 2025-02-22T04:43:09.147Z · comments (2)
[link] Published report: Pathways to short TAI timelines
Zershaaneh Qureshi (zershaaneh-qureshi) · 2025-02-20T22:10:12.276Z · comments (0)
Training AI to do alignment research we don’t already know how to do
joshc (joshua-clymer) · 2025-02-24T19:19:43.067Z · comments (0)
[link] Notes on Argentina
Annapurna (jorge-velez) · 2025-01-26T03:51:15.393Z · comments (5)
Efficiency spectra and “bucket of circuits” cartoons
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-29T15:06:50.768Z · comments (0)
[question] How useful would alien alignment research be?
Donald Hobson (donald-hobson) · 2025-01-23T10:59:22.330Z · answers+comments (5)
Seeing Through the Eyes of the Algorithm
silentbob · 2025-02-22T11:54:35.782Z · comments (1)
The memorization-generalization spectrum and learning coefficients
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-28T16:53:24.628Z · comments (0)
A City Within a City
Declan Molony (declan-molony) · 2025-02-24T15:51:19.118Z · comments (1)
Undergrad AI Safety Conference
JoNeedsSleep (joanna-j-1) · 2025-02-19T03:43:47.969Z · comments (0)
Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T13:04:12.774Z · comments (0)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio (yoshua-bengio) · 2025-02-24T18:31:48.580Z · comments (1)
[link] Conference Report: Threshold 2030 - Modeling AI Economic Futures
Deric Cheng (deric-cheng) · 2025-02-24T18:56:51.682Z · comments (0)
Half-baked idea: a straightforward method for learning environmental goals?
Q Home · 2025-02-04T06:56:31.813Z · comments (7)
[link] Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)
Archimedes · 2025-02-04T02:55:44.401Z · comments (0)
$300 Fermi Model Competition
ozziegooen · 2025-02-03T19:47:09.270Z · comments (14)
[link] Training Data Attribution (TDA): Examining Its Adoption & Use Cases
Deric Cheng (deric-cheng) · 2025-01-22T15:40:13.393Z · comments (0)
Test of the Bene Gesserit
lsusr · 2025-02-23T11:51:10.279Z · comments (1)
[link] Rationalist Movie Reviews
Nicholas / Heather Kross (NicholasKross) · 2025-02-01T23:10:53.184Z · comments (2)
5,000 calories of peanut butter every week for 3 years straight
Declan Molony (declan-molony) · 2025-01-31T17:29:35.190Z · comments (8)
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Stuart_Armstrong · 2025-01-31T15:36:01.050Z · comments (2)
[question] Do you consider perfect surveillance inevitable?
samuelshadrach (xpostah) · 2025-01-24T04:57:48.266Z · answers+comments (34)
6 (Potential) Misconceptions about AI Intellectuals
ozziegooen · 2025-02-14T23:51:44.983Z · comments (11)
Detecting AI Agent Failure Modes in Simulations
Michael Soareverix (michael-soareverix) · 2025-02-11T11:10:26.030Z · comments (0)
November-December 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2025-01-22T01:20:00.868Z · comments (0)
[link] The Geometry of Linear Regression versus PCA
criticalpoints · 2025-02-23T21:01:33.415Z · comments (2)
How different LLMs answered PhilPapers 2020 survey
Satron · 2025-01-27T21:41:12.334Z · comments (1)
Studies of Human Error Rate
tin482 · 2025-02-13T13:43:30.717Z · comments (3)
[link] Is there such a thing as an impossible protein?
Abhishaike Mahajan (abhishaike-mahajan) · 2025-01-24T17:12:01.174Z · comments (3)
[link] Lazy Hasselback Pommes Anna
Brendan Long (korin43) · 2025-01-26T21:30:36.587Z · comments (18)
[link] "Self-Blackmail" and Alternatives
jessicata (jessica.liu.taylor) · 2025-02-09T23:20:19.895Z · comments (12)
[link] Won't vs. Can't: Sandbagging-like Behavior from Claude Models
Joe Benton · 2025-02-19T20:47:06.792Z · comments (0)
← previous page (newer posts) · next page (older posts) →