LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Detecting AI Agent Failure Modes in Simulations
Michael Soareverix (michael-soareverix) · 2025-02-11T11:10:26.030Z · comments (0)
Chaos Investments v0.31
Screwtape · 2025-02-08T06:53:22.959Z · comments (1)
[link] "Self-Blackmail" and Alternatives
jessicata (jessica.liu.taylor) · 2025-02-09T23:20:19.895Z · comments (12)
Studies of Human Error Rate
tin482 · 2025-02-13T13:43:30.717Z · comments (3)
[link] Visual Reference for Frontier Large Language Models
kenakofer · 2025-02-11T05:14:24.752Z · comments (0)
[link] Systematic Sandbagging Evaluations on Claude 3.5 Sonnet
farrelmahaztra · 2025-02-14T01:22:46.695Z · comments (0)
Rational Utopia, Multiversal AI Alignment, Steerable ASI, Ultimate Human Freedom (V. 2: Place ASI, some pictures)
ank · 2025-02-11T03:21:40.899Z · comments (7)
Hopeful hypothesis, the Persona Jukebox.
Donald Hobson (donald-hobson) · 2025-02-14T19:24:35.514Z · comments (4)
[link] A Bearish Take on AI, as a Treat
rats (cartier-gucciscarf) · 2025-02-10T19:22:30.593Z · comments (0)
[link] Forecasting newsletter #2/2025: Forecasting meetup network
NunoSempere (Radamantis) · 2025-02-09T18:07:51.514Z · comments (0)
I'm making a ttrpg about life in an intentional community during the last year before the Singularity
bgaesop · 2025-02-13T21:54:09.002Z · comments (2)
[link] OpenAI lied about SFT vs. RLHF
sanxiyn · 2025-02-10T03:24:16.625Z · comments (2)
[question] A Simulation of Automation economics?
qbolec · 2025-02-10T08:11:04.424Z · answers+comments (1)
Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-13T00:49:48.854Z · comments (0)
[link] Inside the dark forests of the internet
Itay Dreyfus (itay-dreyfus) · 2025-02-12T10:20:59.426Z · comments (0)
[link] What About The Horses?
Maxwell Tabarrok (maxwell-tabarrok) · 2025-02-11T13:59:36.913Z · comments (17)
AXRP Episode 38.7 - Anthony Aguirre on the Future of Life Institute
DanielFilan · 2025-02-09T01:10:05.683Z · comments (0)
SWE Automation Is Coming: Consider Selling Your Crypto
A_donor · 2025-02-13T20:17:59.227Z · comments (6)
[link] The current AI strategic landscape: one bear's perspective
Matrice Jacobine · 2025-02-15T09:49:13.120Z · comments (0)
[link] Introduction to Expected Value Fanaticism
Petra Kosonen · 2025-02-14T19:05:26.556Z · comments (4)
If Neuroscientists Succeed
Mordechai Rorvig (mordechai-rorvig) · 2025-02-11T15:33:09.098Z · comments (6)
The Structure of Professional Revolutions
SebastianG (JohnBuridan) · 2025-02-09T13:23:01.059Z · comments (0)
Come join Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-15T22:10:02.166Z · comments (0)
Sleeping Beauty: an Accuracy-based Approach
glauberdebona · 2025-02-10T15:40:29.619Z · comments (2)
Goals don't necesserily start to crystallize the moment AI is capable enough to fake alignment
Mikhail Samin (mikhail-samin) · 2025-02-08T23:44:46.081Z · comments (0)
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros (ana-kapros) · 2025-02-12T19:12:07.592Z · comments (0)
Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)
[link] AI Safety at the Frontier: Paper Highlights, January '25
gasteigerjo · 2025-02-11T16:14:16.972Z · comments (0)
Knitting a Sweater in a Burning House
CrimsonChin · 2025-02-15T19:50:33.275Z · comments (0)
[question] p(s-risks to contemporary humans)?
mhampton · 2025-02-08T21:19:53.821Z · answers+comments (5)
[question] Should I Divest from AI?
OKlogic · 2025-02-10T03:29:33.582Z · answers+comments (4)
Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald (oliver-oswald) · 2025-02-10T19:19:36.233Z · comments (6)
Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (4)
[link] Teaching AI to reason: this year's most important story
Benjamin_Todd · 2025-02-13T17:40:02.869Z · comments (0)
[link] How do you make a 250x better vaccine at 1/10 the cost? Develop it in India.
Abhishaike Mahajan (abhishaike-mahajan) · 2025-02-09T03:53:17.050Z · comments (5)
OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (2)
Cross-Layer Feature Alignment and Steering in Large Language Model
dlaptev · 2025-02-08T20:18:20.331Z · comments (0)
Response to the US Govt's Request for Information Concerning Its AI Action Plan
Davey Morse (davey-morse) · 2025-02-14T06:14:08.673Z · comments (0)
ML4Good Colombia - Applications Open to LatAm Participants
Alejandro Acelas (alejandro-acelas) · 2025-02-10T15:03:03.929Z · comments (0)
AI Safety Oversights
Davey Morse (davey-morse) · 2025-02-08T06:15:52.896Z · comments (0)
Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour · 2025-02-11T18:15:33.082Z · comments (0)
Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang (weibing-wang) · 2025-02-11T14:01:39.167Z · comments (0)
LW/ACX social meetup
Stefan (stefan-1) · 2025-02-10T21:12:39.092Z · comments (0)
How identical twin sisters feel about nieces vs their own daughters
Dave Lindbergh (dave-lindbergh) · 2025-02-09T17:36:25.830Z · comments (19)
Sparse Autoencoder Feature Ablation for Unlearning
aludert · 2025-02-13T19:13:48.388Z · comments (0)
[link] Probability of AI-Caused Disaster
Alvin Ånestrand (alvin-anestrand) · 2025-02-12T19:40:11.121Z · comments (2)
Intrinsic Dimension of Prompts in LLMs
Karthik Viswanathan (vkarthik095) · 2025-02-14T19:02:49.464Z · comments (0)
Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester · 2025-02-11T03:04:58.852Z · comments (0)
[link] Claude is More Anxious than GPT; Personality is an axis of interpretability in language models
future_detective · 2025-02-10T19:19:28.005Z · comments (2)
Quantifying the Qualitative: Towards a Bayesian Approach to Personal Insight
Pruthvi Kumar (pruthvi-kumar) · 2025-02-15T19:50:42.550Z · comments (0)
← previous page (newer posts) · next page (older posts) →