LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (0)
Studies of Human Error Rate
tin482 · 2025-02-13T13:43:30.717Z · comments (0)
the dumbest theory of everything
lostinwilliamsburg · 2025-02-13T07:57:38.842Z · comments (0)
Skepticism towards claims about the views of powerful institutions
tlevin (trevor) · 2025-02-13T07:40:52.257Z · comments (1)
Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise
lc · 2025-02-13T06:59:17.525Z · comments (4)
My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (3)
Not all capabilities will be created equal: focus on strategically superhuman agents
benwr · 2025-02-13T01:24:46.084Z · comments (0)
[link] LLMs can teach themselves to better predict the future
Ben Turtel (ben-turtel) · 2025-02-13T01:01:12.175Z · comments (0)
Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-13T00:49:48.854Z · comments (0)
Extended analogy between humans, corporations, and AIs.
Daniel Kokotajlo (daniel-kokotajlo) · 2025-02-13T00:03:13.956Z · comments (1)
Moral Hazard in Democratic Voting
lsusr · 2025-02-12T23:17:39.355Z · comments (5)
MATS Spring 2024 Extension Retrospective
HenningB (HenningBlue) · 2025-02-12T22:43:58.193Z · comments (0)
[link] Hunting for AI Hackers: LLM Agent Honeypot
Reworr R (reworr-reworr) · 2025-02-12T20:29:32.269Z · comments (0)
[link] Probability of AI-Caused Disaster
Alvin Ånestrand (alvin-anestrand) · 2025-02-12T19:40:11.121Z · comments (2)
Two flaws in the Machiavelli Benchmark
TheManxLoiner · 2025-02-12T19:34:35.241Z · comments (0)
Gradient Anatomy's - Hallucination Robustness in Medical Q&A
DieSab (diego-sabajo) · 2025-02-12T19:16:58.949Z · comments (0)
Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (1)
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros (ana-kapros) · 2025-02-12T19:12:07.592Z · comments (0)
The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (16)
[link] Inside the dark forests of the internet
Itay Dreyfus (itay-dreyfus) · 2025-02-12T10:20:59.426Z · comments (0)
[link] Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Matrice Jacobine · 2025-02-12T09:15:07.793Z · comments (28)
Why you maybe should lift weights, and How to.
samusasuke · 2025-02-12T05:15:32.011Z · comments (22)
[question] how do the CEOs respond to our concerns?
KvmanThinking (avery-liu) · 2025-02-11T23:39:27.654Z · answers+comments (3)
Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour · 2025-02-11T18:15:33.082Z · comments (0)
[link] AI Safety at the Frontier: Paper Highlights, January '25
gasteigerjo · 2025-02-11T16:14:16.972Z · comments (0)
If Neuroscientists Succeed
Mordechai Rorvig (mordechai-rorvig) · 2025-02-11T15:33:09.098Z · comments (6)
The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (14)
Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang (weibing-wang) · 2025-02-11T14:01:39.167Z · comments (0)
[link] What About The Horses?
Maxwell Tabarrok (maxwell-tabarrok) · 2025-02-11T13:59:36.913Z · comments (16)
On Deliberative Alignment
Zvi · 2025-02-11T13:00:07.683Z · comments (2)
Detecting AI Agent Failure Modes in Simulations
Michael Soareverix (michael-soareverix) · 2025-02-11T11:10:26.030Z · comments (0)
World Citizen Assembly about AI - Announcement
Camille Berger (Camille Berger) · 2025-02-11T10:51:56.948Z · comments (0)
[link] Visual Reference for Frontier Large Language Models
kenakofer · 2025-02-11T05:14:24.752Z · comments (0)
Rational Utopia & Multiversal AI Alignment: Steerable ASI for Ultimate Human Freedom
ank · 2025-02-11T03:21:40.899Z · comments (6)
Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester · 2025-02-11T03:04:58.852Z · comments (0)
[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)
Positive Directions
G Wood (geoffrey-wood) · 2025-02-11T00:00:11.426Z · comments (0)
Logical Correlation
niplav · 2025-02-10T23:29:10.518Z · comments (4)
Proof idea: SLT to AIT
Lucius Bushnaq (Lblack) · 2025-02-10T23:14:24.538Z · comments (6)
LW/ACX social meetup
Stefan (stefan-1) · 2025-02-10T21:12:39.092Z · comments (0)
[link] A Bearish Take on AI, as a Treat
rats (cartier-gucciscarf) · 2025-02-10T19:22:30.593Z · comments (0)
Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald (oliver-oswald) · 2025-02-10T19:19:36.233Z · comments (6)
[link] Claude is More Anxious than GPT; Personality is an axis of interpretability in language models
future_detective · 2025-02-10T19:19:28.005Z · comments (2)
Notes on Occam via Solomonoff vs. hierarchical Bayes
JesseClifton · 2025-02-10T17:55:14.689Z · comments (5)
Sleeping Beauty: an Accuracy-based Approach
glauberdebona · 2025-02-10T15:40:29.619Z · comments (2)
Political Idolatry
Arturo Macias (arturo-macias) · 2025-02-10T15:26:30.686Z · comments (5)
ML4Good Colombia - Applications Open to LatAm Participants
Alejandro Acelas (alejandro-acelas) · 2025-02-10T15:03:03.929Z · comments (0)
Nonpartisan AI safety
Yair Halberstadt (yair-halberstadt) · 2025-02-10T14:55:50.913Z · comments (4)
Opinion Article Scoring System
ciaran · 2025-02-10T14:32:19.030Z · comments (0)
Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (0)
next page (older posts) →