LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Two flaws in the Machiavelli Benchmark
TheManxLoiner · 2025-02-12T19:34:35.241Z · comments (0)
[link] Notes on the Presidential Election of 1836
Arjun Panickssery (arjun-panickssery) · 2025-02-13T23:40:23.224Z · comments (0)
[link] The Peeperi (unfinished) - By Katja Grace
Nathan Young · 2025-02-17T19:33:29.894Z · comments (0)
MATS Spring 2024 Extension Retrospective
HenningB (HenningBlue) · 2025-02-12T22:43:58.193Z · comments (0)
System 2 Alignment
Seth Herd · 2025-02-13T19:17:56.868Z · comments (0)
[question] What are the surviving worlds like?
KvmanThinking (avery-liu) · 2025-02-17T00:41:49.810Z · answers+comments (1)
Come join Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-15T22:10:02.166Z · comments (0)
Moral Hazard in Democratic Voting
lsusr · 2025-02-12T23:17:39.355Z · comments (8)
Longtermist implications of aliens Space-Faring Civilizations - Introduction
Maxime Riché (maxime-riche) · 2025-02-21T12:08:42.403Z · comments (0)
Undergrad AI Safety Conference
JoNeedsSleep (joanna-j-1) · 2025-02-19T03:43:47.969Z · comments (0)
[link] When should we worry about AI power-seeking?
Joe Carlsmith (joekc) · 2025-02-19T19:44:25.062Z · comments (0)
6 (Potential) Misconceptions about AI Intellectuals
ozziegooen · 2025-02-14T23:51:44.983Z · comments (11)
Studies of Human Error Rate
tin482 · 2025-02-13T13:43:30.717Z · comments (3)
[link] Ascetic hedonism
dkl9 · 2025-02-17T15:56:30.267Z · comments (9)
Literature Review of Text AutoEncoders
NickyP (Nicky) · 2025-02-19T21:54:14.905Z · comments (1)
[link] Systematic Sandbagging Evaluations on Claude 3.5 Sonnet
farrelmahaztra · 2025-02-14T01:22:46.695Z · comments (0)
The Takeoff Speeds Model Predicts We May Be Entering Crunch Time
johncrox · 2025-02-21T02:26:31.768Z · comments (0)
MAISU - Minimal AI Safety Unconference
Linda Linsefors · 2025-02-21T11:36:25.202Z · comments (0)
I'm making a ttrpg about life in an intentional community during the last year before the Singularity
bgaesop · 2025-02-13T21:54:09.002Z · comments (2)
Hopeful hypothesis, the Persona Jukebox.
Donald Hobson (donald-hobson) · 2025-02-14T19:24:35.514Z · comments (4)
Using Prompt Evaluation to Combat Bio-Weapon Research
Stuart_Armstrong · 2025-02-19T12:39:00.491Z · comments (0)
[link] US AI Safety Institute will be 'gutted,' Axios reports
Matrice Jacobine · 2025-02-20T14:40:13.049Z · comments (0)
[link] The current AI strategic landscape: one bear's perspective
Matrice Jacobine · 2025-02-15T09:49:13.120Z · comments (0)
[link] Inside the dark forests of the internet
Itay Dreyfus (itay-dreyfus) · 2025-02-12T10:20:59.426Z · comments (0)
Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-13T00:49:48.854Z · comments (0)
[link] DeepSeek Made it Even Harder for US AI Companies to Ever Reach Profitability
garrison · 2025-02-19T21:02:42.879Z · comments (1)
[link] Metaculus Q4 AI Benchmarking: Bots Are Closing The Gap
Molly (hickman-santini) · 2025-02-19T22:42:39.055Z · comments (0)
Human-AI Relationality is Already Here
bridgebot (puppy) · 2025-02-20T07:08:22.420Z · comments (0)
[link] Published report: Pathways to short TAI timelines
Zershaaneh Qureshi (zershaaneh-qureshi) · 2025-02-20T22:10:12.276Z · comments (0)
SWE Automation Is Coming: Consider Selling Your Crypto
A_donor · 2025-02-13T20:17:59.227Z · comments (8)
[link] Introduction to Expected Value Fanaticism
Petra Kosonen · 2025-02-14T19:05:26.556Z · comments (8)
Call for Applications: XLab Summer Research Fellowship
JoNeedsSleep (joanna-j-1) · 2025-02-18T19:19:20.155Z · comments (0)
Talking to laymen about AI development
David Steel · 2025-02-17T18:42:23.289Z · comments (0)
What makes a theory of intelligence useful?
Cole Wyeth (Amyr) · 2025-02-20T19:22:29.725Z · comments (0)
[link] Won't vs. Can't: Sandbagging-like Behavior from Claude Models
Joe Benton · 2025-02-19T20:47:06.792Z · comments (0)
[link] Progress links and short notes, 2025-02-17
jasoncrawford · 2025-02-17T19:18:29.422Z · comments (0)
[link] Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2025-02-18T22:16:14.449Z · comments (2)
THE ARCHIVE
Jason Reid (jason-reid) · 2025-02-17T01:12:41.486Z · comments (0)
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros (ana-kapros) · 2025-02-12T19:12:07.592Z · comments (0)
[link] Cooperation for AI safety must transcend geopolitical interference
Matrice Jacobine · 2025-02-16T18:18:01.539Z · comments (6)
[link] The Dilemma’s Dilemma
James Stephen Brown (james-brown) · 2025-02-19T23:50:47.485Z · comments (8)
What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)
AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)
Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)
Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (0)
There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)
[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)
Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (4)
[link] Teaching AI to reason: this year's most important story
Benjamin_Todd · 2025-02-13T17:40:02.869Z · comments (0)
Make Superintelligence Loving
Davey Morse (davey-morse) · 2025-02-21T06:07:17.235Z · comments (0)
← previous page (newer posts) · next page (older posts) →