LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (47)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes (john-hughes) · 2025-04-08T17:32:55.315Z · comments (16)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (36)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (39)

Learned pain as a leading cause of chronic pain
SoerenMind · 2025-04-09T11:57:58.523Z · comments (13)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

AI-enabled coups: a small group could use AI to seize power
Tom Davidson (tom-davidson-1) · 2025-04-16T16:51:29.561Z · comments (14)

AI 2027: Responses
Zvi · 2025-04-08T12:50:02.197Z · comments (3)

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)

How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)

[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (28)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)

[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)

Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (7)

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)

[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (32)

On Google’s Safety Plan
Zvi · 2025-04-11T12:51:12.112Z · comments (6)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

OpenAI Responses API changes models' behavior
Jan Betley (jan-betley) · 2025-04-11T13:27:29.942Z · comments (6)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (10)

Reactions to METR task length paper are insane
Cole Wyeth (Amyr) · 2025-04-10T17:13:36.428Z · comments (41)

Four Types of Disagreement
silentbob · 2025-04-13T11:22:38.466Z · comments (2)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (6)

Vestigial reasoning in RL
Caleb Biddulph (caleb-biddulph) · 2025-04-13T15:40:11.954Z · comments (7)

A collection of approaches to confronting doom, and my thoughts on them
Ruby · 2025-04-06T02:11:31.271Z · comments (18)

Youth Lockout
Xavi CF (xavi-cf) · 2025-04-11T15:05:54.441Z · comments (6)

[link] College Advice For People Like Me
henryj · 2025-04-12T14:36:46.643Z · comments (5)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)

[link] American College Admissions Doesn't Need to Be So Competitive
Arjun Panickssery (arjun-panickssery) · 2025-04-07T17:35:26.791Z · comments (18)

Paper
dynomight · 2025-04-11T12:20:04.200Z · comments (12)

The first AI war will be in your computer
Viliam · 2025-04-08T09:28:53.191Z · comments (9)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs
denkenberger · 2025-04-16T21:47:40.687Z · comments (7)

[link] The case for AGI by 2030
Benjamin_Todd · 2025-04-09T20:35:55.167Z · comments (6)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (4)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (7)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (28)

[link] Existing Safety Frameworks Imply Unreasonable Confidence
Joe Rogero · 2025-04-10T16:31:50.240Z · comments (1)

[link] Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
elifland · 2025-04-10T23:10:23.063Z · comments (0)

Austin Chen on Winning, Risk-Taking, and FTX
Elizabeth (pktechgirl) · 2025-04-07T19:00:08.039Z · comments (3)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)

next page (older posts) →

Archive

Recent comments

niplav on shortplav

Hm, good point. I'll amend the previous post.

viliam on shortplav

When I think about a good business idea, but end up doing nothing, I often later find out that someone else did it.

viliam on Kamelo: A Rule-Based Constructed Language for Universal, Logical Communication

How would a language like this survive a change in ontology? You take a category and split it into 5 subcategories. What if two years later you find out that a sixth subcategory exists?

If you update the language, you would have to rewrite all existing texts. The problem would not be that they contain archaic words -- it would be that all the words are still used, but now they mean something different.

Seemingly similar words (prepending one syllable to a long word or a sentence) will result in a wildly different meaning.

shankar-sivarajan on AI-enabled coups: a small group could use AI to seize power

It's called "defensive democracy," and is standard practice in most of Europe.

jiro on A Dissent on Honesty

Advocating for more lying seems like especially bad advice to give to people with poor social skills, because they lack the skills to detect if they’re succeeding at learning how to lie or if they’re just burning what little social capital they have for no gain.

I think the advice works better as "if it's a social situation, and the situation calls for what you consider to be a lie, don't let that stop you." You do not have to tell someone that you're not feeling fine when they ask how you're doing. You do not need to tell them that actually the color they painted their house in is really ugly. And you certainly shouldn't go to a job interview, get asked for your biggest weakness, and actually state your biggest weakness.

If someone reads the advice and thinks "Lying, that's an idea! I'll use it every time I can" they've overcorrected by far too much.

viliam on Doing Prioritization Better

I think this article would be much better with many specific examples. (If that would make it too long, just split it into a series of articles.)

alphaandomega on The Russell Conjugation Illuminator

The input of 2k characters is rather limiting, albeit understandable. Giving these instructions to an existing LLM (I used Gemini 2.5 Pro) gives longer, better results without the need for a dedicated tool.

viliam on Gamify life from BayesianMind

I agree. Any punishment in a system has the side effect of punishing you for using the system.

The second suggestion is an interesting one. It would probably work better if you had an AI watching you constantly and summarizing your daily activities. If doing some seemingly unimportant X predictably makes you more likely to do some desirable Y later, you want to know about it. But if you write your diary manually, there is a chance that you won't notice X, or won't consider it important enough to mention.

hleumas on The Bell Curve of Bad Behavior

I wonder if this lurch happens at the two meter mark in countries that use the metric system?

No way. First, we do centimeters, so 195cm not 1.95m.

Second, 2m is crazy high. You pity people over 2m for their terrible life in a society that is not accustomed to that height, you don’t envy them.

jiro on How worker co-ops can help restore social trust

The alternative theory is that political bias has gotten much greater, and the acceptable political beliefs are strongly in the direction of trusting some groups and not trusting others. By that theory, progressive movements are trusted more because they have better press. Realizing that you can increase trust by creating worker co-ops would then be an example of Goodhart's Law--optimizing for "being trusted" independently of "being trustworthy" is not a worthy goal.