LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

System 2 Alignment
Seth Herd · 2025-02-13T19:17:56.868Z · comments (0)

Come join Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-15T22:10:02.166Z · comments (0)

Longtermist implications of aliens Space-Faring Civilizations - Introduction
Maxime Riché (maxime-riche) · 2025-02-21T12:08:42.403Z · comments (0)

The case for the death penalty
Yair Halberstadt (yair-halberstadt) · 2025-02-21T08:30:41.182Z · comments (32)

[link] When should we worry about AI power-seeking?
Joe Carlsmith (joekc) · 2025-02-19T19:44:25.062Z · comments (0)

Undergrad AI Safety Conference
JoNeedsSleep (joanna-j-1) · 2025-02-19T03:43:47.969Z · comments (0)

6 (Potential) Misconceptions about AI Intellectuals
ozziegooen · 2025-02-14T23:51:44.983Z · comments (11)

[link] Won't vs. Can't: Sandbagging-like Behavior from Claude Models
Joe Benton · 2025-02-19T20:47:06.792Z · comments (0)

Studies of Human Error Rate
tin482 · 2025-02-13T13:43:30.717Z · comments (3)

Literature Review of Text AutoEncoders
NickyP (Nicky) · 2025-02-19T21:54:14.905Z · comments (1)

[link] Ascetic hedonism
dkl9 · 2025-02-17T15:56:30.267Z · comments (9)

[link] Systematic Sandbagging Evaluations on Claude 3.5 Sonnet
farrelmahaztra · 2025-02-14T01:22:46.695Z · comments (0)

MAISU - Minimal AI Safety Unconference
Linda Linsefors · 2025-02-21T11:36:25.202Z · comments (0)

[link] The current AI strategic landscape: one bear's perspective
Matrice Jacobine · 2025-02-15T09:49:13.120Z · comments (0)

I'm making a ttrpg about life in an intentional community during the last year before the Singularity
bgaesop · 2025-02-13T21:54:09.002Z · comments (2)

Hopeful hypothesis, the Persona Jukebox.
Donald Hobson (donald-hobson) · 2025-02-14T19:24:35.514Z · comments (4)

Using Prompt Evaluation to Combat Bio-Weapon Research
Stuart_Armstrong · 2025-02-19T12:39:00.491Z · comments (1)

[link] US AI Safety Institute will be 'gutted,' Axios reports
Matrice Jacobine · 2025-02-20T14:40:13.049Z · comments (0)

Human-AI Relationality is Already Here
bridgebot (puppy) · 2025-02-20T07:08:22.420Z · comments (0)

[link] DeepSeek Made it Even Harder for US AI Companies to Ever Reach Profitability
garrison · 2025-02-19T21:02:42.879Z · comments (1)

[link] Published report: Pathways to short TAI timelines
Zershaaneh Qureshi (zershaaneh-qureshi) · 2025-02-20T22:10:12.276Z · comments (0)

[link] Metaculus Q4 AI Benchmarking: Bots Are Closing The Gap
Molly (hickman-santini) · 2025-02-19T22:42:39.055Z · comments (0)

Dovetail's agent foundations fellowship talks & discussion
Alex_Altair · 2025-02-13T00:49:48.854Z · comments (0)

[link] Introduction to Expected Value Fanaticism
Petra Kosonen · 2025-02-14T19:05:26.556Z · comments (8)

SWE Automation Is Coming: Consider Selling Your Crypto
A_donor · 2025-02-13T20:17:59.227Z · comments (8)

Call for Applications: XLab Summer Research Fellowship
JoNeedsSleep (joanna-j-1) · 2025-02-18T19:19:20.155Z · comments (0)

What makes a theory of intelligence useful?
Cole Wyeth (Amyr) · 2025-02-20T19:22:29.725Z · comments (0)

[link] Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2025-02-18T22:16:14.449Z · comments (2)

[link] Progress links and short notes, 2025-02-17
jasoncrawford · 2025-02-17T19:18:29.422Z · comments (0)

Talking to laymen about AI development
David Steel · 2025-02-17T18:42:23.289Z · comments (0)

[link] The Dilemma’s Dilemma
James Stephen Brown (james-brown) · 2025-02-19T23:50:47.485Z · comments (8)

[link] Cooperation for AI safety must transcend geopolitical interference
Matrice Jacobine · 2025-02-16T18:18:01.539Z · comments (6)

THE ARCHIVE
Jason Reid (jason-reid) · 2025-02-17T01:12:41.486Z · comments (0)

Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)

What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)

There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)

AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)

Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (0)

Make Superintelligence Loving
Davey Morse (davey-morse) · 2025-02-21T06:07:17.235Z · comments (0)

[link] Neural Scaling Laws Rooted in the Data Distribution
aribrill (Particleman) · 2025-02-20T21:22:10.306Z · comments (0)

[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)

Closed-ended questions aren't as hard as you think
electroswing · 2025-02-19T03:53:11.855Z · comments (0)

[link] Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
Lukas Petersson (lukas-petersson-1) · 2025-02-21T15:45:00.146Z · comments (0)

[link] Teaching AI to reason: this year's most important story
Benjamin_Todd · 2025-02-13T17:40:02.869Z · comments (0)

Safe Distillation With a Powerful Untrusted AI
Alek Westover (alek-westover) · 2025-02-20T03:14:04.893Z · comments (1)

Permanent properties of things are a self-fulfilling prophecy
YanLyutnev (YanLutnev) · 2025-02-19T00:08:20.776Z · comments (0)

Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)

[link] AISN #48: Utility Engineering and EnigmaEval
Corin Katzke (corin-katzke) · 2025-02-18T19:15:16.751Z · comments (0)

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (3)

A fable on AI x-risk
bgaesop · 2025-02-18T20:15:24.933Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

archimedes on The case for the death penalty

We would obviously have to significantly streamline the process, such that people are executed within 6 months of being caught or so.

This is one of the biggest hurdles, IMO. How do you significantly streamline the process without destroying due process? In the US, this would require a complete overhaul of the criminal justice system to be feasible.

lgs on How to Make Superbabies

Your OP is completely misleading if you're using plain GWAS!

GWAS is an association -- that's what the A stands for. Association is not causation. Anything that correlates with IQ (eg melanin) can show up in a GWAS for IQ. You're gonna end up editing embryos to have lower melanin and claiming their IQ is 150

thane-ruthenis on Vladimir_Nesov's Shortform

So the rumors about GPT-5 in late May 2025 either represent change in the naming convention

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

I think he's pretty plainly saying that this "GPT-5" will be a completely different thing from a 100x'd GPT-4.

archimedes on How might we safely pass the buck to AI?

I think the misunderstanding came from Eliezer's reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That's where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.

Habryka's analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.

james-camacho on The case for the death penalty

So, you're making two rather large claims here that I don't agree with.

When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems--dictatorship, corruption, punishment that varies between social classes, lack of due process, etc.

This seems more a quirk of scarcity than due to having a bad justice system. Historically, it wasn't just the tryannical, corrupt governments that punished people with mutlation, it was every civilization on the planet! I think it's due to a combination of (1) hardly having enough food and shelter for the general populace, let alone resources for criminals, and (2) a lower-information, lower-trust society where there's no way to check for a prior criminal history, or prevent them from committing more crimes after they leave jail. Chopping off a hand or branding them was a cheap way to dole out punishment and warn others to be extra cautious in their vicinity.

Actual humans aren't capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.

Obviously it isn't possible for imperfectly rational agents to be perfectly fair, but I don't see why you're applying this only to a mutalitive justice system. This is true of our current justice system or when you buy groceries at the store. The issue isn't making mistakes, the issue is the frequency of mistakes. They create an entropic force that pushes you out of good equilibriums, which is why it's good to have systems that fail gracefully.

I don't see what problems mutilative justice would have over incarcerative. We could have the exact same court procedures, just change the law on the books from 3–5 years to 3–5 fingers. Is the issue that bodily disfigurement is more visible than incarceration? People would have to actually see how they're ruining other people's lives in retribution? Or are you just stating, without any justification, that when we move from incarceration to mutilation, our judges, jurors, and lawyers will suddenly become wholly irrational beings? That it's just "human nature"? To put it in your words: that opinion is bizarre.

archimedes on How might we safely pass the buck to AI?

Using something as a validation metric to iterate methods doesn’t cause overfitting at anything like the level of directly training on it.

Validation is certainly less efficient at overfitting but it seems a bit like using an evolutionary algorithm rather than gradient descent. You aren't directly optimizing according to the local gradient, but that doesn't necessarily mean you'll avoid Goodharting--just that you're less likely to immediately fall into a bad local optimum.

The likelihood of preventing Goodharting feels like it depends heavily on assumptions about the search space. The "validation" filters the search space to areas where scheming isn't easily detectable, but what portion of this space is safe (and how can we tell)? We don't actually have a true globally accurate validator oracle--just a weak approximation of one.

stephen-fowler on Thermodynamic entropy = Kolmogorov complexity

These recordings I watched were actually from 2022.

shankar-sivarajan on The case for the death penalty

Do you have an example in mind of a legal system that doesn't have "corruption, punishment that varies between social classes, lack of due process, etc."?

danielechlin on On Overconfidence

I know I'm writing in 2025 but this is the first Codex piece I didn't like. People don't know about or like AI experts so they ignore them like all us rationalists ignore astrology experts. There's no fallacy. There's a crisis in expert trust, let's not try to conflate that with people's inability to distinguish between 1% and 5% chances.

jiro on The case for the death penalty

I would agree that eight years of imprisonment can be as bad or worse as mutilation. But the problem is that punishing people by mutilation has different incentives than punishing them with jail--at least among actual human punishers. When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems--dictatorship, corruption, punishment that varies between social classes, lack of due process, etc. Actual humans aren't capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.