LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (10)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

Implications of the inference scaling paradigm for AI safety
Ryan Kidd (ryankidd44) · 2025-01-14T02:14:53.562Z · comments (69)

Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (54)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

[link] On Eating the Sun
jessicata (jessica.liu.taylor) · 2025-01-08T04:57:20.457Z · comments (93)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

Agent Foundations 2025 at CMU
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-19T23:48:22.569Z · comments (10)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

[link] Five Recent AI Tutoring Studies
Arjun Panickssery (arjun-panickssery) · 2025-01-19T03:53:47.714Z · comments (0)

Scaling Sparse Feature Circuit Finding to Gemma 9B
Diego Caples (diego-caples) · 2025-01-10T11:08:11.999Z · comments (10)

Circling as practice for “just be yourself”
Kaj_Sotala · 2024-12-16T07:40:04.482Z · comments (5)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (2)

Voting Results for the 2023 Review
Raemon · 2025-02-06T08:00:37.461Z · comments (3)

[link] The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Corin Katzke (corin-katzke) · 2025-01-21T16:57:00.998Z · comments (11)

The Risk of Gradual Disempowerment from AI
Zvi · 2025-02-05T22:10:06.979Z · comments (16)

Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Remap your caps lock key
bilalchughtai (beelal) · 2024-12-15T14:03:33.623Z · comments (18)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (41)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (39)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

[link] SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can (Can Rager) · 2024-12-11T06:30:37.076Z · comments (6)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (54)

Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)

Effective Evil's AI Misalignment Plan
lsusr · 2024-12-15T07:39:34.046Z · comments (9)

On the OpenAI Economic Blueprint
Zvi · 2025-01-15T14:30:06.773Z · comments (2)

Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (12)

Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (10)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

I'm offering free math consultations!
Gurkenglas · 2025-01-14T16:30:40.115Z · comments (6)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (0)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (21)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

[link] Best-of-N Jailbreaking
John Hughes (john-hughes) · 2024-12-14T04:58:48.974Z · comments (5)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (21)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (25)

[link] Wired on: "DOGE personnel with admin access to Federal Payment System"
Raemon · 2025-02-05T21:32:11.205Z · comments (45)

Human study on AI spear phishing campaigns
Simon Lermen (dalasnoin) · 2025-01-03T15:11:14.765Z · comments (8)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (15)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

archimedes on The case for the death penalty

We would obviously have to significantly streamline the process, such that people are executed within 6 months of being caught or so.

This is one of the biggest hurdles, IMO. How do you significantly streamline the process without destroying due process? In the US, this would require a complete overhaul of the criminal justice system to be feasible.

lgs on How to Make Superbabies

Your OP is completely misleading if you're using plain GWAS!

GWAS is an association -- that's what the A stands for. Association is not causation. Anything that correlates with IQ (eg melanin) can show up in a GWAS for IQ. You're gonna end up editing embryos to have lower melanin and claiming their IQ is 150

thane-ruthenis on Vladimir_Nesov's Shortform

So the rumors about GPT-5 in late May 2025 either represent change in the naming convention

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

I think he's pretty plainly saying that this "GPT-5" will be a completely different thing from a 100x'd GPT-4.

archimedes on How might we safely pass the buck to AI?

I think the misunderstanding came from Eliezer's reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That's where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.

Habryka's analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.

james-camacho on The case for the death penalty

So, you're making two rather large claims here that I don't agree with.

When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems--dictatorship, corruption, punishment that varies between social classes, lack of due process, etc.

This seems more a quirk of scarcity than due to having a bad justice system. Historically, it wasn't just the tryannical, corrupt governments that punished people with mutlation, it was every civilization on the planet! I think it's due to a combination of (1) hardly having enough food and shelter for the general populace, let alone resources for criminals, and (2) a lower-information, lower-trust society where there's no way to check for a prior criminal history, or prevent them from committing more crimes after they leave jail. Chopping off a hand or branding them was a cheap way to dole out punishment and warn others to be extra cautious in their vicinity.

Actual humans aren't capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.

Obviously it isn't possible for imperfectly rational agents to be perfectly fair, but I don't see why you're applying this only to a mutalitive justice system. This is true of our current justice system or when you buy groceries at the store. The issue isn't making mistakes, the issue is the frequency of mistakes. They create an entropic force that pushes you out of good equilibriums, which is why it's good to have systems that fail gracefully.

I don't see what problems mutilative justice would have over incarcerative. We could have the exact same court procedures, just change the law on the books from 3–5 years to 3–5 fingers. Is the issue that bodily disfigurement is more visible than incarceration? People would have to actually see how they're ruining other people's lives in retribution? Or are you just stating, without any justification, that when we move from incarceration to mutilation, our judges, jurors, and lawyers will suddenly become wholly irrational beings? That it's just "human nature"? To put it in your words: that opinion is bizarre.

archimedes on How might we safely pass the buck to AI?

Using something as a validation metric to iterate methods doesn’t cause overfitting at anything like the level of directly training on it.

Validation is certainly less efficient at overfitting but it seems a bit like using an evolutionary algorithm rather than gradient descent. You aren't directly optimizing according to the local gradient, but that doesn't necessarily mean you'll avoid Goodharting--just that you're less likely to immediately fall into a bad local optimum.

The likelihood of preventing Goodharting feels like it depends heavily on assumptions about the search space. The "validation" filters the search space to areas where scheming isn't easily detectable, but what portion of this space is safe (and how can we tell)? We don't actually have a true globally accurate validator oracle--just a weak approximation of one.

stephen-fowler on Thermodynamic entropy = Kolmogorov complexity

These recordings I watched were actually from 2022.

shankar-sivarajan on The case for the death penalty

Do you have an example in mind of a legal system that doesn't have "corruption, punishment that varies between social classes, lack of due process, etc."?

danielechlin on On Overconfidence

I know I'm writing in 2025 but this is the first Codex piece I didn't like. People don't know about or like AI experts so they ignore them like all us rationalists ignore astrology experts. There's no fallacy. There's a crisis in expert trust, let's not try to conflate that with people's inability to distinguish between 1% and 5% chances.

jiro on The case for the death penalty

I would agree that eight years of imprisonment can be as bad or worse as mutilation. But the problem is that punishing people by mutilation has different incentives than punishing them with jail--at least among actual human punishers. When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems--dictatorship, corruption, punishment that varies between social classes, lack of due process, etc. Actual humans aren't capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.