LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2025-02-18T22:16:14.449Z · comments (2)

If Neuroscientists Succeed
Mordechai Rorvig (mordechai-rorvig) · 2025-02-11T15:33:09.098Z · comments (6)

What makes a theory of intelligence useful?
Cole Wyeth (Amyr) · 2025-02-20T19:22:29.725Z · comments (0)

Talking to laymen about AI development
David Steel · 2025-02-17T18:42:23.289Z · comments (0)

[link] Progress links and short notes, 2025-02-17
jasoncrawford · 2025-02-17T19:18:29.422Z · comments (0)

[link] Cooperation for AI safety must transcend geopolitical interference
Matrice Jacobine · 2025-02-16T18:18:01.539Z · comments (6)

[link] The Dilemma’s Dilemma
James Stephen Brown (james-brown) · 2025-02-19T23:50:47.485Z · comments (8)

THE ARCHIVE
Jason Reid (jason-reid) · 2025-02-17T01:12:41.486Z · comments (0)

Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros (ana-kapros) · 2025-02-12T19:12:07.592Z · comments (0)

Sleeping Beauty: an Accuracy-based Approach
glauberdebona · 2025-02-10T15:40:29.619Z · comments (2)

What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)

Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (0)

There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)

AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, January '25
gasteigerjo · 2025-02-11T16:14:16.972Z · comments (0)

Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald (oliver-oswald) · 2025-02-10T19:19:36.233Z · comments (7)

Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)

[question] Should I Divest from AI?
OKlogic · 2025-02-10T03:29:33.582Z · answers+comments (4)

Make Superintelligence Loving
Davey Morse (davey-morse) · 2025-02-21T06:07:17.235Z · comments (0)

[link] Teaching AI to reason: this year's most important story
Benjamin_Todd · 2025-02-13T17:40:02.869Z · comments (0)

Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (4)

[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)

Closed-ended questions aren't as hard as you think
electroswing · 2025-02-19T03:53:11.855Z · comments (0)

Safe Distillation With a Powerful Untrusted AI
Alek Westover (alek-westover) · 2025-02-20T03:14:04.893Z · comments (1)

[link] Neural Scaling Laws Rooted in the Data Distribution
aribrill (Particleman) · 2025-02-20T21:22:10.306Z · comments (0)

[link] Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
Lukas Petersson (lukas-petersson-1) · 2025-02-21T15:45:00.146Z · comments (0)

Build a Metaculus Forecasting Bot in 30 Minutes: A Practical Guide
ChristianWilliams · 2025-02-22T03:52:14.753Z · comments (0)

A fable on AI x-risk
bgaesop · 2025-02-18T20:15:24.933Z · comments (0)

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (3)

Permanent properties of things are a self-fulfilling prophecy
YanLyutnev (YanLutnev) · 2025-02-19T00:08:20.776Z · comments (0)

ML4Good Colombia - Applications Open to LatAm Participants
Alejandro Acelas (alejandro-acelas) · 2025-02-10T15:03:03.929Z · comments (0)

[link] Demonstrating specification gaming in reasoning models
Matrice Jacobine · 2025-02-20T19:26:20.563Z · comments (0)

[link] AISN #48: Utility Engineering and EnigmaEval
Corin Katzke (corin-katzke) · 2025-02-18T19:15:16.751Z · comments (0)

Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)

Response to the US Govt's Request for Information Concerning Its AI Action Plan
Davey Morse (davey-morse) · 2025-02-14T06:14:08.673Z · comments (0)

Inefficiencies in Pharmaceutical Research Practices
ErioirE (erioire) · 2025-02-22T04:43:09.147Z · comments (0)

Undesirable Conclusions and Origin Adjustment
Jerdle (daniel-amdurer) · 2025-02-19T18:35:23.732Z · comments (0)

Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour · 2025-02-11T18:15:33.082Z · comments (0)

Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang (weibing-wang) · 2025-02-11T14:01:39.167Z · comments (0)

Sparse Autoencoder Feature Ablation for Unlearning
aludert · 2025-02-13T19:13:48.388Z · comments (0)

LW/ACX social meetup
Stefan (stefan-1) · 2025-02-10T21:12:39.092Z · comments (0)

Artificial Static Place Intelligence: Guaranteed Alignment
ank · 2025-02-15T11:08:50.226Z · comments (2)

Intrinsic Dimension of Prompts in LLMs
Karthik Viswanathan (vkarthik095) · 2025-02-14T19:02:49.464Z · comments (0)

Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester · 2025-02-11T03:04:58.852Z · comments (0)

arch-anarchist reading list
Peter lawless · 2025-02-16T22:47:00.273Z · comments (1)

[link] Probability of AI-Caused Disaster
Alvin Ånestrand (alvin-anestrand) · 2025-02-12T19:40:11.121Z · comments (2)

Fun, endless art debates v. morally charged art debates that are intrinsically endless
danielechlin · 2025-02-21T04:44:22.712Z · comments (0)

[link] New LLM Scaling Law
wrmedford · 2025-02-19T20:21:17.475Z · comments (0)

[question] Why do we have the NATO logo?
KvmanThinking (avery-liu) · 2025-02-19T22:59:41.755Z · answers+comments (4)

Quantifying the Qualitative: Towards a Bayesian Approach to Personal Insight
Pruthvi Kumar (pruthvi-kumar) · 2025-02-15T19:50:42.550Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

archimedes on The case for the death penalty

We would obviously have to significantly streamline the process, such that people are executed within 6 months of being caught or so.

This is one of the biggest hurdles, IMO. How do you significantly streamline the process without destroying due process? In the US, this would require a complete overhaul of the criminal justice system to be feasible.

lgs on How to Make Superbabies

Your OP is completely misleading if you're using plain GWAS!

GWAS is an association -- that's what the A stands for. Association is not causation. Anything that correlates with IQ (eg melanin) can show up in a GWAS for IQ. You're gonna end up editing embryos to have lower melanin and claiming their IQ is 150

thane-ruthenis on Vladimir_Nesov's Shortform

So the rumors about GPT-5 in late May 2025 either represent change in the naming convention

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

I think he's pretty plainly saying that this "GPT-5" will be a completely different thing from a 100x'd GPT-4.

archimedes on How might we safely pass the buck to AI?

I think the misunderstanding came from Eliezer's reference to a perpetual motion machine. The point was that people suggesting how to build them often have complicated schemes that tend to not adequately address the central difficulty of creating one. That's where the analogy ends. From thermodynamics, we have strong reasons to believe such a thing is not just difficult but impossible whereas we have no corresponding theory to rule out verifiably safe AI.

Habryka's analogy to nuclear reactor plans is similar except we know that building one of those is difficult but actually possible.

james-camacho on The case for the death penalty

So, you're making two rather large claims here that I don't agree with.

When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems--dictatorship, corruption, punishment that varies between social classes, lack of due process, etc.

This seems more a quirk of scarcity than due to having a bad justice system. Historically, it wasn't just the tryannical, corrupt governments that punished people with mutlation, it was every civilization on the planet! I think it's due to a combination of (1) hardly having enough food and shelter for the general populace, let alone resources for criminals, and (2) a lower-information, lower-trust society where there's no way to check for a prior criminal history, or prevent them from committing more crimes after they leave jail. Chopping off a hand or branding them was a cheap way to dole out punishment and warn others to be extra cautious in their vicinity.

Actual humans aren't capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.

Obviously it isn't possible for imperfectly rational agents to be perfectly fair, but I don't see why you're applying this only to a mutalitive justice system. This is true of our current justice system or when you buy groceries at the store. The issue isn't making mistakes, the issue is the frequency of mistakes. They create an entropic force that pushes you out of good equilibriums, which is why it's good to have systems that fail gracefully.

I don't see what problems mutilative justice would have over incarcerative. We could have the exact same court procedures, just change the law on the books from 3–5 years to 3–5 fingers. Is the issue that bodily disfigurement is more visible than incarceration? People would have to actually see how they're ruining other people's lives in retribution? Or are you just stating, without any justification, that when we move from incarceration to mutilation, our judges, jurors, and lawyers will suddenly become wholly irrational beings? That it's just "human nature"? To put it in your words: that opinion is bizarre.

archimedes on How might we safely pass the buck to AI?

Using something as a validation metric to iterate methods doesn’t cause overfitting at anything like the level of directly training on it.

Validation is certainly less efficient at overfitting but it seems a bit like using an evolutionary algorithm rather than gradient descent. You aren't directly optimizing according to the local gradient, but that doesn't necessarily mean you'll avoid Goodharting--just that you're less likely to immediately fall into a bad local optimum.

The likelihood of preventing Goodharting feels like it depends heavily on assumptions about the search space. The "validation" filters the search space to areas where scheming isn't easily detectable, but what portion of this space is safe (and how can we tell)? We don't actually have a true globally accurate validator oracle--just a weak approximation of one.

stephen-fowler on Thermodynamic entropy = Kolmogorov complexity

These recordings I watched were actually from 2022.

shankar-sivarajan on The case for the death penalty

Do you have an example in mind of a legal system that doesn't have "corruption, punishment that varies between social classes, lack of due process, etc."?

danielechlin on On Overconfidence

I know I'm writing in 2025 but this is the first Codex piece I didn't like. People don't know about or like AI experts so they ignore them like all us rationalists ignore astrology experts. There's no fallacy. There's a crisis in expert trust, let's not try to conflate that with people's inability to distinguish between 1% and 5% chances.

jiro on The case for the death penalty

I would agree that eight years of imprisonment can be as bad or worse as mutilation. But the problem is that punishing people by mutilation has different incentives than punishing them with jail--at least among actual human punishers. When you look at the history of societies that punish people by mutilation, you find that mutilation goes hand in hand (no pun intended) with bad justice systems--dictatorship, corruption, punishment that varies between social classes, lack of due process, etc. Actual humans aren't capable of implementing a justice system which punishes by mutilation but does so in a way that you could argue is fair.