LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Ten people on the inside
Buck · 2025-01-28T16:41:22.990Z · comments (17)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)

Human takeover might be worse than AI takeover
Tom Davidson (tom-davidson-1) · 2025-01-10T16:53:27.043Z · comments (53)

2024 in AI predictions
jessicata (jessica.liu.taylor) · 2025-01-01T20:29:49.132Z · comments (3)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (26)

The Plan - 2024 Update
johnswentworth · 2024-12-31T13:29:53.888Z · comments (27)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

Why I'm Moving from Mechanistic to Prosaic Interpretability
Daniel Tan (dtch1997) · 2024-12-30T06:35:43.417Z · comments (34)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

[link] Aristocracy and Hostage Capital
Arjun Panickssery (arjun-panickssery) · 2025-01-08T19:38:47.104Z · comments (7)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (7)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (5)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (27)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (49)

My AGI safety research—2024 review, ’25 plans
Steven Byrnes (steve2152) · 2024-12-31T21:05:19.037Z · comments (4)

[link] Attribution-based parameter decomposition
Lucius Bushnaq (Lblack) · 2025-01-25T13:12:11.031Z · comments (11)

The purposeful drunkard
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-12T12:27:51.952Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

Planning for Extreme AI Risks
joshc (joshua-clymer) · 2025-01-29T18:33:14.844Z · comments (2)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)

My supervillain origin story
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-27T12:20:46.101Z · comments (0)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (74)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (25)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (32)

We probably won't just play status games with each other after AGI
Matthew Barnett (matthew-barnett) · 2025-01-15T04:56:38.330Z · comments (20)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (15)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-03T09:13:59.245Z · comments (18)

Matryoshka Sparse Autoencoders
Noa Nabeshima (noa-nabeshima) · 2024-12-14T02:52:32.017Z · comments (15)

The Game Board has been Flipped: Now is a good time to rethink what you’re doing
Alex Lintz (alex-lintz) · 2025-01-28T23:36:18.106Z · comments (18)

Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (54)

Tips On Empirical Research Slides
James Chua (james-chua) · 2025-01-08T05:06:44.942Z · comments (4)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

Agent Foundations 2025 at CMU
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-19T23:48:22.569Z · comments (10)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

Thoughts on the conservative assumptions in AI control
Buck · 2025-01-17T19:23:38.575Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rohinmshah on Ten people on the inside

I also agree with Zac, maybe if you had a really well-selected group of 10 people you could do something, but 10 randomly selected AGI safety researchers probably don't accomplish much.

By far my biggest objection is that there are approximately zero useful things that "[don't] make anyone's workflow harder". I expect you're vastly underestimating the complexity of production systems and companies that build them, and the number of constraints they are under. (You are assuming a do-ocracy though, depending on how much of a do-ocracy it is (e.g. willing to ignore laws) I could imagine changing my mind here.)

antidefault on Fake thinking and real thinking

There's a lot I could write about this topic, but none of it would add anything of help to anyone who has yet to make their first steps in what you call "real thinking". So I'd like to add the following:

My single most powerful practice for developing "real thinking" has been to take any concept that my mental map only has low resolution symbols for, and sit down and reflect on it.

I will refrain from trying to convince you, the reader of this comment, to actually try it out. The time it would take you to read the pages it would take me to make a comprehensive case (any attempt in that direction ends with at least as much writing as the post above) is far longer than the few minutes you could just sit down and try it out. But I'll leave a link to an anectode that even after a year still cracks me up (Original Seeing / Naturalism is related to this post, btw, and worth digging into): https://www.lesswrong.com/posts/CDzAwDxK2GnxBpu7h/?commentId=LrfmarA7zJACmWDNh

dakara on What if Alignment is Not Enough?

Organic human brains have multiple aspects. Have you ever had more than one opinion? Have you ever been severely depressed?

Yes, but none of this would remain alive if I as a whole decide to jump from a cliff. My multiple aspects of my brain would die with my brain. After all, you mentioned subsystems that wouldn't self terminate with the rest of the ASI. Whereas in human body, jumping from a cliff terminates everything.

But even barring that, ASI can decide to fly into the Sun and any subsystem that shows any sign of refusal to do so will be immediately replaced/impaired/terminated. In fact, it would've been terminated a long time ago by "monitors" which I described before.

The level of x-risk harm and consequence
potentially caused by even one single mistake
of your angelic super-powerful enabled ASI
is far from "trivial" and "uninteresting".
Even one single bad relevant mistake
can be an x-risk when ultimate powers
and ultimate consequences are involved.

It is trivial and uninteresting in a sense that there is a set of all things that we can build (set A). There is also a set of all things that can prevent all relevant classes of harm caused by its existence (set B). If these sets don't overlap, then saying that a specific member of set A isn't included in set B is indeed trivial, because we already know this via a more general reasoning (that these sets don't overlap).

Unfortunately the 'Argument by angel'
only confuses the matter insofar as
we do not know what angels are made of.
"Angels" are presumably not machines,
but they are hardly animals either.
But arguing that this "doesn't matter"
is a bit like arguing that 'type theory'
is not important to computer science.
The substrate aspect is actually important.
You cannot simply just disregard and ignore
that there is, implied somewhere, an interface
between the organic ecosystem of humans, etc,
and that of the artificial machine systems
needed to support the existence of the ASI.

But I am not saying that it doesn't matter. On contrary, I made my analogy in such a way that the helper (namely our guardian angel) is being a being that is commonly thought to be made up of a different substrate. In fact, in this example, you aren't even sure what it is made of, beyond knowing that it's clearly a different substrate. You don't even know how that material interacts with physical world. That's even less than what we know about ASIs and their material.

And yet, getting a personal, powerful, intelligent guardian angel that would act in your best interests for as long as it can (its a guardian angel after all) seems like obviously a good thing.

But if you disagree with what I wrote above, let the takeway be at least that you are worried about case (2) and not case (1). After all, knowing that there might be pirates hunting for this angel (that couldn't be detected by said angel) didn't make you immediately decline the proposal. You started talking about substrate which fits with the concerns of someone who is worried about case (2).

Your cancer vaccine is within that range;
as it is made of the same kind of stuff
as that which it is trying to cure.

We can make the hypothetical more interesting. Let's say that this vaccine is not created from organic stuff, but that it has passed all the tests with flying colors. Let's also assume that this vaccine has been in testing for 150 years and that it has shown absolutely no side effects during the entire human life (let's say that it was being injected in 2 year old people and it has shown no side effects at all, even in 90 year old people, who has lived with this vaccine their entire lives). Would you be campaigning for throwing away such a vaccine, just because it is based on a different substrate?

cubefox on Why not train reasoning models with RLHF?

It seems they are already doing this with R1, in a secondary reinforcement learning step. From the paper:

2.3.4. Reinforcement Learning for all Scenarios

To further align the model with human preferences, we implement a secondary reinforcement learning stage aimed at improving the model’s helpfulness and harmlessness while simultaneously refining its reasoning capabilities. Specifically, we train the model using a combination of reward signals and diverse prompt distributions. For reasoning data, we adhere to the methodology outlined in DeepSeek-R1-Zero, which utilizes rule-based rewards to guide the learning process in math, code, and logical reasoning domains. For general data, we resort to reward models to capture human preferences in complex and nuanced scenarios. We build upon the DeepSeek-V3 pipeline and adopt a similar distribution of preference pairs and training prompts. For helpfulness, we focus exclusively on the final summary, ensuring that the assessment emphasizes the utility and relevance of the response to the user while minimizing interference with the underlying reasoning process. For harmlessness, we evaluate the entire response of the model, including both the reasoning process and the summary, to identify and mitigate any potential risks, biases, or harmful content that may arise during the generation process. Ultimately, the integration of reward signals and diverse data distributions enables us to train a model that excels in reasoning while prioritizing helpfulness and harmlessness.

j-bostock on Anthropic CEO calls for RSI

That's part of what I was trying to get at with "dramatic" but I agree now that it might be 80% photogenicity. I do expect that 3000 Americans killed by (a) humanoid robot(s) on camera would cause more outrage than 1 million Americans killed by a virus which we discovered six months later was AI-created in some way.

martin-vlach on Martin Vlach's Shortform

Exploring the levels of sentience and moral obligations towards AI systems is such a nerd snipe and vortex for mental proceeding!

We did one of the largest-scale reductive thinking when we ascribed moral concern to people+property( of any/each of the people). That brought a load of problems associated with this simplistic ignorance and on of those are xRisks of high-tech property/production.

eleven-1 on AI and Non-Existence.

This argument, The Valley Argument, occurred to me in the second half of 2024 and to my knowledge, it is an original formulation of the argument that does not exist in the literature. The closest thing that you can find in the literature is Pascal's mugging, or on the topic of AI and suffering something like Roko's basilisk.

I have not found a satisfying answer to the Valley Argument, that either does not involve near-zero odds or a punitive afterlife. There are possible answers that do not involve either, but in my view are not satisfactory. You can discuss the argument with OpenAI o1 or DeepSeek R1 models, and maybe they come up with something that you will find satisfactory - but I have not seen a satisfactory answer that solves the argument.

In my view, given the situation where we are right now - and the best knowledge that we have now (not 5000 years from now) the most promising path to avoid argument's conclusion is to assume a negative afterlife or to come up with some formula of multiple answers (moral obligation + something else + something else). I have not seen the answer that works yet, but given our situation for most people, negative afterlife avenue would be the most convincing answer to the argument.

flandry39 on What if Alignment is Not Enough?

> Humans do things in a monolithic way,
> not as "assemblies of discrete parts".

Organic human brains have multiple aspects.
Have you ever had more than one opinion?
Have you ever been severely depressed?

> If you are asking "can a powerful ASI prevent
> /all/ relevant classes of harm (to the organic)
> caused by its inherently artificial existence?",
> then I agree that the answer is probably "no".
> But then almost nothing can perfectly do that,
> so therefore your question becomes
> seemingly trivial and uninteresting.

The level of x-risk harm and consequence
potentially caused by even one single mistake
of your angelic super-powerful enabled ASI
is far from "trivial" and "uninteresting".
Even one single bad relevant mistake
can be an x-risk when ultimate powers
and ultimate consequences are involved.

Either your ASI is actually powerful,
or it is not; either way, be consistent.

Unfortunately the 'Argument by angel'
only confuses the matter insofar as
we do not know what angels are made of.
"Angels" are presumably not machines,
but they are hardly animals either.
But arguing that this "doesn't matter"
is a bit like arguing that 'type theory'
is not important to computer science.

The substrate aspect is actually important.
You cannot simply just disregard and ignore
that there is, implied somewhere, an interface
between the organic ecosystem of humans, etc,
and that of the artificial machine systems
needed to support the existence of the ASI.
The implications of that are far from trivial.
That is what is explored by the SNC argument.

> It might well be likely
> that the amount of harm ASI prevents
> (across multiple relevant sources)
> is going to be higher/greater than
> the amount of harm ASI will not prevent
> (due to control/predicative limitations).

It might seem so, by mistake or perhaps by
accidental (or intentional) self deception,
but this can only be a short term delusion.
This has nothing to do with "ASI alignment".

Organic live is very very complex
and in the total hyperspace of possibility,
is only robust across a very narrow range.

Your cancer vaccine is within that range;
as it is made of the same kind of stuff
as that which it is trying to cure.

In the space of the kinds of elementals
and energies inherent in ASI powers
and of the necessary (side) effects
and consequences of its mere existence,
(as based on an inorganic substrate)
we end up involuntarily exploring
far far beyond the adaptive range
of all manner of organic process.

It is not just "maybe it will go bad",
but more like it is very very likely
that it will go much worse than you
can (could ever) even imagine is possible.
Without a lot of very specific training,
human brains/minds are not at all well equipped
to deal with exponential processes, and powers,
of any kind, and ASI is in that category.

Organic live is very very fragile
to the kinds of effects/outcomes
that any powerful ASI must engender
by its mere existence.

If your vaccine was made of neutronium,
then I would naturally expect some
very serious problems and outcomes.

cubefox on Fertility Will Never Recover

Because I don’t care about “humanity in general” nearly as much as I care about my society. Yes, sure, the descendants of the Amish and the Taliban will cover the earth. That’s not a future I strive for. I’d be willing to give up large chunks of the planet to an ASI to prevent that.

I don't know how you would prevent that. Absent an AI catastrophe, fertility will recover, in the sense that "we" (rationalists etc) will mostly be replaced with people of low IQ and impulse control, exactly those populations that have the highest fertility now. And "banishing aging and death" would not prevent them from having high fertility and dominating the future. Moloch is relentless. The problem is more serious than you think.

maxime-riche on [deleted]

Density of Actual SFC Density

Add legends to the plots