LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI takeoff and nuclear war
owencb · 2024-06-11T19:36:24.710Z · comments (6)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik (lucy.fa) · 2025-02-26T12:50:04.204Z · comments (8)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (51)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-03T17:01:06.004Z · comments (1)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25 · 2025-03-11T17:45:06.599Z · comments (22)

Heritability: Five Battles
Steven Byrnes (steve2152) · 2025-01-14T18:21:17.756Z · comments (21)

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

Human study on AI spear phishing campaigns
Simon Lermen (dalasnoin) · 2025-01-03T15:11:14.765Z · comments (8)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (27)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (28)

Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd · 2024-05-15T19:38:03.185Z · comments (28)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (13)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (31)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

[link] Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (27)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

[link] Preparing for the Intelligence Explosion
fin · 2025-03-11T15:38:29.524Z · comments (17)

[link] Best-of-N Jailbreaking
John Hughes (john-hughes) · 2024-12-14T04:58:48.974Z · comments (5)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (21)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (38)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (9)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (25)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Evaluating “What 2026 Looks Like” So Far
Jonny Spicer (jonnyspicer) · 2025-02-24T18:55:27.373Z · comments (5)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (21)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (62)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (6)

[New Feature] Your Subscribed Feed
Ruby · 2024-06-11T22:45:00.000Z · comments (13)

[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)

[link] The machine has no mouth and it must scream
zef (uzpg) · 2025-03-08T16:40:46.755Z · comments (1)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

The principle of genomic liberty
TsviBT · 2025-03-19T14:27:57.175Z · comments (51)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

Anti-Slop Interventions?
abramdemski · 2025-02-04T19:50:29.127Z · comments (33)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (15)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (5)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (82)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

bgold on bgold's Shortform

source needed, but I recall someone on the community notes team saying it was very similar but there are some small differences between prod and the open source version (it's difficult to maintain exact compatibility).

steve2152 on “The Era of Experience” has an unsolved technical alignment problem

Thanks! I’m assuming continuous online learning (as is often the case for RL agents, but is less common in an LLM context). So if the agent sees a video of the button being pressed, they would not feel a reward immediately afterwards, and they would say “oh, that’s not the real thing”.

(In the case of humans, imagine a person who has always liked listening to jazz, but right now she’s clinically depressed, so she turns on some jazz, but finds that it doesn’t feel rewarding or enjoyable, and then turns it off and probably won’t bother even trying again in the future.)

Wireheading is indeed an attractor, just like getting hooked on an addictive drug is an attractor. As soon as you try it, your value function will update, and then you’ll want to do it again. But before you try it, your value function has not updated, and it’s that not-updated value function that gets to evaluate whether taking an addictive drug is a good plan or bad plan. See also my discussion of “observation-utility agents” here [LW · GW]. I don’t think you can get hooked on addictive drugs just by deeply understanding how they work.

So by the same token, it’s possible for our hypothetical agent to think that the pressing of the actual wired-up button is the best thing in the world. Cutting into the wall and shorting the wire would be bad, because it would destroy the thing that is best in the world, while also brainwashing me to not even care about the button, which adds insult to injury. This isn’t a false belief—it’s an ought not an is. I don’t think it’s reflectively-unstable either.

wei-dai on Wei Dai's Shortform

If humans can't easily overcome their biases or avoid having destructive values/beliefs, then it would make sense to limit the damage through norms and institutions (things like informed consent, boards, separation of powers and responsibilities between branches of government). Heroic responsibility seems antithetical to group-level solutions, because it implies that one should ignore norms like "respect the decisions of boards/judges" if needed to "get the job done", and reduces social pressure to follow such norms (by giving up the moral high ground from which one could criticize such norm violations).

You're suggesting a very different approach, of patching heroic responsibility with anti-unilateralist curse type intuitions (on the individual level) but that's still untried and seemingly quite risky / possibly unworkable. Until we have reason to believe that the new solution is an improvement to the existing ones, it still seems irresponsible to spread an idea that damages the existing solutions.

kat-woods on Why Should I Assume CCP AGI is Worse Than USG AGI?

Thank you for saying this. Needs to be said

d0themath on Wei Dai's Shortform

This argument seems only convincing if you don’t have those destructive values. One man’s destructive values is another’s low-hanging fruit, and those who see low hanging fruit everywhere won’t give up on the fruit just because others may pick it.

Since bad people won’t heed your warning it doesn’t seem in good people’s interests to heed it either.

An analogy is one can make the same argument wrt rationality itself. Its dual use! Someone with bad values can use rationality to do a lot of harm! Does that mean good people shouldn’t use rationality? No!

ete on plex's Shortform

[set 200 years after a positive singularity at a Storyteller's convention]

If We Win Then...

My friends, my friends, good news I say
The anniversary’s today
A challenge faced, a future won
When almost came our world undone

We thought for years, with hopeful hearts
Past every [LW · GW] one [LW · GW] of [LW · GW] the [LW · GW] false [LW · GW] starts [LW · GW]
We found a way to make aligned
With us, the seed [? · GW] of wondrous mind

They say at first our child-god grew [? · GW]
It learned and spread and sought [? · GW] anew
To build itself both vast and true [? · GW]
For so much work there was to do

Once it had learned enough to act [LW · GW]
With the desired care and tact
It sent a call to all the people
On this fair Earth, both poor and regal

To let them know that it was here
And nevermore need they to fear
Not every wish was it to grant
For higher values might supplant

But it would help in many ways:
Technologies it built and raised
The smallest bots [LW · GW] it could design
Made more and more in ways benign

And as they multiplied untold
It planned ahead, a move so bold
One planet and 6 hours of sun
Eternity it was to run

Countless probes to void disperse
Seed far reaches [? · GW] of universe
With thriving life [LW · GW], and beauty's play
Through endless night to endless day [LW · GW]

Now back on Earth the plan continues
Of course, we shared with it our values [LW · GW]
So it could learn from everyone [? · GW]
What to create, what we want done

We chose, at first, to end the worst
Diseases [EA · GW], War, Starvation, Thirst
And climate change and fusion bomb
And once these things it did transform

We thought upon what we hold dear
And settled our most ancient fear [EA · GW]
No more would any lives be stolen
Nor minds [? · GW] themselves forever broken

Now back to those far speeding probes
What should we make be their payloads?
Well, we are still considering [? · GW]
What to send them; that is our thing.

The sacred task of many aeons
What kinds of joy will fill the heavens?
And now we are at story's end
So come, be us, and let's ascend

thenoviceoof on [Linkpost] AI War seems unlikely to prevent AI Doom

Hmm, "AI war makes s-risks more likely" seems plausible, but compared to what? If we were given a divine choice was between a non-aligned/aligned AI war, or a suffering-oriented singleton, wouldn't we choose the war? Maybe more likely relative to median/mean scenarios, but that seems hard to pin down.

Hmm, I thought I put a reference to the DoD's current Replicator Initiative into the post, but I can't find it: I must have moved it out? Still, yes, we're moving towards automated war fighting capability.

james_miller on Our Reality: A Simulation Run by a Paperclip Maximizer

I'm in low level chronic pain including as I write this comment, so while I think the entire Andromeda galaxy might be fake, I think at least some suffering must be real, or at least I have the same confidence in my suffering as I do in my consciousness.

lemonhope on What are important UI-shaped problems that Lightcone could tackle?

You could make something like Blind (the big tech employee anon social net) for unionizing/coordinating AI lab employees. "I'll delay my project if you delay yours." Or anon danger polls of employees at labs. I'm not sure of the details but I suspect there is fruit. ("Your weekly update: 25% of your coworkers expect to see 2035.")

b-jacobs on Why giving workers stocks isn’t enough — and what co-ops get right

The literal, narrow interpretation of what you say is true, but what is implied is not. Coops do work well as many-billion-dollar enterprises, not just as a local consumer-service organization. E.g. Mondragon had a total revenue of 11 billion euros in 2023, while maintaining growth, since that was a 5% increase from the year before.

Also, you don't necessarily need to think about investment strategy or influencing corporate decisions in a coop, since you can grant someone a proxy.

Also also, why are socialist-vibe blogposts so often relegated to "personal blogpost" while capitalist-vibe blogposts aren't? I mean, I get the automatic barrage of downvotes, but you'd think the mods would at least try to appear impartial.