LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (27)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (15)

The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-03T09:13:59.245Z · comments (17)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

Circling as practice for “just be yourself”
Kaj_Sotala · 2024-12-16T07:40:04.482Z · comments (5)

Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (49)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (5)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

Remap your caps lock key
bilalchughtai (beelal) · 2024-12-15T14:03:33.623Z · comments (17)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (53)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (37)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (26)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

[link] Best-of-N Jailbreaking
John Hughes (john-hughes) · 2024-12-14T04:58:48.974Z · comments (5)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

Effective Evil's AI Misalignment Plan
lsusr · 2024-12-15T07:39:34.046Z · comments (9)

Human study on AI spear phishing campaigns
Simon Lermen (dalasnoin) · 2025-01-03T15:11:14.765Z · comments (8)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (62)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (25)

Matryoshka Sparse Autoencoders
Noa Nabeshima (noa-nabeshima) · 2024-12-14T02:52:32.017Z · comments (15)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (15)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (15)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (18)

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (19)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (79)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)

[link] Video lectures on the learning-theoretic agenda
Vanessa Kosoy (vanessa-kosoy) · 2024-10-27T12:01:32.777Z · comments (0)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (13)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (5)

[link] Cost, Not Sacrifice
Joe Rogero · 2024-11-20T21:32:26.281Z · comments (13)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on 2. Skim the Manual: Intelligent Voluntary Cooperation

I've read and appreciate the entire series, but this was the chapter that really stood out to me. I think that improved systems of voluntary contracts with decentralized enforcement might be our best hope out of the multiple Pareto-suboptimal races-to-doom that humanity seems stuck in.

I think this is potentially a place where we could use our biggest risk factor (AI) to save us before it dooms us. Could we manage to use AI to speed-run the invention and deployment of such subsidiarity governance systems? I think the biggest challenge to this is how fast it would need to move in order to take effect in time. For a system that needs extremely broad buy-in from a large number of heterogenous actors, speed of implementation and adoption is a key weak point.

Imagine though that a really good system was designed which you felt confident that a supermajority of humanity would sign onto if they had it personally explained to them (along with a convincing explanations of the counterfactuals). How might we get this personalized explanation accomplished at scale? Welll, LLMs are still bad at certain things, but giving personalized interactive explanations of complex legal docs seems well within their near-term capabilities. It would still be a huge challenge to actually present nearly everyone on Earth with the opportunity to have this interaction, and all within a short deadline... But not beyond belief.

This is the first non-destructive non-coercive pivotal act that I've considered plausibly sufficient to save us from the current crisis. I've thought up and heard of a lot of different bad plans over the past ten years, so it's really nice to finally find a good one!

Existential Hope indeed!

alex_altair on Shallow review of technical AI safety, 2024

Some small corrections/additions to my section ("Altair agent foundations"). I'm currently calling it "Dovetail research". That's not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.

Which orthodox alignment problems could it help with?: 9. Humans cannot be first-class parties to a superintelligent value handshake

I wouldn't put number 9. Not intended to "solve" most of these problems, but is intended to help make progress on understanding the nature of the problems through formalization, so that they can be avoided or postponed, or more effectively solved by other research agenda.

Target case: worst-case

definitely not worst-case, more like pessimistic-case

Some names: Alex Altair, Alfred Harwood, Daniel C, Dalcy K

Add "José Pedro Faustino"

Estimated # FTEs: 1-10

I'd call it 2, averaged throughout 2024.

Some outputs in 2024: mostly [LW · GW] exposition [LW · GW] but it’s early days

Basically right; I'd add this post [LW · GW] and this post [LW · GW].

raemon on On Eating the Sun

Richard said "I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that." He didn't say it'd be a longterm project, I think he just meant he didn't change his beliefs about it due to thist post.

sweenesm on How can humanity survive a multipolar AGI scenario?

Here's what Claude 3.5 Sonnet said in response to this prompt:

Me:

What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?

Claude:

Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:

A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:

International coordination and governance frameworks that place limits on allowable AI system capabilities and uses, similar to arms control treaties. However, enforcement would be challenging.
Technical measures like capability limitations built into AI systems' training that prevent them from being used for certain catastrophic actions, regardless of user intent. The challenge is defining and implementing such limitations without preventing beneficial uses.
Focused work on improving human values and cooperation through better institutions, education, and mechanisms for resolving conflicts peacefully. This could reduce the likelihood of actors wanting to cause catastrophic harm.
Developing AI systems with some form of common good or humanitarian principles that apply regardless of user instructions, though this involves difficult philosophical and technical questions about how to define and implement such principles.

What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.

___________________________________

I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).

jfw01 on Where I agree and disagree with Eliezer

implement a relatively limited policy

I read this as Libertarian; the hope that there could be a very stiff, strong government that was also small, and did only a subset of the things in the short-term interest of its supporters.

raemon on On Eating the Sun

So, I'm with you on "hey guys, uh, this is pretty horrifying, right? Uh, what's with the missing mood about that."

The thing is that not doing it is also horrifying. i.e. see also All Possible Views About Humanity's Future Are Wild [LW · GW]. To not eat the sun is to throw away orders of magnitude more resources than anyone has ever thrown away before. Is it percentage-wise "a small fraction of the cosmos?". Maybe. But, (quickly checks Claude, which wrote up a fermi code snippet before answering, I can share the work if you want to doublecheck yourself), a two year delay would be... 200 galaxies lost, longterm.

When you compare "the Amish get a Sun Replica that doesn't change their experience", the question "Is it worth throwing away 80 trillion stars to have the real thing" is, like, not a trivial question.

IMO there isn't an option that isn't at least a bit horrifying in some sense that one could have a missing mood about. And while I still feel unsettled about it, I think if I have to grieve something, makes more sense to grieve in the direction of "don't throw away 80 trillion stars worth of resources."

I think you're also maybe just not appreciating how much would change in 10,000 years? Like, there is no single culture that has survived 10,000 years. (Maybe one of those small tribes in the amazon? I'd still bet on there having been a lot of cultural drift there but not confidently). The Amish are only a few hundred years old. I can imagine doing a lot of moral reflection and coming to the conclusion the sun shouldn't be eaten until all human cultures have decided it's the right thing to do, but I do really doubt that process takes 10,000 years.

fread2281 on Is there software to practice reading expressions?

I just made an anki deck for this, using gpt 4o-mini to label images of faces. I’ll try to report back in a month or two how well it works.

nathan-helm-burger on How can humanity survive a multipolar AGI scenario?

My current best guess: Subsidiarity I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future [LW · GW]

The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.

I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.

jfw01 on Where I agree and disagree with Eliezer

Alignment isn’t like that; it was chosen to be an important problem

Like medicine.

This was specifically commented on in a book whose preface I read as a child. It was called something like "Medicine: from science to magic", and I have not found a clear link back to it.

charlie-steiner on Aristocracy and Hostage Capital

No, you're right that aristocracy is more complicated. There were lots of pressures that shaped the form of it. Certainly more than how good of managers aristocrats made!

An invalid syllogism: "The rules of aristocracy were shaped by forces. Avoiding poor management is a force. Therefore, the rules of aristocracy will be all about avoiding poor management."

Aristocrats were also selected for how well they could extract rents from those below, and how well they could resist rent-extraction from above, both alone and collectively. Nor was the top-down pressure all about making aristocrats into productive managers - rent-extraction has been mentioned, and also weakening the aristocracy to secure central power, allowing advancement via marriage and alliance, various human status games, and the need for a legislative arm of government.

I don't want to hear the One Pressure That Explains Everything (but only qualitatively, and if you squint). I'll want to hear when they have the dozen pressures that make up a model that can be quantitatively fit to past data by tuning some parameters, including good retrodictive accuracy over a held-out time period.