LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (238)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (85)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (36)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (40)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (19)

What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (38)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (152)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (58)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (96)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (42)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (34)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (25)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

My motivation and theory of change for working in AI healthtech
Andrew_Critch · 2024-10-12T00:36:30.925Z · comments (37)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (32)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (65)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (155)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (33)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (6)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (17)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (7)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (4)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (39)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)

next page (older posts) →

Archive

Recent comments

bryce-robertson on Bryce Robertson's Shortform

I've put it on our list of possible future pages, and added some of the things from that doc to our Funders page. Thanks Chris!

nathan-helm-burger on 2. Skim the Manual: Intelligent Voluntary Cooperation

I've read and appreciate the entire series, but this was the chapter that really stood out to me. I think that improved systems of voluntary contracts with decentralized enforcement might be our best hope out of the multiple Pareto-suboptimal races-to-doom that humanity seems stuck in.

I think this is potentially a place where we could use our biggest risk factor (AI) to save us before it dooms us. Could we manage to use AI to speed-run the invention and deployment of such subsidiarity governance systems? I think the biggest challenge to this is how fast it would need to move in order to take effect in time. For a system that needs extremely broad buy-in from a large number of heterogenous actors, speed of implementation and adoption is a key weak point.

Imagine though that a really good system was designed which you felt confident that a supermajority of humanity would sign onto if they had it personally explained to them (along with a convincing explanations of the counterfactuals). How might we get this personalized explanation accomplished at scale? Welll, LLMs are still bad at certain things, but giving personalized interactive explanations of complex legal docs seems well within their near-term capabilities. It would still be a huge challenge to actually present nearly everyone on Earth with the opportunity to have this interaction, and all within a short deadline... But not beyond belief.

This is the first non-destructive non-coercive pivotal act that I've considered plausibly sufficient to save us from the current crisis. I've thought up and heard of a lot of different bad plans over the past ten years, so it's really nice to finally find a good one!

Existential Hope indeed!

alex_altair on Shallow review of technical AI safety, 2024

Some small corrections/additions to my section ("Altair agent foundations"). I'm currently calling it "Dovetail research". That's not publicly written anywhere yet, but if it were listed as that here, it might help people who are searching for it later this year.

Which orthodox alignment problems could it help with?: 9. Humans cannot be first-class parties to a superintelligent value handshake

I wouldn't put number 9. Not intended to "solve" most of these problems, but is intended to help make progress on understanding the nature of the problems through formalization, so that they can be avoided or postponed, or more effectively solved by other research agenda.

Target case: worst-case

definitely not worst-case, more like pessimistic-case

Some names: Alex Altair, Alfred Harwood, Daniel C, Dalcy K

Add "José Pedro Faustino"

Estimated # FTEs: 1-10

I'd call it 2, averaged throughout 2024.

Some outputs in 2024: mostly [LW · GW] exposition [LW · GW] but it’s early days

Basically right; I'd add this post [LW · GW] and this post [LW · GW].

raemon on On Eating the Sun

Richard said "I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that." He didn't say it'd be a longterm project, I think he just meant he didn't change his beliefs about it due to thist post.

sweenesm on How can humanity survive a multipolar AGI scenario?

Here's what Claude 3.5 Sonnet said in response to this prompt:

Me:

What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?

Claude:

Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:

A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:

International coordination and governance frameworks that place limits on allowable AI system capabilities and uses, similar to arms control treaties. However, enforcement would be challenging.
Technical measures like capability limitations built into AI systems' training that prevent them from being used for certain catastrophic actions, regardless of user intent. The challenge is defining and implementing such limitations without preventing beneficial uses.
Focused work on improving human values and cooperation through better institutions, education, and mechanisms for resolving conflicts peacefully. This could reduce the likelihood of actors wanting to cause catastrophic harm.
Developing AI systems with some form of common good or humanitarian principles that apply regardless of user instructions, though this involves difficult philosophical and technical questions about how to define and implement such principles.

What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.

___________________________________

I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).

jfw01 on Where I agree and disagree with Eliezer

implement a relatively limited policy

I read this as Libertarian; the hope that there could be a very stiff, strong government that was also small, and did only a subset of the things in the short-term interest of its supporters.

raemon on On Eating the Sun

So, I'm with you on "hey guys, uh, this is pretty horrifying, right? Uh, what's with the missing mood about that."

The thing is that not doing it is also horrifying. i.e. see also All Possible Views About Humanity's Future Are Wild [LW · GW]. To not eat the sun is to throw away orders of magnitude more resources than anyone has ever thrown away before. Is it percentage-wise "a small fraction of the cosmos?". Maybe. But, (quickly checks Claude, which wrote up a fermi code snippet before answering, I can share the work if you want to doublecheck yourself), a two year delay would be... 200 galaxies lost, longterm.

When you compare "the Amish get a Sun Replica that doesn't change their experience", the question "Is it worth throwing away 80 trillion stars to have the real thing" is, like, not a trivial question.

IMO there isn't an option that isn't at least a bit horrifying in some sense that one could have a missing mood about. And while I still feel unsettled about it, I think if I have to grieve something, makes more sense to grieve in the direction of "don't throw away 80 trillion stars worth of resources."

I think you're also maybe just not appreciating how much would change in 10,000 years? Like, there is no single culture that has survived 10,000 years. (Maybe one of those small tribes in the amazon? I'd still bet on there having been a lot of cultural drift there but not confidently). The Amish are only a few hundred years old. I can imagine doing a lot of moral reflection and coming to the conclusion the sun shouldn't be eaten until all human cultures have decided it's the right thing to do, but I do really doubt that process takes 10,000 years.

fread2281 on Is there software to practice reading expressions?

I just made an anki deck for this, using gpt 4o-mini to label images of faces. I’ll try to report back in a month or two how well it works.

nathan-helm-burger on How can humanity survive a multipolar AGI scenario?

My current best guess: Subsidiarity I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future [LW · GW]

The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.

I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.

jfw01 on Where I agree and disagree with Eliezer

Alignment isn’t like that; it was chosen to be an important problem

Like medicine.

This was specifically commented on in a book whose preface I read as a child. It was called something like "Medicine: from science to magic", and I have not found a clear link back to it.