LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (241)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (85)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (36)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (40)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (19)

What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (39)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (152)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (59)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (97)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (44)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (34)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (25)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (32)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (65)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (155)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (33)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (18)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (6)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (7)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (12)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (22)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (48)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

next page (older posts) →

Archive

Recent comments

karl-krueger on johnswentworth's Shortform

I see a lot of discussion of AI doom stemming from research, business, and government / politics (including terrorism). Not a lot about AI doom from crime. Criminals don't stay in the box; the whole point of crime is to benefit yourself by breaking the rules and harming others. Intentional creation of intelligent cybercrime tools — ecosystems of AI malware, exploit discovery, spearphishing, ransomware, account takeovers, etc. — seems like a path to uncontrolled evolution of explicitly hostile AGI, where a maxim of "discover the rules; break them; profit" is designed-in.

sam-marks on Scaling Sparse Feature Circuit Finding to Gemma 9B

Good work! A few questions:

Where do the edges you draw come from? IIUC, this method should result in a collection of features but not say what the edges between them are.
IIUC, the binary masking technique here is the same as the subnetwork probing baseline from the ACDC paper, where it seemed to work about as well as ACDC (which in turn works a bit worse than attribution patching). Do you know why you're finding something different here? Some ideas:
1. The SP vs. ACDC comparison from the ACDC paper wasn't really apples-to-apples because ACDC pruned edges whereas SP pruned nodes (and kept all edges betwen non-pruned nodes IIUC). If Syed et al. had compared attribution patching on nodes vs. subnetwork probing, they would have found that subnetwork probing was better.
2. There's something special about SAE features which changes which subnetwork discovery technique works best.
  1. I'd be a bit interested in seeing your experiments repeated for finding subnetworks of neurons (instead of subnetworks of SAE features); does the comparison between attribution patching/integrated gradients and training a binary mask still hold in that case?

vladimir_nesov on The Golden Opportunity for American AI

The $25-40bn figure is an estimate for about 1 GW worth of GB200s. SemiAnalysis expects 1 GW training systems for Google in 2025 and something comparable for Microsoft/OpenAI. This is discussed by Dylan Patel publicly on Dwarkesh Podcast, claiming that there is a 300K B200s cluster and 500K-700K B200s worth of compute in total currently being constructed, possibly networked into a single training system. So if planned Microsoft capex was $60bn, that would've been surprising, too little for this project without cutting something else, but $80bn fits this story, that's my takeaway.

With Stargate, $100bn is still too much for the training systems of 2024-2025, so it's either not about what's being built in 2024-2025 at all, or a larger project that has current activities as part (which wouldn't fit building a big training system using a specific generation of hardware). Musk's 100K H100s Colossus tells me that building a training system in a year is feasible, even though it normally takes longer. The preliminary steps (land, power, permits, buildings) are much cheaper, but securing power and permits can require starting years in advance. So talking about a $100bn Stargate in 2024 is consistent with building it in late 2026, once there is a plot with 3-5 GW of power and datacenter permits, most of the expense will then be in 2026 (Nvidia Rubin probably).

rationalelf on Human takeover might be worse than AI takeover

I mean humans with strong AGIs under their control might function as if they don't need sleep, might become immortal, will probably build up superhuman protections from assasination, etc

benquo on Guilt, Shame, and Depravity

Different example - I said "instead"

If you look back, you'll see I was specifically responding to the hypothetical scenario about public admission in that comment. For your points about private shame, you might want to check my other comment replying to you [LW(p) · GW(p)] where I addressed how internal shame and self-image maintenance connect to social dynamics.

I notice you're attributing positions to me that I haven't taken and expressing confusion about points I've already addressed in detail. It would be helpful if you could engage more carefully with what I've carefully written.

so if the musician openly admits and apologize for only being average they are ashamed because they are afraid of the reaction of the fan who clearly loved their performance (not their failure to abstain from what they believe is the cause of their average performance?)

You're introducing new elements that weren't in your original scenario. But more importantly: you described the show as "a hit" where "everyone loves them." Calling this performance "only average" isn't revealing accurate adverse information - it's a lie.

but if they don't mention it to anyone (therefore are committing neither a dominance nor submission gesture) they are also ashamed?

In my other reply to you [LW(p) · GW(p)], I explained how private shame often involves maintaining conflicting mental models - one that enables confident performance and another that tracks specific flaws for improvement. Even when no one would directly know or care about staying up late drinking, the performer may feel shame because they've invested in an identity as a "professional musician" or "disciplined performer" - an identity that others care about and grant certain privileges to. The shame comes from violating the requirements of this identity, which serves as a proxy for social approval and professional opportunities. This creates internal pressure toward shame even without a specific idea of someone else who would directly condemn the behavior or trait in question.

Are you telling me there is no conceivable circumstance where any human being feels shame for something which is totally alone, none at all?

What I'm suggesting is that shame inherently involves at least a tacit social component - some imagined perspective by which we are condemned. This is consistent with Smith's and Hume's moral sentiments theory, where moral judgments fundamentally involve taking up imagined perspectives of others. This doesn't mean the shame isn't genuinely felt or that any specific others would actually condemn us. But in my experience people can frequently unravel particular cases of such shame by honestly examining what specific others would actually think if they knew, which is some experimental validation for this view.

lsusr on Open Thread Winter 2024/2025

Is anyone else on this website making YouTube videos? Less Wrong is great, but if you want to broadcast to a larger audience, video seems like the place to be. I know Rational Animations [LW · GW] makes videos. Is there anyone else? Are you, personally, making any?

gwern on The Golden Opportunity for American AI

Stargate was reported in 2024, and that reporting specified that the Stargate $100b phase hadn't started yet because MS was still building the previous phase, with "in excess of $115b" for all the phases, implying a large ramp up. And since Stargate was intended for OA, while MS of course has its own knitting to tend to, that implies much larger datacenter capex total. Given how vague the reporting is and how large the numbers are, but that the sooner the better, $80b in FY 2025 doesn't clearly tell me that there must be some mystery $25-40bn training system which is a big surprise. You don't build a Stargate overnight, and if it is to be finished and fully operational "as soon as 2028", you're going to need to be spending a lot of money 3 years beforehand.

sharmake-farah on When is a mind me?

I've answered a question on this discussion, and my short answer is that I basically agree with the post, mostly because I think computationalism is closest to accurate as a model of identity in the general case, with physicalism being a special case of the general case (with caveats) but I definitely think you were pretty epistemically terrible during your interactions, and I don't blame @andeslodes [LW · GW] and @sunwillrise [LW · GW] for disagreeing with the post, and the way you handled disagreements here does not make me confident that LW thought leaders will reliably go in truth-tracking directions.

Answer is below:

https://www.lesswrong.com/posts/yoAhc7ZhQZfGqrzif/what-are-the-actual-arguments-in-favor-of-computationalism#KTWgPbomupmwE2TFb [LW(p) · GW(p)]

General comments on consciousness:

https://www.lesswrong.com/posts/TkahaFu3kb6NhZRue/quick-general-thoughts-on-suffering-and-consciousness#FaMEMcpa6mXTybarG [LW(p) · GW(p)]

https://www.lesswrong.com/posts/TkahaFu3kb6NhZRue/quick-general-thoughts-on-suffering-and-consciousness#WEmbycP2ppDjuHAH2 [LW(p) · GW(p)]

russellthor on Beliefs and state of mind into 2025

Perhaps, depends how it is. I think we could do worse than just have Anthropic have a 2 year lead etc. I don't think they would need to prioritize profit as they would be so powerful anyway - the staff would be more interested in getting it right and wouldn't have financial pressure. WBE is a bit difficult, there needs to be clear expectations, i.e. leave weaker people alone and make your own world
https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity [LW · GW]
There is no reason why super AI would need to exploit normies. Whatever we decide, we need some kind of clear expectations and values regarding what WBE are before they become common, Are they benevolent super-elders, AI gods banished to "just" the rest of the galaxy, the natural life progression of first world humans now?

rob-lucas on On Eating the Sun

I agree that it's plausible just from priors that ASI could find a way to eat the sun. The matter is there, and while it's strongly gravitationally bound in a way that's inconvenient, there's nothing physically impossible about getting it out of that arrangement into one that's more convenient to using fusion reactors or something.

But an analysis of how plausible the scenario is would certainly have made the post more valuable. There are plausible proposals for how to get the fuel present in the sun out such that it could be used more efficiently, and while it may be possible that an ASI might come up with a more elegant or efficient plan, there are some fundamental physical limits on exactly how efficient the process could be made.

Wikipedia has some discussion of possible methods: https://en.m.wikipedia.org/wiki/Star_lifting

That article says: "This energy could be collected by a Dyson sphere; using 10% of the Sun's total power output would allow 5.9 x10^21 kilograms of matter to be lifted per year (0.0000003% of the Sun's total mass)", but this doesn't take account of the possibility of using the collected mass to fuel fusion reactions that are then used to power the mass collection. What are the constraints on that process (my first thought is you have to worry about heat if you try to get the total power too high).

10,000 years sounds like enough time if you can get an exponential process going that uses the fuel harvested from the sun to collect more fuel. But any process will have some constraints, such as max temperature at which the various parts of your system can function, or the specific materials which your system is made of (do you have to build your fusion reactors out of materials harvested from metal rich bodies? can you use carbon converted into diamondoid nanomachines? can you get enough of those materials out of the fusion of hydrogen to keep the process going once it's started?). Even if your fuel harvesters and fusion reactors can stand up to the high temperatures necessary to eat the sun in that time frame, what about everything else in the solar system? Does this process sterilize the earth of biological life?

Once I consider that there will be some sort of physical contraints on the process and also remember the fact that the sun is really big, it's not obvious that even an exponential process of fuel harvesting from the sun will be completed in a 10,000 year time frame.