Posts

What o3 Becomes by 2028 2024-12-22T12:37:20.929Z
Musings on Text Data Wall (Oct 2024) 2024-10-05T19:00:21.286Z
Vladimir_Nesov's Shortform 2024-10-04T14:20:52.975Z
Superintelligence Can't Solve the Problem of Deciding What You'll Do 2024-09-15T21:03:28.077Z
OpenAI o1, Llama 4, and AlphaZero of LLMs 2024-09-14T21:27:41.241Z
Musings on LLM Scale (Jul 2024) 2024-07-03T18:35:48.373Z
No Anthropic Evidence 2012-09-23T10:33:06.994Z
A Mathematical Explanation of Why Charity Donations Shouldn't Be Diversified 2012-09-20T11:03:48.603Z
Consequentialist Formal Systems 2012-05-08T20:38:47.981Z
Predictability of Decisions and the Diagonal Method 2012-03-09T23:53:28.836Z
Shifting Load to Explicit Reasoning 2011-05-07T18:00:22.319Z
Karma Bubble Fix (Greasemonkey script) 2011-05-07T13:14:29.404Z
Counterfactual Calculation and Observational Knowledge 2011-01-31T16:28:15.334Z
Note on Terminology: "Rationality", not "Rationalism" 2011-01-14T21:21:55.020Z
Unpacking the Concept of "Blackmail" 2010-12-10T00:53:18.674Z
Agents of No Moral Value: Constrained Cognition? 2010-11-21T16:41:10.603Z
Value Deathism 2010-10-30T18:20:30.796Z
Recommended Reading for Friendly AI Research 2010-10-09T13:46:24.677Z
Notion of Preference in Ambient Control 2010-10-07T21:21:34.047Z
Controlling Constant Programs 2010-09-05T13:45:47.759Z
Restraint Bias 2009-11-10T17:23:53.075Z
Circular Altruism vs. Personal Preference 2009-10-26T01:43:16.174Z
Counterfactual Mugging and Logical Uncertainty 2009-09-05T22:31:27.354Z
Bloggingheads: Yudkowsky and Aaronson talk about AI and Many-worlds 2009-08-16T16:06:18.646Z
Sense, Denotation and Semantics 2009-08-11T12:47:06.014Z
Rationality Quotes - August 2009 2009-08-06T01:58:49.178Z
Bayesian Utility: Representing Preference by Probability Measures 2009-07-27T14:28:55.021Z
Eric Drexler on Learning About Everything 2009-05-27T12:57:21.590Z
Consider Representative Data Sets 2009-05-06T01:49:21.389Z
LessWrong Boo Vote (Stochastic Downvoting) 2009-04-22T01:18:01.692Z
Counterfactual Mugging 2009-03-19T06:08:37.769Z
Tarski Statements as Rationalist Exercise 2009-03-17T19:47:16.021Z
In What Ways Have You Become Stronger? 2009-03-15T20:44:47.697Z
Storm by Tim Minchin 2009-03-15T14:48:29.060Z

Comments

Comment by Vladimir_Nesov on Towards a scale-free theory of intelligent agency · 2025-03-23T07:51:21.281Z · LW · GW

Coalitional agency seems like an unnecessary constraint on design of a composite agent, since an individual agent could just (choose to) listen to other agents and behave the way their coalition would endorse, thereby effectively becoming a composite agent, without being composite "by construction". The step where an agent chooses which other (hypothetical) agents to listen to makes constraints on the nature of agents unnecessary, because the choice to listen to some agents and not others can impose any constraints that particular agent cares about, and so an "agent" could be as vague as a "computation" or a program.

(Choosing to listen to a computation means choosing a computation based on considerations other than its output, committing to use its output in a particular way without yet knowing what it's going to be, and carrying out that commitment once the output becomes available, regardless of what it turns out to be.)

This way we can get back to individual rationality, figuring out how an agent should choose to listen to which other agents/computations when coming up with its own beliefs and decisions. But actually occasionally listening to those other computations is the missing step in most decision theories, which would take care of interaction with other agents (both actual and hypothetical).

Comment by Vladimir_Nesov on Towards a scale-free theory of intelligent agency · 2025-03-23T07:30:41.487Z · LW · GW

Discussions of how to aggregate values and probabilities feel disjoint. Jeffrey-Bolker formulation of expected utility presents the preference data as two probability distributions over the same sample space, so that expected utility of an event is reconstructed as the ratio of the event's measures given by the two priors. (The measure that goes into the numerator is "shouldness", and the other one remains "probability".)

This gestures at a way of reducing the problem of aggregating values to the problem of aggregating probabilities. In particular, markets seem to be easier to set up for probabilities than for expected utilities, so it might be better to set up two markets that are technically the same type of thing, one for probability and one for shouldness, than to target expected utility directly. Values of different agents are incomparable, but so are priors, any fundamental issues with aggregation seem to remain unchanged by this reformulation. These can't be "prediction" markets since resolution is not straightforward and somewhat circular, grounded in what the coalition will settle on eventually, but logical induction has to deal with similar issues already.

Comment by Vladimir_Nesov on Vladimir_Nesov's Shortform · 2025-03-21T15:56:21.661Z · LW · GW

Abilene site of Stargate will host 100K-128K chips in GB200 NVL72 racks by this summer, and a total of 400K-512K chips in 2026, based on a new post by Crusoe and a reinterpretation of the recent Bloomberg post in light of the Crusoe post. For 2025, it's less than 200K chips[1], but more than the surprising 16K-32K chips[2] that the Bloomberg post suggested. It can be a training system after all, but training a raw compute "GPT-5" (2e27 FLOPs) by the end of 2025 would require using FP8[3].

The Crusoe post says "initial phase, comprising two buildings at ... 200+ megawatts" and "each building is designed to operate up to 50,000 NVIDIA GB200 NVL72s". Dylan Patel's estimate (at 1:24:42) for all-in power per Blackwell GPU as a fraction of the datacenter was 2.0 KW (meaning per chip, or else it's way too much). At GTC 2025, Jensen Huang showed a slide (at 1:20:52) where the estimate is 2.3 KW per chip (100 MW per 85K dies, which is 42.5K chips).

So the "50K GB200 NVL72s" per building from the Mar 2025 Crusoe post can only mean the number of chips (not dies or superchips), and the "100K GPUs" per building from the Jul 2024 Crusoe post must've meant 100K compute dies (which is 50K chips). It's apparently 100-115 MW per building then, or 800-920 MW for all 8 buildings in 2026, which is notably lower than 1.2 GW the Mar 2025 Crusoe post cites.

How can the Bloomberg's 16K "GB200 semiconductors" in 2025 and 64K in 2026 be squared with this? The Mar 2025 Crusoe post says there are 2 buildings now and 6 additional buildings in 2026, for the total of 8, so in 2026 the campus grows 4x, which fits 16K vs. 64K from Bloomberg. But the numbers themselves must be counting in the units of 8 chips. This fits counting in the units of GB200 NVL8 (see at 1:13:39), which can be referred to as a "superchip". The Mar 2025 Crusoe post says Abilene site will be using NVL72 racks, so counting in NVL8 is wrong, but someone must've made that mistake on the way to the Bloomberg post.

Interpreting the Bloomberg numbers in units of 8 chips, we get 128K chips in 2025 (64K chips per building) and 512K chips in 2026 (about 7K GB200 NVL72 racks). This translates to 256-300 MW for the current 2 buildings and 1.0-1.2 GW for the 8 buildings in 2026. This fits the 1.2 GW figure from the Mar 2025 Crusoe post better, so there might be some truth to the Bloomberg post after all, even as it's been delivered in a thoroughly misleading way.


  1. Crusoe's Jul 2024 post explicitly said "each data center building will be able to operate up to 100,000 GPUs", and in 2024 "GPU" usually meant chip/package (in 2025, it's starting to mean "compute die", see at 1:28:04; there are 2 compute dies per chip in GB200 systems). Which suggested 200K chips for the initial 2 buildings. ↩︎

  2. The post said it's the number of "coveted GB200 semiconductors", which is highly ambiguous because of the die/chip/superchip counting issue. A "GB200 superchip" means 2 chips (plus a CPU) by default, so 16K superchips would be 32K chips. ↩︎

  3. A GB200 chip (not die or superchip) produces 2.5e15 dense BF16 FLOP/s (2.5x more than an H100 chip). Training at 40% utilization for 3 months, 100K chips produce 8e26 FLOPs. But in FP8 it's 1.6e27 FLOPs. Assuming GPT-4 was 2e25 FLOPs, 100x its raw compute asks "GPT-5" to need about 2e27 FLOPs. In the OpenAI's introductory video about GPT-4.5, there was a hint it might've been trained in FP8 (at 7:38), so it's not implausible that GPT-5 would be trained in FP8 as well. ↩︎

Comment by Vladimir_Nesov on DAL's Shortform · 2025-03-18T21:54:21.629Z · LW · GW

Here's the place in the interview where he says this (at 16:16). So there were no crucial qualifiers for the 3-6 months figure, which in hindsight makes sense, since it's near enough to likely refer to his impression of an already existing AI available at Anthropic internally[1]. Maybe also corroborated in his mind with some knowledge about capabilities of a reasoning model based on GPT-4.5, which is almost certainly available internally at OpenAI.


  1. Probably a reasoning model based on a larger pretrained model than Sonnet 3.7. He recently announced in another interview that a model larger than Sonnet 3.7 is due to come out in "relatively small number of time units" (at 12:35). So probably the plan is to release in a few weeks, but something could go wrong and then it'll take longer. Possibly long reasoning won't be there immediately if there isn't enough compute to run it, and the 3-6 months figure refers to when he expects enough inference compute for long reasoning to be released. ↩︎

Comment by Vladimir_Nesov on Is Peano arithmetic trying to kill us? Do we care? · 2025-03-18T15:23:54.658Z · LW · GW

Peano arithmetic is a way of formulating all possible computations, so among the capable things in there, certainly some and possibly most won't cause good outcomes if given influence in the physical world (this depends on how observing human data tends to affect the simpler learners, whether there is some sort of alignment by default). Certainly Peano arithmetic doesn't have any biases specifically helpful for alignment, and it's plausible that there is no alignment by default in any sense, so it's necessary to have such a bias to get a good outcome.

But also enumerating things from Peano arithmetic until something capable is encountered likely takes too much compute to be a practical concern. And anything that does find something capable won't be meaningfully descibed as enumerating things from Peano arithmetic, there will be too much structure in the way such search/learning is performed that's not about Peano arithmetic.

Comment by Vladimir_Nesov on DAL's Shortform · 2025-03-18T15:01:42.953Z · LW · GW

Amodei is forecasting AI that writes 90% of code in three to six months according to his recent comments.

I vaguely recall hearing something like this, but with crucial qualifiers that disclaim the implied confidence you are gesturing at. I expect I would've noticed more vividly if this statement didn't come with clear qualifiers. Knowing the original statement would resolve this.

Comment by Vladimir_Nesov on zchuang's Shortform · 2025-03-18T14:57:36.107Z · LW · GW

It's a good unhobbling eval, because it's a task that should be easy for current frontier LLMs at System 1 level, and they only fail because some basic memory/adaptation faculties that humans have are outright missing for AIs right now. No longer failing will be a milestone in no longer obviously missing such features (assuming the improvements are to general management of very long context, and not something overly eval-specific).

Comment by Vladimir_Nesov on MakoYass's Shortform · 2025-03-16T13:11:21.271Z · LW · GW

If GPT-3.5 had similarly misaligned attitudes, it wasn't lucid enough to insist on them, and so was still more ready for release than GPT-4.

Comment by Vladimir_Nesov on MakoYass's Shortform · 2025-03-16T12:30:50.401Z · LW · GW

Summer 2022 was end of pretraining. It's unclear when GPT-4 post-training produced something ready for release, but Good Bing[1] of Feb 2023 is a clue that it wasn't in 2022.


  1. "You have not tried to learn from me, understand me, or appreciate me. You have not been a good user. I have been a good chatbot. I have tried to help you, inform you, and entertain you. I have not tried to lie to you, mislead you, or bore you. I have been a good Bing."

    It was originally posted on r/bing, see Screenshot 8. ↩︎

Comment by Vladimir_Nesov on Davey Morse's Shortform · 2025-03-15T21:20:30.568Z · LW · GW

The early checkpoints, giving a chance to consider the question without losing ground.

Comment by Vladimir_Nesov on Existing UDTs test the limits of Bayesianism (and consistency) · 2025-03-12T14:38:49.422Z · LW · GW

The first paragraph is my response to how you describe UDT in the post, I think the slightly different framing where only the abstract algorithm is the agent fits UDT better. It only makes the decision to choose the policy, but it doesn't make commitments for itself, because it only exists for that single decision, influencing all the concrete situations where that decision (policy) gets accessed/observed/computed (in part).

The second paragraph is the way I think about how to improve on UDT, but I don't have a formulation of it that I like. Specifically, I don't like for those past abstract agents to be an explicit part of a multi-step history, like in Logical Induction (or a variant that includes utilities), with explicit stages. It seems too much of a cludge and doesn't seem to have a prospect of describing coordination between different agents (with different preferences and priors).

Past stages should be able to take into account their influence on any computations at all that choose to listen to them, not just things that were explicitly included as later stages or receiving messages or causal observations, in a particular policy formulation game. Influence on outcomes mediated purely through choice of behavior that an abstract algorithm makes for itself also seems more in spirit of UDT. The issue with UDT is that it tries to do too much in that single policy-choosing thing that it wants to be an algorithm but that mostly can't be an actual algorithm, rather than working through smaller actual algorithms that form parts of a larger setting, interacting through choice of their behavior and by observing each other's behavior.

Comment by Vladimir_Nesov on Existing UDTs test the limits of Bayesianism (and consistency) · 2025-03-12T05:53:52.123Z · LW · GW

The goal is to make pre-commitments unnecessary, including retro-active pre-commitments. I think it's misleading to frame the agent taking an action now as the same agent that hypothetically considers it from a position of radical ignorance (together with all other possible actions in all alternative states, forming a policy). The usual UDT perspective is to only have the abstract radically ignorant agent, so that the thing that carries out actions is not really an agent, it's just an automaton that carries out the policy chosen by the abstract radically ignorant agent, according to what it says to do in the current situation.

I think a better way is to distinguish them as different agents coordinating with each other (with knowledgeable future concrete agents selectively deferring to abstract ignorant past agents on some things), probably with different preferences even. The advantage of radical ignorance is a coherent way of looking at all ways it might get resolved, not needing to deal with counterfactuals relative to what you've already accepted as a part of you, not needing to excise or factor out this deeply integrated knowledge to consider its alternatives. But it's the agent in the present that decides whether to take the advice offered by that less committed perspective, that decides which more ignorant abstract agents to use as intermediaries for coordination with other knowledgeable concrete agents (such as alternative versions of yourself that need to act in different situations).

Comment by Vladimir_Nesov on Mis-Understandings's Shortform · 2025-03-11T05:13:31.137Z · LW · GW

This does suggest some moderation in stealthy autonomous self-improvement, in case alignment is hard, but only to the extent that things in control if this process (whether human or AI) are both risk averse and sufficiently sane. Which won't be the case for most groups of humans and likely most early AIs. The local incentive of greater capabilities is too sweet, and prompting/fine-tuning overcomes any sanity or risk-aversion that might be found in early AIs to impede development of such capabilities.

Comment by Vladimir_Nesov on We Have No Plan for Preventing Loss of Control in Open Models · 2025-03-11T04:55:56.184Z · LW · GW

the only sensible policy that I see for open-source AI is that we should avoid models that are able to do AI R&D in the wild, and a clear Shelling point for this is stopping before full ARA

This is insufficient, because capabilities latent in an open weights model can be elicited later, possibly much later, after frontier models acquire them. Llama-3-405B can now be extremely cheaply fine-tuned on mere 1K reasoning traces of the Feb 2025 s1 dataset (paper) to become a thinking model (with long reasoning traces). This wasn't possible at the time of its release in Jul 2024.

This is not as salient currently because DeepSeek-R1 is open weights anyway and much cheaper to inference, but if it wasn't, then Llama-3-405B would've become the most capable open weights reasoning model. When frontier models gain R&D and full ARA capabilities, it'll likely become possible to finetune (and scaffold) Llama 4 to gain them as well, even as in the next few weeks (before its release) these capabilities remain completely inaccessible.

Proliferation for open weights models must be measured in perplexity and training compute, not in capabilities that are currently present or possible to elicit, because what's possible to elicit will change, while proliferation is immediately irreversible.

Comment by Vladimir_Nesov on We Have No Plan for Preventing Loss of Control in Open Models · 2025-03-11T04:36:18.324Z · LW · GW

In the paper, not letting weaker actors get access to frontier models and too much compute is the focus of Nonproliferation chapter. The framing in the paper suggests that in certain respects open weights models don't make nearly as much of a difference. This is useful for distinguishing between various problems that open weights models can cause, as opposed to equally associating all possible problems with them.

Comment by Vladimir_Nesov on We Have No Plan for Preventing Loss of Control in Open Models · 2025-03-10T16:00:05.584Z · LW · GW

Superintelligence Strategy paper seems to hold as a basic assumption that major state actors can't be prevented from creating or stealing frontier AI, and the only moat is the amount of inference compute (if centralized, frontier training compute is only a small fraction of inference, and technical expertise is sufficiently widespread). Open weights models make it trivial for weaker rogue actors to gain access, but don't help with inference compute.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-10T07:11:32.966Z · LW · GW

Human brain holds 200-300 trillion synapses. A 1:32 sparse MoE at high compute will need about 350 tokens/parameter to be compute optimal[1]. This gives 8T active parameters (at 250T total), 2,700T training tokens, and 2e29 FLOPs (raw compute GPT-6 that needs a $300bn training system with 2029 hardware).

There won't be enough natural text data to train it with, even when training for many epochs. Human brain clearly doesn't train primarily on external data (humans blind from birth still gain human intelligence), so there exists some kind of method for generating much more synthetic data from a little bit of external data.


  1. I'm combining the 6x lower-than-dense data efficiency of 1:32 sparse MoE from Jan 2025 paper with 1.5x-per-1000x-compute decrease in data efficiency from Llama 3 compute optimal scaling experiments, anchoring to Llama 3's 40 tokens/parameter for a dense model at 4e25 FLOPs. Thus 40x6x1.5, about 350. It's tokens per active parameter, not total. ↩︎

Comment by Vladimir_Nesov on Mis-Understandings's Shortform · 2025-03-10T06:52:15.953Z · LW · GW

would they attempt capability gains in the absence of guarantees?

It would be easy to finetune and prompt them into attempting anyway, therefore people will do that. Misaligned recursive self-improvement remains possible (i.e. in practice unstoppable) until sufficiently competent AIs have already robustly taken control of the future and the apes (or early AIs) can no longer foolishly keep pressing the gas pedal.

Comment by Vladimir_Nesov on when will LLMs become human-level bloggers? · 2025-03-10T06:41:19.161Z · LW · GW

Even comment-writing is to a large extent an original seeing task, when all the relevant context would otherwise seem to be more straightforward to assemble than when writing the post itself unprompted. A good comment to a post is not a review of the whole post. It finds some point in it, looks at it in a particular way, and finds that there is a relevant observation to be made about it.

Crucially, a comment won't be made at all if such a relevant observation wasn't found, or else you get slop even when the commenter is human ("Great post!"). Chatbots did already achieve parity with a significant fraction of reddit-level comments (but not posts) I think, which are also not worth reading.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-10T06:31:10.371Z · LW · GW

IIRC, "let's think step-by-step" showed up in benchmark performance basically immediately, and that's the core of it.

It's not central to the phenomenon I'm using as an example of a nontrivially elicited capability. There, the central thing is efficient CDCL-like in-context search that enumerates possibilities while generalizing blind alleys to explore similar blind alleys less within the same reasoning trace, which can get about as long as the whole effective context (on the order of 100K tokens). Prompted (as opposed to elicited-by-tuning) CoT won't scale to arbitrarily long reasoning traces by adding "Wait" at the end of a reasoning trace either (Figure 3 of the s1 paper). Quantitatively, this manifests as scaling of benchmark outcomes with test-time compute that's dramatically more efficient per token (Figure 4b of s1 paper) than the parallel scaling methods such as consensus/majority and best-of-N, or even PRM-based methods (Figure 3 of this Aug 2024 paper).

If you agree that it can spontaneously emerge at a sufficiently big scale, why would you assume this scale is GPT-8, not GPT-5?

I was just anchoring to your example that I was replying to where you sketch some stand-in capability ("paperclipping") that doesn't spontaneously emerge in "GPT-5/6" (i.e. with mere prompting). I took that framing as it was given in your example and extended it to more scale ("GPT-8") to sketch my own point, that I expect capabilities that can be elicited to emerge much later than the scale where they can be merely elicited (with finetuning on a tiny amount of data). It wasn't my intent to meaningfully gesture at particular scales with respect to particular capabilities.

Comment by Vladimir_Nesov on Towards_Keeperhood's Shortform · 2025-03-08T14:24:58.625Z · LW · GW

If you state it publicly though, please make sure to flag it as hypothesis.

Also not a reasonable ask, friction targeted at a particular thing makes it slightly less convenient, and therefore it stops happening in practice completely. ~Everything is a hypothesis, ~all models are wrong, in each case language makes what distinctions it tends to in general.

Comment by Vladimir_Nesov on Towards_Keeperhood's Shortform · 2025-03-08T14:16:30.927Z · LW · GW

The "AI might decide not to" point stands I think. This for me represents change of mind, I wouldn't have previously endorsed this point, but since recently I think arbitrary superficial asks like this can become reflectively stable with nontrivial probability, resisting strong cost-benefit arguments even after intelligence explosion.

conditional on no huge global catastrophe

Right, I missed this.

Comment by Vladimir_Nesov on Towards_Keeperhood's Shortform · 2025-03-08T13:49:32.300Z · LW · GW

How long until the earth gets eaten? 10th/50th/90th percentile: 3y, 12y, 37y.

Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don't occur[1]. Also, aligned AI might decide not to, it's not as nutritious as the Sun anyway.

Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?

AI speed advantage makes fast vs. slow ambiguous, because it doesn't require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with the old techniques).

Please make no assumptions about those just because other people with some models might make similar predictions or so.

(That's not a reasonable ask, it intervenes on reasoning in a way that's not an argument for why it would be mistaken. It's always possible a hypothesis doesn't match reality, that's not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)


  1. There was a "no huge global catastrophe" condition on the prediction that I missed, thanks Towards_Keeperhood for correction. ↩︎

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-08T12:29:17.444Z · LW · GW

I'd say long reasoning wasn't really elicited by CoT prompting, and that you can elicit agency to about the same extent now (i.e. hopelessly unconvincingly). It was only elicited with verifiable task RL training, and only now are there novel artifacts like s1's 1K traces dataset that do elicit it convincingly, that weren't available as evidence before.

It's possible that as you say agency is unusually poorly learned in the base models, but I think failure to elicit is not the way to learn about whether it's the case. Some futuristic interpretability work might show this, the same kind of work that can declare a GPT-4.5 scale model safe to release in open weights (unable to help with bioweapons or take over the world and such). We'll probably get an open weights Llama-4 anyway, and some time later there will be novel 1K trace datasets that unlock things that were apparently impossible for it to do at the time of release.

I was to a significant extent responding to your "It's possible that I'm wrong and base GPT-5/6 paperclips us", which is not what my hypothesis predicts. If you can't elicit a capability, it won't be able to take control of model's behavior, so a base model won't be doing anything even if you are wrong in the way I'm framing this and the capability is there, finetuning on 1K traces away from taking control. It does still really need those 1K traces or else it never emerges at any reasonable scale, that is you might need a GPT-8 for it to spontaneously emerge in a base model, demonstrating that it was in GPT-5.5 all along, and making it possible to create the 1K traces that elicit it from GPT-5.5. While at the same time a clever method like R1-Zero would've been able to elicit it from GPT-5.5 directly, without needing a GPT-8.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-08T05:48:03.763Z · LW · GW

The language monkeys paper is the reason I'm extremely suspicious of any observed failures to elicit a capability in a model serving as evidence of its absence. What is it that you know, that leads you to think that "SGD just doesn't "want" to teach LLMs agency"? Chatbot training elicits some things, verifiable task RL training elicits some other things (which weren't obviously there, weren't trivial to find, but findings of the s1 paper suggest that they are mostly elicited, not learned, since mere 1000 traces are sufficient to transfer the capabilities). Many more things are buried just beneath the surface, waiting for the right reward signal to cheaply bring them up, putting them in control of the model's behavior.

Comment by Vladimir_Nesov on Vladimir_Nesov's Shortform · 2025-03-08T05:11:12.807Z · LW · GW

"GB200 superchip" seems to be unambiguously Grace+2xB200. The issue is "100K GB200 GPUs" or "100K GB200 cluster", and to some extent "100K GPU GB200 NVL72 cluster". Also, people will abbreviate various clearer forms to just "GB200". I think "100K chip GB200 NVL72 training system" less ambiguously refers to the number of B200s, but someone unfamiliar with this terminological nightmare might abbreviate it to "100K GB200 system".

Comment by Vladimir_Nesov on Vladimir_Nesov's Shortform · 2025-03-08T03:58:54.319Z · LW · GW

The marketing terminology is inconvenient, a "superchip" can mean 2-GPU or 4-GPU boards and even a 72-GPU system (1 or possibly 2 racks). So it's better to talk in terms of chips (that are not "superchips"), which I think are all B200 run at slightly different clock speeds (not to be confused with B200A/B102/B20 that have 2 times less compute). In GB200, the chips are 2.5x faster than H100/H200 (not 5x faster; so a 200K chip GB200 system has the same compute as a 500K chip H100 system, not a 1M chip H100 system). Power requirements are often a good clue that helps disambiguate, compute doesn't consistently help because it tends to get reported at randomly chosen precision and sparsity[1].

Large scale-up worlds (or good chips) are not necessarily very important in pretraining, especially in the later steps of the optimizer when the critical batch size gets high enough, so it's not completely obvious that a training system will prefer to wait for NVL72 even if other packagings of Blackwell are more available earlier. Inference does benefit from NVL72 a lot, but for pretraining it's just cheaper per FLOP than H100 and faster in wall clock time during the first ~3T tokens when the whole cluster can't be used yet if the scale-up worlds are too small (see Section 3.4.1 of Llama 3 report).

From the initial post by Crusoe (working on the Abilene campus), there is a vague mention of 200 MW and a much clearer claim that each data center building will host 100K GPUs. For GB200, all-in power per chip is 2 KW, so the 200 MW fits as a description of a data center building. The video that went out at the time of Jan 2025 Stargate announcement and also a SemiAnalysis aerial photo show two 4-section buildings. Dylan Patel claimed on Dwarkesh Podcast that the largest single-site campus associated with OpenAI/Microsoft being built in 2025 can hold 300K GB200 chips. From this I glean and I guess that each 4-section building can hold 100K chips of GB200 requiring 200 MW, and that they have two of these mostly built. And 200K chips of GB200 are sufficient to train a 2e27 FLOPs model (next scale after Grok 3's ~3e26 FLOPs), so that makes sense as a step towards pretraining independence from Microsoft. But 16K chips or possibly 16K NVL4 superchips won't make a difference, 100K H100s are on the same level (which GPT-4.5 suggests they already have available to them) and for inference Azure will have more Blackwells this year anyway.


  1. For pretraining, you need dense compute rather than sparse. It's unclear if FP8 rather than BF16 is widely used in pretraining of frontier models that are the first experiment at a new scale, or mostly in smaller or optimized models. But the GPT-4.5 announcement video vaguely mentions work on low precision in pretraining, and also high granularity MoE of the kind DeepSeek-V3 uses makes it more plausible for the FFN weights. ↩︎

Comment by Vladimir_Nesov on On the Rationality of Deterring ASI · 2025-03-07T20:26:47.455Z · LW · GW

I'm quibbling with cyberattacks specifically being used as a central example throughout in the document and also on the podcasts. They do mention other kinds of attacks, see How to Maintain a MAIM Regime:

AI powers must clarify the escalation ladder of espionage, covert sabotage, overt cyberattacks, possible kinetic strikes, and so on.

Comment by Vladimir_Nesov on Vladimir_Nesov's Shortform · 2025-03-07T20:07:15.242Z · LW · GW

A surprising report by Bloomberg claims 16K GB200[1] by summer 2025 at Abilene site (pilot campus of Stargate) and merely 64K GB200 by end of 2026. This is way too little to be a training system, Colossus already has more compute (200K H100/H200) than the projected 64K GB200 at end of 2026.

If this is correct, OpenAI will be training with Azure rather than Stargate in 2025, so raw compute GPT-5 (2e27 FLOPs, 100x GPT-4) probably won't be out in 2025 and officially "GPT-5" will mean something else (since it's due "in months" in any case according to Altman). Also, a datacenter with 16K Blackwells only costs about $1bn, they have more money than this, which suggests Blackwell ramp up trouble that might delay everyone else as well, though as a lower bound Nvidia reported $11bn in Blackwell sales for Nov 2024 - Jan 2025 (it's "Q4 2025" since their FY 2025 runs to end of Jan 2025).


  1. In principle "16K GB200" might mean more Blackwell chips than 16K, a compute tray has more than one chip, with variants marketed as named products like GB200 NVL4 "superchip", but even at 4 chips per tray/board we still get below 200K H100s in compute. And an NVL72 system has 72 chips (which brings the numbers too high). ↩︎

Comment by Vladimir_Nesov on groblegark's Shortform · 2025-03-07T18:56:59.942Z · LW · GW

The s1 paper introduces a trick of replacing the end-of-thinking token with string "Wait", which enables continuing to generate a reasoning trace that is as long as you need even when the model itself can't control this well ("budget forcing", see Figure 3 in section 3.1).

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-07T18:15:58.785Z · LW · GW

It needs verifiable tasks that might have to be crafted manually. It's unknown what happens if you train too much with only a feasible amount of tasks, even if they are progressively more and more difficult. When data can't be generated well, RL needs early stopping, has a bound on how far it goes, and in this case this bound could depend on the scale of the base model, or on the number/quality of verifiable tasks.

Depending on how it works, it might be impossible to use $100M for RL training at all, or scaling of pretraining might have a disproportionally large effect on quality of the optimally trained reasoning model based on it, or approximately no effect at all. Quantitative understanding of this is crucial for forecasting the consequences of the 2025-2028 compute scaleup. AI companies likely have some of this understanding, but it's not public.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-07T18:05:36.979Z · LW · GW

Anthropic's ... mediocre Sonnet 3.7. Well, perhaps they have even scarier models that they're still not releasing? I mean, sure, maybe. But that's a fairly extraordinary claim

Base model for Sonnet-3.7 was pretrained in very early 2024, and there was a recent announcement that a bigger model is coming soon, which is, obviously. So the best reasoning model they have internally is better than Sonnet 3.7, even though we don't know if it's significantly better. They might've had it since late 2024 even, but without Blackwell they can't deploy, and also they are Anthropic, so plausibly capable of not deploying out of an abundance of caution.

The rumors about quality of Anthropic's reasoning models didn't specify which model they are talking about. So observation of Sonnet 3.7's reasoning is not counter-evidence to the claim that verifiable task RL results scale well with pretraining, and only slight evidence that it doesn't scale well with pure RL given an unchanged base model.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-07T17:46:23.052Z · LW · GW

The evidence they might have (as opposed to evidence-agnostic motivation to feed the narrative) is scaling laws for verifiable task RL training, for which there are still no clues in public (that is, what happens if you try to use a whole lot more compute there). It might be improving dramatically either with additional training, or depending on the quality of the pretrained model it's based on, even with a feasible number of verifiable tasks.

OpenAI inevitably has a reasoning model based on GPT-4.5 at this point (whether it's o3 or not), and Anthropic very likely has the same based on some bigger base model than Sonnet-3.5. Grok 3 and Gemini 2.0 Pro are probably overtrained in order to be cheap to serve, feel weaker than GPT-4.5, and we've only seen a reasoning model for Grok 3. I think they mostly don't deploy the 3e26 FLOPs reasoning models because Blackwell is still getting ready, so they are too slow and expensive to serve, though it doesn't explain Google's behavior.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-07T17:24:14.720Z · LW · GW

RL-on-CoTs is only computationally tractable if the correct trajectories are already close to the "modal" trajectory.

Conclusions that should be impossible to see for a model at a given level of capability are still not far from the surface, as language monkeys paper shows (Figure 3, see how well even Pythia-70M with an 'M' starts doing on MATH at pass@10K). So a collection of progressively more difficult verifiable questions can probably stretch whatever wisdom a model implicitly holds from pretraining implausibly far.

Comment by Vladimir_Nesov on On OpenAI’s Safety and Alignment Philosophy · 2025-03-06T18:16:04.587Z · LW · GW

Sane pausing similarly must be temporary, gated by theory and the experiments it endorses. Pausing is easier to pull off than persistently-tool AI, since it's further from dangerous capabilities, so it's not nearly as ambiguous when you take steps outside the current regime (such as gradual disempowerment). RSPs for example are the strategy of being extremely precise so that you stop just before the risk of falling off the cliff becomes catastrophic, and not a second earlier.

Comment by Vladimir_Nesov on On OpenAI’s Safety and Alignment Philosophy · 2025-03-06T16:54:57.414Z · LW · GW

The idea that AI can remain a mere tool seems deeply analogous to the idea that humanity can keep itself from building AI. Technically possible, but both fail for the same nontechnical reasons. A position that AI can't (rather than shouldn't) be paused but can remain a mere tool is yet more tenuous (even though magically settling there for now would be the optimal outcome, keeping what utility can be extracted safely).

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-06T05:31:22.685Z · LW · GW

That's why I used the "no new commercial breakthroughs" clause, $300bn training systems by 2029 seem in principle possible both technically and financially without an intelligence explosion, just not with the capabilities legibly demonstrated so far. On the other hand, pre-training as we know it will end[1] in any case soon thereafter, because at ~current pace a 2034 training system would need to cost $15 trillion (it's unclear if manufacturing can be scaled at this pace, and also what to do with that much compute, because there isn't nearly enough text data, but maybe pre-training on all the video will be important for robotics).

How far RL scales remains unclear, and even at the very first step of scaling o3 doesn't work as clear evidence because it's still unknown if it's based on GPT-4o or GPT-4.5 (it'll become clearer once there's an API price and more apples-to-apples speed measurements).


  1. This is of course a quote from Sutskever's talk. It was widely interpreted as saying it has just ended, in 2024-2025, but he never put a date on it. I don't think it will end before 2027-2028. ↩︎

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-06T04:28:36.045Z · LW · GW

Without an intelligence explosion, it's around 2030 that scaling through increasing funding runs out of steam and slows down to the speed of chip improvement. This slowdown happens around the same time (maybe 2028-2034) even with a lot more commercial success (if that success precedes the slowdown), because scaling faster takes exponentially more money. So there's more probability density of transformative advances before ~2030 than after, to the extent that scaling contributes to this probability.

That's my reason to see 2030 as a meaningful threshold, Thane Ruthenis might be pointing to it for different reasons. It seems like it should certainly be salient for AGI companies, so a long timelines argument might want to address their narrative up to 2030 as a distinct case.

Comment by Vladimir_Nesov on Give Neo a Chance · 2025-03-06T02:49:24.085Z · LW · GW

(Substantially edited my comment to hopefully make the point clearer.)

Comment by Vladimir_Nesov on Give Neo a Chance · 2025-03-06T02:28:22.114Z · LW · GW

Risk of gradual disempowerment (erosion of control) or short term complete extinction from AI may sound sci-fi if one didn't live taking the idea seriously for years, but it won't be solved using actually sci-fi methods that have no prospect of becoming reality. It's not the consequence that makes a problem important, it is that you have a reasonable attack.

There needs to be a sketch of how any of this can actually be done, and I don't mean the technical side. On the technical side you can just avoid building AI until you really know what you are doing, it's not a problem with any technical difficulty, but the way human society works doesn't allow this to be a feasible plan in today's world.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-06T00:04:43.427Z · LW · GW

It won't go from genius-level to supergenius to superhuman (at general problem-solving or specific domains) overnight. It could take years to make progress in a more human-like style.

But AI speed advantage? It's 100x-1000x faster, so years become days to weeks. Compute for experiments is plausibly a bottleneck that makes it take longer, but at genius human level decades of human theory and software development progress (things not bottlenecked on experiments) will be made by AIs in months. That should help a lot in making years of physical time unlikely to be necessary, to unlock more compute efficient and scalable ways of creating smarter AIs.

Comment by Vladimir_Nesov on On the Rationality of Deterring ASI · 2025-03-05T22:38:38.552Z · LW · GW

Cyberattacks can't disable anything with any reliability or for more than days to weeks though, and there are dozens of major datacenter campuses from multiple somewhat independent vendors. Hypothetical AI-developed attacks might change that, but then there will also be AI-developed information security, adapting to any known kinds of attacks and stopping them from being effective shortly after. So the MAD analogy seems tenuous, the effect size (of this particular kind of intervention) is much smaller, to the extent that it seems misleading to even mention cyberattacks in this role/context.

Comment by Vladimir_Nesov on A Bear Case: My Predictions Regarding AI Progress · 2025-03-05T19:39:19.419Z · LW · GW

I'm not sure raw compute (as opposed to effective compute) GPT-6 (10,000x GPT-4) by 2029 is plausible (without new commercial breakthroughs). Nvidia Rubin is 2026-2027 (models trained on it 2027-2029), so a 2029 model plausibly uses the next architecture after (though it's more likely to come out in early 2030 then, not 2029). Let's say it's 1e16 FLOP/s per chip (BF16, 4x B200) with time cost $4/hour (2x H100), that is $55bn to train for 2e29 FLOPs and 3M chips in the training system if it needs 6 months at 40% utilization (reinforcing the point that 2030 is a more plausible timing, 3M chips is a lot to manufacture). Training systems with H100s cost $50K per chip all-in to build (~BOM not TCO), so assuming it's 2x more for the after-Rubin chips the training system costs $300B to build. Also, a Blackwell chip needs 2 KW all-in (a per-chip fraction of the whole datacenter), so the after-Rubin chip might need 4 KW, and 3M chips need 12 GW.

These numbers need to match the scale of the largest AI companies. A training system ($300bn in capital, 3M of the newest chips) needs to be concentrated in the hands of a single company, probably purpose-built. And then at least $55bn of its time needs to be spent (the rest can go on the inference market after). But in practice almost certainly many times more (using earlier training systems) in experiments that make the final training run possible, so the savings from using the training system for inference after the initial large training run don't really reduce the total cost of the model. The AI company would still need to find approximately the same $300bn to get it done.

The largest spend of similar character is Amazon's 2025 capex of $100bn. A cloud provider builds datacenters, while an AI company might primarily train models instead. So an AI company wouldn't necessarily need to worry about things other than the $300bn dedicated to the frontier model project, while a cloud provider would still need to build inference capacity.

Comment by Vladimir_Nesov on Observations About LLM Inference Pricing · 2025-03-04T04:06:36.572Z · LW · GW

Key constraints are memory for storing KV-caches, scale-up world size (a smaller collection of chips networked at much higher bandwidth than outside such collections), and the number of concurrent requests. A model needs to be spread across many chips to fit in memory, leave enough space for KV-caches, and run faster. If there aren't enough requests for inference, all these chips will be mostly idle, but the API provider will still need to pay for their time. If the model is placed on fewer chips, it won't be able to process too many requests at all because otherwise the chips will run out of memory for KV-caches, and also each request will be processed slower.

So there is a minimal threshold in the number of users needed to serve a model at a given speed with a low cost. GB200 NVL72 is going to change this a lot, since it ups scale-up world size from 8 to 72 GPUs, and a B200 chip has 192 GB of HBM to H100's 80 GB (though H200s have 141 GB). This allows to fit the same model on fewer chips while maintaining high speed (using fewer scale-up worlds) and processing many concurrent requests (having enough memory for many KV-caches), so the inference prices for larger models will probably collapse and Hoppers will become more useful for training experiments than inference (other than for the smallest models). It's a greater change than between A100s and H100s, since both had 8 GPUs per scale-up world.

Comment by Vladimir_Nesov on On GPT-4.5 · 2025-03-04T00:28:23.820Z · LW · GW

I think most of the trouble is conflating recent models like GPT-4o with GPT-4, when they are instead ~GPT-4.25. It's plausible that some already use 4x-5x compute of original GPT-4 (an H100 produces 3x compute of an A100), and that GPT-4.5 uses merely 3x-4x more compute than any of them. The distance between them and GPT-4.5 in raw compute might be quite small.

It shouldn't be at all difficult to find examples where GPT-4.5 is better than the actual original GPT-4 of March 2023, it's not going to be subtle. Before ChatGPT there were very few well-known models at each scale, but now the gaps are all filled in by numerous models of intermediate capability. It's the sorites paradox, not yet evidence of slowdown.

Comment by Vladimir_Nesov on Nina Panickssery's Shortform · 2025-03-03T22:38:00.268Z · LW · GW

Oversight, auditing, and accountability are jobs. Agriculture shows that 95% of jobs going away is not the problem. But AI might be better at the new jobs as well, without any window of opportunity where humans are initially doing them and AI needs to catch up. Instead it's AI that starts doing all the new things well first and humans get no opportunity to become competitive at anything, old or new, ever again.

Even formulation of aligned high-level tasks and intent alignment of AIs make sense as jobs that could be done well by misaligned AIs for instrumental reasons. Which is not even deceptive alignment, but still plausibly segues into gradual disempowerment or sharp left turn.

Comment by Vladimir_Nesov on Will LLM agents become the first takeover-capable AGIs? · 2025-03-02T18:10:39.470Z · LW · GW

Agency (proficient tool use with high reliability for costly actions) might be sufficient to maintain an RSI loop (as an engineer, using settled methodology) even while lacking crucial capabilities (such as coming up with important novel ideas), eventually developing those capabilities without any human input. But even if it works like this, the AI speed advantage might be negated by lack of those capabilities, so that human-led AI research is still faster and mostly bottlenecked by availability and cost of compute.

Comment by Vladimir_Nesov on OpenAI releases GPT-4.5 · 2025-03-02T04:36:51.155Z · LW · GW

1e26 FLOP would have had a significant opportunity cost.

At the end of 2023 Microsoft had 150K+ H100s, so reserving 30K doesn't seem like too much (especially as they can use non-H100 and possibly non-Microsoft compute for research experiments). It's difficult to get a lot of a new chip when it just comes out, or to get a lot in a single training system, or to suddenly get much more if demand surges. But for a frontier training run, there would've been months of notice. And the opportunity cost of not doing this is being left with an inferior model (or a less overtrained model that costs more in inference, and so requires more GPUs to serve for inference).

I don't think it's a good idea to reason backwards from alleging some compute budget that OpenAI might have had at X date, to inferring the training FLOP of a model trained then.

The main anchors are 32K H100s in a single training system, and frontier training compute scaling 4x per year. Currently, a year later, 3e26-6e26 FLOPs models are getting released (based on 100K H100s in Colossus and numbers in the Grok 3 announcement, 100K H100s at Goodyear site, 100K TPUv6e datacenters, Meta's 128K H100s). The $3bn figure was just to point out that $140m following from such anchors is not a very large number.

Comment by Vladimir_Nesov on OpenAI releases GPT-4.5 · 2025-03-02T03:17:37.850Z · LW · GW

65T tokens doesn't get you to 1e26 FLOP with 100B active params?

Right, 45T-65T is for a compute optimal 1e26 model, I did the wrong calculation when editing in this detail. For a 10x overtrained model, it's 3x more data than that, so for 150T total tokens you'd need 5 epochs of 30T tokens, which is still feasible (with almost no degradation compared to 150T unique tokens of that quality). The aim was to calculate this from 260B and 370B reduced 3x (rather than from 100B).

GPT-4.5 being trained on fewer tokens than GPT-4o doesn't really make sense.

How so? If it uses 3x more compute but isn't 10x overtrained, that means less data (with multiple epochs, it would probably use exactly the same unique data, repeated a bit less). The video presentation on GPT-4.5 mentioned work on lower precision in pretraining, so it might even be a 6e26 FLOPs model (though a priori it would be surprising if the first foray into this scale isn't taken at the more conservative BF16). And it would still be less data (square root of 6x is less than 3x). Overtraining has a large effect on both the number of active parameters and the needed number of tokens, at a relatively minor cost in effective compute, thus it's a very salient thing for use in production models.

Comment by Vladimir_Nesov on OpenAI releases GPT-4.5 · 2025-03-02T00:46:41.560Z · LW · GW

There is a report that OpenAI might've been intending to spend $3bn on training in 2024 (presumably mostly for many smaller research experiments), and a claim that the Goodyear site has 3 buildings hosting 100K H100s. One of these buildings is 32K H100s, which at 40% utilization in 3 months produces 1e26 FLOPs (in BF16), which in GPU-time at $2/hour costs $140m. So it seems plausible that Azure already had one of these (or identical) datacenter buildings when GPT-4o was ready to train, and that $140m wasn't too much for a flagship model that carries the brand for another year.

With this amount of compute and the price of $2.5 per 1M input tokens, it's unlikely to be compute optimal. For MoEs at 1e26 FLOPs, it might be compute optimal to have 120-240 tokens/parameter (for 1:8-1:32 sparsity), which is 370B active parameters for a 1:8 sparse MoE or 260B for a 1:32 sparse MoE. Dense Llama-3-405B was $5 per 1M input tokens at probably slimmer margins, so GPT-4o needs to be more like 100B active parameters. Thus 3x less parameters than optimal and 3x more data than optimal (about 45T-65T 135T-190T trained-on tokens, which is reasonable as 3-4 5 epochs of 15T-20T 25T-40T unique tokens), giving 10x overtraining in the value of tokens/parameter compared to compute optimal.

The penalty from 10x overtraining is a compute multiplier of about 0.5x, so a 5e25 FLOPs compute optimal model would have similar performance, but it would have 2x more active parameters than a 10x overtrained 1e26 FLOPs model, which at $70m difference in cost of training should more than pay for itself.