Posts

Comments

Comment by Josh You (scrafty) on Chris_Leong's Shortform · 2025-04-18T12:23:46.416Z · LW · GW

But this only works if those less worried about AI risks who join such a collaboration don't use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. It is incredibly damaging to trust within the community.

...This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place.

(note: I work at Epoch) This attitude feels like a recipe for creating an intellectual bubble. Of course people will use the knowledge they gain in collaboration with you for the purposes that they think are best. I think it would be pretty bad for the AI safety community if it just relied on forecasting work from card-carrying AI safety advocates.

Comment by Josh You (scrafty) on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-27T15:39:28.619Z · LW · GW

I think there are two models that you measured time horizon for, Claude 3 Opus, and GPT-4 Turbo, that didn't make it onto the main figure. Is that right? There are 13 models in Figure 5, which shows the time horizon curves for a bunch of models across the full test suite, and only 11 dots on Figure 1.

Comment by Josh You (scrafty) on DAL's Shortform · 2025-03-18T15:56:36.329Z · LW · GW

AI has probably increased valuations for Big Tech (particularly Nvidia) by at least a few trillion over the past two years. So part of this is that investors think OpenAI/Anthropic will only capture around 10% of total AI profits.

Comment by Josh You (scrafty) on OpenAI releases GPT-4.5 · 2025-03-02T02:42:01.743Z · LW · GW

65T tokens doesn't get you to 1e26 FLOP with 100B active params? You'd need well over 100T tokens: 6 * 100 billion * 65 trillion is 3.9e25 FLOP.

GPT-4.5 being trained on fewer tokens than GPT-4o doesn't really make sense. GPT-4.5 only having 5x more active params than GPT-4o doesn't quite make sense either, though I'm not as confident that's wrong.

1e26 FLOP would have had a significant opportunity cost. Remember that OpenAI was and is very GPU constrained and may have valued GPU hours in a large-scale cluster a lot more than $2/hour. It would be worth it to make your flagship model good, but not worth it if it barely has any effect on your flagship model. I don't think it's a good idea to reason backwards from alleging some compute budget that OpenAI might have had at X date, to inferring the training FLOP of a model trained then.

Comment by Josh You (scrafty) on OpenAI releases GPT-4.5 · 2025-03-01T23:55:18.559Z · LW · GW

I don't think GPT-4o was trained on 1e26 FLOP or particularly close to it. Overtraining is common but GPT-4o being overtrained by 10x for 1e26 FLOP is kind of a strong and surprising claim (some models like Llama 3 8b are extremely overtrained but they're small so this overtraining is cheap). I think a more natural explanation is that it improves on GPT-4 because of superior post-training and other innovations.

Comment by Josh You (scrafty) on Daniel Kokotajlo's Shortform · 2025-03-01T23:40:49.719Z · LW · GW

The high cost and slow speed of GPT-4.5 seems like a sign OpenAI is facing data constraints, though we don't actually know the parameters and OpenAI might be charging an bigger margin than usual (it's a "research preview" not a flagship commercial product). If data was more abundant, wouldn't GPT-4.5 be more overtrained and have fewer parameters? 

edit: FWIW Artificial Analysis measures GPT-4.5 at a not-that-bad 50 tokens per second whereas I've been experiencing a painfully slow 10-20 tokens/second in the chat app. So may just be growing pains until they get more inference GPUs online. But OpenAI does call it a "chonky" model, implying significant parameter scaling. 

Comment by Josh You (scrafty) on Vladimir_Nesov's Shortform · 2025-02-21T23:08:39.096Z · LW · GW

if OpenAI follows the usual naming convention of roughly 100x in raw compute.

I doubt this is a real convention. I think OpenAI wanted to call Orion GPT-5 if they thought it was good enough to deserve the name.

Comment by Josh You (scrafty) on Implications of the inference scaling paradigm for AI safety · 2025-01-15T15:52:57.046Z · LW · GW

In Holden Karnofsky's "AI Could Defeat All Of Us Combined" a plausible existential risk threat model is described, in which a swarm of human-level AIs outmanoeuvre humans due to AI's faster cognitive speeds and improved coordination, rather than qualitative superintelligence capabilities. This scenario is predicated on the belief that "once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each." If the first AGIs are as expensive to run as o3-high (costing ~$3k/task), this threat model seems much less plausible.

I wonder how different the reasoning paradigm is, actually, from the picture presented here. After all, running a huge number of AI copies in parallel is... scaling up test-time compute. 

The overhang argument is a rough analogy anyway. I think you are invoking the intuition of replacing the AI equivalent of a very large group of typical humans with the AI equivalent of a small number of ponderous geniuses, but those analogies are going to be highly imperfect in practice.

Comment by Josh You (scrafty) on Daniel Tan's Shortform · 2024-12-23T17:59:31.878Z · LW · GW

By several reports, (e.g. here and here) OpenAI is throwing enormous amounts of training compute at o-series models. And if the new RL paradigm involves more decentralized training compute than the pretraining paradigm, that could lead to more consolidation into a few players, not less, because pretraining* is bottlenecked by the size of the largest cluster. E.g. OpenAI's biggest single compute cluster is similar in size to xAI's, even though OpenAI has access to much more compute overall. But if it's just about who has the most compute then the biggest players will win.

*though pretraining will probably shift to distributed training eventually

Comment by Josh You (scrafty) on The Great Data Integration Schlep · 2024-09-25T20:41:46.422Z · LW · GW

AI systems can presumably be given at least as much access to company data as human employees at that company. So if rapidly scaling up the number and quality of human workers at a given company would be transformative,  AI agents with >=human-level intelligence can also be transformative.

Comment by Josh You (scrafty) on things that confuse me about the current AI market. · 2024-08-30T14:40:48.325Z · LW · GW

I think a little more explanation is required on why there isn't already a model with 5-10x* more compute than GPT-4 (which would be "4.5 level" given that GPT version numbers have historically gone up by 1 for every two OOMs, though I think the model literally called GPT-5 will only be a roughly 10x scale-up). 

You'd need around 100,000 H100s (or maybe somewhat fewer; Llama 3.1 was 2x GPT-4 and trained using 16,000 H100s) to train a model at 10x GPT-4.  This has been available to the biggest hyperscalers since sometime last year. Naively it might take ~9 months from taking delivery of chips to releasing a model (perhaps 3 months to set up the cluster, 3 months for pre-training, 3 months of post-training, evaluations, etc). But most likely the engineering challenges in building a cluster that big, which is unprecedented, and perhaps high demand for inference, has prevented them from concentrating that much compute into one training run in time to release a model by now.

*I'm not totally sure the 5x threshold (1e26 FLOP) hasn't been breached but most people think it hasn't. 

Comment by Josh You (scrafty) on things that confuse me about the current AI market. · 2024-08-30T14:28:26.379Z · LW · GW

Llama 405B was trained on a bunch of synthetic data in post-training for coding, long-context prompts, and tool use (see section 4.3 of the paper).

Comment by Josh You (scrafty) on johnswentworth's Shortform · 2024-06-30T23:25:19.291Z · LW · GW

AI that can rewrite CUDA is a ways off. It's possible that it won't be that far away in calendar time, but it is far away in terms of AI market growth and hype cycles. If GPT-5 does well, Nvidia will reap the gains more than AMD or Google.

Comment by Josh You (scrafty) on What's the status of third vaccine doses? · 2021-08-04T21:14:16.702Z · LW · GW

The US is currently donating doses to other countries in large quantities. Domestically, it has around 54m doses distributed but not used right now. (https://covid.cdc.gov/covid-data-tracker/#vaccinations). Some but certainly not all of those are at risk of expiration. If US authorities recommended booster shots for the general population then that would easily use up the currently unused supply and reduce vaccine exports.

Comment by Josh You (scrafty) on 2014 Less Wrong Census/Survey · 2014-10-30T02:11:38.104Z · LW · GW

I did it, I did it, I did it, yay!

Comment by Josh You (scrafty) on Proportional Giving · 2014-03-03T04:34:02.823Z · LW · GW

A compromise that I find appealing and might implement for myself is giving a fixed percentage over a fixed amount, with that fixed percentage being relatively high (well above ten percent). You could also have multiple "donation brackets" with an increased marginal donation rate as your income increases.

Comment by Josh You (scrafty) on How big of an impact would cleaner political debates have on society? · 2014-02-07T01:24:49.755Z · LW · GW

I doubt an IQ test would be useful at all. One has to be quite intelligent to be a real candidate for presidency.

Comment by Josh You (scrafty) on Arthur Chu: Jeopardy! champion through exemplary rationality · 2014-02-03T05:53:01.313Z · LW · GW

He also likes arguing with Jeff Kaufman about effective altruism.

Comment by Josh You (scrafty) on Physics grad student: how to build employability in programming & finance · 2014-01-10T02:32:24.304Z · LW · GW

Probably shouldn't say someone "probably" has an IQ between 145 and 160 unless you have pretty good evidence.

Comment by Josh You (scrafty) on [LINK] Why I'm not on the Rationalist Masterlist · 2014-01-06T05:14:00.706Z · LW · GW

I think it makes a big difference if the preferred theory is gender/racial equality as opposed to fundamentalist Christianity, and whether the opposition to those perceived challenges result from emotional sensitivity as opposed to blind faith. At the very least, the blog post doesn't indicate that the author would be irrational about issues other than marginalization.

Comment by Josh You (scrafty) on Fascists and Rakes · 2014-01-06T00:33:47.416Z · LW · GW

I don't see how the fact that the permissiveness principle is only based on one (two, actually, including the third one) of the six foundations would imply that it's not a widely-held intuition.

Comment by Josh You (scrafty) on December Monthly Bragging Thread · 2013-12-04T02:08:48.493Z · LW · GW

How risk-averse are you? But even if you aren't, I suspect that right now bitcoins aren't a great investment strictly in expected-value terms due to the high risk that they will decline in value by a lot. No one really knows what will happen, though.

Comment by Josh You (scrafty) on A critique of effective altruism · 2013-12-03T01:50:22.731Z · LW · GW

Another possible critique is that the philosophical arguments for ethical egoism are (I think) at least fairly plausible. The extent to which this is a critique of EA is debatable (since people within the movement state that it's compatible with non-utilitarian ethical theories and that it appeals to people who want to donate for self-interested reasons) but it's something which merits consideration.

Comment by Josh You (scrafty) on Some thoughts on relations between major ethical systems · 2013-11-26T03:56:48.220Z · LW · GW

Ehh, I think that's pretty much what rule util means, though I'm not that familiar with the nuances of the definition so take my opinion with a grain of salt. Rule util posits that we follow those rules with the intent of promoting the good; that's why it's called rule utilitarianism.

Comment by Josh You (scrafty) on Some thoughts on relations between major ethical systems · 2013-11-25T18:40:18.371Z · LW · GW

That would be a form of deontology, yes. I'm not sure which action neo-Kantians would actually endorse in that situation, though.

Comment by Josh You (scrafty) on Some thoughts on relations between major ethical systems · 2013-11-25T17:24:53.061Z · LW · GW

I think that's accurate, though maybe not because the programming jargon is unnecessarily obfuscating. The basic point is that following the rule is good in and of itself. You shouldn't kill people because there is a value in not killing that is independent of the outcome of that choice.

Comment by Josh You (scrafty) on Some thoughts on relations between major ethical systems · 2013-11-25T05:54:14.797Z · LW · GW

Your description of deontological ethics sounds closer to rule consequentialism, which is a different concept. Deontology means that following certain rules is good in and of itself, not because they lead to better decisionmaking (in terms of promoting some other good) in situations of uncertainty.

Comment by Josh You (scrafty) on 2013 Less Wrong Census/Survey · 2013-11-22T17:51:06.478Z · LW · GW

Survey taken. Defected since I'm neutral as to whether the money goes to Yvain or a random survey-taker, but would prefer the money going to me over either of those two.