How to Model the Future of Open-Source LLMs?

joel-burget

How to Model the Future of Open-Source LLMs?

post by Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · LW · GW · No comments

This is a question post.

  Answers
    51 gwern
    5 Aaron_Scher
None
No comments

I previously expected open-source LLMs to lag far behind the frontier because they're very expensive to train and naively it doesn't make business sense to spend on the order of $10M to (soon?) $1B to train a model only to give it away for free.

But this has been repeatedly challenged, most recently by Meta's Llama 3. They seem to be pursuing something like a commoditize your complement strategy: https://twitter.com/willkurt/status/1781157913114870187 .

As models become orders-of-magnitude more expensive to train can we expect companies to continue to open-source them?

In particular, can we expect this of Meta?

Answers

answer by gwern · 2024-04-20T20:03:33.239Z · LW(p) · GW(p)

Yes. Commoditize-your-complement dynamics do not come with any set number. They can justify an expense of thousands of dollars, or of billions - it all depends on the context. If you are in a big enough industry, and the profits at stake are large enough, and the investment in question is critical enough, you can justify any number as +EV. (Think of it less as 'investment' and more as 'buying insurance'. Facebook's META market cap is worth ~$1,230 billion right now; how much insurance should its leaders buy against the periodic emergences of new platforms or possible paradigm shifts? Definitely at least in the single billions, one would think...)

And investments of $10m are highly routine and ordinary, and people have already released weights (note: most of these AI releases are not 'open source', including Llama-3) for models with easily $10m of investment before. (Given that a good ML researcher-engineer could have a fully-loaded cost of $1m/year, if you have a small team of 10 and they release a model per year, then you already hit $10m spent the first year.) Consider Linux: if you wanted to make a Linux kernel replacement, which has been tested in battle and supported as many things as it does etc, today, that would probably cost you at least $10 billion, and the creation of Linux has been principally bankrolled by many companies collectively paying for development (for a myriad of reasons and ways). Or consider Android Linux. (Or go through my list and think about how much money it must take to do things like RISC-V.)

If Zuckerberg feels that LLMs are enough of a threat to the Facebook advertising model or creating a new social media which could potentially supersede Facebook (like Instagram and Whatsapp were), then he certainly could justify throwing a billion dollars of compute at a weights release in order to shatter the potential competition into a commoditized race-to-the-bottom. (He's already blown much, much more on VR.)

The main prediction, I think, of commoditize-your-complement is that there is not much benefit to creating the leading-edge model or surpassing the SOTA by a lot. Your motivation is to release the cheapest model which serves as a spoiler model. So Llama-3 doesn't have to be better than GPT-4 to spoil the market for OA: it just needs to be 'good enough'. If you can do that by slightly beating GPT-4, then great. (But there's no appetite to do some amazing moonshot far surpassing SOTA.)

However, because LLMs are moving so fast, this isn't necessarily too useful to point out: Zuckerberg's goal with Llama-3 is not to spoil GPT-4 (which has already been accomplished by Claude-3 and Databricks and some others, I think), but to spoil GPT-5 as well as Claude-4 and unknown competitors. You have to skate to where the puck will be because if you wait for GPT-5 to fully come out before you start spinning up your comoditizer model, your teams will have staled, infrastructure rotted, you'll lose a lot of time, and who knows what will happen with GPT-5 before you finally catch up.

The real killer of Facebook investment would be the threat disappearing and permanent commoditization setting in, perhaps by LLMs sigmoiding hard and starting to look like a fad like 3D TVs. For example, if GPT-5 came out and it was barely distinguishable from GPT-4 and nothing else impressive happened and "DL hit a wall" at long last, then Llama-4 would probably still happen at full strength - since Zuck already bought all those GPUs - but then I would expect a Llama-5 to be much less impressive and be coasting on fumes and not receive another 10 or 100x scaleup, and Facebook DL R&D would return to normal conditions.

EDIT: see https://thezvi.wordpress.com/2024/04/22/on-llama-3-and-dwarkesh-patels-podcast-with-zuckerberg/

answer by Aaron_Scher · 2024-04-19T21:16:55.598Z · LW(p) · GW(p)

Yeah, I think we should expect much more powerful open source AIs than we have now. I've been working on a blog post about this, maybe I'll get it out soon. Here are what seem like the dominant arguments to me:

Scaling curves show strongly diminishing returns to $ spend: A $100m model might not be that far behind a $1b model, performance wise.
There are numerous (maybe 7) actors in the open source world who are at least moderately competent and want to open source powerful models. There is a niche in the market for powerful open source models, and they hurt your closed-source competitors.
I expect there is still tons of low-hanging fruit available in LLM capabilities land. You could call this "algorithmic progress" if you want. This will decrease the compute cost necessary to get a given level of performance, thus raising the AI capability level accessible to less-resourced open-source AI projects. [edit: but not exclusively open-source projects (this will benefit closed developers too). This argument is about the absolute level of capabilities available to the public, not about the gap between open and closed source.]

↑ comment by p.b. · 2024-04-21T14:12:35.109Z · LW(p) · GW(p)

Scaling curves show strongly diminishing returns to $ spend: A $100m model might not be that far behind a $1b model, performance wise.

What's your argument for that?

Replies from: Seth Herd, Aaron_Scher

↑ comment by Seth Herd · 2024-04-22T03:29:45.218Z · LW(p) · GW(p)

I expect there is still tons of low-hanging fruit available in LLM capabilities land. You could call this "algorithmic progress" if you want. This will decrease the compute cost necessary to get a given level of performance, thus raising the AI capability level accessible to less-resourced open-source AI projects.

Don't you expect many of those improvements to remain closed-source from here on out, benefitting the teams that developed them at great (average) expense? And even the ones that are published freely will benefit the leaders just as much as their open-source chasers.

↑ comment by Aaron_Scher · 2024-04-22T18:05:42.684Z · LW(p) · GW(p)

Um, looking at the scaling curves and seeing diminishing returns? I think this pattern is very clear for metrics like general text prediction (cross-entropy loss on large texts), less clear for standard capability benchmarks, and to-be-determined for complex tasks which may be economically valuable.

General text prediction: see Chinchilla, Fig 1 of the GPT-4 technical report
Capability benchmarks: see epoch post, the ~4th figure here
Complex tasks: See GDM dangerous capability evals (Fig 9, which indicates Ultra is not much better than Pro, despite likely being trained on >5x the compute, though training details not public)

To be clear, I'm not saying that a $100m model will be very close to a $1b model. I'm saying that the trends indicate they will be much closer than you would think if you only thought about how big a 10x difference in training compute is, without being aware of the empirical trends of diminishing returns. The empirical trends indicate this will be a relatively small difference, but we don't have nearly enough data for economically valuable tasks / complex tasks to be confident about this.

Replies from: p.b.

↑ comment by p.b. · 2024-04-22T20:05:26.616Z · LW(p) · GW(p)

Diminishing returns in loss are not diminishing returns in capabilities. And benchmarks tend to saturate, so diminishing returns are baked in if you look at those.

I am not saying that there aren't diminishing returns to scale, but I just haven't seen anything definitive yet.

↑ comment by Tristan Wegner (tristan-wegner) · 2024-04-22T10:42:35.785Z · LW(p) · GW(p)

I agree with the premise, but not the conclusion of your last point. Any OpenSource development, that will significantly lower the resource requirements can also be used by closed models to just increased their model/training size for the same cost, thus keeping the gap.

Replies from: Aaron_Scher, nathan-helm-burger

↑ comment by Aaron_Scher · 2024-04-22T17:48:03.468Z · LW(p) · GW(p)

Yeah, these developments benefit close-sourced actors too. I think my wording was not precise, and I'll edit it. This argument about algorithmic improvement is an argument that we will have powerful open source models (and powerful closed-source models), not that the gap between these will necessarily shrink. I think both the gap and the absolute level of capabilities which are open-source are important facts to be modeling. And this argument is mainly about the latter.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-04-23T05:26:07.417Z · LW(p) · GW(p)

Unless there is a 'peak-capabilities wall' that gets hit by current architectures that doesn't get overcome by the combined effects of the compute-efficiency-improving algorithmic improvements. In that case, the gap would close because any big companies that tried to get ahead by just naively increasing compute and having just a few hidden algorithmic advantages would be unable to get very far ahead because of the 'peak-capabilities wall'. It would get cheaper to get to the wall, but once there, extra money/compute/data would be wasted. Thus, a shrinking-gap world.

I'm not sure if there will be a 'peak-capabilities wall' in this way, or if the algorithmic advancements will be creative enough to get around it. The shape of the future in this regard seems highly uncertain to me. I do think it's theoretically possible to get substantial improvements in peak capabilities and also in training/inference efficiencies. Will such improvements keep arriving relatively gradually as they have been? Will there be a sudden glut at some point when the models hit a threshold where they can be used to seek and find algorithmic improvements? Very unclear.

No comments

Comments sorted by top scores.

How to Model the Future of Open-Source LLMs?

Contents

Answers

No comments