How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
post by Thane Ruthenis · 2025-03-04T16:23:39.296Z · LW · GW · 1 commentThis is a question post.
Contents
Answers 7 Davidmanheim 6 faul_sname 5 CBiddulph 4 Noosphere89 4 Cole Wyeth 1 wachichornia 1 chezdiogenes None 1 comment
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their productivity, up to 5x'ing/10x'ing it.
It seems clear that this multiplier isn't field-wide, at least. There's no corresponding increase in output, after all.
This would make sense. If you're doing anything nontrivial (i. e., anything other than adding minor boilerplate features to your codebase), LLM tools are fiddly. Out-of-the-box solutions don't Just Work for that purpose. You need to significantly adjust your workflow to make use of them, if that's even possible. Most programmers wouldn't know how to do that/wouldn't care to bother.
It's therefore reasonable to assume that a 5x/10x greater output, if it exists, is unevenly distributed, mostly affecting power users/people particularly talented at using LLMs.
Empirically, we likewise don't seem to be living in the world where the whole software industry is suddenly 5-10 times more productive. It'll have been the case for 1-2 years now, and I, at least, have felt approximately zero impact. I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.
However, I'm also struggling to see the supposed 5-10x'ing anywhere else. If power users are experiencing this much improvement, what projects were enabled by it?
Previously, I'd assumed I didn't know just because I'm living under a rock. So I've tried to get Deep Research to fetch me an overview, and it... also struggled to find anything concrete. Judge for yourself: one, two. The COBOL refactor counts, but that's about it. (Maybe I'm bad at prompting it?)
Even the AGI labs' customer-facing offerings aren't an endless trove of rich features for interfacing with their LLMs in sophisticated ways – even though you'd assume there'd be an unusual concentration of power users there. You have a dialogue box and can upload PDFs to it, that's about it. You can't get the LLM to interface with an ever-growing list of arbitrary software and data types, there isn't an endless list of QoL features that you can turn on/off on demand, etc.[1]
So I'm asking LW now: What's the real-world impact? What projects/advancements exist now that wouldn't have existed without LLMs? And if none of that is publicly attributed to LLMs, what projects have appeared suspiciously fast, such that, on sober analysis, they couldn't have been spun up this quickly in the dark pre-LLM ages? What slice through the programming ecosystem is experiencing 10x growth, if any?
And if we assume that this is going to proliferate, with all programmers attaining the same productivity boost as the early adopters are experiencing now, what would be the real-world impact?
To clarify, what I'm not asking for is:
- Reports full of vague hype about 10x'ing productivity, with no clear attribution regarding what project this 10x'd productivity enabled. (Twitter is full of those, but light on useful stuff actually being shipped.)
- Abstract economic indicators that suggest X% productivity gains. (This could mean anything, including an LLM-based bubble.)
- Abstract indicators to the tune of "this analysis shows Y% more code has been produced in the last quarter". (This can just indicate AI producing code slop/bloat).
- Abstract economic indicators that suggest Z% of developers have been laid off/junior devs can't find work anymore. (Which may be mostly a return to the pre-COVID normal trends.)
- Useless toy examples like "I used ChatGPT to generate the 1000th clone of Snake/of this website!".
- New tools/functions that are LLM wrappers, as opposed to being created via LLM help. (I'm not looking for LLMs-as-a-service, I'm looking for "mundane" outputs that were produced much faster/better due to LLM help.)
I. e.: I want concrete, important real-life consequences.
From the fact that I've observed none of them so far, and in the spirit of Cunningham's Law, here's a tentative conspiracy theory: LLMs mostly do not actually boost programmer productivity on net. Instead:
- N hours that a programmer saves by generating code via an LLM are then re-wasted fixing/untangling that code.
- At a macro-scale, this sometimes leads to "climbing up where you can't get down", where you use an LLM to generate a massive codebase, then it gets confused once a size/complexity threshold is passed, and then you have to start from scratch because the LLM made atrocious/alien architectural decisions. This likewise destroys (almost?) all apparent productivity gains.
- Inasmuch as LLMs actually do lead to people creating new software, it's mostly one-off trinkets/proofs of concept that nobody ends up using and which didn't need to exist. But it still "feels" like your productivity has skyrocketed.
- Inasmuch as LLMs actually do increase the amount of code that goes into useful applications, it mostly ends up spent on creating bloatware/services that don't need to exist. I. e., it actually makes the shipped software worse, because it's written more lazily.
- People who experience LLMs improving their workflows are mostly fooled by the magical effect of asking an LLM to do something in natural language and then immediately getting kinda-working code in response. They fail to track how much they spend integrating and fixing this code, and/or how much the code is actually used.
I don't fully believe this conspiracy theory, it feels like it can't possibly be true. But it suddenly seems very compelling.
I expect LLMs have definitely been useful for writing minor features or for getting the people inexperienced with programming/with a specific library/with a specific codebase get started easier and learn faster. They've been useful for me in those capacities. But it's probably like a 10-30% overall boost, plus flat cost reductions for starting in new domains and for some rare one-off projects like "do a trivial refactor".
And this is mostly where it'll stay unless AGI labs actually crack long-horizon agency/innovations; i. e., basically until genuine AGI is actually there.
Prove me wrong, I guess.
- ^
Just as some concrete examples: Anthropic took ages to add LaTeX support, and why weren't RL-less Deep Research clones offered as a default option by literally everyone 1.5 years ago?
Answers
My understanding of the situations, speaking to people in nirmal firms who code, is that this is all about theory of constraints. As a simplified example, if you previously needed 1 business analyst, one QA tester, and one programmer a day each to do a task, and the programmer's efficiency doubles, or quintuples, the impact on output is zero, because the firm isn't set up to go much faster.
Firms need to rebuild their processes around this to take advantage, and that's only starting to happen, and only at some firms.
↑ comment by Thane Ruthenis · 2025-03-04T18:01:05.407Z · LW(p) · GW(p)
That makes perfect sense to me.
However, shouldn't we expect immediate impact on startups led by early adopters, which could be designed from the ground up around exploiting LLMs? Similar with AGI labs: you'd expect them to rapidly reform around that.
Yet, I haven't heard of a single example of a startup shipping a five-year/ten-year project in one year, or a new software firm eating the lunch of ossified firms ten times its size.
Replies from: robert-k↑ comment by Mis-Understandings (robert-k) · 2025-03-04T18:54:06.556Z · LW(p) · GW(p)
There is no reorganization that can increase the tempo of other organizations (pace of customer feedback), which is often the key bottleneck in software already. The same speed dynamic is not new, it is just in sharper focus.
Inasmuch as LLMs actually do lead to people creating new software, it's mostly one-off trinkets/proofs of concept that nobody ends up using and which didn't need to exist. But it still "feels" like your productivity has skyrocketed.
I've personally found that the task of "build UI mocks for the stakeholders", which was previously ~10% of the time I spent on my job, has gotten probably 5x faster with LLMs. That said, the amount of time I spend doing that part of my job hasn't really gone down, it's just that the UI mocks are now a lot more detailed and interactive and go through more iterations, which IMO leads to considerably better products.
"This code will be thrown away" is not the same thing as "there is no benefit in causing this code to exist".
The other notable area I've seen benefits is in finding answers to search-engine-proof questions - saying "I observe this error within task running on xyz stack, here is how I have kubernetes configured, what concrete steps can I take to debug such the system?"
But it's probably like a 10-30% overall boost, plus flat cost reductions for starting in new domains and for some rare one-off projects like "do a trivial refactor".
Sounds about right - "10-30% overall productivity boost, higher at the start of projects, lower for messy tangled legacy stuff" aligns with my observations, with the nuance that it's not that I am 10-30% more effective at all of my tasks, but rather that I am many times more effective at a few of my tasks and have no notable gains on most of them.
And this is mostly where it'll stay unless AGI labs actually crack long-horizon agency/innovations; i. e., basically until genuine AGI is actually there.
FWIW I think the bottleneck here is mostly context management rather than agency.
↑ comment by RussellThor · 2025-03-04T20:01:48.773Z · LW(p) · GW(p)
Yes to much of this. For small tasks or where I don't have specialist knowledge I can get 10* speed increase - on average I would put 20%. Smart autocomplete like Cursor is undoubtably a speedup with no apparent downside. The LLM is still especially weak where I am doing data science or algorithm type work where you need to plot the results and look at the graph to know if you are making progress.
Yeah, 5x or 10x productivity gains from AI for any one developer seem pretty high, and maybe implausible in most cases. However, note that if 10 people in a 1,000-person company get a 10x speedup, that's only a ~10% overall speedup, which is significant but not enough that you'd expect to be able to clearly point at the company's output and say "wow, they clearly sped up because of AI."
For me, I'd say a lot of my gains come from asking AI questions rather than generating code directly. Generating code is also useful though, especially for small snippets like regex. At Google, we have something similar to Copilot that autocompletes code, which is one of the most useful AI features IMO since the generated code is always small enough to understand. 25% of code at Google is now generated by AI, a statistic which probably mostly comes from that feature.
There are a few small PRs I wrote for my job which were probably 5-10x faster than the counterfactual where I couldn't use AI, where I pretty much had the AI write the whole thing and edited from there. But these were one-offs where I wasn't very familiar with the language (SQL, HTML), which means it would have been unusually slow without AI.
↑ comment by Thane Ruthenis · 2025-03-04T18:14:11.135Z · LW(p) · GW(p)
However, note that if 10 people in a 1,000-person company get a 10x speedup, that's only a ~10% overall speedup
Plausible. Potential counter-argument: software engineers aren't equivalent, "10x engineers" are a thing, and if we assume they're high-agency people always looking to streamline and improve their workflows, we should expect them to be precisely the people who get a further 10x boost from LLMs. Have you observed any specific people suddenly becoming 10x more prolific?
But these were one-offs where I wasn't very familiar with the language (SQL, HTML)
Flat cost reductions, yeah. Though, uniformly slashing the costs on becoming proficient in new sub-domains of programming perhaps could have nontrivial effects on the software industry as a whole...
For example, perhaps the actual impact of LLMs should instead be modeled as all (competent) programmers effectively becoming able to use any and all programming languages/tools at offer (plus knowing of the existence of these tools)? Which, in idealized theory, should lead to every piece of a software project being built using the best tools available for it, rather than being warped by what the specific developer happened to be proficient in.
Re AI coding, some interesting thoughts on this are from Ajeya Cotra's talks (short form, there are a lot of weaknesses, but the real-world programmer productivity is surprisingly high for coding tasks, but is very bad outside of coding tasks, which is why AI's impact is limited so far):
https://x.com/ajeya_cotra/status/1894821432854749456
https://x.com/ajeya_cotra/status/1895161774376436147
Re this:
And this is mostly where it'll stay unless AGI labs actually crack long-horizon agency/innovations; i. e., basically until genuine AGI is actually there.
Prove me wrong, I guess.
My main takeaway is actually kind of different, in that there's less of a core of generality on the default path than people originally thought, and while there is some entanglement on capabilities, there can also be weird spikes and deficits, and this makes the term AGI a lot less useful than people thought.
As you probably know, I have been endorsing this "conspiracy theory" for some time, e.g. roughly here: https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms [LW · GW]
My current job is to develop PoCs and iterate over user feedback. It is a lot of basic and boilerplate. I am handling three projects at the same time when before cursor I would have been managing one and taking longer. I suck at UI and cursor simply solved this for me. We have shipped one of the tools and are finalizing shipping the second one, but they are LLM wrappers indeed designed to summarize or analyze text for customer support purposes and GTR related stuff. The UI iteration however has immensely helped and accelerated.
1 comment
Comments sorted by top scores.
comment by Tao Lin (tao-lin) · 2025-03-04T18:18:54.198Z · LW(p) · GW(p)
Empirically, we likewise don't seem to be living in the world where the whole software industry is suddenly 5-10 times more productive. It'll have been the case for 1-2 years now, and I, at least, have felt approximately zero impact. I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.
Diminishing returns! Scaling laws! One concrete version of "5x productivity" is "as much productivity as 5 copies of me in parallel", and we know that usually 5x-ing most inputs, like training compute and data, # of employees, etc, more often scales logarithmically instead of linearly