Is this the beginning of the end for LLMS [as the royal road to AGI, whatever that is]?

bill-benzon

Is this the beginning of the end for LLMS [as the royal road to AGI, whatever that is]?

post by Bill Benzon (bill-benzon) · 2023-08-24T14:50:19.312Z · LW · GW · 15 comments

  GPT-3 is a significant achievement.
None
16 comments

It’s hard to tell, but it sure is...shall we say...interesting.

Back in the summer of 2020 when GPT-3 was unveiled I wrote a working paper, GPT-3: Waterloo or Rubicon? Here be Dragons. My objective was to convince myself that the underlying technology wasn’t just some weird statistical fluke, that there was in fact something going on of substantial interest and value. To my mind, I succeeded in that. But I was skeptical as well.

Here's what I put on the first page of that working paper, even before the abstract:

GPT-3 is a significant achievement.
But I fear the community that has created it may, like other communities have done before – machine translation in the mid-1960s, symbolic computing in the mid-1980s, triumphantly walk over the edge of a cliff and find itself standing proudly in mid-air.
This is not necessary and certainly not inevitable.
A great deal has been written about GPTs and transformers more generally, both in the technical literature and in commentary of various levels of sophistication. I have read only a small portion of this. But nothing I have read indicates any interest in the nature of language or mind. Interest seems relegated to the GPT engine itself. And yet the product of that engine, a language model, is opaque. I believe that, if we are to move to a level of accomplishment beyond what has been exhibited to date, we must understand what that engine is doing so that we may gain control over it. We must think about the nature of language and of the mind.

I didn’t expect that anyone with any influence in these matters would pay any attention to me – though one can always hope – but that’s no reason not to write.

That was 2020 and GPT-3. Two years later ChatGPT was launched to great acclaim, and justly so. I certainly spent a great deal of time playing with, investigating it, and writing about it. But I didn’t forget my cautionary remarks from 2020.

Now we’re hearing rumblings that things aren’t working out so well. Back on August 12 the ever skeptical Gary Marcus posted, What if Generative AI turned out to be a Dud? Some possible economic and geopolitical implications. His first two paragraphs:

With the possible exception of the quick to rise and quick to fall alleged room-temperature superconductor LK-99, few things I have ever seen have been more hyped than generative AI. Valuations for many companies are in the billions, coverage in the news is literally constant; it’s all anyone can talk about from Silicon Valley to Washington DC to Geneva.
But, to begin with, the revenue isn’t there yet, and might never come. The valuations anticipate trillion dollar markets, but the actual current revenues from generative AI are rumored to be in the hundreds of millions. Those revenues genuinely could grow by 1000x, but that’s mighty speculative. We shouldn’t simply assume it.

And his last:

If hallucinations aren’t fixable, generative AI probably isn’t going to make a trillion dollars a year. And if it probably isn’t going to make a trillion dollars a year, it probably isn’t going to have the impact people seem to be expecting. And if it isn’t going to have that impact, maybe we should not be building our world around the premise that it is.

FWIW, I believe, and have been saying time and again, that hallucinations seem to me to be inherent in the technology. They aren’t fixable.

Now, yesterday, Ted Gioia, a culture critic with an interest in technology and experience in business, has posted, Ugly Numbers from Microsoft and ChatGPT Reveal that AI Demand is Already Shrinking. Where Marcus has a professional interest in AI technology and has intellectual skin the tech game, Gioia is just a sophisticated and interested observer. Near the end of his post, after many links to unfavorable stories, Gioia observes:

... we can see that the real tech story of 2023 is NOT how AI made everything great. Instead this will be remembered as the year when huge corporations unleashed a half-baked and dangerous technology on a skeptical public—and consumers pushed back.
Here’s what we now know about AI:
Consumer demand is low, and already appears to be shrinking.
Skepticism and suspicion are pervasive among the public.
Even the companies using AI typically try to hide that fact—because they’re aware of the backlash.
The areas where AI has been implemented make clear how poorly it performs.
AI potentially creates a situation where millions of people can be fired and replaced with bots—so a few people at the top continue to promote it despite all these warning signs.
But even these true believers now face huge legal, regulatory, and attitudinal obstacles
In the meantime, cheaters and criminals are taking full advantage of AI as a tool of deception.

Marcus has just updated his earlier post with a followup: The Rise and Fall of ChatGPT?

The situation is very volatile. I certainly don’t know how to predict how things are going to unfold. In the long run, I remain convinced that if we are to move to a level of accomplishment beyond what has been exhibited to date, we must understand what these engines are doing so that we may gain control over them. We must think about the nature of language and of the mind.

Stay tuned.

Cross posted from New Savanna.

15 comments

Comments sorted by top scores.

comment by Quintin Pope (quintin-pope) · 2023-08-24T16:40:06.090Z · LW(p) · GW(p)

Out of curiosity, I skimmed the Ted Gioia linked article and encountered this absolutely wild sentence:

AI is getting more sycophantic and willing to agree with false statements over time.

which is just such a complete misunderstanding of the results from Discovering Language Model Behaviors with Model-Written Evaluations. Instantly disqualified the author from being someone I'd pay attention to for AI-related analysis.

comment by frontier64 · 2023-08-24T20:56:27.743Z · LW(p) · GW(p)

I don't think the body of this post is related to the title. Whether a framework outlines a path to AGI has little to do with consumer takeup of an earlier product based on the same framework.

comment by Paul Tiplady (paul-tiplady) · 2023-08-24T19:47:10.885Z · LW(p) · GW(p)

While of course this is easy to rationalize post hoc, I don’t think falling user count of ChatGPT is a particularly useful signal. There is a possible world where it is useful; something like “all of the value from LLMs will come from people entering text into ChatGPT”. In that world, users giving up shows that there isn’t much value.

In this world, I believe most of the value is (currently) gated behind non-trivial amounts of software scaffolding, which will take man-years of development time to build. Things like UI paradigms for coding assistants, experimental frameworks and research for medical or legal AI, and integrations with existing systems.

There are supposedly north of 100 AI startups in the current Y Combinator batch; the fraction of those that turn into unicorns would be my proposal for a robust metric to pay attention to. Even if it’s par for startups that’s still a big deal, since there was just a major glut in count of startups founded. But if the AI hype is real, more of these than normal will be huge.

Another similar proxy would be VC investment dollars; if that falls off a cliff you could tell a story that even the dumb money isn’t convinced anymore.

Replies from: bill-benzon

↑ comment by Bill Benzon (bill-benzon) · 2023-08-24T20:37:11.310Z · LW(p) · GW(p)

While of course this is easy to rationalize post hoc, I don’t think falling user count of ChatGPT is a particularly useful signal.

I agree with that. Perhaps those who've dropped off were casual users and have become bored. But there are other complaints. The continued existence of confabulation seems more troublesome. OTOH, I can imagine that coding assistance will prove viable. As I said, the situation is quite volatile.

Replies from: Nate Showell, paul-tiplady

↑ comment by Nate Showell · 2023-08-26T02:09:53.496Z · LW(p) · GW(p)

Some other possible explanations for why ChatGPT usage has decreased:

The quality of the product has declined over time
People are using its competitors instead

Replies from: gwern

↑ comment by gwern · 2023-08-26T15:55:03.237Z · LW(p) · GW(p)

There's a lot one could say about this claim:

Recall here the numbers here are substantially fake. The best they can tell you is roughly "this is very big" or "this is very small". If you want to go much beyond that, you are reading sheep entrails. They are not coming from OpenAI but web traffic measurements. Such measurements are notoriously both noisy and highly biased and the biases change over time, and so unsurprisingly, at the time, OAers were saying it had overestimated the actual users by like 100%.
The numbers have lots of ways to be misleading. For example, because Chinese DL was, and still is, so inferior, there was a whole cottage industry of Chinese companies pirating accounts and black markets in credentials. This also applied to all the Third World or embargoed or difficult countries OA has denied access to, either for paying accounts or just period. Then you have people abusing it or sexing with it and getting banned and figuring out how to create new accounts, or moving on.

Just a huge amount of whac-a-mole going on. This obviously causes issues for interpreting any user metrics: a huge decrease in user count might actually reflect a huge increase in users, and vice-versa, depending on how the security arms race is going.
Summer vacation. If you look at the graph, you'll notice a remarkable correlation with the Western academic calendar... Someone confidently proclaiming that ChatGPT use is crashing before seeing the September–November 2023 numbers is giving hostage to fortune.

EDIT: as of 1 October, SimilarWeb and other sources are reporting increasing traffic.
Many, many alternatives spinning up, many of which are using OA as the backend, particularly after the large price drops. These would count in a naive web-traffic approach as OA 'losing users', rather than gaining them.

More broadly, if they are going to true competitors like Claude-2 and do not count as OA users by any definition, that's still damaging to Goia's thesis that 'generative AI is useless' - sure, maybe it's not great for OA but it shows that the users are getting value out of generative AI, and just that they found a better way to get that value.

Personally, I would say that a simple test of OA users/activity would be to look at how they act. OA is presumably getting regular large shipments of GPUs installed into datacenters as fast as MS money can buy them; if OA usage is flat for several months - never mind crashing! - then they should be 'enjoying' a glut of GPUs by now and acting accordingly. Does OA look like it has an embarrassment of GPUs? Or does it look like it is struggling to add capacity as necessary to keep up with constant user growth, and holding back major improvements because it can't afford them, and focusing on optimizing models (even to the detriment of quality) to get more out of its existing GPUs?

Replies from: bill-benzon

↑ comment by Bill Benzon (bill-benzon) · 2023-08-26T18:01:31.757Z · LW(p) · GW(p)

Thanks for this. Very useful.

↑ comment by Paul Tiplady (paul-tiplady) · 2023-08-25T00:42:54.793Z · LW(p) · GW(p)

Confabulation is a dealbreaker for some use-cases (e.g. customer support), and potentially tolerable for others (e.g. generating code when tests / ground-truth is available). I think it's essentially down to whether you care about best-case performance (discarding bad responses) or worst-case performance.

But agreed, a lot of value is dependent on solving that problem.

Replies from: bill-benzon

↑ comment by Bill Benzon (bill-benzon) · 2023-08-25T13:24:43.128Z · LW(p) · GW(p)

As sort of an aside, in some way I think the confabulation is the default mode of human language. We make stuff up all the time. But we have to coordinate with others too, so that places constraints on what we say. Those constraints can be so binding that we've come to think of this socially constrained discourse as 'ground truth' and free of the confabulation impulse. But that's not quite so.

comment by Quintin Pope (quintin-pope) · 2023-08-24T15:44:24.851Z · LW(p) · GW(p)

Seems contradictory to argue both that generative AI is useless and that it could replace millions of jobs.

Replies from: bill-benzon, MattJ

↑ comment by Bill Benzon (bill-benzon) · 2023-08-24T16:35:46.396Z · LW(p) · GW(p)

You're probably right. I note, however, that this is territory that's not been well-charted. So it's not obvious to me just what to make of the inconsistency. It doesn't (strongly) contradict Gioia's main point, which is that LLMs seem to be in trouble in the commercial sphere.

↑ comment by MattJ · 2023-08-24T15:51:39.664Z · LW(p) · GW(p)

I think the auther ment that there was a perception that it could replace millions of jobs, and so an incentive for business to press forward with their implementation plans, but that this would eventually back fire if the hallucination problem is insoluble.

Replies from: quintin-pope

↑ comment by Quintin Pope (quintin-pope) · 2023-08-24T16:25:04.490Z · LW(p) · GW(p)

Perhaps, but that's not the literal meaning of the text.

Here’s what we now know about AI:
[...]
AI potentially creates a situation where millions of people can be fired and replaced with bots [...]

Replies from: MattJ, bill-benzon

↑ comment by MattJ · 2023-08-24T16:41:01.629Z · LW(p) · GW(p)

Yes, but that ”generative AI can potentially replace millions of jobs” is not contradictory to the statement that it eventually ”may turn out to be a dud”.

I initially reacted in the same way as you to the exact same passage but came to the conclusion that it was not illogical. Maybe I’m wrong but I don’t think so.

↑ comment by Bill Benzon (bill-benzon) · 2023-08-24T16:38:48.236Z · LW(p) · GW(p)

You're right, and I don't know what Gioia would say if pressed. But it might be something like: "Millions of people will be replaced by bots and then the businesses will fall apart because the bots don't behave as advertised. So now millions are out of jobs and the businesses that used to employ them are in trouble."

comment by Noosphere89 (sharmake-farah) · 2023-08-25T01:44:35.990Z · LW(p) · GW(p)

Is this the beginning of the end for LLMS [as the royal road to AGI, whatever that is]?

Contents

15 comments