GPTs are Predictors, not Imitators

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-04-08T19:59:13.601Z · LW · GW · 99 comments

Contents

100 comments

(Related text posted to Twitter; this version is edited and has a more advanced final section.)

Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.

Koan:  Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text?  What factors make that task easier, or harder?  (If you don't have an answer, maybe take a minute to generate one, or alternatively, try to predict what I'll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)


Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.

GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:

There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.

Indeed, in general, you've got to be more intelligent to predict particular X, than to generate realistic X.  GPTs are being trained to a much harder task than GANs.

Same spirit: <Hash, plaintext> pairs, which you can't predict without cracking the hash algorithm, but which you could far more easily generate typical instances of if you were trying to pass a GAN's discriminator about it (assuming a discriminator that had learned to compute hash functions).


Consider that some of the text on the Internet isn't humans casually chatting. It's the results section of a science paper. It's news stories that say what happened on a particular day, where maybe no human would be smart enough to predict the next thing that happened in the news story in advance of it happening.

As Ilya Sutskever compactly put it, to learn to predict text, is to learn to predict the causal processes of which the text is a shadow.

Lots of what's shadowed on the Internet has a *complicated* causal process generating it.


Consider that sometimes human beings, in the course of talking, make errors.

GPTs are not being trained to imitate human error. They're being trained to *predict* human error.

Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.

If you then ask that predictor to become an actress and play the character of you, the actress will guess which errors you'll make, and play those errors.  If the actress guesses correctly, it doesn't mean the actress is just as error-prone as you.


Consider that a lot of the text on the Internet isn't extemporaneous speech. It's text that people crafted over hours or days.

GPT-4 is being asked to predict it in 200 serial steps or however many layers it's got, just like if a human was extemporizing their immediate thoughts.

A human can write a rap battle in an hour.  A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.


Or maybe simplest:

Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."

Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -

Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.

The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.

When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.

GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.


Figuring out that your next utterance is 'mongo' is not mostly a question, I'd guess, of that mighty Mind being hammered into the shape of a thing that can simulate arbitrary humans, and then some less intelligent subprocess being responsible for adapting the shape of that Mind to be you exactly, after which it simulates you saying 'mongo'.  Figuring out exactly who's talking, to that degree, is a hard inference problem which seems like noticeably harder mental work than the part where you just say 'mongo'.

When you predict how to chip a flint handaxe, you are not mostly a causal process that behaves like a flint handaxe, plus some computationally weaker thing that figures out which flint handaxe to be.  It's not a problem that is best solved by "have the difficult ability to be like any particular flint handaxe, and then easily figure out which flint handaxe to be".


GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

GPTs are not Imitators, nor Simulators, but Predictors.

99 comments

Comments sorted by top scores.

comment by DragonGod · 2023-04-08T22:19:09.529Z · LW(p) · GW(p)

GPTs are not Imitators, nor Simulators, but Predictors.

I think an issue is that GPT is used to mean two things:

  1. A predictive model whose output is a probability distribution over token space given its prompt and context
  2. Any particular techniques/strategies for sampling from the predictive model to generate responses/completions for a given prompt.

[See the Appendix]

 

The latter kind of GPT, is what I think is rightly called a "Simulator".

 

From @janus [LW · GW]' Simulators [LW · GW] (italicised by me):

I use the generic term “simulator” to refer to models trained with predictive loss on a self-supervised dataset, invariant to architecture or data type (natural language, code, pixels, game states, etc). The outer objective of self-supervised learning is Bayes-optimal conditional inference over the prior of the training distribution, which I call the simulation objective, because a conditional model can be used to simulate rollouts which probabilistically obey its learned distribution by iteratively sampling from its posterior (predictions) and updating the condition (prompt). Analogously, a predictive model of physics can be used to compute rollouts of phenomena in simulation. A goal-directed agent which evolves according to physics can be simulated by the physics rule parameterized by an initial state, but the same rule could also propagate agents with different values, or non-agentic phenomena like rocks. This ontological distinction between simulator (rule) and simulacra (phenomena) applies directly to generative models like GPT.

 

It is exactly because of the existence of GPT the predictive model, that sampling from GPT is considered simulation; I don't think there's any real tension in the ontology here.


Appendix

Credit for highlighting this distinction [LW · GW] belongs to @Cleo Nardo [LW · GW]: 

Remark 2: "GPT" is ambiguous 

We need to establish a clear conceptual distinction between two entities often referred to as "GPT" —

  • The autoregressive language model  which maps a prompt  to a distribution over tokens .
  • The dynamic system that emerges from stochastically generating tokens using  while also deleting the start token

Don't conflate them! These two entities are distinct and must be treated as such. I've started calling the first entity "Static GPT" and the second entity "Dynamic GPT", but I'm open to alternative naming suggestions. It is crucial to distinguish these two entities clearly in our minds because they differ in two significant ways: capabilities and safety.

  1. Capabilities:
    1. Static GPT has limited capabilities since it consists of a single forward pass through a neural network and is only capable of computing functions that are O(1). In contrast, Dynamic GPT is practically Turing-complete, making it capable of computing a vast range of functions.
  2. Safety:
    1. If mechanistic interpretability is successful, then it might soon render Static GPT entirely predictable, explainable, controllable, and interpretable. However, this would not automatically extend to Dynamic GPT. This is because Static GPT describes the time evolution of Dynamic GPT, but even simple rules can produce highly complex systems. 
    2. In my opinion, Static GPT is unlikely to possess agency, but Dynamic GPT has a higher likelihood of being agentic. An upcoming article will elaborate further on this point.

This remark is the most critical point in this article. While Static GPT and Dynamic GPT may seem similar, they are entirely different beasts.

To summarise:

  • Static GPT: GPT as predictor
  • Dynamic GPT: GPT as simulator
Replies from: janus, Veedrac
comment by janus · 2023-04-08T23:04:22.249Z · LW(p) · GW(p)

Predictors are (with a sampling loop) simulators! That's the secret of mind

Replies from: martin-vlach
comment by Martin Vlach (martin-vlach) · 2023-12-05T16:10:52.905Z · LW(p) · GW(p)

Do not say the sampling too lightly, there is likely an amazing delicacy around it.'+)

comment by Veedrac · 2023-04-09T18:26:31.978Z · LW(p) · GW(p)

It is exactly because of the existence of GPT the predictive model, that sampling from GPT is considered simulation; I don't think there's any real tension in the ontology here.

EY gave a tension, or at least a way in which viewing Simulators as a semantic primitive, versus an approximate consequence of a predictive model, is misleading. I'll try to give it again from another angle.

To give the sort of claim worth objecting to, and I think is an easy trap to get caught on even though I don't think the original Simulators post was confused, here is a quote from that post: “GPT doesn’t seem to care which agent it simulates, nor if the scene ends and the agent is effectively destroyed.” Namely, the idea is that a GPT rollout is a stochastic sample of a text generating source, or possibly a set of them in superposition.

Consider again the task of predicting first a cryptographic hash and then the text which hashes to it, or rather the general class of algorithms for which the forward pass (hashing) is tractable for the network and the backwards pass (breaking the hash) is not, for which predicting cryptographic hashes is a limiting case.

If a model rollout was primarily trying to be a superposition of one or more coherent simulations, there is a computationally tractable approach to this task: internally sample a set of phrases, then compute their hashes, then narrow down the subset of sampled hashes as the hash is sampled from, then output the prior text.

Instead, a GPT model will produce a random semantically-meaningless hash and then sample unrelated text. Even if seeded from the algorithm above, backprop will select away from the superposition and towards the distributional, predictive model. This holds even in the case where the GPT has an entropy source that would allow it to be distributionally perfect when rolled out from the start! Backprop will still say no, your goal is prediction, not simulation. As EY says, this is not a GAN.

Again, I don't think the original Simulators post was necessarily confused about any of this, but I also agree with this post that the terminology is imprecise and the differences can be important.

Replies from: david-johnston
comment by David Johnston (david-johnston) · 2023-04-09T22:30:46.003Z · LW(p) · GW(p)

I can see why your algorithm is hard for GPT — unless it predicts the follow up string perfectly, there’s no benefit to hashing correctly — but I don’t see why it’s impossible. What if it perfectly predicts the follow up?

Replies from: Veedrac
comment by Veedrac · 2023-04-09T22:41:51.234Z · LW(p) · GW(p)

This is by construction: I am choosing a task for which one direction is tractable and the other is not. The existence of such tasks follows from standard cryptographic arguments, the specifics of the limiting case are less relevant.

If you want to extrapolate to models strong enough to beat SHA256, you have already conceded EY's point as this is a superhuman task at least relative to the generators of the training data, but anyway there will still exist similar tasks of equal or slightly longer length for which it will hold again because of basic cryptographic arguments, possibly using a different hashing scheme.

Note that this argument requires the text to have sufficiently high entropy for the hash to not be predictable a priori.

Replies from: david-johnston
comment by David Johnston (david-johnston) · 2023-04-09T23:07:11.169Z · LW(p) · GW(p)

It’s the final claim I’m disputing - that the hashed text cannot itself be predicted. There’s still a benefit to going from e.g. to probability of a correct hash. It may not be a meaningful difference in practice, but there’s still a benefit in principle, and in practice it could also just generalise a strategy it learned for cases with low entropy text.

Replies from: Veedrac
comment by Veedrac · 2023-04-09T23:23:21.090Z · LW(p) · GW(p)

The mathematical counterpoint is that this again only holds for sufficiently low entropy completions, which need not be the case, and if you want to make this argument against computronium suns you run into issues earlier than a reasonably defined problem statement does.

The practical counterpoint is that from the perspective of a simulator graded by simulation success, such an improvement might be marginally selected for, because epsilon is bigger than zero, but from the perspective of the actual predictive training dynamics, a policy with a success rate that low is ruthlessly selected against, and the actual policy of selecting the per-token base rate for the hash dominates, because epsilon is smaller than 1/64.

Replies from: david-johnston
comment by David Johnston (david-johnston) · 2023-04-09T23:43:34.430Z · LW(p) · GW(p)

Are hash characters non uniform? Then I’d agree my point doesn’t stand

Replies from: Veedrac
comment by Veedrac · 2023-04-10T00:04:55.793Z · LW(p) · GW(p)

They typically are uniform, but I think this feels like not the most useful place to be arguing minutia, unless you have a cruxy point underneath I'm not spotting. “The training process for LLMs can optimize for distributional correctness at the expense of sample plausibility, and are functionally different to processes like GANs in this regard” is a clarification with empirically relevant stakes, but I don't know what the stakes are for this digression.

Replies from: david-johnston
comment by David Johnston (david-johnston) · 2023-04-10T00:45:54.169Z · LW(p) · GW(p)

I was just trying to clarify the limits of autoregressive vs other learning methods. Autoregressive learning is at an apparent disadvantage if is hard to compute and the reverse is easy and low entropy. It can “make up for this” somewhat if it can do a good job of predicting from , but it’s still at a disadvantage if, for example, that’s relatively high entropy compared to from . That’s it, I’m satisfied.

comment by Jan_Kulveit · 2023-04-09T15:44:21.130Z · LW(p) · GW(p)

While the claim - the task ‘predict next token on the internet’ absolutely does not imply learning it caps at human-level intelligence - is true, some parts of the post and reasoning leading to the claims at the end of the post are confused or wrong. 

Let’s start from the end and try to figure out what goes wrong.

GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe. For example: a friend of mine worked at analysing the data from LHC, leading to the Higgs detection paper. Doing this type of work basically requires a human brain to have a predictive model of aggregates of outputs of a very large number of collisions of high-energy particles, processed by a complex configuration of computers and detectors. 


Where GPT and humans differ is not some general mathematical fact about the task,  but differences in what sensory data is a human and GPT trying to predict, and differences in cognitive architecture and ways how the systems are bounded. The different landscape of both boundedness and architecture can lead to both convergent cognition (thinking as the human would do) and the opposite, predicting what the human would output in highly non-human way. 

The boundedness is overall a central concept here. Neither humans nor GPTs are attempting to solve ‘how to predict stuff with unlimited resources’, but a problem of cognitive economy - how to allocate limited computational resources to minimise prediction error.
 

Or maybe simplest:
 Imagine somebody telling you to make up random words, and you say, "Morvelkainen bloombla ringa mongo."

 Imagine a mind of a level - where, to be clear, I'm not saying GPTs are at this level yet -

 Imagine a Mind of a level where it can hear you say 'morvelkainen blaambla ringa', and maybe also read your entire social media history, and then manage to assign 20% probability that your next utterance is 'mongo'.

The fact that this Mind could double as a really good actor playing your character, does not mean They are only exactly as smart as you.

 When you're trying to be human-equivalent at writing text, you can just make up whatever output, and it's now a human output because you're human and you chose to output that.

 GPT-4 is being asked to predict all that stuff you're making up. It doesn't get to make up whatever. It is being asked to model what you were thinking - the thoughts in your mind whose shadow is your text output - so as to assign as much probability as possible to your true next word.

 

If I try to imagine a mind which is able to predict my next word when asked to make up random words, and be successful at assigning 20% probability to my true output, I’m firmly in the realm of weird and incomprehensible Gods. If the Mind is imaginably bounded and smart, it seems likely it would not devote much cognitive capacity to trying to model in detail strings prefaced by a context like ‘this is a list of random numbers’, in particular if inverting the process generating the numbers would seem really costly. Being this good at this task would require so much data and cheap computation that this is way beyond superintelligence, in the realm of philosophical experiments.

Overall I think it is really unfortunate way how to think about the problem, where a system which is moderately hard to comprehend (like GPT) is replaced by something much more incomprehensible. Also it seems a bit of a reverse intuition pump - I’m pretty confident most people's intuitive thinking about this ’simplest’ thing will be utterly confused.

How did we got here?

 

 A human can write a rap battle in an hour.  A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.

 

Apart from the fact that humans are also able to rap battle or impro on the fly, notice that “what would the loss function like the system to do”  in principle tells you very little about what the system will do. For example, the human loss function makes some people attempt to predict winning lottery numbers. This is an impossible task for humans and you can’t say much about the human based on this. Or you can speculate about minds which would be able to succeed in this task, but you soon get into the realm of Gods and outside of physics.
 

Consider that sometimes human beings, in the course of talking, make errors.

GPTs are not being trained to imitate human error. They're being trained to *predict* human error.

Consider the asymmetry between you, who makes an error, and an outside mind that knows you well enough and in enough detail to predict *which* errors you'll make.


Again, from the cognitive economy perspective, predicting my errors would often be wasteful.  With some simplification, you can imagine I make two types of errors - systematic, and random. Often the simplest way how to predict the systematic error would be to emulate the process which led to the error.  Random errors are ...  random, and a mind which knows me in enough detail to predict which random errors I’ll make seems a bit like the mind predicting the lottery numbers.

Consider that somewhere on the internet is probably a list of thruples: <product of 2 prime numbers, first prime, second prime>.

GPT obviously isn't going to predict that successfully for significantly-sized primes, but it illustrates the basic point:

There is no law saying that a predictor only needs to be as intelligent as the generator, in order to predict the generator's next token.
 

 The general claim that some predictions are really hard and you need superhuman powers to be good at them is true, but notice that this does not inform us about what GPT-x will learn. 
 

Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all the text on the Internet.

Koan:  Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text?  What factors make that task easier, or harder?  


Yes this is clearly true: in the limit the task is of unlimited difficulty.  

 

Replies from: Eliezer_Yudkowsky, Maxc
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-04-09T18:56:04.215Z · LW(p) · GW(p)

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs

I didn't say that GPT's task is harder than any possible perspective on a form of work you could regard a human brain as trying to do; I said that GPT's task is harder than being an actual human; in other words, being an actual human is not enough to solve GPT's task.

Replies from: Jan_Kulveit
comment by Jan_Kulveit · 2023-04-11T08:06:21.904Z · LW(p) · GW(p)

I don't see how the comparison of hardness of 'GPT task' and 'being an actual human' should technically work - to me it mostly seems like a type error. 

- The task 'predict the activation of photoreceptors in human retina' clearly has same difficulty as 'predict next word on the internet' in the limit. (cf Why Simulator AIs want to be Active Inference AIs [LW · GW])

- Maybe you mean something like task + performance threshold. Here 'predict the activation of photoreceptors in human retina well enough to be able to function as a typical human' is clearly less difficult than task + performance threshold 'predict next word on the internet, almost perfectly'. But this comparison does not seem to be particularly informative.

- Going in this direction we can make comparisons between thresholds closer to reality e.g. 'predict the activation of photoreceptors in human retina, and do other similar computation well enough to be able to function as a typical human'  vs. 'predict next word on the internet, at the level of GPT4' . This seems hard to order - humans are usually able to do the human task and would fail at the GPT4 task at GPT4 level; GPT4 is able to do the GPT4 task and would fail at the human task. 

- You can't make an ordering between cognitive systems based on 'system A can't do task T system B can, therefore B>A' . There are many tasks which human's can't solve, but this implies very little. E.g. a human is unable to remember 50 thousand digit random number and my phone can easily, but there are also many things which human can do and my phone can't.

From the above the possibly interesting direction of comparisons of 'human skills' and 'GPT-4 skills' is something like 'why can't GPT4 solve the human task at human level' and 'why can't human solve the GPT task on GPT4 level' and 'why are the skills are a bit hard to compare'.

Some thoughts on this

- GPT4 clearly is "width superhuman": it's task is ~modelling of textual output of the whole humanity. This isn't a great fit for the architecture and bounds of a single human mind roughly for the same reasons why a single human mind would do worse than Amazon recommender in recommending products to each of hundred million users. In contrast a human would probably do better in recommending products to one specific user whose preferences the human recommender would try to predict in detail.

Humanity as a whole would probably do significantly better at this task, if you e.g. imagine assigning every human one other human to model (and study in depth, read all their text outputs, etc) 

- GPT4 clearly isn't "samples -> abstractions" better than humans, needing more data to learn the pattern.

- With overall ability to find abstractions, it seems unclear to what extent did GPT "learn smart algorithms independently because they are useful to predict human outputs" vs. "learned smart algorithms because they are implicitly reflected in human text", and at the current level I would expect a mixture of both

 

Replies from: Eliezer_Yudkowsky, viluon
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-04-21T01:37:40.889Z · LW(p) · GW(p)

What the main post is responding to is the argument:  "We're just training AIs to imitate human text, right, so that process can't make them get any smarter than the text they're imitating, right?  So AIs shouldn't learn abilities that humans don't have; because why would you need those abilities to learn to imitate humans?"  And to this the main post says, "Nope."

The main post is not arguing:  "If you abstract away the tasks humans evolved to solve, from human levels of performance at those tasks, the tasks AIs are being trained to solve are harder than those tasks in principle even if they were being solved perfectly."  I agree this is just false, and did not think my post said otherwise.

Replies from: Jan_Kulveit
comment by Jan_Kulveit · 2024-04-22T11:43:02.633Z · LW(p) · GW(p)

I do agree the argument "We're just training AIs to imitate human text, right, so that process can't make them get any smarter than the text they're imitating, right?  So AIs shouldn't learn abilities that humans don't have; because why would you need those abilities to learn to imitate humans?" is wrong and clearly the answer is "Nope". 

At the same time I do not think parts of your argument in the post are locally valid or good justification for the claim.

Correct and locally valid argument why GPTs are not capped by human level was already written here [LW · GW].

In a very compressed form, you can just imagine GPTs have text as their "sensory inputs" generated by the entire universe, similarly to you having your sensory inputs generated by the entire universe. Neither human intelligence nor GPTs are constrained by the complexity of the task (also: in the abstract, it's the same task).  Because of that, "task difficulty" is not a promising way how to compare these systems, and it is necessary to look into actual cognitive architectures and bounds. 

With the last paragraph, I'm somewhat confused by what you mean by "tasks humans evolved to solve". Does e.g. sending humans to the Moon, or detecting Higgs boson, count as a "task humans evolved to solve" or not? 

comment by viluon · 2023-04-16T11:04:06.549Z · LW(p) · GW(p)

I'd really like to see Eliezer engage with this comment, because to me it looks like the following sentence's well-foundedness is rightly being questioned.

it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

While I generally agree that powerful optimizers are dangerous, the fact that the GPT task and the "being an actual human" task are somewhat different has nothing to do with it.

comment by Max H (Maxc) · 2023-04-11T00:06:16.735Z · LW(p) · GW(p)

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs. Unbounded version of the task is basically of same generality and difficulty as what GPT is doing, and is roughly equivalent to understand everything what is understandable in the observable universe.

 

Yes, human brains can be regarded as trying to solve the problem of minimizing prediction error given their own sensory inputs, but no one is trying to push up the capabilities of an individual human brain as fast as possible to make it better at actually doing so. Lots of people are definitely trying this for GPTs, measuring their progress on harder and harder tasks as they do so, some of which humans already cannot do on their own. 

Or, another way of putting it: during training, a GPT is asked to solve a concrete problem no human is capable of or expected to solve. When GPT fails to make an accurate prediction, it gets modified into something that might do better next time. No one performs brain surgery on a human any time they make a prediction error.

Replies from: Jan_Kulveit
comment by Jan_Kulveit · 2023-04-11T08:28:52.520Z · LW(p) · GW(p)

This seems the same confusion again.

Upon opening your eyes, your visual cortex is asked to solve a concrete problem no brain is capable or expected to solve perfectly: predict sensory inputs.  When the patterns of firing don't predict the photoreceptor activations, your brain gets modified into something else, which may do better next time. Every time your brain fails to predict it's visual field, there is a bit of modification, based on computing what's locally a good update.

There is no fundamental difference in the nature of the task. 

Where the actual difference is are the computational and architectural bounds of the systems.  

The smartness of neither humans nor GPTs is bottlenecked by the difficulty of the task, and you can not say how smart the systems are by looking at the problems. To illustrate that  fallacy with a very concrete example:

Please do this task: prove P ≠ NP in next 5 minutes.  You will get  $1M if you do.

Done?

Do you think you have become much smarter mind because of that? I doubt do - but you were given a very hard task, and a high reward.

The actual strategic difference and what's scary isn't the difficulty of the task, but the fact human brain's don't multiple their size every few months. 

(edited for clarity)

Replies from: Maxc
comment by Max H (Maxc) · 2023-04-11T13:56:18.645Z · LW(p) · GW(p)

Do you think you have become much smarter mind because of that? I doubt do - but you were given a very hard task, and a high reward.

No, but I was able to predict my own sensory input pretty well, for those 5 minutes. (I was sitting in a quiet room, mostly pondering how I would respond to this comment, rather than the actual problem you posed. When I closed my eyes, the sensory prediction problem got even easier.)

You could probably also train a GPT on sensory inputs (suitably encoded) instead of text, and get pretty good predictions about future sensory inputs.

Stepping back, the fact that you can draw a high-level analogy between neuroplasticity in human brains <=> SGD in transformer networks, and sensory input prediction <=> next token prediction doesn't mean you can declare there is "no fundamental difference" in the nature of these things, even if you are careful to avoid the type error in your last example.

In the limit (maybe) a sufficiently good predictor could perfectly predict both sensory input and tokens, but the point is that the analogy breaks down in the ordinary, limited case, on the kinds of concrete tasks that GPTs and humans are being asked to solve today. There are plenty of text manipulation and summarization problems that GPT-4 is already superhuman at, and SGD can already re-weight a transformer network much more than neuroplasticity can reshape a human brain.

comment by Razied · 2023-04-08T22:15:50.881Z · LW(p) · GW(p)

I will try to explain Yann Lecun's argument against auto-regressive LLMs, which I agree with. The main crux of it is that being extremely superhuman at predicting the next token from the distribution of internet text does not imply the ability to generate sequences of arbitrary length from that distribution.

GPT4's ability to impressively predict the next token depends very crucially on the tokens in its context window actually belonging to the distribution of internet text written by humans. When you run GPT in sampling mode, every token you sample from it takes it ever so slightly outside the distribution it was trained on. At each new generated token it still assumes that the past 999 tokens were written by humans, but since its actual input was generated partly by itself, as the length of the sequence you wish to predict increases, you take GPT further and further outside of the distribution it knows. 

The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like "the queen was captured" when in fact the queen was not captured. This is not the kind of mistake that chess books make, so it truly takes it out of distribution. What ends up happening is that GPT conditions its future output on its mistake being correct, which takes it even further outside the distribution of human text, until this diverges into nonsensical moves. 

As GPT becomes better, the length of the sequences it can convincingly generate increases, but the probability of a sequence being correct is (1-e)^n, cutting the error rate in half (a truly outstanding feat) merely doubles the length of its coherent sequences. 

To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. You'd need to take all physics books ever written, intersperse them with LLM continuations, then have humans write the corrections to the continuations, like "oh, actually we made a mistake in the last paragraph, here is the correct way to relate pressure to temperature in this problem...". This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet. 

The conclusion that Lecun comes to: auto-regressive LLMs are doomed.

Replies from: Nanda Ale, martin-randall, AllAmericanBreakfast, cubefox, Razied, Roman Leventov, rotatingpaguro, mr-hire, sky-moo, sharps030, thoth-hermes
comment by Nanda Ale · 2023-04-09T06:43:52.616Z · LW(p) · GW(p)

The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like "the queen was captured" when in fact the queen was not captured. This is not the kind of mistake that chess books make, so it truly takes it out of distribution. What ends up happening is that GPT conditions its future output on its mistake being correct, which takes it even further outside the distribution of human text, until this diverges into nonsensical moves. 

 

Is this a limitation in practice? Rap Battles are a bad example because they happen to be the exception of a task premised on being "one shot" and real time, but the overall point stands. We ask GPT to do tasks in one try, one step, that humans do with many steps, iteratively and recursively.

Take this "the queen was captured" problem. As a human I might be analyzing a game, glance at the wrong move, think a thought about the analysis premised on that move (or even start writing words down!) and then notice the error and just fix it. I am doing this right now, in my thoughts and on the keyboard, writing this comment.

Same thing works with ChatGPT, today. I deal with problems like "the queen was captured" every day just by adding more ChatGPT steps. Instead of one-shotting, every completion chains a second ChatGPT prompt to check for mistakes. (You may need a third level to get to like 99% because the checker blunders too.) The background chains can either ask to regenerate the original prompt, or reply to the original ChatGPT describing the error, and ask it to fix its mistake. The latter form seems useful for code generation.

Like right now I typically do 2 additional background chains by default, for every single thing I ask Chat GPT. Not just in a task where I'm seeking rigour and want to avoid factual mistakes like "the queen was captured" but just to get higher quality responses in general.

Original Prompt -> Improve this answer. -> Improve this Answer. 

Not literally just those three words, but even something that simple is actually better than just asking one time. Seriously. Try it, confirm, and make it a habit. Sometimes it's shocking. I ask for a simple javascript function, it pumps out a 20 line function that looks fine to me. I habitually ask for a better version and "Upon reflection, you can do this in two lines of javascript that run 100x faster."

If GPT were 100x cheaper I would be tempted just go wild with this. Every prompt is 200 or 300 prompts in the background, invisibly, instead of 2 or 3. I'm sure there's diminishing returns and the chain would be more complicated than repeating "Improve"  100 times, but it were fast and cheap enough, why not do it. 

As an aside, I think about asking ChatGPT to write code like asking a human to code a project on a whiteboard without the internet to find answers, a computer to run code on, or even paper references. The human can probably do it, sort of, but I bet the code will have tons of bugs and errors and even API 'hallucinations' if you run it! I think it's even worse than that, it's almost like ChatGPT isn't even allowed to erase anything it wrote on white board either. But we don't need to one shot everything, so do we care about infinite length completions? Humans do things in steps, and when ChatGPT isn't trying to whiteboard everything, when it can check API references, when it can see what the code returns, errors, when it can recurse on itself to improve things, it's way better. Right now the form this takes is a human on the ChatGPT web page asking for code, running it, and then pasting the error message back into ChatGPT. The more automated versions of this are trickling out. Then I imagine the future, asking ChatGPT for code when its 1000x cheaper. And my one question behind the scenes is actually 1000 prompts looking up APIs on the internet, running the code in a simulator (or for real, people are already doing that) looking at the errors or results, etc. And that's the boring unimaginative extrapolation. 

Also this is probably obvious, but just in case: if you try asking "Improve this answer." repeatedly in ChatGPT you need to manage your context window size. Migrate to a new conversation when you get about 75% full. OpenAI should really warn you because even before 100% the quality drops like a rock. Just copy your original request and the last best answer(s). If you're doing it manually select a few useful other bits too. 

Replies from: Razied
comment by Razied · 2023-04-09T10:15:09.768Z · LW(p) · GW(p)

I think you've had more luck than me when trying to get chatGPT to correct its own mistakes. When I tried making it play chess, I told it to "be sure not to output your move before writing a paragraph of analysis on the current board position, and output 5 good moves and the reasoning behind them, all of this before giving me your final move." Then after it chose its move I told it "are you sure this is a legal move? and is this really the best move?", it pretty much never changed its answer, and never managed to figure out that its illegal moves were illegal. If I straight-up told it "this move is illegal", it would excuse itself and output something else, and sometimes it correctly understood why its move was illegal, but not always.

so do we care about infinite length completions?

The inability of the GPT series to generate infinite length completions is crucial for safety! If humans fundamentally need to be in the loop for GPT to give us good outputs for things like scientific reasoning, then it makes the whole thing suddenly way safer, and we can be assured that there isn't an instance of GPT running on some amazon server self-improving itself by just doing a thousand years of scientific progress in a week.

Replies from: faul_sname
comment by faul_sname · 2023-04-11T05:51:01.329Z · LW(p) · GW(p)

Does the inability of the GPT series to generate infinite length completions require that humans specifically remain in the loop, or just that the external world must remain in the loop in some way which gets the model back into the distribution? Because if it's the latter case I think you still have to worry about some instance running on a cloud server somewhere.

comment by Martin Randall (martin-randall) · 2023-04-08T23:57:49.444Z · LW(p) · GW(p)

When I prompt GPT-5 it's already out of distribution because the training data mostly isn't GPT prompts, and none of it is GPT-5 prompts. If I prompt with "this is a rap battle between Dath Ilan and Earthsea" that's not a high likelihood sentence in the training data. And then the response is also out of distribution, because the training data mostly isn't GPT responses, and none of it is GPT-5 responses.

So why do we think that the responses are further out of distribution than the prompts?

Possible answer: because we try to select prompts that work well, with human ingenuity and trial and error, so they will tend to work better and be effectively closer to the distribution. Whereas the responses are not filtered in the same way.

But the responses are optimized only to be in distribution, whereas the prompts are also optimized for achieving some human objective like generating a funny rap battle. So once the optimizer achieves some threshold of reliability the error rate should go down as text is generated, not up.

Replies from: Razied, skybrian
comment by Razied · 2023-04-09T10:44:59.987Z · LW(p) · GW(p)

"Being out of distribution" is not a yes-no answer, but a continuum. I agree that all prompts given to GPT are slightly out of distribution simply by virtue of being prompts to a language model, but the length of a prompt is generally not large enough to enable GPT to really be sure of that. If I give you 3 sentences of a made-up physics book introduction, you might guess that no textbook actually starts with those 3 sentences... but that's really just not enough information to be sure. However, if I give you 5 pages, you then have enough information to really understand if this is really a physics textbook or not. 

The point is that sequence length matters, the internet is probably large enough to  populate the space of 200-token (number pulled out of my ass) text sequences densely enough that GPT can extrapolate to most other sequences of such length, but things gradually change as the sequences get longer. And certainly by the time you get to book-length or longer, any sequence that GPT could generate will be so far out of distribution that it will be complete gibberish.

Replies from: martin-randall
comment by Martin Randall (martin-randall) · 2023-04-09T22:33:46.660Z · LW(p) · GW(p)

Could we agree on a testable prediction of this theory? For example, looking at the chess degradation example. I think your argument predicts that if we play several games of chess against ChatGPT in a row, its performance will keep going down in later games, in terms of both quality and legality. Potentially such that the last attempt will be complete gibberish. Would that be a good test?

Replies from: Razied
comment by Razied · 2023-04-09T23:05:25.645Z · LW(p) · GW(p)

Certainly I would agree with that. In fact right now I can't even get chatGPT to play a single game of chess (against stockfish) from start to finish without it at some point outputting an illegal move. I expect that future versions of GPT will be coherent for longer, but I don't expect GPT to suddenly "get it" and be able to play legal and coherent chess for arbitrary length of sequences. (Google tells me that chess has a typical sequence length of about 40, so maybe Go would be a better choice with a typical number of moves per game in the 150). And certainly I don't expect GPT to be able to play chess AND also write coherent chess commentary between each move, since that would greatly increase the timescale of required coherence.

comment by skybrian · 2023-04-14T01:26:04.162Z · LW(p) · GW(p)

Did you mean GPT-4 here? (Or are you from the future :-)

Replies from: martin-randall, ChristianKl
comment by Martin Randall (martin-randall) · 2023-04-15T01:26:52.990Z · LW(p) · GW(p)

Just a confusing writing choice, sorry. Either it's the timeless present tense or it's a grammar error, take your pick.

comment by ChristianKl · 2023-04-14T14:04:53.142Z · LW(p) · GW(p)

GPT-4 was privately available within OpenAI long before it was publically released. It's not necessary to be from the future to be able to interact with GPT-5 before it's publically released.

Replies from: skybrian
comment by skybrian · 2023-04-14T20:44:47.730Z · LW(p) · GW(p)

Okay, but I'm still wondering if Randall is claiming he has private access, or is it just a typo?

Edit: looks like it was a typo?

At MIT, Altman said the letter was “missing most technical nuance about where we need the pause” and noted that an earlier version claimed that OpenAI is currently training GPT-5. “We are not and won’t for some time,” said Altman. “So in that sense it was sort of silly.”

https://www.theverge.com/2023/4/14/23683084/openai-gpt-5-rumors-training-sam-altman

comment by DirectedEvolution (AllAmericanBreakfast) · 2023-04-09T00:35:50.711Z · LW(p) · GW(p)

This argument seems to depend on:

  • After the initial prompt, GPT's input is 100% self-generated
  • GPT has no access to plugins
  • GPT can't launch processes to gather and train additional models on other forms of data

I'm not an expert in this topic, but it seems to me that "doomed" is the wrong word. LLMs aren't the fastest or most reliable way to compute 2+2, but it is going to become trivial for them to access the tool that is the best way to perform this computation. They will be able to gather data from the outside world using these plugins. They will be able to launch fine-tuning and training processes and interact with other pre-trained models. They will be able to interact with robotics and access cloud computing resources.

LLMs strike me as analogous to the cell. Is a cell capable of vision on its own? Only in the most rudimentary sense of having photoresponsive molecules that trigger cell signals. But cells that are configured correctly can form an eye. And we know that cells have somehow been able to evolve themselves into a functioning eye. I don't see a reason why LLMs, perhaps in combination with other software structures, can't form an AGI with some combination of human and AI-assisted engineering.

comment by cubefox · 2023-04-15T19:34:31.427Z · LW(p) · GW(p)

Apparently LLMs automatically correct mistakes in CoT [LW(p) · GW(p)], which seems to run counter to LeCun's argument.

comment by Razied · 2023-04-15T17:27:37.903Z · LW(p) · GW(p)

To make the argument sharper, I will argue the following (taken from another comment of mine and posted here to have it in one place): sequences produced by LLMs very quickly become sequences with very low log-probability (compared with other sequences of the same length) under the true distribution of internet text.

Suppose we have a markov chain  with some transition probability , here  is the analogue of the true generating distribution of internet text. From information theory (specifically the Asymptotic Equipartition Property), we know that the typical probability of a long sequence will be , where  is the entropy of the process.

Now if  is a different markov chain (the analogue of the LLM generating text), which differs from  by some amount, say that the Kullback-Leibler divergence  is non-zero (which is not quite the objective that the networks are being trained with, that would be  instead), we can also compute the expected probability under  of sequences sampled from , this is going to be:

The second term in this integral is just  ,  times the entropy of , and the first term is , so when we put everything together:

So any difference at all between  and  will lead to the probability of almost all sequences sampled from our language model being exponentially squashed relative to the probability of most sequences sampled from the original distribution. I can also argue that  will be strictly larger than : the latter essentially can be viewed as the entropy resulting from a perfect LLM with infinite context window, and , conditioning on further information does not increase the entropy. So  will definitely be positive.

This means that if you sample long enough from an LLM, and more importantly as the context window increases, it must generalise very far out of distribution to give good outputs. The fundamental problem of behaviour cloning I'm referring to is that we need examples of how to behave correctly is this very-out-of-distribution regime, but LLMs simply rely on the generalisation ability of transformer networks. Our prior should be that if you don't provide examples of correct outputs within some region of the input space to your function fitting algorithm, you don't expect the algorithm to yield correct predictions in that region.

comment by Roman Leventov · 2023-04-09T02:17:56.808Z · LW(p) · GW(p)

At each new generated token it still assumes that the past 999 tokens were written by humans

By now means this is necessary. During fine-tuning for dialogue and question-answering, GPT is clearly selected for discriminating the boundaries of the user-generated, and, equivalently, self-generated text in its context (and probably these boundaries are marked with special control tokens).

If we were token about GPTs trained in pure SSL mode without fine-tuning whatsoever, that would be a different story, but this is not practically the case.

comment by rotatingpaguro · 2023-04-08T23:29:23.082Z · LW(p) · GW(p)
  1. I have trouble framing this thing in my mind because I do not understand what the distribution is relative to. In the strictest sense, the distribution of internet text is the internet text itself, and everything GPT outputs is an error. In a broad sense, what is an error and what isn't? I think there's something meaningful here, but I can not pinpoint it clearly.

  2. This strongly shows that GPT won't be able to stay coherent with some initial state, which was already clear from it being autoregressive. It only weakly indicates that GPT won't learn, somewhere in its weights, the correct schemes to play chess, which could then be somehow elicited.

  3. How does this not apply to humans? It seems to me we humans do have a finite context window, within which we can interact with a permanent associative memory system to stay coherent on a longer term. The next obvious step with LLMs is introducing tokens that represent actions and have it interact with other subsystems or the external world, like many people are trying to do (e.g., PaLM-e). If this direction of improvement pans out, I would argue that LLMs leading to these "augmented LLMs" would not count as "LLMs being doomed".

3a) It applies to humans, and humans are doomed :)

  1. LLMs are already somewhat able to generate dialogues where they err and then correct in a systematic way (e.g., reflexion). If there really was the need to create large datasets with err-and-correct-text, I do not exclude they could be generated with the assistance of existing LLMs.
Replies from: Razied
comment by Razied · 2023-04-08T23:37:35.685Z · LW(p) · GW(p)

How does this not apply to humans?

This strongly shows that GPT won't be able to stay coherent with some initial state, which was already clear from it being autoregressive

This problem is not coming from the autoregressive part, if the dataset GPT was trained on contained a lot of examples of GPT making mistakes and then being corrected, it would be able to stay coherent for a long time (once it starts to make small deviations, it would immediately correct them because those small deviations were in the dataset, making it stable). This doesn't apply to humans because humans don't produce their actions by trying to copy some other agent, they learn their policy through interaction with the environment. So it's not that a system in general is unable to stay coherent for long, but only those systems trained by pure imitation that aren't able to do so. 

Replies from: rotatingpaguro
comment by rotatingpaguro · 2023-04-08T23:47:07.125Z · LW(p) · GW(p)

Ok, now I understand better and I agree with this point, it's like when you learn something faster if a teacher lets you try in small steps and corrects your errors at a granular level instead of leaving you alone in front of a large task you blankly stare at.

For a response to this, see my comment above [LW(p) · GW(p)].

comment by Matt Goldenberg (mr-hire) · 2023-04-08T23:06:27.890Z · LW(p) · GW(p)

It seems to me like you only need to finetune a dataset of like 50k diverse samples of this type of error correction built in, or RLHF this type of error correction?

Replies from: Razied
comment by Razied · 2023-04-08T23:13:57.500Z · LW(p) · GW(p)

This same problem exists in the behaviour cloning literature, if you have an expert agent behaving under some policy , and you want to train some other policy to copy the expert, samples from the expert policy are not enough, you need to have a lot of data that shows your agent how to behave when it gets out of distribution, this was the point of the DAGGER paper, and in practice the data that shows the agent how to get back into distribution is significantly larger than the pure expert dataset. There are very many ways that GPT might go out of distribution, and just showing it how to come back for a small fraction of examples won't be enough. 

Replies from: rotatingpaguro
comment by rotatingpaguro · 2023-04-08T23:43:09.752Z · LW(p) · GW(p)

I have not read the paper you link, but I have this expectation about it: that the limitation of imitation learning is proved in a context that lacks richness compared to imitating language.

My intuition is: I have experience myself of failing to learn just from imitating an expert playing a game the best way possible. But if someone explains to me their actions, I can then learn something.

Language is flexible and recursive: you can in principle represent anything out of the real world in language, including language itself, and how to think. If somehow the learner manages to tap into recursiveness, it can shortcut the levels. It will learn how to act meaningfully not because it has covered all the possible examples of long-term sequences that lead to a goal, but because it has seen many schemes that map to how the expert thinks.

I can not learn chess efficiently by observing a grandmaster play many matches and jotting down all the moves. I could do it if the grandmaster was a short program if implemented in chess moves.

comment by Sky Moo (sky-moo) · 2023-04-09T15:23:22.866Z · LW(p) · GW(p)

This is an alignment problem: You/LeCunn want semantic truth, whereas the actual loss function has the goal of producing statistically reasonable text.

Mostly. The fine tuning stage puts an additional layer on top of all that, and skews the model towards stating true things so much that we get surprised when it *doesn't*.

What I would suggest is that aligning an LLM to produce text should not be done with RLHF, instead it may need to extract the internal truth predicate from the model and ensure that the output is steered to keep that neuron assembly lit up.

comment by hazel (sharps030) · 2023-04-09T07:14:59.731Z · LW(p) · GW(p)

To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. [...] This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet. 

I had assumed that creating on that dataset was a major reason for doing a public release of ChatGPT. "Was this a good response?" [thumb-up] / [thumb-down] -> dataset -> more RLHF. Right? 

Replies from: awg
comment by awg · 2023-04-09T15:38:07.781Z · LW(p) · GW(p)

RLHF is done after the pre-training process. I believe this is referring to including examples like this in the pre-training process itself.

Though in broad strokes, I agree with you. It's not inconceivable to me that they'll turn/are turning their ChatGPT data into its own training data for future models using this concept of corrected mistakes.

comment by Thoth Hermes (thoth-hermes) · 2023-04-08T22:51:13.777Z · LW(p) · GW(p)

I've never enjoyed, or agreed with, arguments of the form: "X is inherently, intrinsically incapable of Y." The presence of such statements indicates that there is some social tension of the form "X might be inherently, intrinsically capable of Y." There might be a bias towards the moderate social acceptance of statements such as "X is inherently, intrinsically incapable of Y" due to no more than it being possible to disprove trivially, if X is inherently, intrinsically capable of Y. Disprovable statements might be overrated a lot, and if so, boy, would I hate that. 

This seems kind of relevant to the main point of this post too: 

GPTs are not Imitators, nor Simulators, but Predictors.

Question: Is GPT-5 an Imitator? Simulator? And Predictor? Is GPT-6? 

Does the message of this post become moot on larger, more powerful LLMs? Or does it predict that such models have already reached their limit?

comment by Vladimir_Nesov · 2023-04-08T22:41:07.338Z · LW(p) · GW(p)

(There is a Paul Christiano response [EA(p) · GW(p)] over at the EA forum.)

Replies from: thoth-hermes
comment by Thoth Hermes (thoth-hermes) · 2023-04-08T23:04:16.025Z · LW(p) · GW(p)

He asks:

That said, there's an important further question that isn't determined by the loss function alone---does the model do its most useful cognition in order to predict what a human would say, or via predicting what a human would say?

I don't think there's a major difference between the two, so the answer is, I guess, "yes." I suppose a different way of framing this question, as I interpret it, is whether or not simulation and prediction are considered the same thing. I see those two things as largely similar. If they were not, wouldn't there be some simulations we consider "faithful" that would not be very good at making predictions? 

In physics, for example, models are generally considered useful for both simulation and prediction. Of course, if we have two different words for something, it implies there must be some difference. But prediction seems to refer to the output of a model, not necessarily the mechanics or the process that generates the final output. But given that the final output is a selected, chosen member of the full set of output, which includes everything, a prediction is a member of the set of simulated entities. 

Shouldn't GPT-4 be considered both a simulator and a predictor, then?

Replies from: FishGPT
comment by FishGPT · 2023-04-20T02:36:12.010Z · LW(p) · GW(p)

Well, consider the task of simulating a coin flip vs. predicting a coin flip.  A simulation of a coin flip is satisfactory if it is heads 50% of the time, which is easier than predicting the outcome of some actual coin.

comment by Adele Lopez (adele-lopez-1) · 2023-04-08T22:11:21.853Z · LW(p) · GW(p)

A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.

Very minor point, but humans can rap battle on the fly: https://youtu.be/0pJRmtWNP1g?t=158

comment by Jan_Kulveit · 2024-12-06T10:55:45.830Z · LW(p) · GW(p)

The post showcases the inability of the aggregate LW community to recognize locally invalid reasoning: while the post reaches a correct conclusion, the argument leading to it is locally invalid, as explained in comments. High karma and high alignment forum karma shows a combination of famous author and correct conclusion wins over the argument being correct.

Replies from: jeremy-gillen
comment by Jeremy Gillen (jeremy-gillen) · 2024-12-06T16:58:20.673Z · LW(p) · GW(p)

The OP argument boils down to: the text prediction objective doesn't stop incentivizing higher capabilities once you get to human level capabilities. This is a valid counter-argument to: GPTs will cap out at human capabilities because humans generated the training data.

Your central point is: 

Where GPT and humans differ is not some general mathematical fact about the task,  but differences in what sensory data is a human and GPT trying to predict, and differences in cognitive architecture and ways how the systems are bounded.

You are misinterpreting the OP by thinking it's about comparing the mathematical properties of two tasks, when it was just pointing at the loss gradient of the text prediction task (at the location of a ~human capability profile). The OP works through text prediction sub-tasks where it's obvious that the gradient points toward higher-than-human inference capabilities.

You seem to focus too hard on the minima of the loss function:

notice that “what would the loss function like the system to do”  in principle tells you very little about what the system will do

You're correct to point out that the minima of a loss function doesn't tell you much about the actual loss that could be achieved by a particular system. Like you say, the particular boundedness and cognitive architecture are more relevant to this question. But this is irrelevant to the argument being made, which is about whether the text prediction objective stops incentivising improvements above human capability.

 

The post showcases the inability of the aggregate LW community to recognize locally invalid reasoning

I think a better lesson to learn is that communication is hard, and therefore we should try not to be too salty toward each other. 

Replies from: Jan_Kulveit
comment by Jan_Kulveit · 2024-12-08T20:50:08.358Z · LW(p) · GW(p)

The question is not about the very general claim, or general argument, but about this specific reasoning step

GPT-4 is still not as smart as a human in many ways, but it's naked mathematical truth that the task GPTs are being trained on is harder than being an actual human.

And since the task that GPTs are being trained on is different from and harder than the task of being a human, ....

I do claim this is not locally valid [LW · GW], that's all (and recommend reading the linked essay).  I do not claim the broad argument that text prediction objective doesn't stop incentivizing higher capabilities once you get to human level capabilities is wrong. 

I do agree communication can be hard, and maybe I misunderstand the quoted two sentences, but it seems very natural to read them as making a comparison between tasks at the level of math.

Replies from: habryka4, jeremy-gillen
comment by habryka (habryka4) · 2024-12-08T21:06:28.971Z · LW(p) · GW(p)

I don't understand the problem with this sentence. Yes, the task is harder than the task of being a human (as good as a human is at that task). Many objectives that humans optimize for are also not optimized to 100%, and as such, humans also face many tasks that they would like to get better at, and so are harder than the task of simply being a human. Indeed, if you optimized an AI system on those, you would also get no guarantee that the system would end up only as competent as a human.

This is a fact about practically all tasks (including things like calculating the nth-digit of pi, or playing chess), but it is indeed a fact that lots of people get wrong.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-12-09T22:07:31.265Z · LW(p) · GW(p)

(I affirm this as my intended reading.)

comment by Jeremy Gillen (jeremy-gillen) · 2024-12-09T15:49:40.409Z · LW(p) · GW(p)

There are multiple ways to interpret "being an actual human". I interpret it as pointing at an ability level.

"the task GPTs are being trained on is harder" => the prediction objective doesn't top out at (i.e. the task has more difficulty in it than).

"than being an actual human" => the ability level of a human (i.e. the task of matching the human ability level at the relevant set of tasks).

Or as Eliezer said:

I said that GPT's task is harder than being an actual human; in other words, being an actual human is not enough to solve GPT's task.

In different words again: the tasks GPTs are being incentivised to solve aren't all solvable at a human level of capability.

 

You almost had it when you said:

- Maybe you mean something like task + performance threshold. Here 'predict the activation of photoreceptors in human retina well enough to be able to function as a typical human' is clearly less difficult than task + performance threshold 'predict next word on the internet, almost perfectly'. But this comparison does not seem to be particularly informative.

It's more accurate if I edit it to:

- Maybe you mean something like task + performance threshold. Here 'predict the activation of photoreceptors in human retina [text] well enough to be able to function as a typical human' is clearly less difficult than task + performance threshold 'predict next word on the internet, almost perfectly'.

You say it's not particularly informative. Eliezer responds by explaining the argument it responds to, which provides the context in which this is an informative statement about the training incentives of a GPT.

comment by DragonGod · 2023-04-08T21:49:23.382Z · LW(p) · GW(p)

Koan:  Is this a task whose difficulty caps out as human intelligence, or at the intelligence level of the smartest human who wrote any Internet text?  What factors make that task easier, or harder?  (If you don't have an answer, maybe take a minute to generate one, or alternatively, try to predict what I'll say next; if you do have an answer, take a moment to review it inside your mind, or maybe say the words out loud.)

 

From @janus [LW · GW]' Simulators [LW · GW]:

Something which can predict everything all the time is more formidable than any demonstrator it predicts: the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum (though it may not be trivial to extract that knowledge [? · GW]).

 

I tried (poorly) to draw attention to this thesis in my "The Limit of Language Models [LW · GW]".

comment by jbash · 2023-04-09T01:57:14.706Z · LW(p) · GW(p)

I honestly don't see the relevance of this.

OK, yes, to be a perfect text predictor, or even an approximately perfect text predictor, you'd have to be very smart and smart in a very weird way. But there's literally no reason to think that the architectures being used can ever get that good at prediction, especially not if they have to meet any realistic size constraint and/or are restricted to any realistically available amount of training input.

What we've seen them do so far is to generate vaguely plausible text, while making many mistakes that don't look like the kinds of mistakes the sources of their training input would never actually make. It doesn't follow that they can or will actually become unboundedly good predictors of humans or any other source of training data. In fact I don't think that's plausible at all.

It definitely fails in some cases. For example, there's surely text on the Internet that breaks down RSA key generation, with examples. Therefore, to be a truly perfect predictor even of the sort of thing that's already in the training data, you'd have to be able to complete the sentence "the prime factors of the hexadecimal integer 0xda52ab1517291d1032f91532c54a221a0b282f008b593072e8554c8a4d1842c7883e7eb5dc73aa68ef6b0d161d4464937f9779f805eb68dc7327ee1db7a1e7cf631911a770d29c59355ca268990daa5be746e93e1b883e8bc030df2ba94d45a88252fceaf6de89644392f91a9d437de0410e5b8e1123b9a3e05169497df2c909b73e104daf835b027d4be54f756025974e24363a372c57b46905d61605ce58918dc6fb63a92c9b4745d30ee3fc0b937f47eb3061cd317e658e6521886e51079f327bd705a074b76c94f466ad6ca77b16efb08cd92981ae27bf254b75b67fad8f336d8fdab79bc74e27773f87e80ba778d146cc6cbddc5ba7fdc21f6528303c93 are...".

Replies from: T3t, Roman Leventov
comment by RobertM (T3t) · 2023-04-09T02:32:59.804Z · LW(p) · GW(p)

You're making a claim about both:

  • what sorts of cognitive capabilities can exist in reality, and
  • whether current (or future) training regimes are likely to find them

It sounds like you agree that the relevant cognitive capabilities are likely to exist, though maybe not for prime number factorization, and that it's unclear whether they'd fit inside current architectures.

I do not read Eliezer as making a claim that future GPT-n generations will become perfect (or approximately perfect)  text predictors.  He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are "merely imitating" human text.  This is not obviously true; to the extent that there exist some cognitive capabilities which are physically possible to instantiate in GPT-n model weights which can solve these prediction problems, and are within the region of possible outcomes of our training regimes (+ the data used for them), then it is possible that we will find them.

Replies from: jbash
comment by jbash · 2023-04-09T02:47:45.627Z · LW(p) · GW(p)

He is specifically rebutting claims others have made, that GPTs/etc can not become ASI, because e.g. they are "merely imitating" human text.

That may be, but I'm not seeing that context here. It ends up reading to me as "look how powerful a perfect predictor would be, (and? so?) if we keep training them we're going to end up with a perfect predictor (and, I extrapolate, then we're hosed)".

I'm not trying to make any confident claim that GPT-whatever can't become dangerous[1]. But I don't think that talking about how powerful GPTs would be if they reached implausible performance levels really says anything at all about whether they'd be dangerous at plausible ones.

For that matter, even if you reached an implausible level, it's still not obvious that you have a problem, given that the implausible capability would still be used for pure text prediction. Generating text that, say, manipulated humans and took over the world would be a prediction error, since no such text would ever arise from any of the sources being predicted. OK, unless it predicts that it'll find its own output in the training data....


  1. Although, even with plugins, there are a lot of kinds of non-prediction-like capabilities I'd need to see before I thought a system was obviously dangerous [2]. ↩︎

  2. I love footnotes. ↩︎

Replies from: T3t
comment by RobertM (T3t) · 2023-04-09T02:55:03.776Z · LW(p) · GW(p)

But I don't think that talking about how powerful GPTs would be if they reached implausible performance levels really says anything at all about whether they'd be dangerous at plausible ones.

This seems like it's assuming the conclusion (that reaching dangerous capabilities using these architectures is implausible).  Eliezer did consider it unlikely, though GPT-4 was a negative update in that regard.

For that matter, even if you reached an implausible level, it's still not obvious that you have a problem, given that the implausible capability would still be used for pure text prediction. Generating text that, say, manipulated humans and took over the world would be a prediction error, since no such text would ever arise from any of the sources being predicted. OK, unless it predicts that it'll find its own output in the training data....

This seems like it's assuming that the system ends up outer-aligned.

Replies from: jbash
comment by jbash · 2023-04-09T03:12:56.122Z · LW(p) · GW(p)

This seems like it's assuming the conclusion (that reaching dangerous capabilities using these architectures is implausible).

I think that bringing up the extreme difficulty of approximately perfect prediction, with a series of very difficult examples, and treating that as interesting enough to post about, amounts to taking it for granted that it is plausible that these architectures can get very, very good at prediction.

I don't find that plausible, and I'm sure that there are many, many other people who won't find it plausible either, once you call their attention to the assumption. The burden of proof falls on the proponent; if Eliezer wants us to worry about it, it's his job to make it plausible to us.

This seems like it's assuming that the system ends up outer-aligned.

It might be. I have avoided remembering "alignment" jargon, because every time I've looked at it I've gotten the strong feeling that the whole ontology is completely wrong, and I don't want to break my mind by internalizing it.

It assumes that it ends up doing what you were trying to train it to do. That's not guaranteed, for sure... but on the other hand, it's not guaranteed that it won't. I mean, the whole line of argument assumes that it gets incredibly good at what you were trying to train it to do. And all I said was "it's not obvious that you have a problem". I was very careful not to say that "you don't have a problem".

Replies from: T3t
comment by RobertM (T3t) · 2023-04-09T06:48:17.470Z · LW(p) · GW(p)

I agree that the post makes somewhat less sense without the surrounding context (in that it was originally generated as a series of tweets, which I think were mostly responding to people making a variety of mistaken claims about the fundamental limitations of GPT/etc).

Referring back to your top-level comment:

I honestly don't see the relevance of this.

The relevance should be clear: in the limit of capabilities, such systems could be dangerous.  Whether the relevant threshold is reachable via current methods is unknown - I don't think Eliezer thinks it's overwhelmingly likely; I myself am uncertain.  You do not need a system capable of reversing hashes for that system to be dangerous in the relevant sense.  (If you disagree with the entire thesis of AI x-risk then perhaps you disagree with that, but if so, then perhaps mention that up-front, so as to save time arguing about things that aren't actually cruxy for you?)

But there's literally no reason to think that the architectures being used can ever get that good at prediction, especially not if they have to meet any realistic size constraint and/or are restricted to any realistically available amount of training input.

Except for the steadily-increasing capabilities they continue to display as they scale?   Also my general objection to the phrase "no reason"/"no evidence"; there obviously is evidence, if you think that evidence should be screened off please argue that explicitly.

Replies from: jbash
comment by jbash · 2023-04-09T12:50:13.418Z · LW(p) · GW(p)

The relevance should be clear: in the limit of capabilities, such systems could be dangerous.

What I'm saying is that reaching that limit, or reaching any level qualitatively similar to that limit, via that path, is so implausible, at least to me, that I can't see a lot of point in even devoting more than half a sentence to the possibility, let alone using it as a central hypothesis in your planning. Thus "irrelevant".

It's at least somewhat plausible that you could reach a level that was dangerous, but that's very different from getting anywhere near that limit. For that matter, it's at least plausible that you could get dangerous just by "imitation" rather than by "prediction". So, again, why put so much attention into it?

Except for the steadily-increasing capabilities they continue to display as they scale? Also my general objection to the phrase "no reason"/"no evidence"; there obviously is evidence, if you think that evidence should be screened off please argue that explicitly.

OK, there's not no evidence. There's just evidence weak enough that I don't think it's worth remarking on.

I accept that they've scaled a lot better than anybody would have expected even 5 years ago. And I expect them to keep improving for a while.

But...

  1. They're not so opaque as all that, and they're still just using basically pure statistics to do their prediction, and they're still basically doing just prediction, and they're still operating with finite resources.

  2. When you observe something that looks like an exponential in real life, the right way to bet it is almost always that it's really a sigmoid.

    Whenever you get a significant innovation, you would expect to see a sudden ramp-up in capability, so actually seeing such a ramp-up, even if it's bigger than you would have expected, shouldn't cause you to update that much about the final outcome.

If I wanted to find the thing that worries me most, it'd probably be that there's no rule that somebody building a real system has to keep the architecture pure. Even if you do start to get diminishing returns from "GPTs" and prediction, you don't have to stop there. If you keep adding more obvious-to-only-somewhat-unintuitive elements to the architecture, you can get in at the bottoms of more sigmoids. And the effects can easily be synergistic. And what we definitely have is a lot of momentum: many smart people's attention and a lot of money [1] at stake, plus whatever power you get from the tools already built. That kind of thing is how you get those innovations.


  1. Added on edit: and, maybe worse, prestige... ↩︎

comment by Roman Leventov · 2023-04-09T02:23:20.453Z · LW(p) · GW(p)

You forget about code. GPT often generates correct code (even Quines!) with a single rollout, this is a superhuman ability. This is what Eliezer referred to as "text that took humans many iterations over hours or days to craft".

Replies from: jbash
comment by jbash · 2023-04-09T02:35:30.909Z · LW(p) · GW(p)

OK, so it's superhuman on some tasks[1]. That's well known. But so what? Computers have always been radically superhuman on some tasks.

As far as I can tell the point is supposed to be that predicting what will actually appear next is harder than generating just anything vaguely reasonable, and that a perfect predictor of anything that might appear next would be both amazingly powerful and very unlike a human (and, I assume, therefore dangerous). But that's another "so what". You're not going to get an even approximately perfect predictor, no matter how much you try to train in that direction. You're going to run into the limitations of the approach. So talking about how hard it is to get to be approximately perfect, or about how powerful something approximately perfect would be, isn't really interesting.


  1. By the way, it also generates a lot of wrong code. And I don't find quines exclamation-point-worthy. Quines are exactly the sort of thing I'd expect it to get right, because some people are really fascinated by them and have written both tons of code for them and tons of text explaining how that code works. ↩︎

Replies from: Roman Leventov
comment by Roman Leventov · 2023-04-09T03:38:57.297Z · LW(p) · GW(p)

But so what?

Presumably, the tasks that machines have been superhuman at so far (arithmetic, chess) confer radically less power than the tasks that LLMs could become superhuman at soon (writing code, crafting business strategies, superhuman "Diplomacy" skill of outwitting people or other AIs in negotiations, etc.) 

Replies from: Blueberry
comment by Blueberry · 2023-06-10T06:23:04.217Z · LW(p) · GW(p)

Why do you think an LLM could become superhuman at crafting business strategies or negotiating? Or even writing code? I don't believe this is possible.

Replies from: orion-anderson
comment by Orion Anderson (orion-anderson) · 2023-07-23T07:16:35.335Z · LW(p) · GW(p)

"Writing code" feels underspecified here. I think it is clear that LLM's will be (perhaps already are) superhuman at writing some types of code for some purposes in certain contexts. What line are you trying to assert will not be crossed when you say you don't think it's possible for them to be superhuman at writing code?

comment by awg · 2023-04-08T21:20:29.190Z · LW(p) · GW(p)

Naive question: can you predict something without simulating it?

Replies from: Charlie Steiner, shminux, Making_Philosophy_Better
comment by Charlie Steiner · 2023-04-08T22:51:20.201Z · LW(p) · GW(p)

See "good regulator theorem," and various LW discussion (esp. John Wentworth trying to fix it). For practical purposes, yes, you can predict things without simulating them. The more revealing of the subject your prediction has to get, though, the more of an isomorphism to a simulation you have to contain.

But when you say Simulator, with caps, people will generally take you to be talking about janus' Simulators post, which is not about the AI predicting people by simulating them in detail, but is instead about the AI learning dynamics of text (analogous to how the laws of physics are dynamics of the state of the world), and predicting text by stepping forward these dynamics.

comment by Shmi (shminux) · 2023-04-08T21:50:52.624Z · LW(p) · GW(p)

What you are probably asking is "can you predict something without simulating it faithfully?" The answer is yes and no and worse than no. 

A generic sequence of symbols is not losslessly compressible. Lossy compression is relative to the set of salient features one wants to predict. For example, white noise is unpredictable if you want every point, but very predictable if you want its spectrum to a reasonable accuracy. There are special sequences masquerading as generic, such as pseudorandom number generators, which can be losslessly "predicted." Whether it counts as a "simulation" depends on the definition, I guess. There are also sequences whose end state can be predicted without having to calculate every intermediate state. This probably unambiguously counts as "predicting without simulating".

Again, most finite sequences (i.e. numbers) are not like that. They cannot be predicted or even simulated without knowing the whole sequence first. That's the "worse than no" part.

Replies from: awg
comment by awg · 2023-04-08T22:38:51.965Z · LW(p) · GW(p)

There are also sequences whose end state can be predicted without having to calculate every intermediate state. This probably unambiguously counts as "predicting without simulating".

Could give an example of this?

Replies from: shminux, Archimedes
comment by Shmi (shminux) · 2023-04-08T23:41:09.346Z · LW(p) · GW(p)

say, f(n) = exp(-n)

Replies from: awg
comment by awg · 2023-04-09T15:22:54.142Z · LW(p) · GW(p)

Thanks!

comment by Portia (Making_Philosophy_Better) · 2023-04-09T18:10:40.362Z · LW(p) · GW(p)

Depends on what it is your are predicting, and what you mean by simulating. I am going to take "simulating" to mean "running a comparable computation to the one that produced the result in the other entity".

You cannot reliably predict genuinely novel (!), intelligent actions without being intelligent. (If you can reliably solve novel math problems, this means you can do math.) But you can predict the repetition of an intelligent action you have seen before, or something very similar, even if you are not quite intelligent enough to understand why it is so common. This is especially plausible if there is a relatively small range of intelligent responses. (E.g. I can imagine someone accurately predicting whether a government will initiate a covid lockdown this week, without that person having done an in depth analysis of the data that hopefully led to the government choice, if they have experienced lockdowns and the data and statements that preceded them before).

You can predict what a person with empathy would say, even if you have no empathy, provided you can still model other minds relatively accurately and have observed people with empathy. Running emotions is a very complex affair, but the range of results is still relatively predictable from the outside based on the input, even if you never run through those internal states. If I've seen a roomful of toddlers cry while watching Bambi, and then show them 100 other TV shows with parental deaths, I as a machine will likely be able to predict that they will cry again even if I don't feel sad myself.

comment by DirectedEvolution (AllAmericanBreakfast) · 2023-04-08T22:56:23.056Z · LW(p) · GW(p)

When people call GPT an imitator, it's intended to guide us to accurate intuitions about how to predict its outputs. If I consider GPT as a token predictor, that does not help me very much to predict whether it will output computer code successfully implementing some obscure modified algorithm I've specified in natural language. If I consider it as an imitator, then I can guess that the departures of my modified algorithm from what it's likely to have encountered on the internet means that it's unlikely to produce a correct output.

I work in biology, and it's common to anthropomorphize biological systems to generate hypotheses and synthesize information about how such systems will behave. We also want to be able to translate this back into concrete scientific terms to check if it still makes sense. The Selfish Gene does a great job at managing this back and forth, and that's part of why it's such an illuminating classic.

I think it's useful to promote a greater understanding of GPT-as-predictor, which is the literal and concrete truth, but it is better to integrate that with a more intuitive understanding of GPT-as-imitator, which is my default for figuring out how to work with the technology in practice.

Replies from: ChristianKl
comment by ChristianKl · 2023-04-10T15:58:24.622Z · LW(p) · GW(p)

If your working model is "GPT is an imitator" you won't expect it to have any superhuman capabilities. From the AI safety perspective having a model that assumes that GPT system won't show superhuman capabilities when they can have those capabilities is problematic.

Replies from: AllAmericanBreakfast
comment by DirectedEvolution (AllAmericanBreakfast) · 2023-04-10T16:08:34.595Z · LW(p) · GW(p)

I agree to some extent - that specific sentence makes it sound dumb. "GPT can imitate you at your best" gets closer to the truth.

Replies from: ChristianKl
comment by ChristianKl · 2023-04-10T21:52:00.055Z · LW(p) · GW(p)

I don't see why we should believe that "GPT can imitate you at your best" is an upper bound. 

Replies from: AllAmericanBreakfast
comment by Vladimir_Nesov · 2023-04-08T20:50:12.525Z · LW(p) · GW(p)

That's an empirical question that interpretability and neuroscience should strive to settle (if only they had the time). Transformers are acyclic, the learned algorithm just processes a single relatively small vector one relatively simple operation at a time, several dozen times. Could be that what it learns to represent are mostly the same obvious things that the brain learns (or is developmentally programmed) to represent, until you really run wild with the scaling, beyond mere ability to imitate internal representations of thoughts and emotions of every human in the world. (There are some papers that correlate transformer embeddings with electrode array readings from human brains, but this obviously needs more decades of study and better electrode arrays to get anywhere.)

comment by Martin Randall (martin-randall) · 2024-12-16T15:14:04.084Z · LW(p) · GW(p)

Does this look like a motte-and-bailey to you?

  1. Bailey: GPTs are Predictors, not Imitators (nor Simulators).
  2. Motte: The training task for GPTs is a prediction task.

The title and the concluding sentence both plainly advocate for (1), but it's not really touched by the overall post, and I think it's up for debate (related: reward is not the optimization target [LW · GW]). Instead there is an argument for (2). Perhaps the intention of the final sentence was to oppose Simulators [LW · GW]? If that's the case, cite it, be explicit. This could be a really easy thing for an editor to fix.


Does this look like a motte-and-bailey to you?

  1. Bailey: The task that GPTs are being trained on is ... harder than the task of being a human.
  2. Motte: Being an actual human is not enough to solve GPT's task.

As I read it, (1) is false, the task of being a human doesn't cap out at human intelligence. More intelligent humans are better at minimizing prediction error, achieving goals, inclusive genetic fitness, whatever you might think defines "the task of being a human". In the comments, Yudkowsky retreats to (2), which is true. But then how should I understand this whole paragraph from the post?

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

If we're talking about how natural selection trained my genome, why are we talking about how well humans perform the human task? Evolution is optimizing over generations. My human task is optimizing over my lifetime. Also, if we're just arguing for different thinking, surely it mostly matters whether the training task is different, not whether it is harder?


Overall I think "Is GPT-N bounded by human capabilities? No. [LW · GW]" is a better post on the mottes and avoids staking out unsupported baileys. This entire topic is becoming less relevant because AIs are getting all sorts of synthetic data and RLHF and other training techniques thrown at them. The 2022 question of the capabilities of a hypothetical GPT-N that was only trained on the task of predicting human text is academic in 2024. On the other hand, it's valuable for people to practice on this simpler question before moving on to harder ones.

comment by Rafael Harth (sil-ver) · 2024-12-14T11:19:39.221Z · LW(p) · GW(p)

As someone who expects LLMs to be a dead end, I nonetheless think this post makes a valid point and does so using reasonable and easy to understand arguments. I voted +1.

comment by lionhearted (Sebastian Marshall) (lionhearted) · 2023-04-14T16:58:41.566Z · LW(p) · GW(p)

A bit late in commenting and I understand the "mongo" example was pointing at a more general concept, but I decided to check in on the current state of prediction. Not perfect, n=1, could certainly be set out better, but thought I'd give this a whirl:

Me: 
Hello, I'd like to test your predictive ability on something interesting and novel. May we?

ChatGPT: 
Hello! Of course, I'd be happy to help you with your prediction. What would you like me to predict?

Me:
First, some context — I'm an American tech CEO. I like and have read a lot of classical philosophy, ranging from Aristotle to Wittgenstein and much in between. I'm an amateur historian who has written a few million words of nonfiction with some historical fiction mixed in. I'm logical and technical. I'm in my late 30s. I'm from the East Coast but in Seattle. That's the context.

ChatGPT: 
Thank you for the context. Is there a specific question or topic you would like me to predict about based on your background and interests?

Me: 
Yes. I'm going to think of a fictional book. It's a book I like - one of my favorites. I'd like you to make a list of 5 guesses of what book I might be thinking of, along with what the most likely of the 5 is.

ChatGPT: 
Certainly, I'll do my best to generate a list of 5 guesses for your favorite book based on your background and interests. Here are my predictions:

  1. "The Brothers Karamazov" by Fyodor Dostoevsky - This novel explores complex philosophical and religious themes, which aligns with your interest in classical philosophy. Its exploration of morality and ethics would also appeal to your logical and technical mindset.
  2. "The Name of the Rose" by Umberto Eco - This historical mystery novel is set in a medieval monastery and draws upon elements of philosophy and theology, which would align with your interests in both history and philosophy.
  3. "Infinite Jest" by David Foster Wallace - This novel combines humor, philosophy, and commentary on contemporary American culture, which would align with your interests in both philosophy and fiction. Its complexity would also appeal to your logical and technical mindset. 
  4. "The Hitchhiker's Guide to the Galaxy" by Douglas Adams - This satirical science fiction novel uses humor to explore philosophical and existential themes, which would appeal to your interest in philosophy. Its wit and creativity would also appeal to your literary and imaginative side.
  5. "Slaughterhouse-Five" by Kurt Vonnegut - This anti-war novel uses elements of science fiction to explore the themes of free will and the nature of time, which would align with your interest in both history and science fiction. Its blend of humor and tragedy would also appeal to your literary sensibilities.

Based on your interests and background, my most likely guess for your favorite book would be "The Brothers Karamazov" by Fyodor Dostoevsky.

Me: 
Ah, not so. Let's narrow down a bit. It's science fiction. Would you try again?

ChatGPT:
Thank you for the clarification. Here are 5 new predictions based on your background and interest in science fiction:

  1. "Dune" by Frank Herbert - This classic science fiction novel is known for its complex world-building and exploration of political and philosophical themes, which align with your interests in history, philosophy, and science fiction.
  2. "Snow Crash" by Neal Stephenson - This cyberpunk novel explores themes of language, culture, and technology, which align with your interest in technical and technological topics. Its blend of humor and action would also appeal to your literary and imaginative side.
  3. "Ender's Game" by Orson Scott Card - This science fiction novel explores themes of morality, leadership, and the consequences of war, which align with your interests in history, philosophy, and science fiction. Its blend of action and psychological complexity would also appeal to your literary and imaginative side.
  4. "The Diamond Age" by Neal Stephenson - This novel explores themes of technology, culture, and education, which align with your interest in technical and technological topics. Its exploration of political and social issues would also appeal to your interest in history.
  5. "Hyperion" by Dan Simmons - This science fiction novel is known for its complex world-building and exploration of philosophical and theological themes, which align with your interest in classical philosophy. Its structure, which is composed of interconnected stories, would also appeal to your literary sensibilities.

Based on your interest in science fiction, my most likely guess for your favorite book would be "Dune" by Frank Herbert.

Me:
Not a bad guess - I like Dune a lot. I was thinking of Asimov's "Foundation" though.

ChatGPT:
Ah, I see! "Foundation" is a classic work of science fiction and definitely a great choice. I apologize for not guessing it correctly on the first try, but I hope my predictions were still interesting and useful. Thank you for the opportunity to test my predictive abilities!

Me:
Can we hypothesize a little on why you didn't get Foundation? It combines technology, history, and philosophy...

ChatGPT:
Certainly! While I cannot say for sure why I did not predict "Foundation" as your favorite book, one possible reason could be that your background and interests were broad enough to support several different science fiction titles. Additionally, while "Foundation" does incorporate elements of technology, history, and philosophy, these themes are also present in many other science fiction works, making it challenging to narrow down to one specific book.

Another possibility is that the themes and style of "Foundation" may not have stood out as strongly to me based on the limited information I had about your interests and background. While I did take into account your interest in history and philosophy, other factors such as your preference for a specific writing style or subgenre of science fiction may have been important in predicting your favorite book.

comment by AprilSR · 2023-04-08T21:24:17.204Z · LW(p) · GW(p)

I can imagine a world where LLMs tend to fall into local maxima where they get really good at imitation or simulation, and then they plateau (perhaps only until their developers figure out what adjustments need to be made). But I don't have a good enough model of LLMs to be very sure whether that will happen or not.

comment by James Stephen Brown (james-brown) · 2024-11-06T18:27:59.214Z · LW(p) · GW(p)

This was a fascinating, original idea as usual. I loved the notion of a brilliant, condescending sort of robot capable of doing a task perfectly who chooses (in order to demonstrate its own artistry) to predict and act out how we would get it wrong.

It did make me wonder though, whether when we reframe something like this for GPTs it's also important to apply the reframing to our own human intelligence to determine if the claim is distinct; in this case asking the question "are we imitators, simulators or predictors?". It might be possible to make the case that we are also predictors in as much as our consciousness projects an expectation of the results of our behaviour on to the world, an idea well explained by cognitive scientist Andy Clark.

I agree though, it would be remarkable if GPTs did end up thinking the way we do. And ironically, if they don't think the way we do, and instead begin to do away with the inefficiencies of predicting and playing out human errors, that would put us in the position of doing the hard work of predicting what how they will act.

comment by Review Bot · 2024-02-13T21:18:51.755Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by skybrian · 2023-04-14T00:50:12.737Z · LW(p) · GW(p)

Yes, predicting some sequences can be arbitrarily hard. But I have doubts that LLM training will try to predict very hard sequences.

Suppose that some sequences are not only difficult but impossible to predict, because they're random? I would expect that with enough training, it would overfit and memorize them, because they get visited more than once in the training data. Memorization rather than generalization seems likely to happen for anything particularly difficult?

Meanwhile, there is a sea of easier sequences. Wouldn't it be more "evolutionarily profitable" to predict those instead? Pattern recognizers that predict easy sequences seem more likely to survive than pattern-recognizers that predict hard sequences. Maybe the recognizers for hard sequences would be so rarely used and make so little progress that they'd get repurposed?

Thinking like a compression algorithm, a pattern recognizer needs to be worth its weight, or you might as well leave the data uncompressed.

I'm reasoning by analogy here, so these are only possibilities. Someone will need to actually research what LLM's do. Does it work to think of LLM training as pattern-recognizer evolution? What causes pattern recognizers to be kept or dropped?

comment by dr_s · 2023-04-10T10:40:10.502Z · LW(p) · GW(p)

I think the devil may be in the details here. Ask GPT to hash you a word (let alone guess which word was the origin of a hash), it'll just put together some hash-like string of tokens. It's got the right length and the right character set (namely, it's a hex number), but otherwise, it's nonsense.

This ties into the whole "does it understand" question because it's a very simple example of a simple prediction question with a very deep underlying complexity in which a GPT doesn't perform much better than an N-gram Markov chain. Because there is a lot of complexity to the task that isn't exemplified at all in the simple password-hash pair, and no matter how many password-hash pairs you get, inferring the hashing algorithm is hard. So you end up just mimicking hash-like sequences and calling it a day. You parrot, but you don't understand.

How much of the rest does GPT-4 understand, though? It seems to have a thorough enough world model to at least use words always in sensible and appropriate ways. It seems to understand their meaning, if in a purely relational sense: <adjective> is a property of <noun>, which can perform <verb> or <verb>, and so on. That's something where there's a lot more examples, and the underlying complexity isn't nearly as much as the hashing algorithm (after all, language is designed to make the meaning clear, hashing is designed to obfuscate).

There's a question of which information is implicitly contained in a text, and which just isn't. If you ask GPT-4 to write a new scientific paper it simply can't come up with new empirical results, no matter how smart it is, if they're not implicitly derivable from its training set (that is, they're not actually novel, just an overlooked implication of existing knowledge). So that's a task no text prediction can possibly be up to, no matter how smart. Hashing is theoretically included in the training set (there's descriptions of the algorithm in there!) so not being able to do it definitely qualifies as "GPT-4 is too stupid to do this". Reverse hashing is different because there's no known general solution to it, so being able to do it in a few shots would make it vastly superhuman, but I can imagine there existing eventually an AI that can do it with a good success rate.

I suppose you could also make a distinction here between model and simulacrum. The model is an unfathomable shoggoth who is trained at a superhuman task, and is much better at it than most humans. The simulacra, aka the fake personalities that emerge when the model is asked to predict a dialogue between set characters, may be a lot stupider than that because they're pretend humans talking; they don't have prediction abilities, they are the result of predictions. ChatGPT as we know it isn't a model, but a simulacrum. The model is the invisible puppet master moving it behind the scenes.

Replies from: sanxiyn
comment by sanxiyn · 2023-04-10T11:15:45.184Z · LW(p) · GW(p)

Ask GPT to hash you a word (let alone guess which word was the origin of a hash), it'll just put together some hash-like string of tokens. It's got the right length and the right character set (namely, it's a hex number), but otherwise, it's nonsense.

But GPT can do base64 encoding. So what is the difference?

Replies from: skybrian, dr_s
comment by skybrian · 2023-04-14T01:33:06.479Z · LW(p) · GW(p)

Base64 encoding is a substitution cipher. Large language models seem to be good at learning substitutions.

comment by dr_s · 2023-04-10T11:39:04.860Z · LW(p) · GW(p)

What do you mean? Base64 encoding isn't the same as hashing. Besides, I expect it would be very easy to make a "toolformer" with a hashing module, or even access to a Python interpreter it can use to execute code it writes, but that's a different story. Perhaps you could even walk one step-by-step through the hashing algorithm I guess.

comment by Portia (Making_Philosophy_Better) · 2023-04-09T17:58:51.780Z · LW(p) · GW(p)

It is much, much easier for me to predict a text if I have seen a lot of similar texts beforehand, compared to if I have never seen such a text, and need to model the mind that is writing it with their knowledge and intentions, and the causal relations, to generate the result myself. I think the prime numbers are a good illustration here. I can easily imagine a machine learning algorithm that has seen people list prime numbers in order, and you can give it the first couple of primes and it will spit out the next couple, while having no idea what prime numbers are, let alone how to generate them, or how to predict further primes.

Have you asked ChatGPT to generate scientific papers? It is fascinating. They look like scientific papers. Except the experiments did not happen. The references lead into the void. The conclusions are nonsense.

Similar with continuing screenplay. They are excellent at capturing the voice of characters, but when asked to e.g. write a new Game of Thrones Ending, what they came up with, while surprising and involving the right characters and dragons and violence and moral greyness etc. etc. was littered with plot holes. E.g. their ending included Cersei having a hidden dragon under the red keep all along, which makes no sense at all.

I am surprised at how well they are doing. They are definitely indicating some appreciation of causal reasoning. They are doing some things I would not expect a stupid predictor to be able to predict. E.g. I am stunned that they can insert a character from one novel into another and make reasonable predictions, or follow along with some moral reasoning, or physics. There is some genuine intelligence there, not just stochastically parroting. E.g. you can speak nonsense*, or morse code, or remove all vowels from your words, and they will pick up on it and go along surprisingly quickly. (*Nonsense tends to actually be a lot less random than the humans producing it think.)

But you are mistaking the way you would predict the next piece of text for the only way to do it. This is closely related to the fact that you have not, in fact, read pretty much the whole internet. Humans are excellent at inferring a lot from very little. It is quite different to from what ChatGPT is doing.

Also, ChatGPT is not actually going for the likeliest prediction. They developers tried that at the beginning, and found the results dull, and tweaked them. In order to give continuations and responses that are interesting, inspiring etc., they are actually deviating somewhat from the likeliest next tokens.

comment by Michael Simkin (michael-simkin) · 2023-04-09T03:49:52.948Z · LW(p) · GW(p)

You are missing a whole stage of chatGPT training. They are first trained to predict words, but then they are reinforced by RLHF. This means they are trained to get rewarded when answering in a format that human evaluators are expected to estimate as "good response". Unlike the text prediction, that might belong to some random minds, here the focus is clear and the reward function is reflecting generalized preferences of OpenAI content moderators and content policy makers. This is stage where a text predictor, acquires his value system and preferences, this what makes him so "Friendly AI". 

for ChatGPT blog Introducing ChatGPT (openai.com)

Methods

We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format.

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.

comment by Jon Garcia · 2023-04-08T21:37:47.132Z · LW(p) · GW(p)

It seems to me that imitation requires some form of prediction in order to work. First make some prediction of the behavioral trajectory of another agent; then try to minimize the deviation of your own behavior from an equivalent trajectory. In this scheme, prediction constitutes a strict subset of the computational complexity necessary to enable imitation. How would GPT's task flip this around?

And if prediction is what's going on, in the much-more-powerful-than-imitation sense, what sort of training scheme would be necessary to produce pure imitation without also training the more powerful predictor as a prerequisite?

Replies from: Roman Leventov
comment by Roman Leventov · 2023-04-09T02:33:54.107Z · LW(p) · GW(p)

"Imitation of itself" constitutes prediction, although this phrase doesn't make much sense. Humans' "linguistic center" or "skill" predicts their own generated text with some veracity, but usually (unless the person is a professional linguist, or a translator, or a very talented writer) are bad at predicting others' generated text (i.e., styles).

So, one vector of superhumanness is that GPT is trained to predict extremely wide range of styles, of tens of thousands of notable writers and speakers across the training corpus.

Another vector of superhumanness is that GPTs are trained to produce this prediction autoregressively, "on the first try", whereas for people it may take many iterations to craft good writing, speech, or, perhaps most importantly, code. Then, since GPTs can match this skill "intuitively", in a single rollout, when GPTs are themselves applied iteratively, e.g. to iteratively critique and improve their own generation, this could produce a superhuman quality of code, or strategic planning, or rhetoric, etc.

comment by Shmi (shminux) · 2023-04-08T21:26:30.687Z · LW(p) · GW(p)

This seems like a testable hypothesis. What would it take to train a GPTx on Eliezer's writings and compare its output with the original? And then check if the EliezerGPT is immeasurably smarter than the original? 

Alternatively, since predicting Eliezer is in a way like inverting a one-way function, GPTx might top out way below the reasonably accurate predictability level, unless P=NP.

comment by DragonGod · 2023-04-08T21:43:57.459Z · LW(p) · GW(p)