A learned agent is not the same as a learning agent

post by Ben Amitay (unicode-70) · 2022-12-16T17:27:28.037Z · LW · GW · 5 comments

Contents

  Edit: what did I miss that may change the conclusion?
None
5 comments

[Edit: after reading the comments and thinking more about in-context learning, I don't endorse most of what's written here. I explain why I'm the end of the post.]

I notice a common confusion when people talk about deep learning. Before I try to describe it in general, let’s start with an example.

Like everyone, I lately had many conversations with friends about ChatGPT. A friend of mine said that while ChatGPT is indeed impressive, it highlights how amazing is the ability of human children to learn language from much less data. While I strongly share the sentiment, I think that the comparison is wrong. As Chomsky hypothesized, children’s ability to learn language from experience is so amazing, that we should doubt that this is really what happens. The child’s hearing language is probably merely fine-tuning to the shallow difference between human languages – building on a very strong prior knowledge about language general shape. The prior knowledge is mostly based on evolution’s “experience” with your great-great aunt who misunderstood instructions and ate poisonous mushrooms as a result – and many similar circumstances. Experience that was in turn aggregated to that of many humanoid great-great uncles who died in fights for dominance after mistaking their opponent’s threats for a bluff. And on the birth and death of earlier creatures, with brains that are something between genius controllers and very stupid agents, exploiting simple patterns in their environments[1]

What was my friend’s source of confusion? Probably that he did not properly distinguish the algorithm that generated ChatGPT as its output from the algorithm that is ChatGPT itself. If you want to compare ChatGPT itself to a human, you should compare its training loop to evolution, not to a human. Did training ChatGPT require more data than human evolution? I’m honestly not sure. But if I want to evaluate ChatGPT’s own learning abilities rather than those of the training loop, I should focus on how it uses information from earlier in a long conversation, as “earlier in the conversation” is it’s equivalent to “earlier in life”.  Not information from his training data – which is equivalent to the unfortunate death of my greate-greate aunt. Even with this interpretation of ChatGPT “learning”, it probably doesn’t compare that well with humans – but for different reasons that I’ll mention bellow.

Instead of using the confusing analogy that "artificial neural networks are somewhat similar to neural networks in the brain," it would be more accurate to use the following analogy:

Human (brain)

Trained model

DNA

Weights of the network

Developmental processes translating DNA to a grown human.

Network architecture (?)

Evolution

Training loop

Evolutionary pressure

(The gradient of?) the loss function

Human knowledge

Vector representation of past interactions since the model’s deployment in the environment. In the case of ChatGPT – since the beginning of the conversation.

To make sure that the analogy is fruitfull, let me finish with some shorter points that derive from it:

Edit: what did I miss that may change the conclusion?


[1]And on much non-linguistic information about the world that the language intends to describe, and that ChatGPT have no access to – but this a story for another day.

5 comments

Comments sorted by top scores.

comment by Adam Scherlis (adam-scherlis) · 2022-12-16T19:56:42.331Z · LW(p) · GW(p)

I think this is partly true but mostly wrong.

A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..

There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.

Humans also spend a lot of time learning and have long childhoods, after which they have tons of knowledge that (I assert) could never have been crammed into a few dozen or hundred megabytes.

So I think something like 99.9% of what humans "know" (in the sense of their synaptic strengths) is learned during their lives, from their experiences.

This makes them basically disanalogous to neural nets.

Neural net (LLM):

  • Extremely concise architecture (kB's of code) contains inductive biases
  • Lots of pretraining (billions of tokens or optimizer steps) produces 100s of billions of parameters of pretrained knowledge e.g. Lincoln
  • Smaller fine-tuning stage produces more specific behavior e.g. chatgpt's distinctive "personality", stored in the same parameters
  • Tiny amount of in-context learning (hundreds or thousands of tokens) involves things like induction heads and lets the model incorporate information from anywhere in the prompt in its response

Humans:

  • Enormous amount of evolution (thousands to millions of lifetimes?) produces a relatively small genome (millions of base pairs, or maybe a billion)
  • Much shorter amount of experience in childhood (and later) produces many trillions of synapses' worth of knowledge and learned skills
  • Short term memory, phonological loop, etc lets humans make use of temporary information from the recent environment

You're analogizing pretraining to evolution, which seems wrong to me (99.9% of human synaptic information comes from our own experiences); I'd say it's closer to inductive bias from the architecture, but neural nets don't have a bottleneck analogous to the genome.

In-context learning seems even more disanalogous to a human lifetime of experiences, because the pretrained weights of a neural net massively dwarf the context window or residual stream in terms of information content, which seems closer to the situation with total human synaptic strengths vs short-term memory (rather than genome vs learned synaptic strengths).

I would be more willing to analogize human experiences/childhood/etc to fine tuning, but I think the situation is just pretty different with regards to relative orders of magnitude, because of the gene bottleneck.

Replies from: unicode-70, unicode-70, lahwran
comment by Ben Amitay (unicode-70) · 2023-01-29T07:26:10.499Z · LW(p) · GW(p)

Was eventually convinced in most of your points, and added a long mistakes-list in the end of the post. I would really appreciate comments on the list, as I don't feel fully converged on the subject yet

comment by Ben Amitay (unicode-70) · 2022-12-18T10:39:06.022Z · LW(p) · GW(p)

I think we have much more disagreement about psychology than about AI, though I admit to low certainty about the psychology too.

About AI, my point was that in understand the problem, the training loop take roughly the role of evolution and the model take that off the evolved agent - with implications to comparison of success, and possibly to identifying what's missing. I did refer to the fact that algorithmically we took ideas from the human brain to the training loop, and it therefore make sense for it to be algorithmically more analogous to the brain. Given that clarification - do you still mostly disagree? (If not - how do you recommend to change the post and make it clearer?)

Adding "short term memory" to the picture is interesting, but then it's there any mechanism for it to become long-term?

About the psychology: I do find the genetic bottleneck argument intuitively convincing, but think that we have reasons to distrust this intuition. There is often huge disparity between data in its most condensed form, and data in a form that is convenient to use in deployment. Think about the difference in length between a code written in functional/declarative language, and it's assembly code. I have literally no intuition as to what can be done with 10 megabytes of condensed python - but I guess that it is more than enough to automate a human, if you know what code to write. While there probably is a lot of redundancy in the genome, it seem as least likely that there is huge redundancy of synapses, as their use is not just to store information, but mostly to fit the needed information manipulations.

comment by the gears to ascension (lahwran) · 2022-12-16T23:42:41.885Z · LW(p) · GW(p)

yeah. evolution = grad student descent, automl, etc dna = training code epigenetics = hyperparams gestation = weight init, with a lot of built-in preset weights, plus a huge mostly-tiled neocortex Developmental processes = training loop

comment by Ben Amitay (unicode-70) · 2022-12-16T17:30:14.351Z · LW(p) · GW(p)

Thanks to my girlfriend and to ChatGPT for editing advise