[Linkpost] Large Language Models Converge on Brain-Like Word Representations

bogdan-ionut-cirstea

[Linkpost] Large Language Models Converge on Brain-Like Word Representations

post by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-06-11T11:20:09.078Z · LW · GW · 12 comments

12 comments

This is a linkpost for https://arxiv.org/abs/2306.01930

One of the greatest puzzles of all time is how understanding arises from neural mechanics. Our brains are networks of billions of biological neurons transmitting chemical and electrical signals along their connections. Large language models are networks of millions or billions of digital neurons, implementing functions that read the output of other functions in complex networks. The failure to see how meaning would arise from such mechanics has led many cognitive scientists and philosophers to various forms of dualism -- and many artificial intelligence researchers to dismiss large language models as stochastic parrots or jpeg-like compressions of text corpora. We show that human-like representations arise in large language models. Specifically, the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging.

12 comments

Comments sorted by top scores.

comment by Max H (Maxc) · 2023-06-11T18:04:06.015Z · LW(p) · GW(p)

A possible explanation: both brains and LLMs are somehow solving the symbol grounding problem [? · GW]. It may be that the most natural solutions to this problem share commonalities, or even that all solutions are necessarily isomorphic to each other.

Anyone who has played around with LLMs for a while can see that they are not just "stochastic parrots", but I think it's a pretty big leap to call anything within them "human-like" or "brain-like".

If an AI (perhaps a GOFAI or just an ordinary computer program) implements addition using the standard algorithm for multi-digit addition that humans learn in elementary school, does that make the AI human-like? Maybe a little, but it seems less misleading to say that the method itself is just a natural way of solving the same underlying problem. The fact that AIs are becoming capable of solving more complex problems that were previously only solvable by human brains seems more like a fact about a general increase in AI capabilities, than a result of AI systems getting more "brain-like".

To say that any system which solves a problem via similar methods to humans is brain-like, seems like it is unfairly privileging the specialness / uniqueness of the brain. Claims like that (IMO wrongly) suggestively imply that those solutions somehow "belong" to the brain, simply because that is where we first observed them.

Replies from: o-o, sharmake-farah, jake-jenks

↑ comment by O O (o-o) · 2023-06-11T22:08:53.224Z · LW(p) · GW(p)

To say that any system which solves a problem via similar methods to humans is brain-like, seems like it is unfairly privileging the specialness / uniqueness of the brain. Claims like that (IMO wrongly) suggestively imply that those solutions somehow "belong" to the brain, simply because that is where we first observed them.

The brain isn't exactly some arbitrary set of parameters picked from a mindspace, it's the most statistically likely general intelligence to form from evolutionary mechanisms on a mammalian brain. Presumably the processes it uses are the simplest to build bottom up so the claim is misguided but it isn't entirely wrong.

↑ comment by Noosphere89 (sharmake-farah) · 2023-06-11T18:31:51.138Z · LW(p) · GW(p)

Anyone who has played around with LLMs for a while can see that they are not just "stochastic parrots", but I think it's a pretty big leap to call anything within them "human-like" or "brain-like".

To a large extent, this describes my new views on LLM capabilities, too, especially transformers. Missing important aspects of human cognition, but it's not a useless stochastic parrot, like some of the more dismissive people claim it is.

↑ comment by jakej (jake-jenks) · 2023-06-12T17:07:38.394Z · LW(p) · GW(p)

To me, it really looks like brains and LLMs are both using embedding spaces to represent information. Embedding spaces ground symbols by automatically relating all concepts they contain, including the grammar for manipulating these concepts.

Replies from: bogdan-ionut-cirstea

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-06-12T22:05:05.603Z · LW(p) · GW(p)

There are some papers suggesting this could indeed be the case, at least for language processing e.g. Shared computational principles for language processing in humans and deep language models, Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain.

comment by Vladimir_Nesov · 2023-06-11T21:46:40.827Z · LW(p) · GW(p)

Prediction/compression seems to be working out as a path to general intelligence, implicitly representing situations in terms of their key legible features, making it easy to formulate policies appropriate for a wide variety of instrumental objectives, in a wide variety of situations, without having to adapt the representation for particular kinds of objectives or situations. To the extent brains engage in predictive processing, they are plausibly going to compute related representations. (This doesn't ensure alignment, as there are many different ways of making use of these features, of acting differently in the same world.)

Replies from: bogdan-ionut-cirstea

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-06-12T22:09:33.649Z · LW(p) · GW(p)

Yes, predictive processing as the reason behind related representations has been the interpretation in a few papers, e.g. The neural architecture of language: Integrative modeling converges on predictive processing. There's also some pushback against this interpretation though, e.g. Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data.

comment by Ilio · 2023-06-12T17:00:37.815Z · LW(p) · GW(p)

Big achievement, even if nobody should be surprised (it’s been known for vision for a decade or so).

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003963

@anyone To those who believe a future AGI might pick its value at random: don’t you think this result suggests it should restricts its pick to something human langage and visiospatial cognition push for?

Replies from: bogdan-ionut-cirstea

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-06-12T22:24:34.385Z · LW(p) · GW(p)

Yes, there are similar results in a bunch of other domains, including vision, see for a review e.g. The neuroconnectionist research programme [LW · GW].

I wouldn't interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs, some/many of which are probably relevant for alignment [LW · GW].

Replies from: Ilio

↑ comment by Ilio · 2023-06-13T00:59:34.812Z · LW(p) · GW(p)

wouldn't interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs

I fail to see how the latter could arise without the former. Would you mind to connect these dots?

Replies from: bogdan-ionut-cirstea

↑ comment by Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-06-13T07:31:15.365Z · LW(p) · GW(p)

AIs could have representations of human values without being motivated to pursue them; also, their representations could be a superset of human representations.

(In practice, I do think having overlapping representations with human values likely helps, for reasons related to e.g. Predicting Inductive Biases of Pre-Trained Models and Alignment with human representations supports robust few-shot learning.)

Replies from: Ilio

↑ comment by Ilio · 2023-06-13T17:18:23.004Z · LW(p) · GW(p)

Indeed their representations could form a superset of human representations, and that’s why it’s not random. Or, equivalently, it’s random but not under uniform prior.

(Yes, these further works are more evidence for « it’s not random at all », as if LLMs were discovering (some of) the same set of principles that allows our brains to construct/use our language rather than creating completely new cognitive structures. That’s actually reminiscent of alphazero converging toward human style without training on human input.)

[Linkpost] Large Language Models Converge on Brain-Like Word Representations

Contents

12 comments