LLMs are likely not conscious

post by research_prime_space · 2024-09-29T20:57:26.111Z · LW · GW · 4 comments

Contents

4 comments

I think the sparse autoencoder line of interpretability work is somewhat convincing evidence that LLMs are not conscious.

In order for me to consciously take in some information (e.g. the house is yellow), I need to store not only the contents of the statement but also some aspect of my conscious experience. I need to store more than the minimal number of bits it would take to represent "the house is yellow".

The sparse autoencoder line of work appears to suggest that LLMs essentially store "bits" that represent "themes" in the text they're processing, but close to nothing (at least in L2 norm) beyond that. And furthermore, this is happening in each layer. Thus, there doesn't appear to be any residual "space" that left over for storing aspects of consciousness.

4 comments

Comments sorted by top scores.

comment by Gunnar_Zarncke · 2024-09-29T21:56:37.882Z · LW(p) · GW(p)

I would remove that last paragraph. It doesn't add to your point and gives the impression that you might have a specific agenda.

Replies from: research_prime_space
comment by research_prime_space · 2024-09-29T22:21:23.545Z · LW(p) · GW(p)

I removed it. I don't have an agenda; I just included it because it changed my priors on the mechanism for human consciousness. So that subsequently affected my prior for whether or not AI could be conscious. 

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-30T01:57:01.811Z · LW(p) · GW(p)

I agree that this is some evidence, but perhaps not very strong evidence. We don't know for sure that the SAE latent we have chosen to label 'yellow' represents only an objective representation of yellow instead of both an objective and subjective representation of yellow. 

What is consciousness?  

What are its related (component? overlapping?) concepts like subjective point of view, self-awareness, and qualia? 

What do these look like in a model's weights?

Might these things be spread through many different concepts?

 I do think that the conclusion that current LLMs are not conscious is correct. However, I worry that this might not hold for long as architectures evolve. I expect that architectures which enable consciousness will be shown to have useful properties, and there will thus be pressure to develop and use them. I know some researchers are explicitly working on this already.

I support creating evals for consciousness so that we can determine empirically whether future models are conscious or not. Unfortunately, to objectively establish this we may need to learn more about the human brain and human consciousness, and/or deliberately create conscious models in order to study them. Such work, if mishandled, invites moral catastrophe. [LW · GW

Replies from: research_prime_space
comment by research_prime_space · 2024-09-30T04:51:17.785Z · LW(p) · GW(p)

It can't represent a subjective sense of yellow, because if so, consciousness would be a linear function. That's somewhat ridiculous because I would experience a story about a "dog" differently based on the context.

 Furthermore, LLMs scale "features" by how strongly they appear (e.g. the positive sentiment vector is scaled up if the text is very positive). So the LLM's conscious processing of a positive sentiment would be linearly proportional to how positive the text is. Which also seems ridiculous.

I don't expect consciousness to have any useful properties. Let's say you have a deterministic function y = f(x). You can encode just y = f(x), or y = f(x) where f includes conscious representations in the intermediate layers. The latter does not help you achieve increased training accuracy in the slightest. Neural networks also have a strong simplicity bias towards low frequency functions (this has been mathematically proven), and f(x) without consciousness is much simpler/lower frequency to encode than f(x) with consciousness.