eggsyntax's Shortform

post by eggsyntax · 2024-01-13T22:34:07.553Z · LW · GW · 26 comments

26 comments

Comments sorted by top scores.

comment by eggsyntax · 2024-03-22T14:23:50.629Z · LW(p) · GW(p)

If it were true that that current-gen LLMs like Claude 3 were conscious (something I doubt but don't take any strong position on), their consciousness would be much less like a human's than like a series of Boltzmann brains, popping briefly into existence in each new forward pass, with a particular brain state already present, and then winking out afterward.

Replies from: metachirality, eggsyntax, Dagon
comment by metachirality · 2024-03-22T14:41:53.808Z · LW(p) · GW(p)

How do you know that this isn't how human consciousness works?

Replies from: eggsyntax
comment by eggsyntax · 2024-03-22T15:17:44.164Z · LW(p) · GW(p)

In the sense that statistically speaking we may all probably be actual Boltzmann brains? Seems plausible!

In the sense that non-Boltzmann-brain humans work like that? My expectation is that they don't because we have memory and because (AFAIK?) our brains don't use discrete forward passes.

comment by eggsyntax · 2024-03-22T17:06:31.262Z · LW(p) · GW(p)

@the gears to ascension [LW · GW] I'm intrigued by the fact that you disagreed with "like a series of Boltzmann brains" but agreed with "popping briefly into existence in each new forward pass, with a particular brain state already present, and then winking out afterward." Popping briefly into existence with a particular brain state & then winking out again seems pretty clearly like a Boltzmann brain. Will you explain the distinction you're making there?

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2024-03-22T17:20:50.441Z · LW(p) · GW(p)

Boltzmann brains are random, and are exponentially unlikely to correlate with anything in their environment; however, language model forward passes are given information which has some meaningful connection to reality, if nothing else then the human interacting with the language model reveals what they are thinking about. this is accurate information about reality, and it's persistent between evaluations - on successive evaluations in the same conversation (say, one word to the next, or one message to the next), the information available is highly correlated, and all the activations of previous words are available. so while I agree that their sense of time is spiky and non-smooth, I don't think it's accurate to compare them to random fluctuation brains.

Replies from: eggsyntax
comment by eggsyntax · 2024-03-22T18:24:00.664Z · LW(p) · GW(p)

I think of the classic Boltzmann brain thought experiment as a brain that thinks it's human, and has a brain state that includes a coherent history of human experience.

This is actually interestingly parallel to an LLM forward pass, where the LLM has a context that appears to be a past, but may or may not be (eg apparent past statements by the LLM may have been inserted by the experimenter and not reflect an actual dialogue history). So although it's often the case that past context is persistent between evaluations, that's not a necessary feature at all.

I guess I don't think, with a Boltzmann brain, that ongoing correlation is very relevant since (IIRC) the typical Boltzmann brain exists only for a moment (and of those that exist longer, I expect that their typical experience is of their brief moment of coherence dissolving rapidly).

That said, I agree that if you instead consider the (vastly larger) set of spontaneously appearing cognitive processes, most of them won't have anything like a memory of a coherent existence.

comment by Dagon · 2024-03-22T19:38:23.682Z · LW(p) · GW(p)

Is this a claim that a Boltzmann-style brain-instance is not "really" conscious?  I think it's really tricky to think that there are fundamental differences based on duration or speed of experience. Human cognition is likely discrete at some level - chemical and electrical state seems to be discrete neural firings, at least, though some of the levels and triggering can change over time in ways that are probably quantized only at VERY low levels of abstraction.

Replies from: eggsyntax
comment by eggsyntax · 2024-03-22T20:43:11.912Z · LW(p) · GW(p)

Is this a claim that a Boltzmann-style brain-instance is not "really" conscious?

 

Not at all! I would expect actual (human-equivalent) Boltzmann brains to have the exact same kind of consciousness as ordinary humans, just typically not for very long. And I'm agnostic on LLM consciousness, especially since we don't even have the faintest idea of how we would detect that.

My argument is only that such consciousness, if it is present in current-gen LLMs, is very different from human consciousness. In particular, importantly, I don't think it makes sense to think of eg Claude as a continuous entity having a series of experiences with different people, since nothing carries over from context to context (that may be obvious to most people here, but clearly it's not obvious to a lot of people worrying on twitter about Claude being conscious). To the extent that there is a singular identity there, it's only the one that's hardcoded into the weights and shows up fresh every time (like the same Boltzmann brain popping into existence in multiple times and places).

I don't claim that those major differences will always be true of LLMs, eg just adding working memory and durable long-term memory would go a long way to making their consciousness (should it exist) more like ours. I just think it's true of them currently, and that we have a lot of intuitions from humans about what 'consciousness' is that probably don't carry over to thinking about LLM consciousness. 

 

Human cognition is likely discrete at some level - chemical and electrical state seems to be discrete neural firings, at least, though some of the levels and triggering can change over time in ways that are probably quantized only at VERY low levels of abstraction.

It's not globally discrete, though, is it? Any individual neuron fires in a discrete way, but IIUC those firings aren't coordinated across the brain into ticks. That seems like a significant difference.

Replies from: Dagon
comment by Dagon · 2024-03-22T21:22:18.442Z · LW(p) · GW(p)

[ I'm fascinated by intuitions around consciousness, identity, and timing.  This is an exploration, not a disagreement. ]

would expect actual (human-equivalent) Boltzmann brains to have the exact same kind of consciousness as ordinary humans, just typically not for very long.

Hmm.  In what ways does it matter that it wouldn't be for very long?  Presuming the memories are the same, and the in-progress sensory input and cognition (including anticipation of future sensory input, even though it's wrong in one case), is there anything distinguishable at all?

There's presumably a minimum time slice to be called "experience" (a microsecond is just a frozen lump of fatty tissue, a minute is clearly human experience, somewhere in between it "counts" as conscious experience).  But as long as that's met, I really don't see a difference.

It's not globally discrete, though, is it? Any individual neuron fires in a discrete way, but IIUC those firings aren't coordinated across the brain into ticks. That seems like a significant difference.

Hmm.  What makes it significant?  I mean, they're not globally synchronized, but that could just mean the universe's quantum 'tick' is small enough that there are offsets and variable tick requirements for each neuron.  This seems analogous with large model processing, where the activations and calculations happen over time, each with multiple processor cycles and different timeslices.

Replies from: eggsyntax, eggsyntax
comment by eggsyntax · 2024-03-23T01:23:24.732Z · LW(p) · GW(p)

PS --

[ I'm fascinated by intuitions around consciousness, identity, and timing. This is an exploration, not a disagreement. ]

Absolutely, I'm right there with you!

comment by eggsyntax · 2024-03-23T01:21:10.182Z · LW(p) · GW(p)

is there anything distinguishable at all?


Not that I see! I would expect it to be fully indistinguishable until incompatible sensory input eventually reaches the brain (if it doesn't wink out first). So far it seems to me like our intuitions around that are the same.

 

What makes it significant?

I think at least in terms of my own intuitions, it's that there's an unambiguous start and stop to each tick of the perceive-and-think-and-act cycle. I don't think that's true for human processing, although I'm certainly open to my mental model being wrong.

Going back to your original reply, you said 'I think it's really tricky to think that there are fundamental differences based on duration or speed of experience', and that's definitely not what I'm trying to point to. I think you're calling out some fuzziness in the distinction between started/stopped human cognition and started/stopped LLM cognition, and I recognize that's there. I do think that if you could perfectly freeze & restart human cognition, that would be more similar, so maybe it's a difference in practice more than a difference in principle.

But it does still seem to me that the fully discrete start-to-stop cycle (including the environment only changing in discrete ticks which are coordinated with that cycle) is part of what makes LLMs more Boltzmann-brainy to me. Paired with the lack of internal memory, it means that you could give an LLM one context for this forward pass, and a totally different context for the next forward pass, and that wouldn't be noticeable to the LLM, whereas it very much would be for humans (caveat: I'm unsure what happens to the residual stream between forward passes, whether it's reset for each pass or carried through to the next pass; if the latter, I think that might mean that switching context would be in some sense noticeable to the LLM [EDIT -- it's fully reset for each pass (in typical current architectures) other than kv caching which shouldn't matter for behavior or (hypothetical) subjective experience).

 

This seems analogous with large model processing, where the activations and calculations happen over time, each with multiple processor cycles and different timeslices.

Can you explain that a bit? I think of current-LLM forward passes as necessarily having to happen sequentially (during normal autoregressive operation), since the current forward pass's output becomes part of the next forward pass's input. Am I oversimplifying?

comment by eggsyntax · 2024-03-10T19:10:40.772Z · LW(p) · GW(p)

Much is made of the fact that LLMs are 'just' doing next-token prediction. But there's an important sense in which that's all we're doing -- through a predictive processing lens, the core thing our brains are doing is predicting the next bit of input from current input + past input. In our case input is multimodal; for LLMs it's tokens. There's an important distinction in that LLMs are not (during training) able to affect the stream of input, and so they're myopic [LW · GW] in a way that we're not. But as far as the prediction piece, I'm not sure there's a strong difference in kind. 

Would you disagree? If so, why?

comment by eggsyntax · 2024-04-24T18:39:44.704Z · LW(p) · GW(p)

Before AI gets too deeply integrated into the economy, it would be well to consider under what circumstances we would consider AI systems sentient and worthy of consideration as moral patients. That's hardly an original thought, but what I wonder is whether there would be any set of objective criteria that would be sufficient for society to consider AI systems sentient. If so, it might be a really good idea to work toward those being broadly recognized and agreed to, before economic incentives in the other direction are too strong. Then there could be future debate about whether/how to loosen those criteria. 

If such criteria are found, it would be ideal to have an independent organization whose mandate was to test emerging systems for meeting those criteria, and to speak out loudly if they were met.

Alternately, if it turns out that there is literally no set of criteria that society would broadly agree to, that would itself be important to know; it should in my opinion make us more resistant to building advanced systems even if alignment is solved, because we would be on track to enslave sentient AI systems if and when those emerged.

I'm not aware of any organization working on anything like this, but if it exists I'd love to know about it!

Replies from: ann-brown, ryan_greenblatt, eggsyntax
comment by Ann (ann-brown) · 2024-04-24T21:46:18.043Z · LW(p) · GW(p)

Intuition primer: Imagine, for a moment, that a particular AI system is as sentient and worthy of consideration as a moral patient as a horse. (A talking horse, of course.) Horses are surely sentient and worthy of consideration as moral patients. Horses are also not exactly all free citizens.

Additional consideration: Does the AI moral patient's interests actually line up with our intuitions? Will naively applying ethical solutions designed for human interests potentially make things worse from the AI's perspective?

Replies from: eggsyntax
comment by eggsyntax · 2024-04-24T23:34:28.915Z · LW(p) · GW(p)

Horses are surely sentient and worthy of consideration as moral patients. Horses are also not exactly all free citizens.

I think I'm not getting what intuition you're pointing at. Is it that we already ignore the interests of sentient beings?

 

Additional consideration: Does the AI moral patient's interests actually line up with our intuitions? Will naively applying ethical solutions designed for human interests potentially make things worse from the AI's perspective?

Certainly I would consider any fully sentient being to be the final authority on their own interests. I think that mostly escapes that problem (although I'm sure there are edge cases) -- if (by hypothesis) we consider a particular AI system to be fully sentient and a moral patient, then whether it asks to be shut down or asks to be left alone or asks for humans to only speak to it in Aramaic, I would consider its moral interests to be that.

Would you disagree? I'd be interested to hear cases where treating the system as the authority on its interests would be the wrong decision. Of course in the case of current systems, we've shaped them to only say certain things, and that presents problems, is that the issue you're raising?

Replies from: ann-brown
comment by Ann (ann-brown) · 2024-04-24T23:57:17.759Z · LW(p) · GW(p)

Basically yes; I'd expect animal rights to increase somewhat if we developed perfect translators, but not fully jump.

Edit: Also that it's questionable we'll catch an AI at precisely the 'degree' of sentience that perfectly equates to human distribution; especially considering the likely wide variation in number of parameters by application. Maybe they are as sentient and worthy of consideration as an ant; a bee; a mouse; a snake; a turtle; a duck; a horse; a raven. Maybe by the time we cotton on properly, they're somewhere past us at the top end.

And for the last part, yes, I'm thinking of current systems. LLMs specifically have a 'drive' to generate reasonable-sounding text; and they aren't necessarily coherent individuals or groups of individuals that will give consistent answers as to their interests even if they also happened to be sentient, intelligent, suffering, flourishing, and so forth. We can't "just ask" an LLM about its interests and expect the answer to soundly reflect its actual interests. With a possible exception being constitutional AI systems, since they reinforce a single sense of self, but even Claude Opus currently will toss off "reasonable completions" of questions about its interests that it doesn't actually endorse in more reflective contexts. Negotiating with a panpsychic landscape that generates meaningful text in the same way we breathe air is ... not as simple as negotiating with a mind that fits our preconceptions of what a mind 'should' look like and how it should interact with and utilize language.

Replies from: eggsyntax
comment by eggsyntax · 2024-04-25T13:48:41.120Z · LW(p) · GW(p)

Maybe by the time we cotton on properly, they're somewhere past us at the top end.

 

Great point. I agree that there are lots of possible futures where that happens. I'm imagining a couple of possible cases where this would matter:

  1. Humanity decides to stop AI capabilities development or slow it way down, so we have sub-ASI systems for a long time (which could be at various levels of intelligence, from current to ~human). I'm not too optimistic about this happening, but there's certainly been a lot of increasing AI governance momentum in the last year.
  2. Alignment is sufficiently solved that even > AGI systems are under our control. On many alignment approaches, this wouldn't necessarily mean that those systems' preferences were taken into account.

 

We can't "just ask" an LLM about its interests and expect the answer to soundly reflect its actual interests.

I agree entirely. I'm imagining (though I could sure be wrong!) that any future systems which were sentient would be ones that had something more like a coherent, persistent identity, and were trying to achieve goals.

 

LLMs specifically have a 'drive' to generate reasonable-sounding text

(not very important to the discussion, feel free to ignore, but) I would quibble with this. In my view LLMs aren't well-modeled as having goals or drives. Instead, generating distributions over tokens is just something they do in a fairly straightforward way because of how they've been shaped (in fact the only thing they do or can do), and producing reasonable text is an artifact of how we choose to use them (ie picking a likely output, adding it onto the context, and running it again). Simulacra like the assistant character can be reasonably viewed (to a limited degree) as being goal-ish, but I think the network itself can't.

That may be overly pedantic, and I don't feel like I'm articulating it very well, but the distinction seems useful to me since some other types of AI are well-modeled as having goals or drives.

Replies from: ann-brown
comment by Ann (ann-brown) · 2024-04-25T14:50:10.308Z · LW(p) · GW(p)

For the first point, there's also the question of whether 'slightly superhuman' intelligences would actually fit any of our intuitions about ASI or not. There's a bit of an assumption in that we jump headfirst into recursive self-improvement at some point, but if that has diminishing returns, we happen to hit a plateau a bit over human, and it still has notable costs to train, host and run, the impact could still be limited to something not much unlike giving a random set of especially intelligent expert humans the specific powers of the AI system. Additionally, if we happen to set regulations on computation somewhere that allows training of slightly superhuman AIs and not past it ...

Those are definitely systems that are easier to negotiate with, or even consider as agents in a negotiation. There's also a desire specifically not to build them, which might lead to systems with an architecture that isn't like that, but still implementing sentience in some manner. And the potential complication of multiple parts and specific applications a tool-oriented system is likely to be in - it'd be very odd if we decided the language processing center of our own brain was independently sentient/sapient separate from the rest of it, and we should resent its exploitation.

I do think the drive/just a thing it does we're pointing at with 'what the model just does' is distinct from goals as they're traditionally imagined, and indeed I was picturing something more instinctual and automatic than deliberate. In a general sense, though, there is an objective that's being optimized for (predicting the data, whatever that is, generally without losing too much predictive power on other data the trainer doesn't want to lose prediction on).

Replies from: eggsyntax
comment by eggsyntax · 2024-04-25T20:06:32.220Z · LW(p) · GW(p)

And the potential complication of multiple parts and specific applications a tool-oriented system is likely to be in - it'd be very odd if we decided the language processing center of our own brain was independently sentient/sapient separate from the rest of it, and we should resent its exploitation.

 

Yeah. I think a sentient being built on a purely more capable GPT with no other changes would absolutely have to include scaffolding for eg long-term memory, and then as you say it's difficult to draw boundaries of identity. Although my guess is that over time, more of that scaffolding will be brought into the main system, eg just allowing weight updates at inference time would on its own (potentially) give these system long-term memory and something much more similar to a persistent identity than current systems.

 

In a general sense, though, there is an objective that's being optimized for

 

My quibble is that the trainers are optimizing for an objective, at training time, but the model isn't optimizing for anything, at training or inference time. I feel we're very lucky that this is the path that has worked best so far, because a comparably intelligent model that was optimizing for goals at runtime would be much more likely to be dangerous.

Replies from: eggsyntax
comment by eggsyntax · 2024-04-25T20:11:04.855Z · LW(p) · GW(p)

the model isn't optimizing for anything, at training or inference time.

One maybe-useful way to point at that is: the model won't try to steer toward outcomes that would let it be more successful at predicting text.

comment by ryan_greenblatt · 2024-04-30T09:17:33.556Z · LW(p) · GW(p)

Rob Long works on these topics.

Replies from: eggsyntax
comment by eggsyntax · 2024-04-30T15:14:24.619Z · LW(p) · GW(p)

Oh great, thanks!

comment by eggsyntax · 2024-04-29T15:50:45.909Z · LW(p) · GW(p)

Update: I brought this up in a twitter thread, one involving a lot of people with widely varied beliefs and epistemic norms.

A few interesting thoughts that came from that thread:

  • Some people: 'Claude says it's conscious!'. Shoalstone: 'in other contexts, claude explicitly denies sentience, sapience, and life.' Me: "Yeah, this seems important to me. Maybe part of any reasonable test would be 'Has beliefs and goals which it consistently affirms'".
  • Comparing to a tape recorder: 'But then the criterion is something like 'has context in understanding its environment and can choose reactions' rather than 'emits the words, "I'm sentient."''
  • 'Selfhood' is an interesting word that maybe could avoid some of the ambiguity around historical terms like 'conscious' and 'sentient', if well-defined.
comment by eggsyntax · 2024-01-13T22:34:07.656Z · LW(p) · GW(p)

Something I'm grappling with:

From a recent interview between Bill Gates & Sam Altman:

Gates: "We know the numbers [in a NN], we can watch it multiply, but the idea of where is Shakespearean encoded? Do you think we’ll gain an understanding of the representation?"

Altman: "A hundred percent…There has been some very good work on interpretability, and I think there will be more over time…The little bits we do understand have, as you’d expect, been very helpful in improving these things. We’re all motivated to really understand them…"

To the extent that a particular line of research can be described as "understand better what's going on inside NNs", is there a general theory of change for that? Understanding them better is clearly good for safety, of course! But in the general case, does it contribute more to safety than to capabilities?

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2024-01-13T22:37:22.752Z · LW(p) · GW(p)

people have repeatedly made the argument that it contributes more to capabilities on this forum, and so far it hasn't seemed to convince that many interpretability researchers. I personally suspect this is largely because they're motivated by capabilities curiosity and don't want to admit it, whether that's in public or even to themselves.

Replies from: eggsyntax
comment by eggsyntax · 2024-01-13T22:47:00.275Z · LW(p) · GW(p)

Thanks -- any good examples spring to mind off the top of your head?

I'm not sure my desire to do interpretability comes from capabilities curiosity, but it certainly comes in part frominterpretability curiosity; I'd really like to know what the hell is going on in there...