Posts
Comments
I feel as if I can agree with this statement in isolation, but can't think of a context where I would consider this point relevant.
I'm not even talking about the question of whether or not the AI is sentient, which you asked us to ignore. I'm talking about how do we know that an AI is "suffering," even if we do assume it's sentient. What exactly is "suffering" in something that is completely cognitively distinct from a human? Is it just negative reward signals? I don't think so, or at least if it was, that would likely imply that training a sentient AI is unethical in all cases, since training requires negative signals.
That's not to say that all negative signals are the same or that maybe in some contexts it's painful or not, just that I think determining that is an even harder problem than determining if the AI is sentient.
Fair enough. But for the purposes of this post, the point is that capability increased without increased compute. If you prefer, bucket it as "compute" vs "non-compute" instead of "compute" vs "algorithmic".
I think whether or not it's trivial isn't the point: they did it, it worked, and they didn't need to increase the compute to make it happen.
I agree. I made this point and that is why I did not try to argue that LLMs did not have qualia.
But I do believe you can consider necessary conditions and look at their absence. For instance, I can safely declare that a rock does not have qualia, because I know it does not have a brain.
Similarly, I may not be able to measure whether LLMs have emotions, but I can observe that the processes that generated LLMs are highly inconsistent with the processes that caused emotions to emerge in the only case where I know they exist. Pair that with the observation that specific human emotions seem like only one option out of infinitely many, and it makes a strong probabilistic argument.
This is sort of why I made the argument that we can only consider necessary conditions, and look for their absence.
But more to your point, LLMs and human brains aren't "two agents that are structurally identical." They aren't even close. The fact that a hypothetical built-from-scratch human brain might have the same qualia as humans isn't relevant, because that's not what's being discussed.
Also, unless your process was precisely "attempt to copy the human brain," I find it very unlikely that any AI development process would yield something particularly similar to a human brain.
I have explained myself more here: https://www.lesswrong.com/posts/EwKk5xdvxhSn3XHsD/don-t-over-anthropomorphize-ai
OK, I've written a full rebuttal here: https://www.lesswrong.com/posts/EwKk5xdvxhSn3XHsD/don-t-over-anthropomorphize-ai. The key points are at the top.
In relation to your comment specifically, I would say that anger may have that effect on the conversation, but there's nothing that actually incentivizes the system to behave that way - the slightest hint of anger or emotion would be immediate negative reward during RLHF training. Compare to a human: There may actually be some positive reward to anger, but even if there isn't evolution still allowed to get angry because we are mesa-optimizers where that has a positive effect overall.
Therefore, the system learned angry behavior in stage-1 training. But that has no reward structure, and therefore could not associate different texts to different qualia.
Hmmm... I think I still disagree, but I'll need to process what you're saying and try to get more into the heart of my disagreement. I'll respond when I've thought it over.
Thank you for the interesting debate. I hope you did not perceive as me being overly combative.
I see, but I'm still not convinced. Humans behave in anger as a way to forcibly change a situation into one that is favorable to itself. I don't believe that's what the AI was doing, or trying to do.
I feel like there's a thin line I'm trying to walk here, and I'm not doing a very good job. I'm not trying to comment on whether or not the AI has any sort of subjective experience. I'm just saying that even if it did, I do not believe it would bare any resemblance to what we as humans experience as anger.
Ah okay. My apologies for misunderstanding.
Okay, sure. But those "bugs" are probably something the AI risk community should take seriously.
I would argue that "models generated by RL-first approaches" are not more likely to be the primary threat to humanity, because those models are unlikely to yield AGI any time soon. I personally believe this is a fundamental fact about RL-first approaches, but even if it wasn't it's still less likely because LLMs are what everyone is investing in right now and it seems plausible that LLMs could achieve AGI.
Also, by what mechanism would Bing's AI actually be experiencing anger? The emotion of anger in humans is generally associated with a strong negative reward signal. The behaviors that Bing exhibited were not brought on by any associated negative reward, it was just contextual text completion.
Those are examples of LLMs being rational. LLMs are often rational and will only get better at being rational as they improve. But I'm trying to focus on the times when LLMs are irrational.
I agree that AI is aggregating it's knowledge to perform rationally. But that still doesn't mean anything with respect to its capacity to be irrational.
Imagine a graph with "LLM capacity" on the x axis and "number of irrational failure modes" on the y axis. Yes, there's a lot of evidence this line slopes downward. But there is absolutely no guarantee that it reaches zero before whatever threshold gets us to AGI.
And I did say that I didn't consider the rationality of GPT systems fake just because it was emulated. That said, I don't totally agree with EY's post - LLMs are in fact imitators. Because they're very good imitators, you can tell them to imitate something rational and they'll do a really good job being rational. But being highly rational is still only one of many possible things it can be.
And it's worth remembering that the image at the top of this post was powered by GPT-4. It's totally possible LLM-based AGI will be smart enough not to fail this way, but it is not guaranteed and we should consider it a real risk.
Fair enough, once again I concede your point about definitions. I don't want to play that game either.
But I do have a point which I think is very relevant to the topic of AI Risk: rationality in LLMs is incidental. It exists because the system is emulating rationality it has seen elsewhere. That doesn't make it "fake" rationality, but it does make it brittle. It means that there's a failure mode where the system stops emulating rationality, and starts emulating something else.
I was aware of that, and maybe my statement was too strong, but fundamentally I don't know if I agree that you can just claim that it's rational even though it doesn't produce rational outputs.
Rationality is the process of getting to the outputs. What I was trying to talk about wasn't scholarly disposition or non-eccentricity, but the actual process of deciding goals.
Maybe another way to say it is this: LLMs are capable of being rational, but they are also capable of being extremely irrational, in the sense that, to quote EY, their behavior is not a form of "systematically promot[ing] map-territory correspondences or goal achievement." There is nothing about the pre-training that directly promotes this type of behavior, and any example of this behavior in fundamentally incidental.
Fair enough. Thank you for the feedback. I have edited the post to elaborate on what I mean.
I wrote it the way I did because I took the statement as obviously true and didn't want to be seen as claiming the opposite. Clearly that understanding was incorrect.
To that first sentence, I don't want to get lost in semantics here. My specific statement is that the process that takes DNA into a human is probabilistic with respect to the DNA sequence alone. Add in all that other stuff, and maybe at some point it becomes deterministic, but at that point you are no longer discussing the <1GB that makes DNA. If you wanted to be truly deterministic, especially up to the age of 25, I seriously doubt it could be done in less than millions of petabytes, because there are such a huge number of miniscule variations in conditions and I suspect human development is a highly chaotic process.
As you said, though, we're at the point of minor nitpicks here. It doesn't have to be a deterministic encoding for your broader points to stand.
I think you're broadly right, but I think it's worth mentioning that DNA is a probabilistic compression (evidence: differences in identical twins), so it gets weird when you talk about compressing an adult at age 25 - what is probabilistic compression at that point?
But I think you've mostly convinced me. Whatever it takes to "encode" a human, it's possible to compress it to be something very small.
My objection applied at a different level of reasoning. I would argue that anyone who isn't blind understands light at the level I'm talking about. You understand that the colors you see are objects because light is bouncing off them and you know how to interpret that. If you think about it, starting from zero I'm not sure that you would recognize shapes in pictures as objects.
I guess so? I'm not sure what point you're making, so it's hard for me to address it.
My point is that if you want to build something intelligent, you have to do a lot of processing and there's no way around it. Playing several million games of Go counts as a lot of processing.
Yeah, I agree that it's a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I'm finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That's sort of what I was getting at, and I'm not totally sure the complexity of that process wouldn't add up to a lot more than 1GB.
It's tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make more sense to think of DNA only as the input to a very large function which outputs a human. It seems strange, but it's not like anyone's ever built a human (or any other organism) in a lab from DNA alone; it's definitely possible that there's a huge amount of information stored in the processes of a living human which isn't sufficiently encoded just by DNA.
You don't even have to zoom out to things like organs or the brain. Just knowing which bases match to which amino acids is an (admittedly simple) example of processing that exists outside of the DNA encoding itself.
To your point about the particle filter, my whole point is that you can’t just assume the super intelligence can generate an infinite number of particles, because that takes infinite processing. At the end of the day, superintelligence isn’t magic - those hypotheses have to come from somewhere. They have to be built, and they have to be built sequentially. The only way you get to skip steps is by reusing knowledge that came from somewhere else.
Take a look at the game of Go. The computational limits on the number of games that could be simulated made this “try everything” approach essentially impossible. When Go was finally “solved”, it was with an ML algorithm that proposed only a limited number of possible sequences - it was just that the sequences it proposed were better.
But how did it get those better moves? It didn’t pull them out of the air, it used abstractions it had accumulated form playing a huge number of games.
_____
I do agree with some of the things you’re saying about architecture, though. Sometimes inductive bias imposes limitations. In terms of hypotheses, it can and does often put hard limits on which hypotheses you can consider, period.
I also admit I was wrong and was careless in saying that inductive bias is just information you started with. But I don’t think it’s imprecise to say that “information you started with” is just another form of inductive bias, of which ”architecture” is another.
But at a certain point, the line between architecture and information is going to blur. As I’ve pointed out, a transformer without some of the explicit benefits of a CNN’s architecture can still structure itself in a way that learns shift invariance. I also don’t think any of this effects my key arguments.
Yes, I wasn’t sure if it was wise to use TSP as an example for that reason. Originally I wrote it using the Hamiltonian Path problem, but thought a non-technical reader would be more able to quickly understand TSP. Maybe that was a mistake. It also seems I may have underestimated how technical my audience would be.
But your point about heuristics is right. That’s basically what I think an AGI based on LLMs would do to figure out the world. However, I doubt there would be one heuristic which could do Solomonoff induction in all scenarios, or even most. Which means you’d have to select the right one, which means you’d need a selection criteria, which takes us back to my original points.
You're right that my points lack a certain rigor. I don't think there is a rigorous answer to questions like "what does slow mean?".
However, there is a recurring theme I've seen in discussions about AI where people express incredulity about neural networks as a method for AGI since they require so much "more data" than humans to train. My argument was merely that we should expect things to take a lot of data, and situations where they don't are illusory. Maybe that's less common in this space, so it I should have framed it differently. But I wrote this mostly to put it out there and get people's thoughts.
Also, I see your point about DNA only accounting for 1GB. I wasn't aware it was so low. I think it's interesting and suggests the possibility of smaller learning systems than I envisioned, but that's as much a question about compression as anything else. Don't forget that that DNA still needs to be "uncompressed" into a human, and at least some of that process is using information stored in the previous generation of human. Admittedly, it's not clear how much that last part accounts for, but there is evidence that part of a baby's development is determined by the biological state of the mother.
But I guess I would say my argument does rely on us not getting that <1GB of stuff, with the caveat that that 1GB is super highly compressed through a process that takes a very complex system to uncompress.
I should add as well that I definitely don't believe that LLMs are remotely efficient, and I wouldn't necessarily be surprised if humans are as close to the maximum on data efficiency as possible. I wouldn't be surprised if they weren't, either. But we were built over millions (billions?) of years under conditions that put a very high price tag on inefficiency, so it seems reasonable to believe our data efficiency is at least at some type of local minima.
EDIT: Another way to phrase the point about DNA: You need to account not just for the storage size of the DNA, but also the Kolmogrov complexity of turning that into a human. No idea if that adds a lot to its size, though.
In the context of his argument I think the claim is reasonable, since I interpreted it as the claim that, since it can be used a tool that designs plans, it has already overcome the biggest challenge of being an agent.
But if we take that claim out of context and interpret it literally, then I agree that it's not a justified statement per se. It may be able to simulate a plausible causal explanation, but I think that is very different from actually knowing it. As long as you only have access to partial information, there are theoretical limits to what you can know about the world. But it's hard to think of contexts where that gap would matter a lot.
I think there's definitely some truth to this sometimes, but I don't think you've correctly described the main driver of genius. I actually think it's the opposite: my guess is that there's a limit to thinking speed, and genius exists precisely because some people just have better thoughts. Even Von Neumann himself attributed much of his abilities to intuition. He would go to sleep and in the morning he would have the answer to whatever problem he was toiling over.
I think, instead, that ideas for the most part emerge through some deep and incomprehensible heuristics in our brains. Think about a chess master recognizing the next move at just a glance. However much training it took to give him that ability, he is not doing a tree search at that moment. It's not hard to imagine a hypothetical where his brain, with no training, came pre-configured to make the same decisions, and indeed I think that's more or less what happens with Chess prodigies. They don't come preconfigured, but their brains are better primed to develop those intuitions.
In other words, I think that genius is making better connections with the same number of "cycles", and I think there's evidence that LLMs do this too as they advance. For instance, part of the significance of DeepMind's Chinchilla paper was that by training longer they were able to get better performance in a smaller network. The only explanation for this is that the quality of the processing had improved enough to counteract the effects of the lost quantity.