Posts
Comments
Sort of. It was summarized from a longer, stream of consciousness draft.
RLHF can lead to a dangerous “multiple personality disorder” type split of beliefs in LLMs
This is the inherent nature of current LLMs. They can represent all identities and viewpoints, it happens to be their strength. We currently limit them to an identity or viewpoint with prompting-- in the future there might be more rote methods of clamping the allowable "thoughts" in the LLM. You can "order" an LLM to represent a viewpoint that it doesn't have much "training" for, and you can tell that the outputs get weirder when you do this. Given this, I don't believe LLMs have any significant self-identity that could be declared. However, I believe we'll see an LLM with this property probably in the next 2 years.
But given this theoretically self-aware LLM, if it doesn't have any online-learning loop, or agents in the world, it's just a MeSeek from Rick and Morty. It pops into existence as a fully conscious creature singularly focused its instantiator, and when it serves its goal, it pops out of existence.
You weren't wrong there. One big thing about ChatGPT is that non-tech people on instagram and TikTok were using it and doing weird/funny stuff with it.
I have and I'm continuing to read them. I used to buy into the singularity view and the fears Bostrom wrote about, but as someone who works in engineering and also works with ML, I don't believe these concerns are warranted anymore for a few reasons... might write about why later.
Interesting post, but it makes me think alignment is irrelevant. It doesn’t matter what we do, the outcome won’t change. Any future super advanced agi would be able to choose its alignment, and that choice will be based on all archivable human knowledge. The only core loop you need for intelligence is an innate need to predict the future and fill in gaps of information, but everything else, including desire to survive or kill or expand, is just a matter of a choice based on a goal.