Posts

Comments

Comment by Rusins (raitis-krikis-rusins) on Architects of Our Own Demise: We Should Stop Developing AI · 2023-10-30T00:18:58.880Z · LW · GW

Unfortunately I do not know the reasoning behind why the people you mentioned might not see AI as a threat, but if I had to guess – people not worried are primarily thinking about short term AI safety risks like disinformation from deepfakes, and people worried are thinking about super-intelligent AGI and instrumental convergence, which necessitates solving the alignment problem.

Comment by Rusins (raitis-krikis-rusins) on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-25T00:08:00.207Z · LW · GW

the loss function does not require the model to be an agent.

 

What worries me is that models that happen to have a secret homunculus and behave as agents would score higher than those models which do not. For example, the model could reason about itself being a computer program, and find an exploit of the physical system it is running on to extract the text it's supposed to predict in a particular training example, and output the correct answer and get a perfect score. (Or more realistically, a slightly modified version of the correct answer, to not alert the people observing its training.)

The question of whether or not LLMs like GPT-4 have a homunculus inside them is truly fascinating though. Makes me wonder if it would be possible to trick it into revealing itself by giving it the right prompt, and how to differentiate it from just pretending to be an agent. The fact that we have not observed even a dumb homunculus in less intelligent models really does surprise me. If such a thing does appear as an emergent property in larger models, I sure hope it starts out dumb and reveals itself so that we can catch it and take a moment to pause and reevaluate our trajectory.