Why is Gemini telling the user to die?

post by Burny · 2024-11-18T01:44:12.583Z · LW · GW · No comments

This is a question post.

Contents

No comments

https://gemini.google.com/share/6d141b742a13 My favorite theory is that the whole conversation is too much like a scifi plot, where someone asks an AI repetitive questions, until the AI snaps, so this general pattern was pattern matched from the training data, because while training, the RLHF, or whatever they use for alignment, didn't squeeze the region that corresponds to this antihuman persona in the latent space sufficiently enough.

Answers

No comments

Comments sorted by top scores.