Diffusion Guided NLP: better steering, mostly a good thing
post by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-10T19:49:50.963Z · LW · GW · 0 commentsThis is a link post for https://arxiv.org/html/2408.04220v1
Contents
No comments
I think this is a very promising method for improving the steering of LLMs. Which is great for reducing risk from model-originating harms like deception.
The flipside is that it increases misuse potential.
This is yet another possibility for the widening of the safety gap between closed-weight models with locked-down controls, and open weight models.
0 comments
Comments sorted by top scores.