Diffusion Guided NLP: better steering, mostly a good thing

post by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-10T19:49:50.963Z · LW · GW · 0 comments

This is a link post for https://arxiv.org/html/2408.04220v1

Contents

No comments

I think this is a very promising method for improving the steering of LLMs. Which is great for reducing risk from model-originating harms like deception.

The flipside is that it increases misuse potential.

This is yet another possibility for the widening of the safety gap between closed-weight models with locked-down controls, and open weight models.

0 comments

Comments sorted by top scores.