Posts
Comments
Comment by
Eric Wallace (eric-wallace) on
SolidGoldMagikarp (plus, prompt generation) ·
2023-02-06T16:25:52.570Z ·
LW ·
GW
You also may want to checkout Universal Adversarial Triggers https://arxiv.org/abs/1908.07125, which is an academic paper from 2019 that does the same thing as the above, where they craft the optimal worst-case prompt to feed into a model. And then they use the prompt for analyzing GPT-2 and other models.
Comment by
Eric Wallace (eric-wallace) on
Adam Scherlis's Shortform ·
2023-02-04T00:59:00.412Z ·
LW ·
GW
This is cool! You may also be interested in Universal Triggers https://arxiv.org/abs/1908.07125. These are also short nonsense phrases that wreck havoc on a model.