Posts
A comparison of causal scrubbing, causal abstractions, and related methods
2023-06-08T23:40:34.475Z
Comments
Comment by
Egor Zverev on
Robustness of Contrast-Consistent Search to Adversarial Prompting ·
2023-11-03T13:22:12.345Z ·
LW ·
GW
Thanks for the post! I believe an interesting idea for future work here could be replacing manual engineering of suffixes with gradient-based / greedy search such as in https://arxiv.org/abs/2307.15043