Posts

A comparison of causal scrubbing, causal abstractions, and related methods 2023-06-08T23:40:34.475Z

Comments

Comment by Egor Zverev on Robustness of Contrast-Consistent Search to Adversarial Prompting · 2023-11-03T13:22:12.345Z · LW · GW

Thanks for the post! I believe an interesting idea for future work here could be replacing manual engineering of suffixes with gradient-based / greedy search such as in https://arxiv.org/abs/2307.15043