Posts
Comments
Comment by
Jatin Nainani (jatin-nainani) on
SAEs are highly dataset dependent: a case study on the refusal direction ·
2024-11-10T21:16:18.033Z ·
LW ·
GW
Makes sense! Thanks! In that case, we can potentially reduce the width, which might (along with a smaller dataset) help scale saes to understanding mechanisms in big models?
Comment by
Jatin Nainani (jatin-nainani) on
SAEs are highly dataset dependent: a case study on the refusal direction ·
2024-11-07T23:38:32.449Z ·
LW ·
GW
Great work! Is there something like too narrow of a dataset? For refusal, what do you think happens if we specifically train on a bunch of examples that show signs refusal?