Posts
Evolutionary prompt optimization for SAE feature visualization
2024-11-14T13:06:49.728Z
SAE features for refusal and sycophancy steering vectors
2024-10-12T14:54:48.022Z
Extracting SAE task features for in-context learning
2024-08-12T20:34:13.747Z
Self-explaining SAE features
2024-08-05T22:20:36.041Z
Comments
Comment by
Dmitrii Kharlapenko (dmitrii-kharlapenko) on
Self-explaining SAE features ·
2024-08-06T13:56:43.849Z ·
LW ·
GW
Do you mean SAE encoder weights by input features? We did not look into them.
Comment by
Dmitrii Kharlapenko (dmitrii-kharlapenko) on
Self-explaining SAE features ·
2024-08-06T13:55:07.530Z ·
LW ·
GW
Thanks! We did try to use it in the repeat setting to make the model produce more than a single token, but it did not work well.
And as far as I remember it also did not improve the meaning prompt much.