Posts

Extracting SAE task features for in-context learning 2024-08-12T20:34:13.747Z
Self-explaining SAE features 2024-08-05T22:20:36.041Z

Comments

Comment by Dmitrii Kharlapenko (dmitrii-kharlapenko) on Self-explaining SAE features · 2024-08-06T13:56:43.849Z · LW · GW

Do you mean SAE encoder weights by input features? We did not look into them.

Comment by Dmitrii Kharlapenko (dmitrii-kharlapenko) on Self-explaining SAE features · 2024-08-06T13:55:07.530Z · LW · GW

Thanks! We did try to use it in the repeat setting to make the model produce more than a single token, but it did not work well.

And as far as I remember it also did not improve the meaning prompt much.