Posts
Comments
Comment by
daniel kornis (daniel-kornis) on
Sparse Autoencoders Work on Attention Layer Outputs ·
2024-01-16T19:54:38.784Z ·
LW ·
GW
If you used dropout during training, it might help to explain why redundancy of roles is so common, You are covering some features every training step.