Posts

Comments

Comment by daniel kornis (daniel-kornis) on Sparse Autoencoders Work on Attention Layer Outputs · 2024-01-16T19:54:38.784Z · LW · GW

If you used dropout during training, it might help to explain why redundancy of roles is so common, You are covering some features every training step.