Posts
Comments
Comment by
Erik Garrison (erik-garrison) on
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream ·
2024-09-08T04:50:12.126Z ·
LW ·
GW
Could this affect distributed training that might make the assumption of rotational invariance?