Posts

Comments

Comment by Erik Garrison (erik-garrison) on Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream · 2024-09-08T04:50:12.126Z · LW · GW

Could this affect distributed training that might make the assumption of rotational invariance?