Posts
Comments
Comment by
joswald (johannes-oswald) on
Paper: Transformers learn in-context by gradient descent ·
2022-12-17T09:48:52.662Z ·
LW ·
GW
Hi there - I am the first author! Thanks for this very nice write up. Regarding: "mechanistically understand the inner workings of optimized Transformers that learn in-context" - its definitely fair to say that we do this (only) for self-attention only Transformers! Also, I try to be more careful and (hopefully consistently) only claim this for our simple problems studied... working on v2 including language experiments and I am also trying to find a way how to verify the hypotheses in pretrained models. Thanks again!