Arthur Conmy's Shortform
post by Arthur Conmy (arthur-conmy) · 2022-11-01T21:35:29.449Z · LW · GW · 1 commentsContents
1 comment
1 comments
Comments sorted by top scores.
comment by Arthur Conmy (arthur-conmy) · 2022-11-01T21:35:29.950Z · LW(p) · GW(p)
Has anyone done any reproduction of double descent [https://openai.com/blog/deep-double-descent/] on the transformers they train (or better, GPT-like transformers)? Since grokking can be somewhat understood by transformer interpretability [https://openreview.net/forum?id=9XFSbDPmdW], this seems like a possibly tractable direction