Arthur Conmy's Shortform

post by Arthur Conmy (arthur-conmy) · 2022-11-01T21:35:29.449Z · LW · GW · 1 comments

1 comments

Comments sorted by top scores.

comment by Arthur Conmy (arthur-conmy) · 2022-11-01T21:35:29.950Z · LW(p) · GW(p)


Has anyone done any reproduction of double descent [https://openai.com/blog/deep-double-descent/] on the transformers they train (or better, GPT-like transformers)? Since grokking can be somewhat understood by transformer interpretability [https://openreview.net/forum?id=9XFSbDPmdW], this seems like a possibly tractable direction