Posts
Comments
Comment by
Mark Goodhead (mark-goodhead) on
Basic facts about language models during training ·
2023-02-25T07:05:26.352Z ·
LW ·
GW
Have you tried fitting a Student's t distribution? The nice thing about that distribution is the nu parameter completely controls the shape of the tails and is equivalent to the gaussian where nu is infinite; this would allow you to plot a cool graph of nu against checkpoint steps to get an easy visualisation of exactly how the shape of the tails changes over time.