Posts

Comments

Comment by Mark Goodhead (mark-goodhead) on Basic facts about language models during training · 2023-02-25T07:05:26.352Z · LW · GW

Have you tried fitting a Student's t distribution? The nice thing about that distribution is the nu parameter completely controls the shape of the tails and is equivalent to the gaussian where nu is infinite; this would allow you to plot a cool graph of nu against checkpoint steps to get an easy visualisation of exactly how the shape of the tails changes over time.