Posts

Comments

Comment by JD on How does GPT-3 spend its 175B parameters? · 2024-09-12T03:26:29.851Z · LW · GW

(Re: open question one)

the GPT-3 table must have a typo in the number of parameters or the size hyperparameters. Am I wrong or is that a typo in the GPT-3 paper?

I independently suspect the table is erroneous for GPT-3 XL. When I ran the numbers, I concluded it was most likely n_heads = 16 (and not 24 as listed). I believe that is the only single adjustment which makes n_heads * d_head = d_model while remaining consistent with n_params.