Posts
Comments
Comment by
JD on
How does GPT-3 spend its 175B parameters?
·
2024-09-12T03:26:29.851Z ·
LW ·
GW
(Re: open question one)
the GPT-3 table must have a typo in the number of parameters or the size hyperparameters. Am I wrong or is that a typo in the GPT-3 paper?
I independently suspect the table is erroneous for GPT-3 XL. When I ran the numbers, I concluded it was most likely n_heads = 16 (and not 24 as listed). I believe that is the only single adjustment which makes n_heads * d_head = d_model while remaining consistent with n_params.