What are the most important papers/post/resources to read to understand more of GPT-3?

post by adamShimi · 2020-08-02T20:53:30.913Z · score: 25 (12 votes) · LW · GW · No comments

This is a question post.


    13 Peter Jin
    7 Juraj Vitko
No comments

I'm way more used to thinking about weird maths or distributed algorithms or abstract philosophical problems than about concrete machine learning architectures. But based on everything I see about GPT-3, it seems a nice idea to learn more about it, even if only for participating in the discussion without spouting non-sense.

So I'm asking for what you think are the must-reads on GPT-3 specifically, and maybe any requirement to understand them.


answer by Peter Jin · 2020-08-03T01:26:49.643Z · score: 13 (8 votes) · LW(p) · GW(p)

nostalgebraist's blog is a must-read regarding GPT-x, including GPT-3. Perhaps, start here ("the transformer... 'explained'?"), which helps to contextualize GPT-x within the history of machine learning.

(Though, I should note that nostalgebraist holds a contrarian "bearish" position on GPT-3 in particular; for the "bullish" case instead, read Gwern.)

comment by adamShimi · 2020-08-03T18:39:01.546Z · score: 3 (2 votes) · LW(p) · GW(p)

Thanks for the answer! I knew about the "transformer explained" post, but I was not aware of its author's position on GPT-3.

answer by Juraj Vitko · 2020-08-04T10:51:54.248Z · score: 7 (3 votes) · LW(p) · GW(p)

Here's a list of resources that may be of use to you. The GPT-3 paper isn't too specific on implementation details because the changes that led to it were rather incremental (especially from GPT-2, and more so the farther back we look at the Transformer lineage). So the scope to understand GPT-3 is broader than one might expect.

comment by adamShimi · 2020-08-10T15:03:22.134Z · score: 1 (1 votes) · LW(p) · GW(p)

Thanks! I'll try to read that.

No comments

Comments sorted by top scores.