What are the most important papers/post/resources to read to understand more of GPT-3?
post by adamShimi · 2020-08-02T20:53:30.913Z · LW · GW · No commentsThis is a question post.
Contents
Answers 13 Peter Jin 6 Juraj Vitko None No comments
I'm way more used to thinking about weird maths or distributed algorithms or abstract philosophical problems than about concrete machine learning architectures. But based on everything I see about GPT-3, it seems a nice idea to learn more about it, even if only for participating in the discussion without spouting non-sense.
So I'm asking for what you think are the must-reads on GPT-3 specifically, and maybe any requirement to understand them.
Answers
nostalgebraist's blog is a must-read regarding GPT-x, including GPT-3. Perhaps, start here ("the transformer... 'explained'?"), which helps to contextualize GPT-x within the history of machine learning.
(Though, I should note that nostalgebraist holds a contrarian "bearish" position on GPT-3 in particular; for the "bullish" case instead, read Gwern.)
Here's a list of resources that may be of use to you. The GPT-3 paper isn't too specific on implementation details because the changes that led to it were rather incremental (especially from GPT-2, and more so the farther back we look at the Transformer lineage). So the scope to understand GPT-3 is broader than one might expect.
- https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/nlp/01_Exploring_Word_Embeddings.ipynb
- http://www.peterbloem.nl/blog/transformers
- http://jalammar.github.io/illustrated-transformer/
- https://amaarora.github.io/2020/02/18/annotatedGPT2.html
- http://jalammar.github.io/illustrated-gpt2/
- http://jalammar.github.io/how-gpt3-works-visualizations-animations/
- https://arxiv.org/pdf/1409.0473.pdf Attention (initial)
- https://arxiv.org/pdf/1706.03762.pdf Attention Is All You Need
- http://nlp.seas.harvard.edu/2018/04/03/attention.html (annotated)
- https://www.arxiv-vanity.com/papers/1904.02679/ Visualizing Attention
- https://stats.stackexchange.com/questions/421935/what-exactly-are-keys-queries-and-values-in-attention-mechanisms
- https://arxiv.org/pdf/1807.03819.pdf Universal Transformers
- https://arxiv.org/pdf/2007.14062.pdf Big Bird (see appendices)
- https://www.reddit.com/r/MachineLearning/comments/hxvts0/d_breaking_the_quadratic_attention_bottleneck_in/
- https://www.tensorflow.org/tutorials/text/transformer
- https://www.tensorflow.org/tutorials/text/nmt_with_attention
- https://cdn.openai.com/blocksparse/blocksparsepaper.pdf
- https://openai.com/blog/block-sparse-gpu-kernels/
- https://github.com/pbloem/former/blob/master/former/transformers.py
- https://github.com/openai/blocksparse/blob/master/examples/transformer/enwik8.py
- https://github.com/google/trax/blob/master/trax/models/transformer.py
- https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_gpt2.py
No comments
Comments sorted by top scores.