0 comments
Comments sorted by top scores.
comment by Diziet · 2023-05-10T20:53:18.928Z · LW(p) · GW(p)
Couple of more takeaways I jotted down:
PaLM2 followed closely [to] Chinchilla optimal scaling. No explicit mention of number of parameters, data withheld. Claim performance is generally equivalent to GPT-4. Chain-of-thought reasoning is called out explicitly quite a bit.
Claims of longer context length, but no specific size in the technical report. From the api page: "75+ tokens per second and a context window of 8,000 tokens,"
"The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute" "The pre-training corpus is significantly larger than the corpus used to train PaLM [which was 780B tokens]"