0 comments

Comments sorted by top scores.

comment by Diziet · 2023-05-10T20:53:18.928Z · LW(p) · GW(p)

Couple of more takeaways I jotted down:

PaLM2 followed closely [to] Chinchilla optimal scaling. No explicit mention of number of parameters, data withheld. Claim performance is generally equivalent to GPT-4. Chain-of-thought reasoning is called out explicitly quite a bit.

Claims of longer context length, but no specific size in the technical report. From the api page: "75+ tokens per second and a context window of 8,000 tokens,"

"The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute" "The pre-training corpus is significantly larger than the corpus used to train PaLM [which was 780B tokens]"