[Link] PaLM 2 Technical Report

post by marc/er · 2023-05-10T20:28:16.060Z · LW · GW · 1 comments

This is a link post for https://ai.google/static/documents/palm2techreport.pdf

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

PaLM 2 outperforms PaLM across all datasets and achieves results competitive with GPT-4.

Estimated optimal parameter size at a given number of FLOPs.

Pass rates for PaLM and PaLM-2 experiments on BabelCode.

PaLM 2 outperforms PaLM across all exams and achieves a passing grade for every
language, demonstrating language proficiency across all evaluated languages.


See here for a higher-level overview: https://ai.google/discover/palm2 


Comments sorted by top scores.

comment by Diziet · 2023-05-10T20:53:18.928Z · LW(p) · GW(p)

Couple of more takeaways I jotted down:

PaLM2 followed closely [to] Chinchilla optimal scaling. No explicit mention of number of parameters, data withheld. Claim performance is generally equivalent to GPT-4. Chain-of-thought reasoning is called out explicitly quite a bit.

Claims of longer context length, but no specific size in the technical report. From the api page: "75+ tokens per second and a context window of 8,000 tokens,"

"The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute" "The pre-training corpus is significantly larger than the corpus used to train PaLM [which was 780B tokens]"