AlphaGeometry: An Olympiad-level AI system for geometry
post by alyssavance · 2024-01-17T17:17:30.913Z · LW · GW · 9 commentsThis is a link post for https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/
Contents
Our AI system surpasses the state-of-the-art approach for geometry problems, advancing AI reasoning in mathematics None 9 comments
[Published today by DeepMind]
Our AI system surpasses the state-of-the-art approach for geometry problems, advancing AI reasoning in mathematics
Reflecting the Olympic spirit of ancient Greece, the International Mathematical Olympiad is a modern-day arena for the world's brightest high-school mathematicians. The competition not only showcases young talent, but has emerged as a testing ground for advanced AI systems in math and reasoning.
In a paper published today in Nature, we introduce AlphaGeometry, an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist - a breakthrough in AI performance. In a benchmarking test of 30 Olympiad geometry problems, AlphaGeometry solved 25 within the standard Olympiad time limit. For comparison, the previous state-of-the-art system solved 10 of these geometry problems, and the average human gold medalist solved 25.9 problems.
Links to the paper appear broken, but here is a link:
https://www.nature.com/articles/s41586-023-06747-5
Interesting that the transformer used is tiny. From the paper:
We use the Meliad library for transformer training with its base settings. The transformer has 12 layers, embedding dimension of 1,024, eight heads of attention and an inter-attention dense layer of dimension 4,096 with ReLU activation. Overall, the transformer has 151 million parameters, excluding embedding layers at its input and output heads. Our customized tokenizer is trained with ‘word’ mode using SentencePiece and has a vocabulary size of 757. We limit the maximum context length to 1,024 tokens and use T5-style relative position embedding. Sequence packing is also used because more than 90% of our sequences are under 200 in length.
9 comments
Comments sorted by top scores.
comment by ryan_greenblatt · 2024-01-17T17:44:41.392Z · LW(p) · GW(p)
Another interesting fact is that without any NN, but using the rest of approach from the paper, their method gets 18/30 correct. The NN boosts to 25/30. The prior SOTA was 10/30 (also without a NN).
So arguably about 1/2 of the action is just improvements in the non-AI components.
Replies from: Spencer Becker-Kahn, yair-halberstadt↑ comment by carboniferous_umbraculum (Spencer Becker-Kahn) · 2024-01-18T12:36:59.220Z · LW(p) · GW(p)
Hmm it might be questionable to suggest that it is "non-AI" though? It's based on symbolic and algebraic deduction engines and afaict it sounds like it might be the sort of thing that used to be very much mainstream "AI" i.e. symbolic AI + some hard-coded human heuristics?
Replies from: ryan_greenblatt↑ comment by ryan_greenblatt · 2024-01-18T18:46:03.631Z · LW(p) · GW(p)
Sure, just seems like a very non-central example of AI from the typical perspective of LW readers.
↑ comment by Yair Halberstadt (yair-halberstadt) · 2024-01-18T07:38:38.117Z · LW(p) · GW(p)
Interesting - how does the non AI portion work?
comment by Vladimir_Nesov · 2024-01-18T15:55:37.366Z · LW(p) · GW(p)
This is another example of how matching specialized human reasoning skill seems routinely feasible with search guided by 100M scale networks trained for a task a human would spend years mastering. These tasks seem specialized, but it's plausible all breadth of human activity can be covered with a reasonable number of such areas of specialization. What's currently missing is automation of formulation and training of systems specialized in any given skill.
The often touted surprisingly good human sample efficiency might just mean that when training is set up correctly, it's sufficient to train models of size comparable to the amount of external text data that a human might need to master a skill, rather than models of the size comparable to a human brain. This doesn't currently work for training systems that autonomously do research and produce new AI experiments and papers, and in practice the technology might take a very different route. But once it does work, surpassing human performance might fail to require millions of GPUs even for training.
comment by O O (o-o) · 2024-01-17T17:54:44.875Z · LW(p) · GW(p)
Is the imo grand challenge plausibly solved or solved soon?
Replies from: ryan_greenblatt↑ comment by ryan_greenblatt · 2024-01-17T18:07:25.620Z · LW(p) · GW(p)
I think geometry problems are well known to be very easy for machines relative to humans. See e.g. here [LW(p) · GW(p)]. So this doesn't seem like most of the difficulty.
Replies from: localdeity↑ comment by localdeity · 2024-01-17T19:57:12.166Z · LW(p) · GW(p)
Though perhaps it is a subset of humans:
Chapter 8: Geometry for Americans
We call this chapter "Geometry for Americans" instead of "Geometry for Dummies" so as not to offend. The sad truth is that most mathematically inclined Americans know very little geometry, in contrast to their luckier peers in Eastern Europe and Asia. But it is never too late to learn. Geometry is a particularly fun topic to study, because you are compelled to draw lots and lots of pictures.
(30% serious here)
comment by Lech Mazur (lechmazur) · 2024-04-14T05:01:40.236Z · LW(p) · GW(p)
https://arxiv.org/abs/2404.06405
"Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline strong enough to rival the performance of an IMO silver medalist. (ii) Wu's method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu's method we set a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist."