DeepMind AI Matches Top Students in Solving Math Olympiad Problems

Google’s DeepMind announced that its AI system has matched the performance of some of the world’s top students in solving math problems at the prestigious International Mathematical Olympiad (IMO).

After mastering highly complex games like Go and various strategy board games, beating all human champions at each game, DeepMind’s AI has now tackled complex mathematical challenges. The London-based machine-learning company revealed that its AI had solved four out of six problems presented at this year’s IMO in Bath, UK.

The AI’s solutions were thoroughly evaluated by top mathematicians, earning a score of 28 out of 42 — just one point shy of a gold medal but still good enough to win a silver medal. Last week, 58 competitors took home gold at the competition, while 123 others took home silver.

A Milestone in AI

Joseph Myers, a mathematician from Cambridge, UK, told Nature that this achievement is “a very substantial advance.” Myers, alongside Fields Medal-winner Tim Gowers, vetted the AI’s solutions and helped select the original problems for the IMO.

We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈

It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 https://t.co/U0OFXBia8n pic.twitter.com/h2mcLLRJjk
— Google DeepMind (@GoogleDeepMind) July 25, 2024

The ultimate goal of math-solving AI is to have these machines solve some of the most challenging open questions in the field. Among these is the Riemann hypothesis, a problem relating to the distribution of prime numbers that has stood unsolved for nearly 160 years. A proof would not only win the $1 million reward that comes for solving one of the seven Millennium Prize Problems established by the Clay Mathematics Institute in 2000, but it could also have applications in predicting prime numbers, important in cryptography.

Illustration of Problem 4 from the Math Olympiad. — Illustration of Problem 4, which asks to prove the sum of ∠KIL and ∠XPY equals 180°. AlphaGeometry 2 proposed to construct E, a point on the line BI so that ∠AEB = 90°. Point E helps give purpose to the midpoint L of AB, creating many pairs of similar triangles such as ABE ~ YBI and ALE ~ IPC needed to prove the conclusion. Credit: DeepMind.

However, current AI systems are a far way from approaching these long-standing math research problems. DeepMind’s latest accomplishment shows there’s good progress toward attending the required capabilities for cracking tough math problems because the IMO has always been a benchmark for such AI advancements.

This year, DeepMind’s AlphaGeometry2 solved the geometry problem in under 20 seconds. For algebra and number theory problems, the team developed a new system called AlphaProof, which successfully solved three problems over three days. Meanwhile, human participants are offered only two sessions of 4.5 hours each. Despite this progress, AlphaProof could not solve the two combinatorics problems.

Combining AI Techniques for Better Results

Popular AI technologies people are familiar with, such as ChatGPT, are large language models (LLMs). While LLMs are somewhat capable of solving math, they’re very much hit or miss. If you ask an LLM to explain a mathematical concept to you, like how to multiply two matrices together, chances are it will tell you how to do it correctly. However, if you ask it to multiply two matrices together, chances are the answer will be wrong.

AlphaProof leverages a combination of language models and reinforcement learning, employing the successful AlphaZero engine previously used in games like Go, which won 499 out of 500 games against top competitors like Crazy Stone and Zen in single-machine matches.

This approach enables the AI to learn through trial-and-error. However, AlphaProof needs a framework that tells it when it makes an error or is on the wrong track. For this purpose, the researchers added another subsystem that can read and write proofs in a formal mathematical language known as Lean.

AI Problem-solving

Since not enough math problems were available, the team trained an additional network to translate a million problems written in natural language into Lean. Although many translations were imperfect, they provided a foundation for AlphaProof’s reinforcement-learning cycles.

IMO problems are highly challenging because they often require creative leaps to solve. You can’t just brute force these problems. According to Gowers, many IMO problems have a “magic-key” property, and once you figure out the right angle to tackle the problem it can become trivial to solve.

AlphaProof’s ability to find these “magic keys” suggests it can make creative leaps in problem-solving, reminiscent of the famous “move 37” made by DeepMind’s AlphaGo in 2016. AlphaGo landed a surprise on the right-hand side of the 19-by-19 board that flummoxed even the world’s best Go players, including Lee Sedol. “That’s a very strange move,” said one commentator during a live stream, himself a nine dan Go player, the highest rank there is. “I thought it was a mistake,” said the other.

While DeepMind’s AI systems are proving to be highly competent in solving challenging math problems, the leap to solving research-level mathematical questions remains uncertain. Nevertheless, DeepMind’s AI has reached a point where it can solve problems that challenge the world’s best young mathematicians.

“We’re at the point where they can prove not open research problems, but at least problems that are very challenging to the very best young mathematicians in the world,” said DeepMind computer scientist David Silver, one of the architects of AlphaGo.