homehome Home chatchat Notifications


DeepMind AI Matches Top Students in Solving Math Olympiad Problems

DeepMind's AI achieves medal-level performance in the International Mathematical Olympiad

Tibi Puiu
July 29, 2024 @ 9:08 pm

share Share

Credit: DeepMind.

Google’s DeepMind announced that its AI system has matched the performance of some of the world’s top students in solving math problems at the prestigious International Mathematical Olympiad (IMO).

After mastering highly complex games like Go and various strategy board games, beating all human champions at each game, DeepMind’s AI has now tackled complex mathematical challenges. The London-based machine-learning company revealed that its AI had solved four out of six problems presented at this year’s IMO in Bath, UK.

The AI’s solutions were thoroughly evaluated by top mathematicians, earning a score of 28 out of 42 — just one point shy of a gold medal but still good enough to win a silver medal. Last week, 58 competitors took home gold at the competition, while 123 others took home silver.

A Milestone in AI

Joseph Myers, a mathematician from Cambridge, UK, told Nature that this achievement is “a very substantial advance.” Myers, alongside Fields Medal-winner Tim Gowers, vetted the AI’s solutions and helped select the original problems for the IMO.

The ultimate goal of math-solving AI is to have these machines solve some of the most challenging open questions in the field. Among these is the Riemann hypothesis, a problem relating to the distribution of prime numbers that has stood unsolved for nearly 160 years. A proof would not only win the $1 million reward that comes for solving one of the seven Millennium Prize Problems established by the Clay Mathematics Institute in 2000, but it could also have applications in predicting prime numbers, important in cryptography.

Illustration of Problem 4 from the Math Olympiad.
Illustration of Problem 4, which asks to prove the sum of ∠KIL and ∠XPY equals 180°. AlphaGeometry 2 proposed to construct E, a point on the line BI so that ∠AEB = 90°. Point E helps give purpose to the midpoint L of AB, creating many pairs of similar triangles such as ABE ~ YBI and ALE ~ IPC needed to prove the conclusion. Credit: DeepMind.

However, current AI systems are a far way from approaching these long-standing math research problems. DeepMind’s latest accomplishment shows there’s good progress toward attending the required capabilities for cracking tough math problems because the IMO has always been a benchmark for such AI advancements.

This year, DeepMind’s AlphaGeometry2 solved the geometry problem in under 20 seconds. For algebra and number theory problems, the team developed a new system called AlphaProof, which successfully solved three problems over three days. Meanwhile, human participants are offered only two sessions of 4.5 hours each. Despite this progress, AlphaProof could not solve the two combinatorics problems.

Combining AI Techniques for Better Results

Popular AI technologies people are familiar with, such as ChatGPT, are large language models (LLMs). While LLMs are somewhat capable of solving math, they’re very much hit or miss. If you ask an LLM to explain a mathematical concept to you, like how to multiply two matrices together, chances are it will tell you how to do it correctly. However, if you ask it to multiply two matrices together, chances are the answer will be wrong.

AlphaProof leverages a combination of language models and reinforcement learning, employing the successful AlphaZero engine previously used in games like Go, which won 499 out of 500 games against top competitors like Crazy Stone and Zen in single-machine matches.

This approach enables the AI to learn through trial-and-error. However, AlphaProof needs a framework that tells it when it makes an error or is on the wrong track. For this purpose, the researchers added another subsystem that can read and write proofs in a formal mathematical language known as Lean.

AI Problem-solving

Since not enough math problems were available, the team trained an additional network to translate a million problems written in natural language into Lean. Although many translations were imperfect, they provided a foundation for AlphaProof’s reinforcement-learning cycles.

IMO problems are highly challenging because they often require creative leaps to solve. You can’t just brute force these problems. According to Gowers, many IMO problems have a “magic-key” property, and once you figure out the right angle to tackle the problem it can become trivial to solve.

AlphaProof’s ability to find these “magic keys” suggests it can make creative leaps in problem-solving, reminiscent of the famous “move 37” made by DeepMind’s AlphaGo in 2016. AlphaGo landed a surprise on the right-hand side of the 19-by-19 board that flummoxed even the world’s best Go players, including Lee Sedol. “That’s a very strange move,” said one commentator during a live stream, himself a nine dan Go player, the highest rank there is. “I thought it was a mistake,” said the other. 

While DeepMind’s AI systems are proving to be highly competent in solving challenging math problems, the leap to solving research-level mathematical questions remains uncertain. Nevertheless, DeepMind’s AI has reached a point where it can solve problems that challenge the world’s best young mathematicians.

“We’re at the point where they can prove not open research problems, but at least problems that are very challenging to the very best young mathematicians in the world,” said DeepMind computer scientist David Silver, one of the architects of AlphaGo.

share Share

Denmark could become the first country to ban deepfakes

Denmark hopes to pass a law prohibiting publishing deepfakes without the subject's consent.

The perfect pub crawl: mathematicians solve most efficient way to visit all 81,998 bars in South Korea

This is the longest pub crawl ever solved by scientists.

This Film Shaped Like Shark Skin Makes Planes More Aerodynamic and Saves Billions in Fuel

Mimicking shark skin may help aviation shed fuel—and carbon

China Just Made the World's Fastest Transistor and It Is Not Made of Silicon

The new transistor runs 40% faster and uses less power.

University of Zurich Researchers Secretly Deployed AI Bots on Reddit in Unauthorized Study

The revelation has sparked outrage across the internet.

Teen Influencer Watches Her Bionic Hand Crawl Across a Table on Its Own

The future of prosthetics is no longer science fiction.

Meet the Indian Teen Who Can Add 100 Numbers in 30 Second and Broke 6 Guinness World Records for Mental Math

The Indian teenager is officially the world's fastest "human calculator".

NASA Captured a Supersonic Jet Breaking the Sound Barrier and the Image Is Unreal

The coolest thing about this flight is that there was no sonic boom.

Fully Driverless Trucks Hit Texas Highways (This Time With No Human Oversight)

Driverless trucks will haul freight in Texas without a human behind the wheel.

A Woman Asked ChatGPT for a Palm Reading and It Flagged a Mole That Might Be Cancer

A viral TikTok recounts the story of a young woman who turned to ChatGPT for love advice but received an unsolicited medical advice instead.