homehome Home chatchat Notifications


DeepMind AI Matches Top Students in Solving Math Olympiad Problems

DeepMind's AI achieves medal-level performance in the International Mathematical Olympiad

Tibi Puiu
July 29, 2024 @ 9:08 pm

share Share

Credit: DeepMind.

Google’s DeepMind announced that its AI system has matched the performance of some of the world’s top students in solving math problems at the prestigious International Mathematical Olympiad (IMO).

After mastering highly complex games like Go and various strategy board games, beating all human champions at each game, DeepMind’s AI has now tackled complex mathematical challenges. The London-based machine-learning company revealed that its AI had solved four out of six problems presented at this year’s IMO in Bath, UK.

The AI’s solutions were thoroughly evaluated by top mathematicians, earning a score of 28 out of 42 — just one point shy of a gold medal but still good enough to win a silver medal. Last week, 58 competitors took home gold at the competition, while 123 others took home silver.

A Milestone in AI

Joseph Myers, a mathematician from Cambridge, UK, told Nature that this achievement is “a very substantial advance.” Myers, alongside Fields Medal-winner Tim Gowers, vetted the AI’s solutions and helped select the original problems for the IMO.

The ultimate goal of math-solving AI is to have these machines solve some of the most challenging open questions in the field. Among these is the Riemann hypothesis, a problem relating to the distribution of prime numbers that has stood unsolved for nearly 160 years. A proof would not only win the $1 million reward that comes for solving one of the seven Millennium Prize Problems established by the Clay Mathematics Institute in 2000, but it could also have applications in predicting prime numbers, important in cryptography.

Illustration of Problem 4 from the Math Olympiad.
Illustration of Problem 4, which asks to prove the sum of ∠KIL and ∠XPY equals 180°. AlphaGeometry 2 proposed to construct E, a point on the line BI so that ∠AEB = 90°. Point E helps give purpose to the midpoint L of AB, creating many pairs of similar triangles such as ABE ~ YBI and ALE ~ IPC needed to prove the conclusion. Credit: DeepMind.

However, current AI systems are a far way from approaching these long-standing math research problems. DeepMind’s latest accomplishment shows there’s good progress toward attending the required capabilities for cracking tough math problems because the IMO has always been a benchmark for such AI advancements.

This year, DeepMind’s AlphaGeometry2 solved the geometry problem in under 20 seconds. For algebra and number theory problems, the team developed a new system called AlphaProof, which successfully solved three problems over three days. Meanwhile, human participants are offered only two sessions of 4.5 hours each. Despite this progress, AlphaProof could not solve the two combinatorics problems.

Combining AI Techniques for Better Results

Popular AI technologies people are familiar with, such as ChatGPT, are large language models (LLMs). While LLMs are somewhat capable of solving math, they’re very much hit or miss. If you ask an LLM to explain a mathematical concept to you, like how to multiply two matrices together, chances are it will tell you how to do it correctly. However, if you ask it to multiply two matrices together, chances are the answer will be wrong.

AlphaProof leverages a combination of language models and reinforcement learning, employing the successful AlphaZero engine previously used in games like Go, which won 499 out of 500 games against top competitors like Crazy Stone and Zen in single-machine matches.

This approach enables the AI to learn through trial-and-error. However, AlphaProof needs a framework that tells it when it makes an error or is on the wrong track. For this purpose, the researchers added another subsystem that can read and write proofs in a formal mathematical language known as Lean.

AI Problem-solving

Since not enough math problems were available, the team trained an additional network to translate a million problems written in natural language into Lean. Although many translations were imperfect, they provided a foundation for AlphaProof’s reinforcement-learning cycles.

IMO problems are highly challenging because they often require creative leaps to solve. You can’t just brute force these problems. According to Gowers, many IMO problems have a “magic-key” property, and once you figure out the right angle to tackle the problem it can become trivial to solve.

AlphaProof’s ability to find these “magic keys” suggests it can make creative leaps in problem-solving, reminiscent of the famous “move 37” made by DeepMind’s AlphaGo in 2016. AlphaGo landed a surprise on the right-hand side of the 19-by-19 board that flummoxed even the world’s best Go players, including Lee Sedol. “That’s a very strange move,” said one commentator during a live stream, himself a nine dan Go player, the highest rank there is. “I thought it was a mistake,” said the other. 

While DeepMind’s AI systems are proving to be highly competent in solving challenging math problems, the leap to solving research-level mathematical questions remains uncertain. Nevertheless, DeepMind’s AI has reached a point where it can solve problems that challenge the world’s best young mathematicians.

“We’re at the point where they can prove not open research problems, but at least problems that are very challenging to the very best young mathematicians in the world,” said DeepMind computer scientist David Silver, one of the architects of AlphaGo.

share Share

Lab-Grown Beef Now Has Real Muscle Fibers and It’s One Step Closer to Burgers With No Slaughter

In lab dishes, beef now grows thicker, stronger—and much more like the real thing.

Solid-State Batteries Charge in 3 Minutes, Offer Nearly Double the Range, and Never Catch Fire. So Why Aren't They In Your Phones and Cars Yet?

Solid state are miles ahead lithium-ion, but several breakthroughs are still needed before mass adoption.

An AI Ran a Vending Machine. It Ended Just How You'd Think It Would, But Worse

For a few surreal weeks, the dystopian future ran inside a mini-fridge in San Francisco.

Nearly 3,000 People Tried a Four-Day Workweek With No Pay Cut and the Results Were Great

Largest study of its kind finds fewer workdays make for healthier, happier, more productive employees.

This Disturbing Phone Case Gets Sunburned Like Real Skin to Teach You a Lesson

The creepiest phone case ever made could maybe one day save your life.

An AI Just Took Gold at the World’s Hardest Math Contest and It Wasn't Even Trained For It

Could a machine outthink the brightest young mathematicians on the planet?

AI Is Now Funny Enough to Make You Laugh. But Can It Ever Be Truly Humorous?

As people turn to AI for therapy and companionship, some say the models still need to learn the nuances of human humor.

Meta's New Bracelet Lets You Control Computers Directly

It's a completely new way to interact with computers.

Scientists Create a ‘Smart Sponge’ That Knows When to Heal and When to Fight Inflammation

This hydrogel could help millions of people lead a better life.

AI-designed autonomous underwater glider looks like a paper airplane and swims like a seal

An MIT-designed system lets AI evolve new shapes for ocean-exploring robots.