An AI Just Took Gold at the World’s Hardest Math Contest and It Wasn't Even Trained For It

AI-generated image shared by Alexander Wei.

The International Math Olympiad (IMO) is a brainy battleground where the world’s most talented teenage mathematicians wrestle with devilishly difficult math problems. It’s long been considered a hotbed of exceptional human talent. But now, an experimental AI from OpenAI has solved five of the six problems, essentially earning a gold medal score.

You may be tempted to think this is owed to powerful, brute-force computation or searching through large mathematical databases. That’s not the case. These problems can’t be solved through raw calculation, and they’re made to force the solver to think outside the box. It’s exactly the kind of logical and creative reasoning we once thought was exclusive to the human mind; and the AI nailed it.

AI Can Do Some Real Thinking

Math Olympiad problems aren’t about plugging numbers into formulas. They’re more like complex obstacle courses that seem deceptively simple, but require several layers of cleverness and intuition. It’s not uncommon for participants to solve only a part of the problems, even when they find the right approach. Traditionally, large language models (like ChatGPT) struggled with this kind of task.

But that changed. An unreleased model from OpenAI earned 35 out of 42 points, placing it among the top ~10% of human contestants worldwide. That’s equivalent to a gold medal performance, the highest achievement in the IMO. For the AI, that’s a shift into new territory: sustained, multi-step, deductive reasoning at the highest level. In simple terms, the machine didn’t just learn math. It learned how to think about math.

Alexander Wei, a research scientist at OpenAI working on LLMs and reasoning, posted on X how this happened.

“We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.”

“In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold!”

This Was a General Model, not a Math Model

It gets even more impressive. This was a general-purpose large language model. This model, Wei says, wasn’t built just to solve Olympiad problems. It was trained more broadly, then scaled up in its ability to think carefully and compute wisely during problem-solving.

In 2021, Wei predicted that by 2025, AI might reach 30% accuracy on a math benchmark far easier than the IMO. That was considered bold at the time. It’s a reminder of how fast this field is moving. From playing chess to mastering Go, and now — cracking the world’s toughest math tests.

This is a big step toward machines that can make scientific discoveries, generate legal arguments, debug complex code, or explain physics to a child. And they can do that not because they memorized the answers, but because they understand the rules well enough to derive new ones. If this trend continues, it won’t be long until AIs start making stunning discoveries on their own, and potentially overhaul scientific research.

That’s powerful. And also… a little unsettling.

Even AI skeptics are taking note. Gary Marcus, a longtime critic of AI hype, called the performance “genuinely impressive,” while urging caution around questions of training, cost, and generalizability.

Despite the buzz, OpenAI isn’t releasing this model any time soon. GPT-5, the company’s next flagship model, is expected soon, but it won’t be the Olympiad champ. It’s unclear when or if this model will be released at all to the public.