homehome Home chatchat Notifications


Google's MuZero chess AI reached superhuman performance without even knowing the rules

This gives it a surprisingly human-like intuition.

Mihai Andrei
October 8, 2021 @ 10:06 pm

share Share

Artificial Intelligence is becoming more and more intelligent — and more and more human-like.

Image credits: DeepMind

A lot of things have changed in modern chess compared to the past, but the most important change is the hegemony of computers. Take Magnus Carlsen — who, over the past decade, has been the uncontested world chess champion — he can’t really claim to be the best chess player, only the best human player.

Chess algorithms have long surpassed the human ability to play the game, for a very simple reason: they can memorize and calculate simple tasks far better than we can. But when AI’s started entering the scene, chess algo’s were also in for a revolution.

Traditionally, chess algorithms were trained in a very straightforward way: they were taught the rules of the game, fed a huge database of games, taught how to calculate, and off they went. But Google’s AlphaZero, for instance, takes a very different approach.AlphaZero has become, arguably, the best chess-playing entity in the world without studying a single human game. Instead, it was only taught the rules of the game and allowed to play against itself over and over. Intriguingly, this not only enabled it to achieve remarkable prowess, but also to develop a style of its own. Unlike traditional algorithms which play very concrete, grinding type of games, AlphaZero tends to play in a very conceptual and creative way (though the word ‘creative’ will surely annoy some readers). For instance, AlphaZero would often sacrifice a piece with no immediate reward in sight — it itself doesn’t necessarily calculate all the outcomes. Instead of playing moves that it can fully calculate to be better, which is what most algorithms do, AlphaZero plays moves that seem better.

It’s a surprisingly human way to approach the game, although many of AlphaZero’s moves seem distinctly inhuman.

Now, Google’s researchers have taken things to the next level with MuZero.

Unlike AlphaZero, MuZero wasn’t even told the rules of chess. It wasn’t allowed to make any illegal moves, but it was allowed to ponder them. This allows the algorithm to think in a more human way, considering threats and possibilities even when they might not be apparent or possible at a given time. For instance, the threat of losing an exposed piece might always be present in the back of a human player’s mind, even though it is not threatened at the moment.

Researchers say that this also allows MuZero to develop an internal intuition regarding the rules of the game.

The Elo evaluation of MuZero throughout training in chess, shogi, Go, and Atari. Image Credit: DeepMind

This led to remarkably good performances. Although the details that researchers presented are sparse, they claim that MuZero achieved the same performance as AlphaZero. But it gets even better.

Researchers didn’t only train the engine in chess, they also trained it in go, shogi, and 57 Atari games commonly used in this sort of study.

The most impressive results came from Go, a game that is unfathomably more complex than chess. MuZero slightly exceeded the performance of AlphaZero despite using less overall computation, which seems to indicate that MuZero has a deeper understanding of the game and the positions it was playing. Similar performances were reported in the Atari games, where MuZero outperformed state-of-the-art engines in 42 out of 57 games.

Of course, there is much more to this than just chess, Go, or PacMan. There are very concrete lessons that can be applied in artificial intelligence in a very practical setting.

“Many of the breakthroughs in artificial intelligence have been based on either high-performance planning,” wrote the researchers. “In this paper we have introduced a method that combines the benefits of both approaches. Our algorithm, MuZero, has both matched the superhuman performance of high-performance planning algorithms in their favored domains — logically complex board games such as chess and Go — and outperformed state-of-the-art model-free [reinforcement learning] algorithms in their favored domains — visually complex Atari games.”

The study can be read in a preprint on ArXiv.

share Share

Scientists Say Junk Food Might Be as Addictive as Drugs

This is especially hurtful for kids.

A New AI Can Spot You by How Your Body Bends a Wi-Fi Signal

You don’t need a phone or camera to be tracked anymore: just wi-fi.

Golden Oyster Mushroom Are Invasive in the US. They're Now Wreaking Havoc in Forests

Golden oyster mushrooms, with their sunny yellow caps and nutty flavor, have become wildly popular for being healthy, delicious and easy to grow at home from mushroom kits. But this food craze has also unleashed an invasive species into the wild, and new research shows it’s pushing out native fungi. In a study we believe […]

The World’s Most "Useless" Inventions (That Are Actually Pretty Useful)

Every year, the Ig Nobel Prize is awarded to ten lucky winners. To qualify, you need to publish research in a peer-reviewed journal that is considered "improbable": studies that make people laugh and think at the same time.

This Ancient Greek City Was Swallowed by the Sea—and Yet Refused to Die

A 3,000-year record of resilience, adaptation, and seismic survival

Low testosterone isn't killing your libido. Sugar is

Small increases in blood sugar can affect sperm and sex, even without diabetes

NASA’s Parker Solar Probe Just Flew Closer to the Sun Than Ever Before and the Footage is Breathtaking

Closest-ever solar images offer new insights into Earth-threatening space weather.

The Oldest Dog Breed's DNA Reveals How Humans Conquered the Arctic — and You’ve Probably Never Heard of It

Qimmeq dogs have pulled Inuit sleds for 1,000 years — now, they need help to survive.

A Common DNA Sugar Just Matched Minoxidil in Hair Regrowth Tests on Mice

Is the future of hair regrowth hidden in 2-deoxy-D-ribose?

Your Personal Air Defense System Is Here and It’s Built to Vaporize Up to 30 Mosquitoes per Second with Lasers

LiDAR-guided Photon Matrix claims to fell 30 mosquitoes a second, but questions remain.