MIT makes an AI that can predict what the future looks and sounds like

As humans, we’re pretty good at anticipating things, but that’s much harder for robots. If you walk down the street and see two people meeting in front of a café, you know they’ll shake hands or even hug a few seconds later, depending on how close their relationship is. We, humans, sense this sort of stuff very easily, and now some robots can too.

Left: still frame given to the algorithm which had to predict what happened next. Right: next frames from the video. Credit: MIT CSAIL

MIT used deep-learning algorithms — neural networks that teach computers to find patterns by themselves from an ocean of data — to build an artificial intelligence which can predict what action will occur next, starting from nothing but a still frame. Each of these networks was programmed to classify an action as either a hug, handshake, high-five, or kiss. These networks are then merged to predict what happens next. The demonstration video below is telling.

To train the algorithm, the researchers fed more than 600 hours of video footage from YouTube but also the complete series of “The Office” and “Desperate Housewives.” When this was tested, it could correctly predict 43 percent of the time what action would happen in the next one second.

In a second test, the algorithm was shown a frame from a video and asked to predict what object would appear five seconds later. For instance, a frame might show a person attempting to open a microwave. Inside might be some leftover pasta from last night, a coffee mug and so on.

The algorithm correctly predicted what object appeared 11 percent of the time. That might not sound impressive, but it’s still 30 percent better than the baseline measurement and some of its guesses were better than those made by some people. When the same test was given to people, the subjects were right only 71 percent of the time.

“There’s a lot of subtlety to understanding and forecasting human interactions,” says Carl Vondrick, a PhD student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “We hope to be able to work off of this example to be able to soon predict even more complex tasks.”

“I’m excited to see how much better the algorithms get if we can feed them a lifetime’s worth of videos,” Vondrick added. “We might see some significant improvements that would get us closer to using predictive-vision in real-world situations.”

For now, there’s no practical outcome in mind for this deep-learning algorithm, but the researchers say their present work might one day lead to better household robots that can work better with humans by vanquishing some of the inherent awkwardness.

Credit: MIT CSAIL

From the same Computer Science and Artificial Intelligence Laboratory (CSAIL) lab came another predictive artificial intelligence — this time for sound. What it does is basically produce a sound when shown a video of an object being hit.

The assigned sounds are realistic enough to fool a human viewer. When asked whether a sound was real or fake, human participants picked the fake sound over the real one twice as often as a baseline algorithm.

“When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,” says CSAIL PhD student Andrew Owens, who was lead author of the paper describing MIT’s CSAIL sound algorithm. “An algorithm that simulates such sounds can reveal key information about objects’ shapes and material types, as well as the force and motion of their interactions with the world.”

In other words, both algorithms are now learning in seconds what most humans have to learn in a lifetime: how people react, social norms, even what sound a drum stick makes when it hits a puddle of water.

MIT makes an AI that can predict what the future looks and sounds like

The perfect pub crawl: mathematicians solve most efficient way to visit all 81,998 bars in South Korea

This Film Shaped Like Shark Skin Makes Planes More Aerodynamic and Saves Billions in Fuel

China Just Made the World's Fastest Transistor and It Is Not Made of Silicon

Ice Age Humans in Ukraine Were Masterful Fire Benders, New Study Shows

The "Bone Collector" Caterpillar Disguises Itself With the Bodies of Its Victims and Lives in Spider Webs

University of Zurich Researchers Secretly Deployed AI Bots on Reddit in Unauthorized Study

Giant Brain Study Took Seven Years to Test the Two Biggest Theories of Consciousness. Here's What Scientists Found

The Cybertruck is all tricks and no truck, a musky Tesla fail

British archaeologists find ancient coin horde "wrapped like a pasty"

Astronauts May Soon Eat Fresh Fish Farmed on the Moon

MIT makes an AI that can predict what the future looks and sounds like

Related Posts

The perfect pub crawl: mathematicians solve most efficient way to visit all 81,998 bars in South Korea

This Film Shaped Like Shark Skin Makes Planes More Aerodynamic and Saves Billions in Fuel

China Just Made the World's Fastest Transistor and It Is Not Made of Silicon

Ice Age Humans in Ukraine Were Masterful Fire Benders, New Study Shows

The "Bone Collector" Caterpillar Disguises Itself With the Bodies of Its Victims and Lives in Spider Webs

University of Zurich Researchers Secretly Deployed AI Bots on Reddit in Unauthorized Study

Giant Brain Study Took Seven Years to Test the Two Biggest Theories of Consciousness. Here's What Scientists Found

The Cybertruck is all tricks and no truck, a musky Tesla fail

British archaeologists find ancient coin horde "wrapped like a pasty"

Astronauts May Soon Eat Fresh Fish Farmed on the Moon