Quantcast
ZME Science
  • News
  • Environment
  • Health
  • Future
  • Space
  • Features
  • More
    • About
    • The Team
    • Advertise
    • Contribute
    • Our stance on climate change
    • Privacy Policy
    • Contact
No Result
View All Result
ZME Science

No Result
View All Result
ZME Science
No Result
View All Result
Home Science News

MIT makes an AI that can predict what the future looks and sounds like

Artificial intelligence is learning in seconds what took humans a lifetime to master.

Tibi Puiu by Tibi Puiu
June 21, 2016
in News, Technology
Reading Time: 3 mins read
A A
Share on FacebookShare on TwitterSubmit to Reddit

As humans, we’re pretty good at anticipating things, but that’s much harder for robots. If you walk down the street and see two people meeting in front of a café, you know they’ll shake hands or even hug a few seconds later, depending on how close their relationship is. We, humans, sense this sort of stuff very easily, and now some robots can too.

Left: still frame given to the algorithm which had to predict what happened next. Right: next frames from the video. Credit: MIT CSAIL
Left: still frame given to the algorithm which had to predict what happened next. Right: next frames from the video. Credit: MIT CSAIL

MIT used deep-learning algorithms — neural networks that teach computers to find patterns by themselves from an ocean of data — to build an artificial intelligence which can predict what action will occur next, starting from nothing but a still frame. Each of these networks was programmed to classify an action as either a hug, handshake, high-five, or kiss. These networks are then merged to predict what happens next. The demonstration video below is telling.

ADVERTISEMENT

To train the algorithm, the researchers fed more than 600 hours of video footage from YouTube but also the complete series of “The Office” and “Desperate Housewives.” When this was tested, it could correctly predict 43 percent of the time what action would happen in the next one second.

Sorry to interrupt, but you should really...

...Join the ZME newsletter for amazing science news, features, and exclusive scoops. More than 40,000 subscribers can't be wrong.

   

In a second test, the algorithm was shown a frame from a video and asked to predict what object would appear five seconds later. For instance, a frame might show a person attempting to open a microwave. Inside might be some leftover pasta from last night, a coffee mug and so on.

The algorithm correctly predicted what object appeared 11 percent of the time. That might not sound impressive, but it’s still 30 percent better than the baseline measurement and some of its guesses were better than those made by some people. When the same test was given to people, the subjects were right only 71 percent of the time.

ADVERTISEMENT

“There’s a lot of subtlety to understanding and forecasting human interactions,” says Carl Vondrick, a PhD student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “We hope to be able to work off of this example to be able to soon predict even more complex tasks.”

“I’m excited to see how much better the algorithms get if we can feed them a lifetime’s worth of videos,” Vondrick added. “We might see some significant improvements that would get us closer to using predictive-vision in real-world situations.”

For now, there’s no practical outcome in mind for this deep-learning algorithm, but the researchers say their present work might one day lead to better household robots that can work better with humans by vanquishing some of the inherent awkwardness.

Credit: MIT CSAIL
Credit: MIT CSAIL

From the same Computer Science and Artificial Intelligence Laboratory (CSAIL) lab came another predictive artificial intelligence — this time for sound. What it does is basically produce a sound when shown a video of an object being hit.

The assigned sounds are realistic enough to fool a human viewer. When asked whether a sound was real or fake, human participants picked the fake sound over the real one twice as often as a baseline algorithm.

“When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it,” says CSAIL PhD student Andrew Owens, who was lead author of the paper describing MIT’s CSAIL sound algorithm. “An algorithm that simulates such sounds can reveal key information about objects’ shapes and material types, as well as the force and motion of their interactions with the world.”

In other words, both algorithms are now learning in seconds what most humans have to learn in a lifetime: how people react, social norms, even what sound a drum stick makes when it hits a puddle of water.

Tags: artificial intelligence
ShareTweetShare
Tibi Puiu

Tibi Puiu

Tibi is a science journalist and co-founder of ZME Science. He writes mainly about emerging tech, physics, climate, and space. In his spare time, Tibi likes to make weird music on his computer and groom felines.

ADVERTISEMENT
ADVERTISEMENT
  • News
  • Environment
  • Health
  • Future
  • Space
  • Features
  • More

© 2007-2019 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • News
  • Environment
  • Health
  • Future
  • Space
  • Features
  • More
    • About
    • The Team
    • Advertise
    • Contribute
    • Our stance on climate change
    • Privacy Policy
    • Contact

© 2007-2019 ZME Science - Not exactly rocket science. All Rights Reserved.