Making walls talk - new technique extracts audio from video

A very simple, yet effective optical technique was demonstrated that can transform video inputs, such as the motion of a piece of paper, into audio. To achieve this, the researchers involved exploited a simple principle that describes how sound waves causes objects in their path to vibrate. If you reverse engineer the vibrations, you can effectively decode the sound source and play it back. In effect, the technique could be used to extract audio information from a silent video, like a remote surveillance, granting the walls ears. Of course, the video needs to be shot at high speed since the technique can’t work without many frames per second at its disposal. Also, the demonstrations are far from conclusive, but considering it’s a first version I found it rather impressive.

Turning video into sound

Graffiti artists can make walls turn to life, and speak to the hearts of people through art. Researchers have now given new meaning to the phrase “walls can speak”. Image: Theater Fever

Sound is nothing but vibration. When we hear, what we’re actually sensing is air displaced in a signature manner by a mechanical pressure wave which eventually hits special receptor cells in the inner ear. This information is then transformed by nerves and relayed to the brain where it’s decoded. Because it’s a pressure wave, these mechanical vibrations we call sound can cause objects in its path to vibrate as well. This vibration is so tiny that we hardly notice it, unless you have really powerful speakers. But even with you home stereo, you might be able to notice the effect sound waves have on other objects if the latter are small enough. The vibrations, although usually with small amplitudes, can be detected and analyzed algorithmically, and audio reconstructed based on those calculations.

The researchers from the Catholic University of America used a thin sheet of paper for their tests. Matrix points were spread on the image of the paper so that the sound vibration imaged by a high-speed camera could be mapped. The Gauss-Newton algorithm and a few other measures were applied to process the image, then a simple model enacted the original audio information of the sound waves.

If you have a video of two people taking, you can tell what their conversation is about by lipreading. If you can’t see the people’s faces, though, this can’t work. This technique might thus be useful to discern conversations by studying the vibrations of objects around people.

“One of the intriguing aspects of the paper is the ability to recover spoken words from a video of objects in the room,” said journal Associate Editor Reiner Eschbach, a Research Fellow at Xerox Corp. “The paper shows that the sound creates minute vibrations in objects and that these vibrations ― given the right equipment ― can be picked up from a video signal. This is an interesting foray into a new application space and will, in my view, trigger interesting research in the field.”

Of course, the technique needs to be further refined before anyone can read anything from vibrating bricks. The paper was published in the journal Optical Engineering.

Was this helpful?

Thanks for your feedback!

Tags: sound vibration

Making walls talk – new technique extracts audio from video

Recent news

Pompeii’s Double Tragedy: Earthquakes Struck Alongside Volcanic Inferno

Our brain doesn’t perceive time as a clock. Instead, time flows with experiences, study finds