How an empty crisp packet could be used to listen in on a conversation

Researchers from MIT, Microsoft and Adobe have developed an algorithm which can identify vibrations through objects and convert them into sound.

RESEARCHERS AT MIT, Microsoft and Adobe have created a way to reconstruct sound from a video of an object, turning it into a makeshift microphone.

Instead of using specialist equipment such as laser microphones, which measures minute vibrations in reflective surfaces, the team developed a specialised algorithm to analyse the tiny vibrations of everyday objects using a high-speed camera.

In one experiment, the researchers were able to recover intelligible speech from the vibrations of a crisp packet photographed 15 feet away through soundproof glass.

In other experiments, they extracted audio signals from videos of aluminium foil, the surface of a glass of water, and even the leaves of a potted plant, allowing them to extract the recording of the nursery rhyme ‘Mary had a Little Lamb.’

According to the research team, the algorithm is able to pick up the sound hitting an object, which causes it to vibrate very subtly. The motion of this vibration is usually invisible to the naked eye, but by analysing high-speed video, the algorithm is able to pick out these vibrations and then convert them into audio.

The researchers go into greater detail about how the algorithm works.

Reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal.In some of their experiments, the researchers used a high-speed camera that captured 2,000 to 6,000 frames per second. That’s much faster than the 60 frames per second possible with some smartphones, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second.

(If that explanation has gone over your head, this video will help clear things up).

Abe Davis's Research / YouTube

In one experiment, it was able to extract sound from a video recorded using an ordinary digital camera, recorded at the standard 60 frames per second.

While the audio reconstruction wasn’t as accurate as the previous experiments, it was good enough for researchers to identify the gender of a speaker, the number of speakers in a room and even give enough information about the sound of a speaker’s voice to help identify them.

While the researchers see obvious applications for the technique in areas like law enforcement and forensics, they are more excited about developing a new kind of imaging, mainly trying to determine the material and structural properties of objects from their visible response to short bursts of sound.

The researchers will present their findings at the computer graphics conference, Siggraph, which is taking place next week.

Read: Samsung has been knocked off the top spot in China by a local smartphone maker >

Read: Explainer: Why the new challenger to Bitcoin is worth keeping an eye on >

Readers like you are keeping these stories free for everyone...

A mix of advertising and supporting contributions helps keep paywalls away from valuable information like this article. Over 5,000 readers like you have already stepped up and support us with a monthly payment or a once-off donation.

Learn More Support The Journal