homehome Home chatchat Notifications


AI turns speech into portraits -- and it's eerily accurate sometimes

It's the racist uncle of computers.

Tibi Puiu
April 5, 2022 @ 8:16 pm

share Share

Some law enforcement and intelligence agencies across the world are experimenting with artificial intelligence to identify people using voice recordings and big data. But what if there’s no match? AI systems can still provide valuable input. For instance, researchers at MIT have demonstrated a novel machine-learning algorithm that can synthesize a person’s portrait based solely on a short audio recording of their voice.

Researchers at MIT’S Computer Science and Artificial Intelligence Laboratory (CSAIL) first trained their deep neural network by feeding it millions of videos scraped from YouTube showing people speaking in front of the camera. This huge swath of data trained the AI to correlate certain sound characteristics with facial features corresponding to age, gender, or ethnicity.

Throughout this process, there was no human intervention apart from the initial task of correlating voice and facial features. The AI learned all of this by itself with no external supervision, not even the labeling of subsets of data.

To test the system, the researchers designed a face decoder that reconstructs a speaker’s face from a still frame, regardless of its lighting or pose. This digital reconstruction was then compared to synthesized portraits solely from a speaker’s voice, with striking results that you can see in these images.

The synthesized faces are generic, meaning the produced images are not of specific individuals as those produced by the decoder. Nevertheless, they still manage to capture the basic facial features of a speaker such as skin color, gender, and age. The longer the voice recordings were, the more accurate the synthesized portrait proved to be.

But there were also plenty of mismatches. High-pitched voices were often identified as female, even in cases when they came from males, such as young boys. Asian men who spoke in American English had portraits resembling white males, but this did not happen when the Asian voice spoke in Chinese.

The AI may remind some of their racist uncle, and the researchers are aware of these biases and are looking to overcome these limitations. Improving the system’s accuracy is a matter of providing more training data that is representative of the general population.

Until these limitations are addressed, real-world applications of this AI system should be treated with care. One possible use could be to make interactions between humans and machines more appealing. Machine-generated voices used by home devices and virtual assistants could now be given an appropriate face. Law enforcement could use this neural network to generate a portrait of a suspect when the only evidence is a voice recording. However, any government use will be sure to be met with criticism surrounding privacy and ethics.

“We believe that generating faces, as opposed to predicting specific attributes, may provide a more comprehensive view of voice face correlations and can open up new research opportunities and applications,” the MIT authors wrote in the description of their project called Speech2Face.

[via PetaPixel]

share Share

How Bees Use the Sun for Navigation Even on Cloudy Days

Bees see differently than humans, for them the sky is more than just blue.

Scientists Quietly Developed a 6G Chip Capable of 100 Gbps Speeds

A single photonic chip for all future wireless communication.

This Teen Scientist Turned a $0.50 Bar of Soap Into a Cancer-Fighting Breakthrough and Became ‘America’s Top Young Scientist’

Heman's inspiration for his invention came from his childhood in Ethiopia, where he witnessed the dangers of prolonged sun exposure.

Pluto's Moons and Everything You Didn't Know You Want to Know About Them

Let's get acquainted with the lesser known but still very interesting moons of Pluto.

Japan Is Starting to Use Robots in 7-Eleven Shops to Compensate for the Massive Shortage of Workers

These robots are taking over repetitive jobs and reducing workload as Japan combats a worker crisis.

This Bizarre Martian Rock Formation Is Our Strongest Evidence Yet for Ancient Life on Mars

We can't confirm it yet, but it's as close as it gets.

A small, portable test could revolutionize how we diagnose Alzheimer's

A passive EEG scan could spot memory loss before symptoms begin to show.

Forget the wild-haired savages. Here's what Vikings really looked like

Hollywood has gravely distorted our image.

Is a Plant-Based Diet Really Healthy for Your Dog? This Study Has Surprising Findings

You may need to revisit your dog's diet.

Who Invented Russian Roulette? How a 1937 Short Story Sparked the Deadliest "Game" in Pop Culture

Russian Roulette is deadly game that likely spawned from a work of fiction.