ZME Science
No Result
View All Result
ZME Science
No Result
View All Result
ZME Science

Home → Science → News

The future of AI voice is here: new AI has emotionally intelligent synthetic speech

This AI knows how to sound like you or anyone else.

Rupendra BrahambhattbyRupendra Brahambhatt
January 20, 2023
in Future, Inventions, News, Tech, Technology
A A
Share on FacebookShare on TwitterSubmit to Reddit

The AI releases of the last year give us an idea that it is not just the low-skill labor jobs that AI is after. If you are an artist, you should definitely be worried — especially, if you are a voice artist. A recently published research paper from Microsoft reveals details about VALL-E, an AI model that can reproduce anyone’s voice from just a three-second voice sample.

3-second speaker prompt
VALL-E synthesis
A little toy robot (not VALL-E). Image credits: Rock’n Roll Monkey/Unsplash

Previously, we reported that Chinese company Tencent Music has also been using AI voice for releasing songs in real artist voices — although Tencent claims that it is mostly using its AI engine to produce songs in the voices of legendary singers who are dead, it’s quite possible the engine will become an alternative to human singers for Tencent in the future. After all, no record label in the world would like to spend millions of dollars on human singers, if it has software that can do the same job for free. 

Apart from being a major software company, Microsoft also stands as one of the world’s leading gaming companies. Microsoft is also in the process of acquiring Activision Blizzard for over $68 billion. If this deal happens, it will be the biggest-ever video game acquisition in human history. Now you might be wondering what the connection is between Tencent Music’s AI engine, Microsoft’s gaming business, and VALL-E. 

VALL-E will raise AI’s voice

Microsoft’s revenue from gaming stood at a whopping $16.23 billion in 2022 alone. The company has released some of the biggest game franchises including Gears of War and Halo, and it definitely spends a lot of money on artists that give voices to the characters in these games.

Unlike Tencent, it doesn’t have to hire singers, but it does hire a lot of voice artists. Now there is no official data about how much Microsoft spends on its voice actors, but the number is definitely big considering the company’s mammoth revenue from gaming. Although it’s all just an assumption, it seems possible that, like Tencent, Microsoft is also planning to employ AI to voice its games in the future. 

There could be various other reasons why Microsoft is working on VALL-E. In order to understand those, let’s first understand what this VALL-E is.

VALL-E is basically a neural codec model that is capable of mimicking human voice and the emotional tone that accompanies that voice. It’s not an ordinary voice synthesis software because along with the voice, it also captures the specific style in which a human speaker speaks — and to do that all it needs is a three-second voice sample of the speaker. 

RelatedPosts

How AI analysis of millions of hours of body cam footage could reform the police
Can Your Voice Reveal Diabetes? This New AI Thinks So
ChatGPT discriminates against CVs that imply a disability
This poison shooting robot could be the future of agriculture
3-second speaker prompt delivered with ‘sleepy’ tone
VALL-E synthesis

So for example, imagine you have a friend Carlos, who speaks such that he always sounds angry. You are an animator who creates short-animated films. Now to voice a character in one of your films, you need Carlos. Unfortunately, Carlos also happens to be that friend who drinks a lot and makes a scene wherever he goes. 

You want Carlos’ voice but you can’t take him to the studio for recording. If you were to have access to an AI model like VALL-E, you would be able to voice your character just from a three-second voice sample of Carlos (that you can record even in a car). You won’t need Carlos to come to the studio for recording. 

Imagine what a company like Microsoft could do with VALL-E. The team at Microsoft suggests that once fully developed, VALL-E could be adopted for voice-editing and premium-quality text-to-speech applications. In addition to imitating the voice and emotional tone, this neural codec model can also simulate the acoustic environment in its output. 

If the input voice sample was taken from a tape recorder, the output sample from VALL-E will have the ambiance of a tape recorder. The authors of the VALL-E research paper wrote:

“VALL-E significantly outperforms the state-of-the-art zero-shot TTS (text-to-speech) system in terms of speech naturalness and speaker similarity. In addition, we find VALL-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis.”

Microsoft’s VALL-E can disrupt everything

A report from Ars Technica mentions that VALL-E is developed using a deep-learning-based audio codec model called EnCodec that was actually released by Meta last year. EnCodec can break down a voice sample into small audio codecs (computer programs that compress or decompress data to make any changes in it) that can be further trained to introduce manipulations in the voice sample.

Moreover, VALL-E has been trained using Libri-light, an open-source audio library curated by Meta. It contains 60,000 hours of audio content (mostly, speeches from over 7,000 speakers) in English (available on LibriVox). Currently, Microsoft’s AI can only mimic voice if it closely matches the audio content on which it is trained. 

You can read about VALL-E and check some of its audio samples on GitHub. However, unlike DALL-E mini and ChatGPT, the program is not yet available for public use because of the serious implications audio deepfakes might have. There are people who would love to send each other messages in politician and celebrity voices, but there are also criminals and scammers who could use VALL-E to sow chaos.

Also, there is Microsoft which obviously wouldn’t like its competitors to use its AI voice model for free. The company might even have its own secret plans to shock the gaming industry by using VALL-E as a voice actor in its games. 

In the future, Microsoft might use this technology to provide gamers with the choice to use any voice they want for their character. Who knows — maybe you’d be able to make a game character sound like you using VALL-E. 

The time has also come for voice actors to consider copyrighting their voices because, with a program like VALL-E, they could be replaced anytime in the future. No matter whether you believe it or not, the AI revolution has begun.

The preprint paper is available on arXiv. 

Tags: AIdeep learningmachine learning

ShareTweetShare
Rupendra Brahambhatt

Rupendra Brahambhatt

Rupendra Brahambhatt is an experienced journalist and filmmaker covering culture, science, and entertainment news for the past five years. With a background in Zoology and Communication, he has been actively working with some of the most innovative media agencies in different parts of the globe.

Related Posts

Mind & Brain

AI and Brain Scans Reveal Why You Struggle to Recognize Faces of People of Other Races

byTibi Puiu
4 days ago
History

AI Would Obliterate the Nazi’s WWII Enigma Code in Minutes—Here’s Why That Matters Today

byTudor Tarita
6 days ago
Future

A New AI Tool Can Recreate Your Face Using Nothing But Your DNA

byTibi Puiu
7 days ago
Future

We Don’t Know How AI Works. Anthropic Wants to Build an “MRI” to Find Out

byTudor Tarita
1 week ago

Recent news

Merton College, University of Oxford. Located in Oxford, Oxfordshire, England, UK. Original public domain image from Wikimedia Commons

For over 500 years, Oxford graduates pledged to hate Henry Symeonis. So, who is he?

May 16, 2025

The Strongest Solar Storm Ever Was 500 Times More Powerful Than Anything We’ve Seen in Modern Times. It Left Its Mark in a 14,000-Year-Old Tree

May 16, 2025

Harvard Bought a $27.50 ‘Copy’ of Magna Carta That Turned Out To Be a Genuine Manuscript of the “Most Famous Single Document in the History of the World”

May 16, 2025
  • About
  • Advertise
  • Editorial Policy
  • Privacy Policy and Terms of Use
  • How we review products
  • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • Science News
  • Environment
  • Health
  • Space
  • Future
  • Features
    • Natural Sciences
    • Physics
      • Matter and Energy
      • Quantum Mechanics
      • Thermodynamics
    • Chemistry
      • Periodic Table
      • Applied Chemistry
      • Materials
      • Physical Chemistry
    • Biology
      • Anatomy
      • Biochemistry
      • Ecology
      • Genetics
      • Microbiology
      • Plants and Fungi
    • Geology and Paleontology
      • Planet Earth
      • Earth Dynamics
      • Rocks and Minerals
      • Volcanoes
      • Dinosaurs
      • Fossils
    • Animals
      • Mammals
      • Birds
      • Fish
      • Amphibians
      • Reptiles
      • Invertebrates
      • Pets
      • Conservation
      • Animal facts
    • Climate and Weather
      • Climate change
      • Weather and atmosphere
    • Health
      • Drugs
      • Diseases and Conditions
      • Human Body
      • Mind and Brain
      • Food and Nutrition
      • Wellness
    • History and Humanities
      • Anthropology
      • Archaeology
      • History
      • Economics
      • People
      • Sociology
    • Space & Astronomy
      • The Solar System
      • Sun
      • The Moon
      • Planets
      • Asteroids, meteors & comets
      • Astronomy
      • Astrophysics
      • Cosmology
      • Exoplanets & Alien Life
      • Spaceflight and Exploration
    • Technology
      • Computer Science & IT
      • Engineering
      • Inventions
      • Sustainability
      • Renewable Energy
      • Green Living
    • Culture
    • Resources
  • Videos
  • Reviews
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Editorial policy
    • Privacy Policy
    • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.