Quantcast
ZME Science
  • News
  • Environment
  • Health
  • Future
  • Space
  • Features
    Menu
    Natural Sciences
    Health
    History & Humanities
    Space & Astronomy
    Technology
    Culture
    Resources
    Natural Sciences

    Physics

    • Matter and Energy
    • Quantum Mechanics
    • Thermodynamics

    Chemistry

    • Periodic Table
    • Applied Chemistry
    • Materials
    • Physical Chemistry

    Biology

    • Anatomy
    • Biochemistry
    • Ecology
    • Genetics
    • Microbiology
    • Plants and Fungi

    Geology and Paleontology

    • Planet Earth
    • Earth Dynamics
    • Rocks and Minerals
    • Volcanoes
    • Dinosaurs
    • Fossils

    Animals

    • Mammals
    • Birds
    • Fish
    • Reptiles
    • Amphibians
    • Invertebrates
    • Pets
    • Conservation
    • Animals Facts

    Climate and Weather

    • Climate Change
    • Weather and Atmosphere

    Geography

    Mathematics

    Health
    • Drugs
    • Diseases and Conditions
    • Human Body
    • Mind and Brain
    • Food and Nutrition
    • Wellness
    History & Humanities
    • Anthropology
    • Archaeology
    • Economics
    • History
    • People
    • Sociology
    Space & Astronomy
    • The Solar System
    • The Sun
    • The Moon
    • Planets
    • Asteroids, Meteors and Comets
    • Astronomy
    • Astrophysics
    • Cosmology
    • Exoplanets and Alien Life
    • Spaceflight and Exploration
    Technology
    • Computer Science & IT
    • Engineering
    • Inventions
    • Sustainability
    • Renewable Energy
    • Green Living
    Culture
    • Culture and Society
    • Bizarre Stories
    • Lifestyle
    • Art and Music
    • Gaming
    • Books
    • Movies and Shows
    Resources
    • How To
    • Science Careers
    • Metascience
    • Fringe Science
    • Science Experiments
    • School and Study
    • Natural Sciences
    • Health
    • History and Humanities
    • Space & Astronomy
    • Culture
    • Technology
    • Resources
  • Reviews
  • More
    • Agriculture
    • Anthropology
    • Biology
    • Chemistry
    • Electronics
    • Geology
    • History
    • Mathematics
    • Nanotechnology
    • Economics
    • Paleontology
    • Physics
    • Psychology
    • Robotics
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Privacy Policy
    • Contact
No Result
View All Result
ZME Science

No Result
View All Result
ZME Science

Home → Science → News

The future of AI voice is here: new AI has emotionally intelligent synthetic speech

This AI knows how to sound like you or anyone else.

Rupendra Brahambhatt by Rupendra Brahambhatt
January 20, 2023
in Future, Inventions, News, Tech, Technology

The AI releases of the last year give us an idea that it is not just the low-skill labor jobs that AI is after. If you are an artist, you should definitely be worried — especially, if you are a voice artist. A recently published research paper from Microsoft reveals details about VALL-E, an AI model that can reproduce anyone’s voice from just a three-second voice sample.

3-second speaker prompt
VALL-E synthesis
A little toy robot (not VALL-E). Image credits: Rock’n Roll Monkey/Unsplash

Previously, we reported that Chinese company Tencent Music has also been using AI voice for releasing songs in real artist voices — although Tencent claims that it is mostly using its AI engine to produce songs in the voices of legendary singers who are dead, it’s quite possible the engine will become an alternative to human singers for Tencent in the future. After all, no record label in the world would like to spend millions of dollars on human singers, if it has software that can do the same job for free. 

Apart from being a major software company, Microsoft also stands as one of the world’s leading gaming companies. Microsoft is also in the process of acquiring Activision Blizzard for over $68 billion. If this deal happens, it will be the biggest-ever video game acquisition in human history. Now you might be wondering what the connection is between Tencent Music’s AI engine, Microsoft’s gaming business, and VALL-E. 

VALL-E will raise AI’s voice

Microsoft’s revenue from gaming stood at a whopping $16.23 billion in 2022 alone. The company has released some of the biggest game franchises including Gears of War and Halo, and it definitely spends a lot of money on artists that give voices to the characters in these games.

Unlike Tencent, it doesn’t have to hire singers, but it does hire a lot of voice artists. Now there is no official data about how much Microsoft spends on its voice actors, but the number is definitely big considering the company’s mammoth revenue from gaming. Although it’s all just an assumption, it seems possible that, like Tencent, Microsoft is also planning to employ AI to voice its games in the future. 

There could be various other reasons why Microsoft is working on VALL-E. In order to understand those, let’s first understand what this VALL-E is.

VALL-E is basically a neural codec model that is capable of mimicking human voice and the emotional tone that accompanies that voice. It’s not an ordinary voice synthesis software because along with the voice, it also captures the specific style in which a human speaker speaks — and to do that all it needs is a three-second voice sample of the speaker. 

3-second speaker prompt delivered with ‘sleepy’ tone
VALL-E synthesis

So for example, imagine you have a friend Carlos, who speaks such that he always sounds angry. You are an animator who creates short-animated films. Now to voice a character in one of your films, you need Carlos. Unfortunately, Carlos also happens to be that friend who drinks a lot and makes a scene wherever he goes. 

You want Carlos’ voice but you can’t take him to the studio for recording. If you were to have access to an AI model like VALL-E, you would be able to voice your character just from a three-second voice sample of Carlos (that you can record even in a car). You won’t need Carlos to come to the studio for recording. 

Imagine what a company like Microsoft could do with VALL-E. The team at Microsoft suggests that once fully developed, VALL-E could be adopted for voice-editing and premium-quality text-to-speech applications. In addition to imitating the voice and emotional tone, this neural codec model can also simulate the acoustic environment in its output. 

If the input voice sample was taken from a tape recorder, the output sample from VALL-E will have the ambiance of a tape recorder. The authors of the VALL-E research paper wrote:

“VALL-E significantly outperforms the state-of-the-art zero-shot TTS (text-to-speech) system in terms of speech naturalness and speaker similarity. In addition, we find VALL-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis.”

Microsoft’s VALL-E can disrupt everything

A report from Ars Technica mentions that VALL-E is developed using a deep-learning-based audio codec model called EnCodec that was actually released by Meta last year. EnCodec can break down a voice sample into small audio codecs (computer programs that compress or decompress data to make any changes in it) that can be further trained to introduce manipulations in the voice sample.

A diagrammatic representation of the VALL-E AI model. Image credits: VALL-E, Microsoft/GitHub

Moreover, VALL-E has been trained using Libri-light, an open-source audio library curated by Meta. It contains 60,000 hours of audio content (mostly, speeches from over 7,000 speakers) in English (available on LibriVox). Currently, Microsoft’s AI can only mimic voice if it closely matches the audio content on which it is trained. 

You can read about VALL-E and check some of its audio samples on GitHub. However, unlike DALL-E mini and ChatGPT, the program is not yet available for public use because of the serious implications audio deepfakes might have. There are people who would love to send each other messages in politician and celebrity voices, but there are also criminals and scammers who could use VALL-E to sow chaos.

Also, there is Microsoft which obviously wouldn’t like its competitors to use its AI voice model for free. The company might even have its own secret plans to shock the gaming industry by using VALL-E as a voice actor in its games. 

In the future, Microsoft might use this technology to provide gamers with the choice to use any voice they want for their character. Who knows — maybe you’d be able to make a game character sound like you using VALL-E. 

The time has also come for voice actors to consider copyrighting their voices because, with a program like VALL-E, they could be replaced anytime in the future. No matter whether you believe it or not, the AI revolution has begun.

The preprint paper is available on arXiv. 

Was this helpful?
Thanks for your feedback!
Related posts:
  1. Voice mimicking AI dupes Alexa and other voice recognition devices
  2. Researchers complete 30% of the synthetic yeast chromosome — synthetic life is just around the corner
  3. A less complex voice box could be what gives us our human speech
  4. Synthetic biology might enable future manned missions to Mars
  5. An AI-based voice program did the dialog delivery for Val Kilmer in Top Gun: Maverick
Tags: AIdeep learningmachine learning

ADVERTISEMENT
  • News
  • Environment
  • Health
  • Future
  • Space
  • Features
  • Reviews
  • More
  • About Us

© 2007-2021 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • News
  • Environment
  • Health
  • Future
  • Space
  • Features
    • Natural Sciences
    • Health
    • History and Humanities
    • Space & Astronomy
    • Culture
    • Technology
    • Resources
  • Reviews
  • More
    • Agriculture
    • Anthropology
    • Biology
    • Chemistry
    • Electronics
    • Geology
    • History
    • Mathematics
    • Nanotechnology
    • Economics
    • Paleontology
    • Physics
    • Psychology
    • Robotics
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Privacy Policy
    • Contact

© 2007-2021 ZME Science - Not exactly rocket science. All Rights Reserved.

Don’t you want to get smarter every day?

YES, sign me up!

Over 35,000 subscribers can’t be wrong. Don’t worry, we never spam. By signing up you agree to our privacy policy.

✕
ZME Science News

FREE
VIEW