homehome Home chatchat Notifications


AI for speech recognition is nearing a watershed moment

Computers will soon be able to understand what we say.

Mihai Andrei
November 21, 2022 @ 6:22 am

share Share

Artificial Intelligence (AI) is one of the (if not the) most hyped technology at the moment. While some of this hype is undoubtedly exaggerated — the name itself is somewhat of a misnomer since it’s not exactly intelligent — it’s already making quite a mark and it feels like we’re only seeing the tip of the iceberg. But while the world has been buzzing with AI for creating images, another type of algorithm has been making quite a buzz: speech recognition.

In the 1950s, three researchers from the legendary Bell Labs wanted to work on speech recognition and despite not having access to computers, made notable progress. However, another AI pioneer, Raj Reddy, picked up the topic at Stanford University, and he developed the first system capable of recognizing continuous speaking (until then, there had to be small pauses for the system to work). Reddy saw in speech recognition (and automated translation) a way of making people’s lives better, especially for the ones in lower socioeconomic conditions. He saw this technology a something that can “move the plateau” and improve the lives of the people that need it most. “The technology we’ve created in the past ten years, with things like translation, have moved the [socioeconomic] plateau up by a significant amount,” Reddy noted in a recent panel at the Heidelberg Laureate Forum.

But still, up until a few years ago, automatic transcriptions were pretty bad, despite all this progress. The problem is not an easy one by any margin: you have to recognize people’s speech, account for their accents and different way of pronouncing words, compensate for pitch and so on. But at some point, AI transcription and captioning started improving dramatically and new models seem to come along every day.

For communicators such as ourselves, this has been a boon. It often happens that transcribing an interview can take longer than the actual interview, and having tools (often free or at relatively low prices) that can perform speech recognition automatically is of great help. But this goes far beyond just transcribing interviews.

AI can be used for speech recognition in a number of ways, ranging from transcription to translation. It can play a role in everything from teaching and healthcare to tourism — heck, even food companies are now using speech recognition fridges. The market is expected to grow over $45 billion over the next decade, and pretty much all the big companies want a piece of the pie.

Just a month ago, Google announced its own speech-to-speech AI translation model called Translation Hub, and not long after that, Meta claimed its own breakthrough, by presenting an AI that can recognize and translate to and from Hokkien — a Taiwanese language that lacks a written form. Then, NVidia also joined the race, and the fact that all these happened within less than two months is telling of how fast the industry is growing.

For consumers, this is pretty good news. Many speech-to-text application programming interfaces (APIs) already boast 92% accuracy, which is fairly comparable to a human rate. Recent strides in machine learning research, as well as developments in computation and the improved data availability to train models on have also made AI speech recognition not just better, but also more affordable.

Of course, this technology was also accelerated by other AI features. For instance, the ability of AIs to summarize (reducing audio transcripts to logical parts) and identify different voices has both improved the performance and expanded the scope in which AI can be applied for speech recognition.

But while AI speech recognition seems to be entering a new phase, it is not without its own shortcomings and problems.

For instance, one such shortcoming is equity. By far, the best language for this type of application is English, and the reasons for that are twofold. Firstly, you need to manually classify data to train the models, which is easiest done in English (where you have a lot of data available). The second reason is that that’s where the money is. Sure, there’s market for speech recognition in Korean or Portuguese, but the market is smaller than the English one

There are also potential security risks in all of this. Voice-controlled devices are becoming increasingly common, and attackers are gaining new ways to get hold of your personal information through this type of speech recognition service. An attacker could, perhaps, confuse speech recognition systems and get them to perform unwanted actions, or access your private messages and documents by peaking to what your device is saying.

Ultimately, AI speech recognition is a tool — and a pretty useful one at that. It’s got plenty of potential, but it’s up to us as a society to use it responsibly.

share Share

This EV Battery Charges in 18 Seconds and It’s Already Street Legal

RML’s VarEVolt battery is blazing a trail for ultra-fast EV charging and hypercar performance.

DARPA Just Beamed Power Over 5 Miles Using Lasers and Used It To Make Popcorn

A record-breaking laser beam could redefine how we send power to the world's hardest places.

AI-Based Method Restores Priceless Renaissance Art in Under 4 Hours Rather Than Months

A digital mask restores a 15th-century painting in just hours — not centuries.

The Real Singularity: AI Memes Are Now Funnier, On Average, Than Human Ones

People still make the funniest memes but AI is catching up fast.

ChatGPT Got Destroyed in Chess by a 1970s Atari Console. But Should You Be Surprised?

ChatGPT’s chess skills falter against a 46-year-old video game in a quirky AI test.

Everyone Thought ChatGPT Used 10 Times More Energy Than Google. Turns Out That’s Not True

Sam Altman revealed GPT-4o uses around 0.3 watthours of energy per query.

World’s Smallest Violin Is No Joke — It’s a Tiny Window Into the Future of Nanotechnology

The tiny etching is smaller than a speck of dust but signals big advances in materials science.

Scientists Made a Battery Powered by Probiotics That's Completely Biodegradable

Scientists have built a battery powered by yogurt microbes that dissolves after use.

A Unique Light-Sensitive Resin Could Make 3D Printing Faster and Cleaner

Smart resin forms tough parts with UV light and dissolvable supports with visible light. This dual nature can make 3D printing waste-free.

This Tiny Chip Could Supercharge the Entire Internet Making It 10 Times Faster

This silicon chip that shatters bandwidth records, offering a 10x boost in data transmission speeds.