It’s no secret that the science of speech recognition has come a long way since IBM introduced its first speech recognition machine in 1962. As the technology has evolved, speech recognition has become increasingly embedded in our everyday lives with voice-driven applications like Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, or the many voice-responsive features of Google. From our phones, computers, watches and even our refrigerators, each new voice-interactive device that we bring into our lives deepens our dependence on artificial intelligence (AI) and machine learning.
Artificial Intelligence and Machine Learning
First coined in 1956 by John McCarthy, artificial intelligence can be defined as “human intelligence exhibited by machines”. Where it was first used to analyze and quickly compute data, artificial intelligence now allows computers to perform tasks that generally only humans were capable of.
Machine learning, a subset of artificial intelligence, refers to systems that can learn by themselves. It involves teaching a computer to recognize patterns, rather than programming it with specific rules. The training process involves feeding large amounts of data to the algorithm and allowing it to learn from that data and identify patterns. In the early days, programmers would have to write code for every object they wanted to recognize (e.g. human v. dog); now one system can recognize both by showing it many examples of each. As a result, these systems continue to get smarter over time without human intervention.
Machine learning is a subset of artificial intelligence, referring to systems that can learn by themselves. It involves teaching a computer to recognize patterns, rather than programming it with specific rules.
There are many different techniques and approaches to machine learning. One of those approaches is artificial neural networks, an example of which is product recommendations. E-commerce companies often use artificial neural networks to show you products you are more likely to purchase. They can do this by ingesting data from all of their users’ browsing experiences and utilizing that information to make effective product recommendations.
Some other common applications of artificial intelligence today are object recognition, translation, speech recognition, and natural language processing. Rev’s automatic transcription is powered by automated speech recognition (ASR) and natural language processing (NLP). ASR is the conversion of spoken word to text while NLP is the processing of the text to derive its meaning. Since humans often speak in colloquialisms, abbreviations, and acronyms, it takes extensive computer analysis of natural language to produce accurate transcription.
Challenges With Speech Recognition Technology
The challenges with speech recognition technology are numerous but they are narrowing. They include overcoming bad recording equipment, background noise, difficult accents and dialects as well as the varied pitches of people’s voices.
Teaching a machine to learn to read a spoken language as humans do, is something that hasn’t yet been perfected. Listening to and understanding what a person says is so much more than hearing the words the person speaks. As humans, we also read the person’s eyes, their facial expressions, body language, and the tones and inflections in their voice. Another nuance of speech is the human tendency to shorten certain words (e.g. “I don’t know” becomes “dunno”); we have said abbreviated words for so long, that we do not pronounce them as precisely as when we learned them. This human disposition poses yet another challenge for machine learning in speech recognition.
Listening to and understanding what a person says is so much more than hearing the words the person speaks.
Machines are learning to “listen” to accents, emotions and inflections, but there is still quite a ways to go. As the technology becomes more sophisticated and more data is used by specific algorithms, those challenges are quickly being overcome.
The technology to support voice-powered interfaces is incredibly powerful. With the advancements in artificial intelligence and the copious amounts of speech data that can be easily mined for machine learning purposes, it would not be surprising if it becomes the next dominant user interface.
Speech Recognition and Rev
At Rev, we have leveraged decades of research and development in speech recognition to create an automated transcription service that is fast, easy-to-use, and affordable. We would not have been able to build Rev speech without all the foundations in speech recognition from other companies.