Our voices are as unique as each individual snowflake that falls from the sky. This distinction is not only based on the shape and size of our vocal cords but also the build of our body and the structure of our vocal tract. All of these components affect the way we articulate speech, create sounds, and set pitch and tone. Our voices are one of a kind, they are inimitable, and they are forever ours, which also makes them ideal passwords.
Timelines In Speech Recognition
Speech recognition systems have been around since the 1950s, but how quickly have developments in voice recognition technology progressed? “Baby Talk”—the first on the market—could comprehend only numbers—and the “Audrey System”—distributed in 1952 could understand digits spoken by a solitary voice.
But it wasn’t until the 1970s that voice recognition gained some renewed attention and significant development. Machines became more astute at recognizing speech by linking sounds with the probability of those same sounds being words. And the Hidden Markov Modelling (HMM) approach, invented by Lenny Baum of Princeton University, was the catalyst in these advanced movements—sharing his progress with Advanced Research Project Agencies (ARPAs) and gaining attention from International Business Machines (IBM).
In the early 80s, IBM commenced work on the Tangora—a machine that by the mid-80s could recognize more than 20,000 spoken words. In the late 80s and into the early 90s, various interactive toys and gadgets began to flood the market and included The World of Wonder Julie Doll, Dragon Dictate, and Sphinx-ll. It wasn’t until the mid 90’s and into the 21st century that voice recognition software began to make its mark in the industry with companies like MedSpeak, Microsoft, and Google releasing commercial products that could recognize continuous speech, including analyzing capabilities.
Speech Recognition software in the 21st century has seen massive upgrades, including machines able to identify the spoken word and convert those words into formats that are machine-readable. Voice recognition is a subgroup of speech recognition and is the tool used for personal identification based on voice alone.The last decade has seen developments at pace across global leaders within this space, cemented in October 2011 by Apple’s introduction of Siri and the iPhone 4S—and Microsoft’s Cortana and Amazons Echo in 2014. These digital personal assistants understand what is being asked of them, and they act accordingly.
What’s The Risk?
Multiple U.S.-based households now rely on voice activation software to some degree, be it via their mobile device or smart speaker, and many of us think nothing of asking Alexa to turn down the music while the baby sleeps; or telling Cortana to add eggs to the shopping list; or questioning Siri about the nearest take-out burger. But has this leading-edge technology shaped a new wave of cyber threat?
Security systems and people can be easily fooled—and with just a snippet of repayable recorded audio, potential attackers have all the tools they need to claim someone else’s identity through the use of voice. As many families juggle commitments in the home—with children, work, and the like— it’s not surprising that such technological gadgetry, that effortlessly makes audio recordings of our lives, has exposed us to attack. Quite simply, we often forget these devices exist—or even more odd, we have embraced these named appliances as one of the family.
On average, an adult speaks 14,000 words per day, so imagine how much information could be gleaned from that amount of words if it were to be recorded—and what could be done with it. As an example, consider the Oregon-based family who became recent victims of an Alexa instruction error which caused Alexa to record a confidential conversation held in the privacy of their home and send it to a contact in their address book. Voice recognition also carries the risk that stolen voices could be used to personalize advertising.
As technology advances—and like any Artificial Intelligence (AI) or Machine Learning (ML)—voice recognition capabilities will continue to be honed. And as they improve, existing risks will be mitigated as new threats emerge. Consider this though—with so much of our every day being captured as audio recordings, through smart speakers, mobile devices, and telephone calls, it begs a moral question as to whether society can resist monetizing this appealing data transporter.
Keeping Perspective And Staying Safe
Listening in on conversations takes time. Hackers of voice recognition software will have clear objectives and defined targets—meaning that for cyber criminals to invest time, effort, and energy into hijacking your voice, they would have to have a clear idea of what they were going to with it and why—with hackers usually reserving this kind of behavior for large-scale attacks. Rest assured—the vast majority of us are safe.
That said, there are always extra measures you can take to keep yourself safe, and here’s two simple tips to remember:
- Ensure that your device microphones are turned off and that all devices have antivirus software installed.
- Take advantage of your device settings by password protecting sites that you use for purchasing and not authorizing one-click-purchase. (This advice is particularly helpful if you use Amazon Echo for voice shopping.)
Researchers at the University of Buffalo are developing an app that identifies Voice Replay Attacks. In short, the app can measure the gap between the phone and its speaker, making an informed judgment on authenticity. Voice authentication can only be considered secure if organizations can adequately defend against Voice Replay Attacks. Remember—your voice is as unique as each snowflake that falls from the sky—long may it stay that way.