As we embrace the Intelligent Voice Assistants from homes to companies, we peep into their technological evolution and their dark side
Hey, Alexa. Do you know Siri?
Only by reputation.
Hey, Alexa. What about Bixby?
I am partial to all AI.
The world is starting to populate with AI and the voice concierges are beginning to be a community of their own. For decades, these intelligent voice assistants (IVA) have evolved systematically with technological investments.
The Voice Family Tree
It all started in 1960, when the first natural language processing computer program was invented by Joseph Weizenbaum, an MIT professor. Soon after, a Shoebox voice-activated calculator was developed by IBM. A decade of voice recognition software followed them. Harpy, a master of 1000 words, was another addition to the lot with its capability to understand sentences that a three-year-old could.
There was also Tangora, IBM’s voice recognising typewriter that had a vocabulary of 20,000 words. Finally, the first real virtual assistant was Simon, created by IBM with a digital speech recognition technology. Programmed with cognitive computing technology, including AI, Machine Learning (ML), and Voice recognition, these IVAs identify and learn from data inputs with the ability to predict user needs. Today the world has Siri, Alexa, Bixby, Nina, Viv, and Mycroft among many others.
Tracing the Technological Roots
There was a time when manual punch cards were used to store data and instruct the machine. With the advent of the programming era, the destiny of IVA was written. Although they were built around vacuum tubes, they were bolstered with the invention of the transistor and the microprocessor.
The Cognitive Computing (CC) phase brought the simulation of human thought process into a computerised model. Leveraging self-learning systems, CC used natural language processing, pattern recognition, and data mining to try and imitate the human thinking process.
Having succeeded in the emulation of the human brain, in terms of parallel processing and associative memory, CC demonstrated pattern recognition, robotic control, and emotional intelligence. It was a technological feat that was lauded, but raised a twinge of fear as well.
Developing context-based hypotheses with ML, the engineering behind CC offered the industry an in-depth understanding of how these voice assistants were built and their ability to interact with human beings.
Making of the Voice
After processing the human command based on datasets, an IVA converts the voice to text with NLP and creates a reply and converts it back into voice. Soon, developers took a step further as the demand for IVAs skyrocketed and they decided to create large Deep Neural Networks (DNNs) to address the then-new challenges.
An open infrastructure model called DjiNN was developed for DNN as a service and Tonic Suite that consisted of seven end-to-end applications including image, speech, and language processing.
Apart from NLP, there is a lesser-known AI technology that powers these intelligent assistants, called Natural Language Generation (NLG), an algorithm that creates the text and speech responses of VAs like Alexa, Siri, and Google Assistant.
The next ML model that worked on an IVA’s idea of emotions through the tone of a human voice was HEIM (Hybrid Emotion Interference Model). The model used Latent Dirichlet Allocation (LDA) to extract text features and a Long Short-Term Memory (LSTM) to analyse the acoustic and understand emotions behind human voices. These technological geniuses can answer with confidence. Experts believe that IVAs will simulate human psychology and make deeper connections in human cognition soon enough. A revolutionary algorithm, they kindled the intelligence threat of IVAs again.
Also Read: Why Intelligent Farming Is Vital For Food Security
Are they listening?
Recently, Siri let the cat out of the bag. On being quizzed by reporters about Apple’s next event, Siri declared, “the special event is on Tuesday, April 20th, at Apple Park in Cupertino, CA.” Siri made this declaration before any official announcement from Apple. The brand made no immediate comments on Siri’s revelation.
Admirable as IVA is, as it becomes more humane every day, people are increasingly concerned about matters of privacy, more than security. Those familiar with Ultron from the Avengers franchise can understand the seriousness of it. Sci-fi apart, the thought of an IVA becoming a highly self-aware AI is scary.
Will IVA be able to make life-changing decisions for a human being without their knowledge and consent? It is very much possible in the distant future.
Another invasive example dates back to 2018. A family in Portland reported that their Amazon Echo device was recording their private conversation and sending it to a random person from their phone contact list. Even Google Assistant hosted a nest of privacy complaints.
The truth is that every IVA’s nature is to collect and understand data, especially voice data. Experts believe that users do not realise that for an IVA to grow, have better conversations and be helpful, they need to review the collected voice data.
Research conducted by Loop Ventures showed that out of the 80 questions asked, Google Assistant got over 92 per cent of them correct, Alexa answered 79 per cent correctly, and Siri was at 83 per cent. In 2019, the figures had been 86, 61, and 79 per cent, respectively. The research suggests that IVA will understand all questions, within reason, in the near future.
Meanwhile, to streamline the process and ease people’s worry, the GDPR law that was activated in 2018 forces all IVAs to fetch consent. Users also possess the right to be informed about the collected data, the right to rectification and erasure too.
Google launched a federated learning model in 2019 to limit the accidental awakening of Google Assistant on Android. This new system would allow the IVA to ask for permission to save audio and speech to learn over time. With the federated model, Google can record the audio in the device as encryption instead of processing the data on the cloud or its servers.
Voice Assistants have come a long way from the programming era, cognitive computing era to the AI era. Data availability, decreasing cost of computing power, and better algorithms will keep enhancing the voice assistants’ intelligence. At the moment, Microsoft is investing in Nuance to dive deeper into natural processing and AI that will be able to respond to humans, and Google and Amazon are working on the complete elimination of ‘wake’ words.
Meanwhile, Amazon and Google are selling their smart speakers as modern alarm clocks, and the Echo Look and the Echo Spot are intended to get cameras in the bedroom. With the forces being marshalled to push us toward voices, and as they grow in sophistication, we want our personal digital assistants not too friendly (not invade our privacy), but sassy enough.