Several findings have indicated that text recommendation systems remain insufficiently nuanced to reflect the subtleties of real-world scenarios
If you recently wrote a text message or an email, chances are you came across an AI that suggested different synonyms, phrases, or ways to start/finish a sentence. The rise of AI-powered autosuggestion tools has coincided with the digital transformation of enterprise communications, which now show their presence in the online world. It was found that a typical worker replies to about 40 emails each day and sends more than 200 Slack messages per week.
Messaging consumes a significant portion of the workday, with the average amount of time that workers spend answering emails being 15.5 hours a week. This task is a death knell for productivity. Research from the University of California and Humboldt University shows that workers can lose up to 23 minutes on a task every time they are interrupted, extending the workday further.
Autosuggestion tools can help save time by streamlining impromptu message writing and replying. For instance, Google’s Smart Reply suggests quick responses to emails that would typically take several minutes to type. However, the AI behind these tools has a handful of shortcomings that could either introduce bias or influence the way language is used in the message in undesirable ways.
The Rise Of Autosuggestion And Text Autocompletion
Autosuggestion and text autocompletion systems use a technology known as predictive text. One of the first and widely used systems, the T9, allowed words to be formed from a single keypress for each letter, which later was included as a standard on many cellphones in the late ’90s. Nevertheless, the advent of more sophisticated, scalable AI techniques in language processing led to leaps in the quality and breadth of autosuggestion tools.
Google launched Smart Reply in 2017 for Gmail, which the company later brought to other Google services, including Google Chat and numerous third-party apps. Google says that the AI behind Smart Reply generates suggestions for reply based on the complete context of the conversation and not just a single message, resulting in more timely and relevant suggestions. The company later announced the launch of Smart Compose, which suggested complete sentences in emails, including being a part of Gmail a year later and Google Docs soon afterwards. A similar feature named suggested replies was introduced to Microsoft Outlook in 2018 and Teams in 2020.
Several academic circles refer to the technology behind such new autosuggestion tools as “AI-mediated communication”. The AI model underpinning such systems is created using billions of email examples and runs on the cloud, using custom accelerator hardware. Many of these systems take a “hierarchical approach” to suggestions, inspired by how humans understand languages and concepts.
But as a given with all technologies, even the most capable autosuggestion tools available today are susceptible to flaws that might crop up during the development and deployment process.
Mistakes To Learn From
In 2016, Google Search’s autocompleting feature suggested multiple hateful and offensive wordings for specific search phrases such as “are jews evil?” for the phrase “are jews”. The company later clarified, saying that at fault was an algorithmic system that updated the suggestions based on what other users have recently searched for. While Google implemented a fix, it took several more years to overcome autocompletion suggestions for controversial political statements, including false claims about the voting requirements and legitimacy in elections.
Smart Reply was also found to offer the “person wearing turban” emoji in response to messages that included a gun emoji, and Apple’s autocomplete on iOS suggested only male emoji for executive roles, including CEO, COO, and CTO.
Why Does It Occur?
Such flaws in auto-completion and autosuggestion systems often arise due to biased data. The billions of examples from which the systems learn can be easily tainted with text from toxic websites and scripts that associate certain genders, races, ethnicities, and religions with hurtful concepts.
Annotations in the data can introduce a new set of problems or exacerbate existing ones. As many models also learn from labels that communicate whether a word, sentence, paragraph or document has specific characteristics, like a positive or negative sentiment, companies and researchers also recruit teams of human annotators to label the present examples from crowdsourcing platforms such as the Amazon Mechanical Turk. Annotators bring their own sets of perspectives and biases to the table, sometimes unknowingly and unintentionally.
However, bias can also be infused intentionally, as it is just a matter of vernacular trade-offs. Writer, a startup currently developing an AI assistant for content generation, says it prioritises “business English” in its writing suggestions.
Influence Of Autocompletion
Intentional or not, these shortcomings make it into autocompletion and autosuggestion systems, influencing and changing the way we write and express. The humongous scale these systems operate behind the scenes makes them challenging to avoid mishaps completely. In one of the comprehensive audits for autocompletion tools by Microsoft, a team of researchers conducted interviews with multiple volunteers who were told to present their thoughts on auto-generated replies in Outlook. The interviewees found some of the replies over-positive, a few wrong in their assumptions, and others too impolite for particular contexts. The experiments during the study showed that users were more likely to favour short, positive, and polite replies that Outlook suggested.
Another Harvard study found that when people writing about a restaurant were presented with “positive” autocomplete suggestions, the reviews tended to be more positive than if they were presented with negative suggestions.
Although an all-encompassing solution to the problem of harmful autocompletion has not been discovered yet, tech giants are now looking more into the problem.
Google has opted to block gender-based pronoun suggestions in its Smart Compose system, as it was a poor predictor of recipients’ sexes or gender identities. Microsoft’s LinkedIn also avoids gendered pronouns now in Smart Replies, its predictive messaging tool, to prevent potential blunders.
Several findings have indicated that the current text recommendation systems remain insufficiently nuanced to reflect the subtleties of real-world scenarios and communication social relationships might need. Maybe in the future, with the developments in neural networks, we might be able to see a system so intricate that it is inculcated with EQ more than the IQ.
So the next time an autocompletion system on your phone gives you a suggestion, give it a slow clap, as it has been through a lot to be what it is today.
If you liked reading this, you might like our other stories
How Netflix Knows What You Want To Watch Next
Is Predictive AI the Future Star Of Cybersecurity?