voice tech

Limits of the Voice Assistant Model

Collin Borns

May 09, 2022

3 min read

Voice Assistants ushered in a wave of excitement around Voice Technology, but years later the limits of Voice Assistants are clear.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

We have seen the adoption of Voice Assistants start to flatten in 2022, with adoption around 50-60% of the US population for Smart Speakers, Smart Phones and In the Car. While there are various challenges facing Voice Assistants to continue its growth, the main inhibitor for Voice Assistant growth lies in the User Experience itself. Users do not want to have “Conversations” with their technology, they want to get tasks done, and this limitation is the core emphasis of this post.

Voice Assistant Challenges

Turn-based conversations favored by the popular voice assistants like Amazon Alexa, Google Assistant, or Apple Siri can sometimes be more convenient than manual input methods, but they are often too slow and tedious. The assistant must wait for you to finish speaking before it can determine your request and then defaults into a spoken response. You then must wait for the assistant to stop talking before moving on to another step.

This makes for a terrible User Experience when you recognize that the assistant has misunderstood the intent (what the user is trying to accomplish with the Voice Command), but still must wait until it finishes speaking before making the correction. And barging into the "conversation" while the voice assistant is speaking to clarify the original request is rarely successful. Instead, you typically wind up starting over with your initial command, albeit with a slight tweak in how you say the command.

So it is not a big surprise when you look at smart speakers and smartphones, the most popular use cases for voice assistants are predominantly user-directed requests. Smart speaker users ask questions, ask to play music, ask for the weather, ask to set a timer or alarm. Smartphone voice assistant users ask to initiate a call, or for directions, or music. Many pose questions expecting a straightforward response or answer. These are all single-turn interactions meaning there is no back-and-forth conversation required or desired. A simple request is spoken and the device completes the task. That's it.

What does this tell us?

We rarely need to Converse with our digital devices through multiple turns to complete our task of interest. The real focus should be on Spoken Commands. When pairing these Spoken Commands with a website or mobile app, think of “Voice as a Feature” vs an Assistant and you are likely to deliver a better user experience.

This insight is why at Speechly we developed the model of building Voice UIs as a Feature that blends alongside existing ways of interacting with a screen such as typing, tapping, or swiping.

Aligning Limitations with Expectations

Academic research backs up this observation. Many people are familiar with the uncanny valley. The uncanny valley describes the relation between how humanlike an artifact is and someone's emotional response to it. The emotional response increases as human likeness increases up to a point where it suddenly plummets.

Uncanny Valley

Roger K. Moore of the University of Sheffield's Speech and Hearing Research Group cited materials from Mike Phillips during a 2006 IEEE workshop that found a similar pattern with conversational capability. This model compared the usability of a voice interactive solution with its flexibility in conversational dialogue. The closer the solutions get to humanlike conversational ability, the more usable they are until a certain point where the assistants cannot meet the user requirements. He calls this the “Habitability Gap”. Moore commented:

"There appears to be a non-linear relationship between flexibility and usability... As flexibility increases with advancing technology, so usability increases until users no longer know what they can and cannot say, at which point usability tumbles and interaction falls apart."

Habitability Gap

Voice UIs as a Feature vs Conversational Voice UIs

There are limitations with the Voice Assistant model, however there are tangible opportunities for Voice as a Feature in our Web and Mobile applications. If these opportunities are of interest to you, consider checking out our full white paper on “Voice UIs as a Feature vs Conversational Voice UIs”.

Download White Paper

Voice UIs as a Feature vs Conversational Voice UIs

Learn how Voice UI features are outperforming Voice Assistants.

We care about the protection of your data. Privacy Policy.

Cover photo by Sigmund on Unsplash

Latest blog posts

case study

Combating Voice Chat Toxicity in VR Games: Speechly and Gym Class

Gym Class VR is a basketball game that was preparing to launch on Meta Quest after a very successful Beta. Voice chat is an important social element of the game, but the team noticed evidence of toxic behavior emerging. After trying speech recognition from cloud service providers, they quickly learned this was a cost-prohibitive approach and turned to Speechly.

Collin Borns

Mar 20, 2023

5 min read

voice tech

The Dirty Dozen - The Impact of 12 Types of Toxic Behavior in Online Game Voice Chat

Speechly surveyed over 1000 online gamers about toxic behavior in voice and text chat. The results show offensive names, trolling, bullying and annoying behavior top the list with the broadest impact. However, these behaviors are between 50%-200% more frequent in voice chat.

Collin Borns

Mar 09, 2023

3 min read

voice tech

Voice Chat is Popular with Gamers - It's also the Top Source of Toxic Behavior - New Report

Speechly commissioned a survey of a nationally representative sample of over 1000 gamers. The survey found that nearly 70% of gamers have used voice chat at least once. Of those, 72% said they've experienced a toxic incident. Read more today in the Full Report.

Otto Söderlund

Mar 08, 2023

3 min read