voice design

Designing Voice UIs

Ottomatias Peura

Sep 09, 2020

3 min read

Designing voice-first applications requires new approaches to UX and UI design. In this post, we'll go through some best practices for designing voice-driven user interfaces.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Back in the early times of digitalization, human-computer-interaction meant white (or green!) box blinking on a black screen. We have come far from that. First revolution came in the form of mouses on graphical user interfaces (GUI), then with touch and mobile phones. Now we are entering the era of voice user interfaces (VUI). What does it mean from a UI designer's point of view?

In this blog post, I'll share you some do's and don'ts you should consider when designing a voice user interface. You can find more tips for voice design in our guide.

1 Don't try to imitate human conversation

Conversational AI is a hot buzzword, but people most often don't want to conversate with their computers. Most if not all of us have better friends, even though we do spend more time with our computers and mobile phones than any of them.

Most often conversational AI refer to chatbots. Google defines chatbots as "a computer program designed to simulate conversation with human users, especially over the Internet". These are voice user interfaces for sure, but most often you don't need the bi-directional "talking" that these chatbots offer.

2 Update visual UI as the user speaks

Most voice user interfaces are based on a simple question-answer pattern. The user asks something, the system waits and then something happens. Often this something is voice assistant answering back in speech.

This is a problem, as voice is an interrupting channel. If you've ever had a conversation, you'll know that both parties should not speak at the same time. This leads into a noise where neither understands each other.

However, we humans can easily speak and digest visual information simulatenously. When designing voice UIs, this should be taken advantage of.

When the user says something, the user interface should continously update to reflect the user commands. For example, if the user says something like "Show me red t-shirts from Hugo Boss in size large", the UI can be updated three times: first showing t-shirts, then showing t-shirts from Hugo Boss and last showing large t-shirts from Hugo Boss.

This encourages the user to continue with the voice experience and enables the user to correct themselves naturally.

3 Show transcript

Transcript is the most important feedback that the user needs when using a voice UI. If the transcript is not shown, the user can't be sure whether they are being listened at all. And most importantly, without the transcript there's no way for the user to understand why they were not understood correctly, if that's the case.

Always show the transcript in the field of vision. When the user activates the microphone, the transcript should be clearly visible in a natural place, either close to the microphone or in the top of the screen.

4 Give visual clue on what the user can say

One big issue with voice assistants is that it's impossible for the user to know what they can and can't achieve with the device without trying.

If you ask the assistant to give an alert, that works great. They also know when Michael Jackson died. But does a Google smart speaker know how many steps you took yesterday on Google Fit? It seems no. There's no way a user can know it beforehand, because the smart speaker doesn't give any visual tips on what is possible.

In a real-world application one simple way where this works great is a form. Let's say you want to book a flight and you are presented with a form that has an input fields for from, to, date, class and some other information.

Now it's very clear for the user what they should say and what is the context in which they should be commanding the system.

About Speechly

Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.

Latest blog posts

voice tech

Voice Chat Toxicity in Games: What the Data Say

Speechly recently released the Voice Chat Toxicity Report for Online Games - a consumer survey of over 1000 online gamers on consumers perspective and sentiment towards toxicity on video games. In this interview, Otto Söderlund (Co-Founder and CEO at Speechly) and Bret Kinsella (Founder of sat down to dig into the key results.

Collin Borns

Mar 30, 2023

1 min read

voice tech

Four Perspectives on How to Combat Voice Chat Toxicity in Games: A Look Back at GDC 2023

The Game Developers Conference (GDC) was back in full force for 2023. This came with new product announcements, plenty of content to consume, and nearly 30k gaming industry professionals all gathered in San Francisco. This also gave me the opportunity to gather a wide ranging perspective on the persistent issue of toxicity in online games.

Otto Söderlund

Mar 28, 2023

4 min read

case study

Combating Voice Chat Toxicity in VR Games: Speechly and Gym Class

Gym Class VR is a basketball game that was preparing to launch on Meta Quest after a very successful Beta. Voice chat is an important social element of the game, but the team noticed evidence of toxic behavior emerging. After trying speech recognition from cloud service providers, they quickly learned this was a cost-prohibitive approach and turned to Speechly.

Collin Borns

Mar 20, 2023

5 min read