voice design

Designing Voice UIs

Ottomatias Peura

Sep 09, 2020

3 min read

Designing voice-first applications requires new approaches to UX and UI design. In this post, we'll go through some best practices for designing voice-driven user interfaces.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Back in the early times of digitalization, human-computer-interaction meant white (or green!) box blinking on a black screen. We have come far from that. First revolution came in the form of mouses on graphical user interfaces (GUI), then with touch and mobile phones. Now we are entering the era of voice user interfaces (VUI). What does it mean from a UI designer's point of view?

In this blog post, I'll share you some do's and don'ts you should consider when designing a voice user interface. You can find more tips for voice design in our guide.

1 Don't try to imitate human conversation

Conversational AI is a hot buzzword, but people most often don't want to conversate with their computers. Most if not all of us have better friends, even though we do spend more time with our computers and mobile phones than any of them.

Most often conversational AI refer to chatbots. Google defines chatbots as "a computer program designed to simulate conversation with human users, especially over the Internet". These are voice user interfaces for sure, but most often you don't need the bi-directional "talking" that these chatbots offer.

2 Update visual UI as the user speaks

Most voice user interfaces are based on a simple question-answer pattern. The user asks something, the system waits and then something happens. Often this something is voice assistant answering back in speech.

This is a problem, as voice is an interrupting channel. If you've ever had a conversation, you'll know that both parties should not speak at the same time. This leads into a noise where neither understands each other.

However, we humans can easily speak and digest visual information simulatenously. When designing voice UIs, this should be taken advantage of.

When the user says something, the user interface should continously update to reflect the user commands. For example, if the user says something like "Show me red t-shirts from Hugo Boss in size large", the UI can be updated three times: first showing t-shirts, then showing t-shirts from Hugo Boss and last showing large t-shirts from Hugo Boss.

This encourages the user to continue with the voice experience and enables the user to correct themselves naturally.

3 Show transcript

Transcript is the most important feedback that the user needs when using a voice UI. If the transcript is not shown, the user can't be sure whether they are being listened at all. And most importantly, without the transcript there's no way for the user to understand why they were not understood correctly, if that's the case.

Always show the transcript in the field of vision. When the user activates the microphone, the transcript should be clearly visible in a natural place, either close to the microphone or in the top of the screen.

4 Give visual clue on what the user can say

One big issue with voice assistants is that it's impossible for the user to know what they can and can't achieve with the device without trying.

If you ask the assistant to give an alert, that works great. They also know when Michael Jackson died. But does a Google smart speaker know how many steps you took yesterday on Google Fit? It seems no. There's no way a user can know it beforehand, because the smart speaker doesn't give any visual tips on what is possible.

In a real-world application one simple way where this works great is a form. Let's say you want to book a flight and you are presented with a form that has an input fields for from, to, date, class and some other information.

Now it's very clear for the user what they should say and what is the context in which they should be commanding the system.

Latest blog posts

company news

New Feature Release: Batch API for Transcribing Pre-Recorded Audio

Today we are excited to announce the Speechly Batch API for Transcribing massive amounts of pre-recorded audio or video content.

Collin Borns

Sep 27, 2022

2 min read

voice tech

3 Common Voice Chat Moderation Mistakes

Voice chat has become an essential feature in many games and social media platforms making Moderation a critical thing to get right.

Otto Söderlund

Sep 19, 2022

5 min read

use cases

Online Harassment Statistics that Matter for 2022

Online harassment is as old as the internet. However, where it was once rare and infrequent, it is now increasingly common. The data all points in one direction and is compiled here.

Collin Borns

Sep 12, 2022

5 min read