voice tech

UI Components for Voice UIs in the Web

Ari Nykänen

Oct 06, 2021

3 min read

Ready-made UI components make development of Voice UIs faster.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Voice User Interfaces (Voice UIs) often refer to UIs that use voice both for user input and output. Voice UIs are typically built to enable a more efficient user experience. However, we frequently run into problems with voice-only UIs that result in confusion and frustration for users.

At Speechly, we believe that many of the problems that exist with Voice UIs today can be mitigated or completely eliminated by adopting a multi-modal design philosophy. This means leveraging all the available modalities (voice, visual, touch) of the user's context to make the user interaction as easy and smooth as possible. One of the most fascinating platforms for multi-modal Voice UIs is the web, but if you look for design patterns for adding voice features to web applications, you will quickly realize a lack of quality resources.

To make designing and developing Voice UIs on the web easier, we are excited to release some of our research on this topic as a set of ready made UI components. These components can be used to give visual cues to users that the Voice UI is working as expected.

4 UI Components for Voice UIs

  • Push-To-Talk Button is a holdable switch for controlling the Voice User Interface.
  • Big Transcript is an overlay-style component that displays the real-time speech-to-text transcript and feedback to the user.
  • Transcript Drawer is an alternative for Big Transcript that slides down from the top of the viewport. It displays usage tips along with the real-time speech-to-text transcript and feedback.
  • Intro Popup is an overlay-style popup that is automatically displayed when the user first interacts with Push-To-Talk Button. It displays a customizable introduction text that briefly explains voice features microphone permissions are needed for. Intro Popup also automatically appears to help recover from a common problems.

Our Multi-Modal Design Philosophy helps design better voice-enabled user interfaces

We believe most of the problems that face Voice UIs can be overcome with a multi-modal design philosophy. Below is the multi-modal design philosophy we embody at Speechly. This Design Philosophy Guide should be used as a complimentary resource with the UI components above when designing or developing a Voice UI.

Chapter 1: Setting the right context

  • Resist the temptation to build an assistant.
  • Design the interactions around command & control, not conversation
  • Give visual guidance on what the user can say
  • Use voice ONLY for the tasks it is good for

Chapter 2: Receiving commands from the user

  • Onboard the user
  • When a pressable button is available wake word is not needed
  • Prefer a push-to-talk button mechanism
  • Signal clearly when the microphone button is pushed down

Chapter 3: Giving feedback to the user

Chapter 4: Recovering from mistakes

  • Show the text transcript in real time
  • Enable corrections both verbally and by using touch
  • Offer an alternative way to complete the task using touch

Free Voice UI Components for Download

You can find more information about these UI components inside our documentation. If you would like to access the Speechly UI component design files, they are now available in Figma and Sketch for download.

If you have any questions on how to best take advantage of our Voice UI components, please feel free to reach out to the team at

Latest blog posts

use cases

ADL Report: Voice Chat Remains a Top Channel for Online Harassment

The annual ADL report about harassment in multiplayer video games showed a significant problem worsening. Voice Chat is once again a leading channel for concern.

Collin Borns

Jan 27, 2023

3 min read

use cases

ADL Report: Online Harassment In Games is Bad and Getting Worse

ADL's annual report about harassment in online multiplayer games paints a negative picture for young people and adults alike. Is 2023 the year the gaming industry will start to overcome these challenges?

Collin Borns

Jan 18, 2023

2 min read

use cases

The Hidden Power of Full-Duplex AI for Voice Assistants and Voice Chat Moderation

The most popular voice assistants (Alexa, Siri, Google) use half-duplex architectures, meaning the user and assistant must take turns to speak. However, Full-duplex systems employ real-time understanding where the system begins predicting the user intent from the very first word uttered, unlocking the ability for Proactive Content Moderation.

Hannes Heikinheimo

Dec 09, 2022

8 min read