Speechly is a tool for enhancing touch user interfaces with a voice modality. In addition to touching and clicking, the end user can use the most natural way for interacting with the application – voice.
This blog post is about why you should use Speechly React client for building your next multi modal user experience. If you already knew it, you can jump directly to our Get started with React Client tutorial and start developing.
So far developers and designers have been limited in the services they can use for voice user interfaces and only simple question-answer based interactions have been feasible. On the other hand, while touch screens are great, there are a lot of use cases that could be improved. Something as common as filling a form or changing search filters requires quite a lot of tapping and swiping.
We don't believe in smart speakers and voice assistants and have worked hard to improve both the current touch screen experience and the current voice experience with our technology.
Problems with current touch screen experience
React is a great library for building user interfaces and it makes building UI components fast and easy. However, while scrolling and tapping is fun and even addictive, there are tasks that could be completed a lot easier by voice. Some common use tasks that could use a better UX include:
A typical form has a few text input fields, possibly with autocomplete, one or two dropdowns, maybe a multiselect and a few radio buttons and a date picker. To fill them all, the user needs to tap to select the input field, type something, select something and move to the next field. And because there are multiple ways how each of these fields can be implemented, some amount of confusion is a given.
Search is a typical feature in almost all applications. A good search can double the conversion rate on an ecommerce site. But the more filters you add, the more cluttered the UI gets and it's harder for the user to find the filters they are looking for.
User can only touch what they can see
As a touch interface requires the user to touch something, the buttons they need to touch have to be on their view. This is a familiar issue for all designers, as they need to think of ways how the user can interact with all the great features they have added to their application. However, there's only so much screen real estate available, so the buttons need to either be very small or there needs to be some kind of a nested menu, where not-so-often used buttons are hidden.
Problems with current voice user interfaces
Current voice solutions are built for turn-based voice assistant experiences. The end users' speech input is processed after they stop speaking and answer is usually given by speech output. This works well for some use cases, but it can't be used for enhancing current applications. Speechly is built from ground up for multimodal touch screen experience.
Recovery from problems is cumbersome
The user says something like "Show me flights from New York to London, departing tomorrow", the machine waits for a while and shows flights from New Jersey to Longmore. With our React client, the user interface updates in real time and it's easy to see whether the system makes any mistakes and fix them either by using voice or by using touch.
Ambigous endpointing and failing wake word detection leads to extra latency
If you have ever used a voice assistant, you'll know that a lot of the queries start by "Hey provider, turn on... Hey provider, turn on the... HEY PROVIDER, TURN ON THE LIGHTS". And once you stop speaking, it takes a random time for the lights to really turn off. With Speechly, the feedback is instantenous and it's always clear when the service is listening.
Voice is a slow output channel
While most of us speak faster than we can write (especially on touch screen device), we read faster than listen. And if you don't hear a small detail in the middle of the sentence, it's hard to go back. For most tasks, it's better to see the result rather than hear it. That's why Speechly has been built to be multi modal from the ground up, meaning it supports all modalities: touch, vision and voice simultaneously.
Speechly React client
We've released a React client that helps developers and designers solve these issues when building React apps. You can find the source code on GitHub and the package on NPM. We've also published a short tutorial to get your started with it, so go ahead and check it out!
When building the client, we've tried to make it easy to use with modern React concepts like Context and Hooks, which should make it easy to integrate to your React app. But if you're not interested in using functional components and Hooks, it wouldn't be more difficult - you can still use regular Context consumer approach!
Developing on Speechly requires you to first create your voice UI configuration by using our web Dashboard or our command line tool. The configuration is done by providing example utterances that your end users are using to interact with the application. After you have configured the application, you can try it out in our Playground and finally integrate it with your application.
Once you have verified that you get the correct intents and entities for your utterances, include the React client to your application. You can use our React tutorial to get started.