Author profile picture
By Ottomatias Peura
Calendar iconOctober 23, 2019

Voice is not “getting big”, it’s already huge. 20% of searches on mobile are already made by voice and googlehomes and alexas are slowly creeping into almost everyone’s living rooms. Already over 50 million Americans own a dedicated smart speaker and a lot more are using a voice assistant built into their mobile phone. In fact, when you think of voice user interfaces you are probably thinking of smart speakers and voice assistants.

Voice assistants can be pretty cool. They can answer basic questions and perform simple tasks such as switching lights on and off or starting a robot vacuum just alright. For the end user, voice assistants often just work. When they don’t, however, the experience can be pretty confusing. Thankfully most often the queries do work and we get our answer. And remember, these devices have existed only for about five years!

Instead of or in addition to this blog post, you can also watch this short video by our CEO Otto Söderlund.

Voice assistants for developers

However, from the perspective of someone who’s going to build services on top of these assistants, that’s not so simple. While Alexa Skill Store can boast with its more than 100.000 voice apps, most of these apps are not in fact used. Skill discovery is hard and users are not accustomed to looking for apps for smart speakers.

When you develop services on top of third party smart speaker platform you are also asking your customers to repeat someone else’s brand words everytime before interacting with your app. That means you are putting your services and brand within someone else’s ecosystem where your brand gets easily lost among competitors and competing services.

Besides you are giving away your data for someone else to monetize. And the major tech giants who are running the voice assistant show love to monetize others’ data. There might be privacy issues, too.

Other crucial issue related to data is that while you might get some piece of data back from the voice assistant provider you don’t get data for unsuccesful queries. Say you have a skill for buying flights. If the assistant hears the user request for flights as “lights”, tough luck. How often does that happen for your users? You’ll never know.

With voice UIs, these unsuccesful queries are often more important than the succesful ones. If the speaker interpreted the user asking for a flight to London and the machine answered with a flight to London, there’s not that much to learn from this interaction. It’s the failures the developers should be focusing in, but that’s not possible with current solutions.

The APIs are also restrictive. It’s possible to deeplink to your app and present custom content in the assistant app but leveraging all interaction modalities (touch, vision, voice) is not possible. Developers can’t extend their current apps features with voice, it’s only possible to bring some of their features in to a third party ecosystem.

speechly-product-sok.jpg
Speechly used in a mobile application

Custom voice user interfaces

At Speechly, we envision a different kind of future for voice. We think voice should be a feature, not an application. We think developers should be able to add amazing voice features in their existing apps and branded experiences and not rely on third party platforms. We think application developers should be able to keep users within their own ecosystem and keep full control of their own data.

Think of touch. When iPhone started, touch was available in all the apps. It was not that all the other apps had to use keypad and only a few Apple-approved things would work in touch mode. No. When App Store came to be, touch was available for all developers. We want the same for voice.

That’s why we are building Speechly. Our tool combines the mechanics of speech recognition and natural language understanding for a natural real-time spoken language undestanding API. Speechly is available for mobile applications, games and web applicatios. It can be used for wide variety of use cases from controling virtual reality environments or games to ecommerce and mobile applications.

speech-to-intent-old-systems.png
Current smart speakers transform audio first to text and from text to intent.

speech-to-intent-speechly.png
Speechly combines speech recognition and natural language processing for faster operation.

App developers can have full access to user data and can train their models for certain acoustic environments and vocabulary. This ascertains fast, well-working and natural voice user interfaces. These would not be possible with current solutions.

For example in grocery ecommerce we have already proven that our technology allows retailers to create shopping experiences that are five to ten times faster than current solutions based on online stores or smart speaker solutions.

If you are interested in building something with Speechly, contact us at hello [at] speechly.com or apply for an application ID from our front page!