voice tech

Why are Smart Speakers Not the Future of Voice?

Ottomatias Peura

Oct 23, 2019

3 min read

While smart speakers can handle simple question-answer -based queries just fine, they are not the future of voice. What is?

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Voice is not "getting big", it's already huge. 20% of searches on mobile are already made by voice and googlehomes and alexas are slowly creeping into almost everyone's living rooms. Already over 50 million Americans own a dedicated smart speaker and a lot more are using a voice assistant built into their mobile phone. In fact, when you think of voice user interfaces you are probably thinking of smart speakers and voice assistants.

Voice assistants can be pretty cool. They can answer basic questions and perform simple tasks such as switching lights on and off or starting a robot vacuum just alright. For the end user, voice assistants often just work. When they don't, however, the experience can be pretty confusing. Thankfully most often the queries do work and we get our answer. And remember, these devices have existed only for about five years!

Instead of or in addition to this blog post, you can also watch this short video by our CEO Otto Söderlund.

Voice assistants for developers

However, from the perspective of someone who's going to build services on top of these assistants, that's not so simple. While Alexa Skill Store can boast with its more than 100.000 voice apps, most of these apps are not in fact used. Skill discovery is hard and users are not accustomed to looking for apps for smart speakers.

When you develop services on top of third party smart speaker platform you are also asking your customers to repeat someone else's brand words everytime before interacting with your app. That means you are putting your services and brand within someone else's ecosystem where your brand gets easily lost among competitors and competing services.

Besides you are giving away your data for someone else to monetize. And the major tech giants who are running the voice assistant show love to monetize others' data. There might be privacy issues, too.

Other crucial issue related to data is that while you might get some piece of data back from the voice assistant provider you don't get data for unsuccesful queries. Say you have a skill for buying flights. If the assistant hears the user request for flights as "lights", tough luck. How often does that happen for your users? You'll never know.

With voice UIs, these unsuccesful queries are often more important than the succesful ones. If the speaker interpreted the user asking for a flight to London and the machine answered with a flight to London, there's not that much to learn from this interaction. It's the failures the developers should be focusing in, but that's not possible with current solutions.

The APIs are also restrictive. It's possible to deeplink to your app and present custom content in the assistant app but leveraging all interaction modalities (touch, vision, voice) is not possible. Developers can't extend their current apps features with voice, it's only possible to bring some of their features in to a third party ecosystem.

Speechly used in a mobile application

Custom voice user interfaces

At Speechly, we envision a different kind of future for voice. We think voice should be a feature, not an application. We think developers should be able to add amazing voice features in their existing apps and branded experiences and not rely on third party platforms. We think application developers should be able to keep users within their own ecosystem and keep full control of their own data.

Think of touch. When iPhone started, touch was available in all the apps. It was not that all the other apps had to use keypad and only a few Apple-approved things would work in touch mode. No. When App Store came to be, touch was available for all developers. We want the same for voice.

That's why we are building Speechly. Our tool combines the mechanics of speech recognition and natural language understanding for a natural real-time spoken language undestanding API. Speechly is available for mobile applications, games and web applicatios. It can be used for wide variety of use cases from controling virtual reality environments or games to ecommerce and mobile applications.

Current smart speakers transform audio first to text and from text to intent.

Speechly combines speech recognition and natural language processing for faster operation.

App developers can have full access to user data and can train their models for certain acoustic environments and vocabulary. This ascertains fast, well-working and natural voice user interfaces. These would not be possible with current solutions.

For example in grocery ecommerce we have already proven that our technology allows retailers to create shopping experiences that are five to ten times faster than current solutions based on online stores or smart speaker solutions.

If you are interested in building something with Speechly, contact us at hello [at] or apply for an application ID from our front page!

Latest blog posts

case study

Combating Voice Chat Toxicity in VR Games: Speechly and Gym Class

Gym Class VR is a basketball game that was preparing to launch on Meta Quest after a very successful Beta. Voice chat is an important social element of the game, but the team noticed evidence of toxic behavior emerging. After trying speech recognition from cloud service providers, they quickly learned this was a cost-prohibitive approach and turned to Speechly.

Collin Borns

Mar 20, 2023

5 min read

voice tech

The Dirty Dozen - The Impact of 12 Types of Toxic Behavior in Online Game Voice Chat

Speechly surveyed over 1000 online gamers about toxic behavior in voice and text chat. The results show offensive names, trolling, bullying and annoying behavior top the list with the broadest impact. However, these behaviors are between 50%-200% more frequent in voice chat.

Collin Borns

Mar 09, 2023

3 min read

voice tech

Voice Chat is Popular with Gamers - It's also the Top Source of Toxic Behavior - New Report

Speechly commissioned a survey of a nationally representative sample of over 1000 gamers. The survey found that nearly 70% of gamers have used voice chat at least once. Of those, 72% said they've experienced a toxic incident. Read more today in the Full Report.

Otto Söderlund

Mar 08, 2023

3 min read