Blog

company news

What is Speechly?

Ottomatias Peura

Nov 03, 2019

5 min read

Speechly is a company, but also a developer tool that can be used to create real-time voice user interfaces to any application on any platform. Speechly is fast, smart and easy to deploy for developers and its models are easy to train and modify to specific use cases.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

What is Speechly? That's one of the keywords people use to find our website and no wonder why. It's not very easy to define something like us, but let's try.

Speechly as a company

First of all, Speechly is a Helsinki, Finland-based startup that as of writing this post has about a dozen skillful employees and co-founders and a brand new office in central Helsinki. Speechly was founded in 2017 by skillful machine learning specialists. They had been building voice assistants such as Siri and Alexa for major tech companies and realized that the technology could be developed into something a lot better than current smart speakers.

As a company, we believe in small super professional teams who are fully capable of organizing their work independently. We believe that a small team of motivated and smart people working on solving a big problem can achieve miracles and challenge the incumbents.

We believe that modern professionals know how to best plan their work in terms of where to work, what tools to use etc. We also believe that our open and fair culture allows people to develop themselves and progress their career fueled by our fast growth, challenging work assignments and super-smart co-workers. We believe that everyone should laugh every day at work as work should always be fun and motivating.

Working at Speechly is fun, rewarding and challenging. We are constantly looking for skillful speech scientists, senior software developers, and another workforce. See our careers-page for job openings.

Speechly as a technology

Speechly is also a technology. Speechly is the tool our founders started to build in 2017. It's a set of tools and APIs that enable developers to build voice-enabled user interfaces to their apps and services on any platform. It can be used in [eCommerce]/(https://www.youtube.com/watch?v=MUAwpSDDd_Q), gaming, digital health, VR, point-of-sale terminals and more. We have ready SDKs for iOS, Android, Unreal Engine and web. We call our tool a spoken language understanding tool.

With our technology, developers can add voice functionalities to any app. You want to use voice in adding products to the shopping cart? Speechly’s got it. You want to use voice to control cameras or change strategies in a sports game? Speechly’s got it. You want to create a check-in self-service desk where customers can use to voice to register in? Speechly’s got it. You need voice search? We've got it. And it can be used to power a lot more complex voice tasks.

One of the most unique ideas in our technology is that it combines speech recognition and natural language understanding technologies. Most other providers do these tasks separately. This allows our tool to be faster (almost real-time!) and more accurate than other providers. What does it mean?

When the user starts talking, most other providers start listening to the audio stream and translate the stream into speech. Many of them do this pretty well and after the user stops talking, they give out a text string that represents the sentence the user just said. This is called speech recognition.

After we have the sentence we need to make sense of it. This step is called natural language understanding (NLU). Let's say we have two sentences "turn off the lamp" and "shut the lights". Even though they look different as text and if said out loud sound very different, they mean the same.

Current smart speakers transform audio first to text and from text to intent.

Traditional providers work like in the picture. They first listen to the audio stream and once that finishes, they send the data to another service that extracts meaning out of it. After we have the meaning, we know what the user wants to do and can fulfill the request.

This sounds simple, but it's not how we humans talk. We don't wait for others to say what they want to say and only after that think what the other was trying to say. We listen and understand simultaneously.

If we hear someone saying "Olive, the other reindeer", we might guess that we misheard and what we were supposed to hear is actually "All of the other reindeers" (at least if we were having a discussion on Christmas carols or Rudolph the Red-Nosed reindeer). Understanding and context make hearing easier. We also give feedback to others about our hearing and understanding as we speak: we nod, say "a-ha" or look confused based on what we have just heard (or rather what we think we heard).

Our technology does something similar. When the user starts talking, Speechly starts evaluating the intent of the user. When Speechly hears "Turn off --" it can already guess that the user intent is about shutting off something, maybe a light or other device somewhere. When the user finishes "-- living room TV", Speechly has all the information it needs and can start proceeding with the actual task of shutting it down.

Speechly combines the mechanics of speech recognition and natural language understanding for faster and more natural user experience

Another feature of Speechly is multi-modality. By multi-modality we mean that users should be able to interact with their apps using different modalities in different situations. Say you want to order a pizza. You don't want to hear a long list of pizzas they have on their menu but rather see a list of pizzas and select the one that looks the best. But when it's the time to give your address for delivery, it's easier to say it out loud than to type it on the clumsy mobile phone screen. Multi-modality is the option to choose the best interaction type for each task. Our technology is built for multi-modality from ground up.

Now our technology is getting ready for prime time. We have already done some amazing projects with companies such as with virtual reality company ZOAN or a major Nordic grocery retailer SOK. We have proven that the technology is working and now we are packaging the tool and documentation so that any developer can start using it.

Our public documentation will be launched in late 2019 or early 2020. You can already apply for a developer ID by sending mail to hello@speechly.com or filling the form on our front page. And if you still don't know what Speechly is or what it does, please send us a mail, too.

Ps. If you are coming to Slush, find us from the matchmaking tool and come and say hello!

Latest blog posts

company news

Speechly Introduces a Solution to the Voice Chat Moderation Gap at VOICE 2022

Voice chat is a popular feature in games, the metaverse, and social media networks but it comes with challenges like harassment and toxic behavior. This post breaks down our keynote at VOICE 22 exploring how Speechly helps solve these issues.

Collin Borns

Nov 17, 2022

1 min read

voice tech

3 Vectors of Voice Chat Moderation

Voice chat is very popular with both users and the creators of games, social media platforms, and metaverse spaces. However, the introduction of voice chat comes with the risk of harassment for users.

Otto Söderlund

Nov 14, 2022

6 min read

voice tech

Why Games Need Better Voice Chat Moderation

Major gaming studios like Riot Games, Roblox and Sony are recording voice chats for moderation, but the tools for content moderation today typically suffer from low accuracy, high cost, and high latency. A new technical approach is needed to fill the voice chat moderation gap.

Otto Söderlund

Oct 24, 2022

6 min read