What is Speechly?

Ottomatias Peura

Nov 03, 2019

5 min read

Speechly is a company, but also a developer tool that can be used to create real-time voice user interfaces to any application on any platform. Speechly is fast, smart and easy to deploy for developers and its models are easy to train and modify to specific use cases.

Copy link
Mail
LinkedIn
Facebook
Twitter

What is Speechly? That's one of the keywords people use to find our website and no wonder why. It's not very easy to define something like us, but let's try.

Speechly as a company

First of all, Speechly is a Helsinki, Finland-based startup that as of writing this post has about a dozen skillful employees and co-founders and a brand new office in central Helsinki. Speechly was founded in 2017 by skillful machine learning specialists. They had been building voice assistants such as Siri and Alexa for major tech companies and realized that the technology could be developed into something a lot better than current smart speakers.

As a company, we believe in small super professional teams who are fully capable of organizing their work independently. We believe that a small team of motivated and smart people working on solving a big problem can achieve miracles and challenge the incumbents.

We believe that modern professionals know how to best plan their work in terms of where to work, what tools to use etc. We also believe that our open and fair culture allows people to develop themselves and progress their career fueled by our fast growth, challenging work assignments and super-smart co-workers. We believe that everyone should laugh every day at work as work should always be fun and motivating.

Working at Speechly is fun, rewarding and challenging. We are constantly looking for skillful speech scientists, senior software developers, and another workforce. See our careers-page for job openings.

Speechly as a technology

Speechly is also a technology. Speechly is the tool our founders started to build in 2017. It's a set of tools and APIs that enable developers to build voice-enabled user interfaces to their apps and services on any platform. It can be used in [eCommerce]/(https://www.youtube.com/watch?v=MUAwpSDDd_Q), gaming, digital health, VR, point-of-sale terminals and more. We have ready SDKs for iOS, Android, Unreal Engine and web. We call our tool a spoken language understanding tool.

With our technology, developers can add voice functionalities to any app. You want to use voice in adding products to the shopping cart? Speechly’s got it. You want to use voice to control cameras or change strategies in a sports game? Speechly’s got it. You want to create a check-in self-service desk where customers can use to voice to register in? Speechly’s got it. You need voice search? We've got it. And it can be used to power a lot more complex voice tasks.

One of the most unique ideas in our technology is that it combines speech recognition and natural language understanding technologies. Most other providers do these tasks separately. This allows our tool to be faster (almost real-time!) and more accurate than other providers. What does it mean?

When the user starts talking, most other providers start listening to the audio stream and translate the stream into speech. Many of them do this pretty well and after the user stops talking, they give out a text string that represents the sentence the user just said. This is called speech recognition.

After we have the sentence we need to make sense of it. This step is called natural language understanding (NLU). Let's say we have two sentences "turn off the lamp" and "shut the lights". Even though they look different as text and if said out loud sound very different, they mean the same.

Current smart speakers transform audio first to text and from text to intent.

Traditional providers work like in the picture. They first listen to the audio stream and once that finishes, they send the data to another service that extracts meaning out of it. After we have the meaning, we know what the user wants to do and can fulfill the request.

This sounds simple, but it's not how we humans talk. We don't wait for others to say what they want to say and only after that think what the other was trying to say. We listen and understand simultaneously.

If we hear someone saying "Olive, the other reindeer", we might guess that we misheard and what we were supposed to hear is actually "All of the other reindeers" (at least if we were having a discussion on Christmas carols or Rudolph the Red-Nosed reindeer). Understanding and context make hearing easier. We also give feedback to others about our hearing and understanding as we speak: we nod, say "a-ha" or look confused based on what we have just heard (or rather what we think we heard).

Our technology does something similar. When the user starts talking, Speechly starts evaluating the intent of the user. When Speechly hears "Turn off --" it can already guess that the user intent is about shutting off something, maybe a light or other device somewhere. When the user finishes "-- living room TV", Speechly has all the information it needs and can start proceeding with the actual task of shutting it down.

Speechly combines the mechanics of speech recognition and natural language understanding for faster and more natural user experience

Another feature of Speechly is multi-modality. By multi-modality we mean that users should be able to interact with their apps using different modalities in different situations. Say you want to order a pizza. You don't want to hear a long list of pizzas they have on their menu but rather see a list of pizzas and select the one that looks the best. But when it's the time to give your address for delivery, it's easier to say it out loud than to type it on the clumsy mobile phone screen. Multi-modality is the option to choose the best interaction type for each task. Our technology is built for multi-modality from ground up.

Now our technology is getting ready for prime time. We have already done some amazing projects with companies such as with virtual reality company ZOAN or a major Nordic grocery retailer SOK. We have proven that the technology is working and now we are packaging the tool and documentation so that any developer can start using it.

Our public documentation will be launched in late 2019 or early 2020. You can already apply for a developer ID by sending mail to hello@speechly.com or filling the form on our front page. And if you still don't know what Speechly is or what it does, please send us a mail, too.

Ps. If you are coming to Slush, find us from the matchmaking tool and come and say hello!

About Speechly

Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.

Latest blog posts

company news

Speechly is joining Roblox

Hannes Heikinheimo

Sep 19, 2023

1 min read

voice tech

4 Voice Chat Solutions for Virtual Reality

Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.

Matt Durgavich

Jul 06, 2023

5 min read

company news

Speechly Has Received SOC 2 Type II Certification

Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.

Markus Lång

Jun 01, 2023

1 min read