What is Speechly? That’s one of the keywords people use to find our website and no wonder why. It’s not very easy to define something like us, but let’s try.
Speechly as a company
First of all, Speechly is a Helsinki, Finland-based startup that as of writing this post has about a dozen skillful employees and co-founders and a brand new office in central Helsinki. Speechly was founded in 2017 by skillful machine learning specialists. They had been building voice assistants such as Siri and Alexa for major tech companies and realized that the technology could be developed into something a lot better than current smart speakers.
As a company, we believe in small super professional teams who are fully capable of organizing their work independently. We believe that a small team of motivated and smart people working on solving a big problem can achieve miracles and challenge the incumbents. We believe that modern professionals know how to best plan their work in terms of where to work, what tools to use etc. We also believe that our open and fair culture allows people to develop themselves and progress their career fueled by our fast growth, challenging work assignments and super-smart co-workers. We believe that everyone should laugh every day at work as work should always be fun and motivating.
Working at Speechly is fun, rewarding and challenging. We are constantly looking for skillful speech scientists, senior software developers, and another workforce. See our careers-page for job openings.
Speechly as a technology
Speechly is also a technology. Speechly is the tool our founders started to build in 2017. It’s a set of tools and APIs that enable developers to build voice-enabled user interfaces to their apps and services on any platform. It can be used in eCommerce, gaming, digital health, VR, point-of-sale terminals and more. We have ready SDKs for iOS, Android, Unreal Engine and web. We call our tool a spoken language understanding tool.
With our technology, developers can add voice functionalities to any app. You want to use voice in adding products to the shopping cart? Speechly’s got it. You want to use voice to control cameras or change strategies in a sports game? Speechly’s got it. You want to create a check-in self-service desk where customers can use to voice to register in? Speechly’s got it. And it can be used to power a lot more complex voice tasks.
One of the most unique ideas in our technology is that it combines speech recognition and natural language understanding technologies. Most other providers do these tasks separately. This allows our tool to be faster (almost real-time!) and more accurate than other providers. What does it mean?
When the user starts talking, most other providers start listening to the audio stream and translate the stream into speech. Many of them do this pretty well and after the user stops talking, they give out a text string that represents the sentence the user just said. This is called speech recognition.
After we have the sentence we need to make sense of it. This step is called natural language understanding (NLU). Let’s say we have two sentences “turn off the lamp” and “shut the lights”. Even though they look different as text and if said out loud sound very different, they mean the same.
Traditional providers work like in the picture. They first listen to the audio stream and once that finishes, they send the data to another service that extracts meaning out of it. After we have the meaning, we know what the user wants to do and can fulfill the request.
This sounds simple, but it’s not how we humans talk. We don’t wait for others to say what they want to say and only after that think what the other was trying to say. We listen and understand simultaneously.
If we hear someone saying “Olive, the other reindeer”, we might guess that we misheard and what we were supposed to hear is actually “All of the other reindeers” (at least if we were having a discussion on Christmas carols or Rudolph the Red-Nosed reindeer). Understanding and context make hearing easier. We also give feedback to others about our hearing and understanding as we speak: we nod, say “a-ha” or look confused based on what we have just heard (or rather what we think we heard).
Our technology does something similar. When the user starts talking, Speechly starts evaluating the intent of the user. When Speechly hears “Turn off –” it can already guess that the user intent is about shutting off something, maybe a light or other device somewhere. When the user finishes “– living room TV”, Speechly has all the information it needs and can start proceeding with the actual task of shutting it down.
Another feature of Speechly is multi-modality. By multi-modality we mean that users should be able to interact with their apps using different modalities in different situations. Say you want to order a pizza. You don’t want to hear a long list of pizzas they have on their menu but rather see a list of pizzas and select the one that looks the best. But when it’s the time to give your address for delivery, it’s easier to say it out loud than to type it on the clumsy mobile phone screen. Multi-modality is the option to choose the best interaction type for each task. Our technology is built for multi-modality from ground up.
Now our technology is getting ready for prime time. We have already done some amazing projects with companies such as with virtual reality company ZOAN or a major Nordic grocery retailer SOK. We have proven that the technology is working and now we are packaging the tool and documentation so that any developer can start using it.
Our public documentation will be launched in late 2019 or early 2020. You can already apply for a developer ID by sending mail to firstname.lastname@example.org or filling the form on our front page. And if you still don’t know what Speechly is or what it does, please send us a mail, too.
Ps. If you are coming to Slush, find us from the matchmaking tool and come and say hello!