Our team made some great progress in 2020 in enabling all developers to become voice developers.
Speechly has existed now for about for five years. We are a team of 13 experienced software developers and machine learning experts and for the most part of that five years, we’ve been operating in stealth mode, focusing on building our core technologies. Now it’s time to tell what we’ve achieved so far.
We are building a developer tool for improving touch screen user experience by voice functionalities. We don’t believe that smart speakers and voice assistants are the best use case for voice, but voice should be thought of as an add-on to current mobile applications’ and websites’ user interface. Voice is a modality, not a complete user interface.
Touch screen user interfaces definitely need improvements: while selecting from a few options is easy, selecting for example 30 items from an inventory of 20.000 is pretty cumbersome.
Typing is notoriously hard, too. Most humans speak about three times faster with less errors than they type. In short, voice is a great solution for information heavy tasks. While there are good solutions for speech recognition, there’s really no tools that would enable developers build the kind of user interfaces we’ve envisioned for voice.
2020 was our first year when we really published something out in the wild. We’ve built our technology for the past five years and now Speechly is finally in a stage that a developer can configure a model, integrate it to their application and build an awesome voice user interface. In this post, I’ll summarize our achievements.
1 Spoken Language Understanding accuracy matching Google
We run our own ASR and NLU technologies that provide both transcript and meaning (intents and entities) in real-time. During the 2020 we achieved significant increases in both ASR and NLU accuracy.
We evaluate the accuracy of our engine by transcribing the data we receive with both our own and with Google Cloud Speech API. Based on our results, our Spoken Language Understanding is in a typical voice user interface task 15% more accurate than Google.
Because ASR is a hard task, this is not to claim that our technology is better than Google in all cases. It means that when building voice user interfaces, Speechly outperforms Google in most cases, even without training the model separately for a certain use case.
In a real case, Speechly can further be optimized by using the actual user data for retraining the model. This improves the accuracy typically by another 10-15%.
2 Client libraries for most important web and mobile platforms
During the 2020, we’ve published three client libraries that make integrating Speechly to an application simple and fast. Handling GRPC API, real-time audio streaming and of course parsing the results is a cumbersome task and the client libraries take most of the workload off our developers.
Our browser-client can be used in all web applications in modern browsers and React client makes development on React framework even easier.
For the iOS, we released the iOS client and our Android client will be published very soon. After that, developers can easily build a unified voice user interface on all major platforms.
We have created a simple tutorial application for all of the client libraries for a gradual learning curve on all platforms.
3 Demos showcasing our technology
Speechly is a tool for building real-time voice functionalities that integrate seamlessly to existing touch or web user interfaces.
We don’t think smart speakers or "voice-only” solutions is the best way to use voice and rather advocate multimodality and real-time visual feedback.
4 Speechly Annotation Language features for configuring voice user interfaces
Our Speechly Annotation Language (SAL) is a syntax we use to annotate example utterances that are used to train our models. In 2020 we added many new features to SAL:
Canonical entities for easy handling of synonyms
Lookup tables for handling large inventories
With these features, developers and designers can create complex voice user interfaces with a minimal amount of example utterances. Because the same model can be used on all platforms, the user experience is unified.
5 Improved latency in our GRPC API
When iPhone nailed the user experience with the touch screen, one of the key features was the very responsive user interface that reacted immediately to user input. This is a key issue also for voice user interfaces.
We’ve improved our latency in 2020 significantly and now we can proudly say that our API is real-time with tail latency of under 200 milliseconds.
Low latency is the key to intuitive user experience in two ways: first, it enables user to correct themselves naturally by using voice and second, it encourages the user to go on with the voice experience.
Compare this to the traditional smart speaker user experience that first starts by uttering a wake word that sometimes fails. Once the wake word is recognized and user starts speaking, they’ll know whether they were understood only after they have stopped speaking and the system has processed the input. If the answer is wrong, the user needs to start again from the beginning.
6 Speechly Dashboard
In March we published the first version of the Speechly Dashboard, a web application for building and configuring Spoken Language Understanding models with the Speechly Annotation Language.
The Dashboard supports nearly all Speechly features and it’s the fastest way for getting up to speed with our technology. Hundreds of developers have already created their models and tried them out in the Speechly Playground.
We renewed our website to better position our product and hired many new developers and machine learning experts. Our founders have been interviewed in many industry leading podcasts and we were nominated as one of the Europe’s Hottest Startups.
If you want to work with us and build awesome developer tools for next-generation voice user interfaces, please check our careers page.
Overall, we are pretty happy with our 2020. We’ve now built a technology stack that enables efficient user interfaces that improve user experience significantly. In 2021 we focus on showing the world some cool examples of our technology.
Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.
Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.
Jun 01, 2023
1 min read
Countering Extremism in Online Games - New NYU Report
A recent NYU report exposes how extremist actors exploit online game communication features. In this blog we expand on NYU's data and recommendations for maintaining safety and security in online gaming communities.
May 30, 2023
4 min read
What You Can Learn from The Data in Xbox’s Transparency Report
The 2023 Xbox Transparency Report is (likely) around the corner. Our first blog broke down how the moderation process works at Xbox, but this blog will take a deep dive into the data from the inaugural report comparing Reactive vs Proactive moderation.