voice tech

When to Run Speech to Text On-Device or On-Premise vs in the Cloud

Antti Ukkonen

Sep 06, 2022

4 min read

When deciding to deploy Speech to Text technology On-Device vs in the Cloud you should consider Cost, Speed, & Privacy.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Speech to Text technology can be deployed in various ways, such as in the Cloud, On-Device, or On-Premise (Server or Private Cloud). However, there are various Pros and Cons in how you deploy that can affect the Cost, Speed, and Privacy of the experience you build. In this post, we will cover the differences between Cloud, On-Device, and On-Premise Speech to Text deployment and scenarios where you should consider ditching the Cloud for an On-Device or On-Premise deployment.

Speech to Text: On-Device vs On-Premise vs Cloud

Whether you are running Speech to Text On-Device, On-Premise, or in the Cloud the core outcome remains the same. Speech to Text enables developers to convert audio to text for various use cases, such as Transcription for Video Calls or Moderation for Video Game chats. However, there are many more use cases for Speech to Text.

Speech to Text can be deployed in multiple ways. The most common way that Speech to Text is deployed is in the Cloud. This simply means that audio is converted into text using the help of a cloud provider such as Google or Amazon, where the audio is captured on a users device, sent to the cloud for transcription and instruction from the developer on what to do with the transcription, before being sent back to the users device.

Other ways of deploying Speech to Text include On-Device or On-Premise. This simply means that Transcription is taking place directly on the user's device running the application or within a company's private server stack or private cloud. While the use cases for On-Device or On-Premise Speech to Text are similar in nature, meaning at the core there is still the conversion of audio into text, deploying in this fashion comes with some additional benefits to consider.

Learn more about running Speech to Text On-Device or On-Premise with Cloud-grade performance

When to run Speech to Text On-Device or On-Premise

Running Speech to Text On-Device or On-Premise has 3 main benefits: Cost, Speed, & Privacy.


Most Speech to Text or Speech Recognition solutions are Cloud based products. However, running Speech to Text in the Cloud requires sending large amounts of audio over the internet to be processed. For use cases where there is a lot of audio to be transcribed, like in a Video Call or Stream, the cost can climb fast making Speech to Text an unviable feature. With the ability to run Speech to Text directly on the user's device or On-Premise, the cost can come down by up to 10x depending on the provider.


Another key pitfall with many cloud based Speech Recognition providers is the inability to deliver real time Speech to Text. Even with the current speed of sending information back and forth between the cloud, there is still a noticeable lag in speed for the majority of Speech to Text products that greatly disrupts the end user experience. Running Speech to Text On-Device or On-Premise also is a great way to increase the speed of the transcription since the transcription process is never required to leave the end user or companies product ecosystem.


The final, but arguably most important reason to run Speech to Text On-Device or On-Premise is Privacy. We live in a world where consumers' attention to privacy is at an all time high. Even the concept of technology listening to complete tasks like transcription can make people uncomfortable.

Running On-Device or On-Premise allows companies to build experiences that leverage Speech to Text while giving users confidence that their valuable Voice Data is remaining private, either by never leaving their device or by remaining secure with the company delivering the experience.

Speech to Text Accuracy: On-Device vs On-Premise vs Cloud

Speech to Text technology is powered by large Machine Learning models which historically has made it difficult to deliver the same accuracy in On-Device or On-Premise experiences vs in the Cloud. Until recently, running Speech to Text anywhere but in the Cloud meant a significant drop in accuracy performance as this environment usually required running smaller and less sophisticated Speech Recognition models.

However, at Speechly the Speech to Text models used by the On-Device and On-Premise solution are the same as the ones used in our Cloud Based offering. This means you can get 95%+ accuracy with Speech to Text Transcription in the Cloud, On-Device, or On-Premise.

Building On-Device & On-Premise Speech to Text

There are still use cases for Speech to Text technology where a cloud based deployment makes sense. These scenarios are not limited to, but usually will have the characteristic of Lower Overall Voice Data volume. This simply means that there is a small amount of information to be transcribed at any given time - such as giving simple Voice Search inputs to a website.

When it comes to high volume scenarios, such as Transcribing a Video Call or Moderating a Voice Chat in an online game, deploying Speech to Text either On-Device or On-Premise can bring you Cost, Speed, and Privacy benefits. It is important to keep these factors in mind when finding a Speech to Text technology partner.

Contact our Product Team if you would like to learn more about running Speech to Text On-Device or On-Premise

Photo by Juairia Islam Shefa on Unsplash

About Speechly

Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.

Latest blog posts

company news

Speechly is joining Roblox

Hannes Heikinheimo

Sep 19, 2023

1 min read

voice tech

4 Voice Chat Solutions for Virtual Reality

Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.

Matt Durgavich

Jul 06, 2023

5 min read

company news

Speechly Has Received SOC 2 Type II Certification

Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.

Markus Lång

Jun 01, 2023

1 min read