Combating Voice Chat Toxicity in VR Games

Gym Class VR is a basketball game, and voice chat is integral to the player experience. However, toxic behavior was undermining the fun.

Success Stories

Combating Voice Chat Toxicity in VR Games


4.9 User Rating

United States

  • 95% savings

    More cost efficient than cloud providers

  • 11% more accurate

    Higher true positives and lower false positives

  • Custom model

    Real-time monitoring on-device

Contact us

Measuring and Mitigating Toxic Behavior

Gym Class was preparing to launch on Meta Quest. In addition to fun game mechanics, it also has a voice chat feature that makes it a social experience. However, there was also some evidence of toxic behavior emerging in voice chat, and the company didn’t know whether the problem was widespread or mostly isolated incidents.

The team was also very serious about building a healthy social space as part of the game experience. If left unchecked, toxicity can shape the game’s community culture and leave some players feeling unwelcome and uncomfortable. To address this problem, Gym Class needed a way to measure the problem and put controls in place to weed out toxic behavior.

Gym Class had tried popular cloud service providers for speech recognition to see if it could establish a baseline measurement for toxic behavior. However, at a market price of $1 per hour for transcription, those solutions turned out to be cost-prohibitive. So, Gym Class began looking for a new solution.


The Proactive Imperative

Gym Class’ goals were pretty straightforward. The company first needed to measure the level of toxicity in the game. That understanding could then be used to decrease the amount of toxicity, improve player experience, and support a successful launch in the Meta Quest app store.

Most game makers today treat voice chat moderation strictly from a complaint-led model. That means they only are aware of toxicity that is reported. If Gym Class followed the complaint-led process, it would miss the vast majority of incidents. The company would need to take a proactive approach that involved monitoring voice chat sessions for toxicity.


Gym Class already knew it needed a highly accurate automated speech recognition (ASR) solution. Speech recognition and transcription accuracy are the first steps in any monitoring of natural language conversations. The company also wanted to ensure it correctly identified toxic incidents by taking context into account, so it didn’t miss cleverly disguised toxic behavior. And the context-based analysis was important to mitigate the likelihood of false positive events which arise when a benign statement is flagged as toxic.

Given that Gym Class has several unique aspects of its VR game mechanics and culture, it was going to need a custom AI model to drive high accuracy. It also became clear that the only economically feasible solution would be to run the monitoring on user devices as part of the downloaded app.

If you run transcription through a cloud provider, you are paying for all of the data processing. For any individual gamer utterance, it may not exorbitantly expensive, but the costs add up quickly for any game with a significant user base and frequent voice chat use. Cloud providers typically add $1 of cost for every player hour.

However, if you run the speech recognition locally on the user device, you only need to send messages to the game makers’ servers when an incident is detected. This turns out to be an order of magnitude less expensive than using a cloud provider. The approach also means proactive monitoring is suddenly economically feasible.

Recall %False positiveModel sizeCost / audio hour
Google69.9%0.2%N/AHighest cost
Azure75.5%0.3%N/AHighest cost
Whisper on-prem76.9%0.3%1400 MB70% lower
Speechly on-prem77.9%0.2%70 MB90% lower
Speechly on-device77.9%0.2%70 MB95% lower

The results made clear that Speechly’s custom ASR model both on-device and on-prem provided better accuracy in terms of Recall (i.e. identifying true positive toxic behavior). False positives were near zero and at par or below other AI model implementations. In addition, Speechly costs were 90% to 95% lower than cloud deployments and one-third to one-sixth the cost of an OpenAI Whisper implementation.

Implementing Speechly’s voice chat monitoring solution enabled Gym Class to proactively address toxic behavior and execute a successful Meta Quest store launch. And it was made possible at a reasonable cost.

Today, Gym Class VR has a 4.9-star rating on the Meta Quest store and over 28,000 positive reviews. Data from show it became the highest-rated experience in the entire store in March 2023. You should give it a try.


Discover what Speechly can do for you

Learn more
Learn more

Contact us to discuss your use case

Learn more
Get started

Build your first application in minutes

Learn more