Sep 19, 2023
1 min read
Major gaming studios like Riot Games, Roblox and Sony are recording voice chats for moderation, but the tools for content moderation today typically suffer from low accuracy, high cost, and high latency. A new technical approach is needed to fill the voice chat moderation gap.
We interviewed 20+ experts in the Online Gaming & Metaverse space. Here are the key takeaways for Voice Moderation.
Riot Games began recording voice chats in its Valorant title in July 2022 to better monitor the communication channel for toxic behavior and to investigate reported incidents. Roblox also informed users earlier this year that it is recording voice chats and maintains the recordings for seven days unless there is a complaint filed.
Sony said in 2020 that it is recording all Playstation voice chats on a 5-minute rolling basis. It is up to the user to select up to a 40-second clip that includes the offensive behavior and submit it to Sony’s moderation team for review.
Whether it is these games, Back 4 Blood, or others, the intent is the same. Game makers want to offer voice chat because it is a feature that players enjoy. However, they are concerned about the impact that toxic behavior and harassment have on the game-playing experience and how it can lead to players reducing their play session time, use frequency, and avoiding specific game titles altogether.
Game makers are applying a variety of approaches to combat voice chat abuse. However, our conversations with game makers suggest there is not only room for improvement; the existing techniques for voice chat moderation introduce new problems because they are inaccurate, expensive, and slow.
Some people may suggest that you could avoid this problem altogether by simply sticking to text chat alone or offering no in-game communications. But simple solutions aren’t always practical. Gamers like voice chat, and its absence could undermine a game’s competitiveness while shortchanging the in-game experience. An article in Hackernoon shared statistics from Tencent Cloud that reported:
“Over 90% of Chinese gamers prefer to interact with other players in an experience. 90.6% of consumers use the built-in voice chat function when playing a game, with 38.4% saying that they use the voice chat function often. When a title doesn’t have an in-game voice communication system in place, 73.7% of these players say they turn to a third-party service instead.”
It’s not just Chinese gamers that like voice chat. As I wrote in an earlier post:
An Oxford Academic study from 2007 found that “voice chat leads to stronger bonds and deeper empathy than text chat. As Subspace put it in 2021, “Voice deepens the immersive world, helps forge social bonds, and strengthens online play.”
You can also see it in user behavior related to voice chat. Tod Bouris, the former director of customer success at Vivox and now a manager at Unity, said at a 2019 conference:
“The metrics show that people who use communications during their gaming game more and more often than those that don’t. And we also find this holds true for any platform, any game type. So, voice [chat] is really a social element that adds stickiness and retention to your games that you can’t get from something else.”
Bouris also said that voice chat users spent twice the amount of time playing as non-voice users and were five times more likely to be playing after five weeks. The data suggest the case for including voice chat in games is strong and getting stronger. Given this situation, game makers are turning to voice chat moderation vs eliminating voice chat from their experiences.
If a game maker has other communications, such as text chat, they typically have some type of moderation solution to monitor for abusive behavior. Many companies believe that they can just add an off-the-shelf transcription solution to a voice chat and then feed the text into their existing moderation solutions. This is where complications begin.
Most general purpose automated speech recognition (ASR) solutions will not recognize a significant portion of the game-related nomenclature and slang. That often leads to transcript errors which means the text analysis will suffer from a high frequency of false negatives (i.e. missing something that should be flagged) and false positives (i.e. flagging something that is not a policy violation). These error types lead to different problems that are costly to resolve, can lead to missed issues, and don’t live up to the goal of the moderation policy.
Part of the solution is to use a custom-trained ASR model. That will help reduce transcription errors. Very often, game makers also need to have a refined natural language understanding (NLU) model to provide context that is often unclear from the text alone. Voice chat provides more robust data signals that can help differentiate between abusive and collegial verbal exchanges, which can further reduce the occurrence of false positives and negatives.
Another challenge is cost. Transcribing every voice chat in the cloud can run up large computational processing bills very quickly. Try it sometime. Few organizations can afford this and, as a result, scale back their voice chat moderation plans or cancel them altogether. This can be mitigated by running some or all of the ASR transcription locally on the users’ devices.
Finally, there is the issue of speed. Voice chat is a real-time activity, while most moderation is conducted in after-the-fact audits or based on user-submitted complaints. That means the moderation is really adjudication. It takes place after the abusive or harassing behavior is over. Real-time solutions can flag these problematic conversations while they are in process and potentially mitigate the negative effects on the victims and prevent further spreading or virality. On-device speech recognition is one method to significantly speed up speech transcription, which is the first step in real-time monitoring.
Game makers increasingly need to provide voice chat and need better tools for their moderation program. Speechly focuses on voice chat monitoring and provides the ASR and optional NLU as an API. Game makers can just connect to Speechly’s API and feed their existing moderation solutions directly with higher quality data. Speechly can also quickly train custom ASR models and run the solution on user devices to deliver higher accuracy, lower cost, and lower latency.
We didn’t originally set out to solve this problem. Some game makers were using our API to power the speech recognition for non-player characters and asked if our technology could help. It turns out the Speechly architecture and performance are well suited to address the voice chat moderation gap. As we began assisting game makers, we learned even more about the real requirements behind the problem, and we refined our API solution for moderation. You can learn more here.
Let us know if you have questions about how we help game makers tackle the voice chat moderation gap and also if you have any particular requirements that you would like to see added to our products. We are certain voice chat moderation is a problem worth solving for the benefit of gamers and game makers alike.
Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.
Sep 19, 2023
1 min read
Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.
Jul 06, 2023
5 min read
Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.
Jun 01, 2023
1 min read