use cases

The 5 AI Technologies You Need for Voice Chat Moderation in Games

Hannes Heikinheimo

Apr 18, 2023

4 min read

The rapid rise of multiplayer online gaming has resulted in video games becoming social experiences. Voice chat has become an important communication channel to facilitate this social experience, but also the top channel for toxic behavior. Luckily there are 5 AI technologies to help overcome this toxicity.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Voice chat has become very popular in games. The rise of multiplayer games has influenced this trend, but the larger impact has become the evolution of online games into social experiences. Nearly half of the players say they like games better with voice chat, and over 68% use the feature.

Game makers also appreciate voice chat because it can help build stronger bonds between players and deliver better retention, longer sessions, and more frequent play. For many game makers, this translates into higher average revenue per user (ARPU). 

However, we also know that voice chat is the biggest source of toxic behavior, far exceeding text chat, in-game play, and user-generated content. Those toxic incidents have direct negative impacts on the players, playing time, playing frequency, and sentiment toward the game. While game makers have had tools for monitoring toxic behavior in games and through text chat for some time, voice chat is largely a free-for-all. 

The solution to this is AI-enabled voice chat moderation. It is not uncommon for game industry professionals to be aware of some AI-based tools in use for text chat moderation, and others may have heard of similar technology for voice chat. However, most people don’t know that effective voice chat monitoring for gaming requires multiple AI-based capabilities. 

Beyond Keywords to Conversational Context

Traditional moderation tools favor simplistic approaches such as keyword spotting. This technique is common for text chat moderation, and some people have tried it for voice chat. It can help you flag or redact a known list of bad words, but you are constantly chasing the novel bad word and just trying to catch up with the bad actors. And this approach will miss many toxic incidents while flagging some that are actually benign. 

The issue is context. The same word could be viewed as toxic or appropriate depending on what is happening in the conversation or the game. For example, "I'm going to kill you!” could be flagged as a threat of physical violence. However, this could also be a key objective of the game based on combat. Similarly, someone may say they are going to plant a bomb at the courthouse. Should the game maker notify law enforcement, or is it a known strategy in the video game? 

Both of these comments could result in a false positive result. That is when the system identifies a toxic incident when the comments are perfectly within the bounds of acceptable behavior for the game. False positives can be detrimental to a game because they can lead to false accusations and unjust penalties for players that are acting entirely in good faith. 

Misunderstandings can also arise from sarcasm, cultural differences, and misinterpreted accents. This is why custom AI models are often essential for voice chat moderation. It is also why a single technique generally doesn’t get the job done. 

5 Key Voice Chat Monitoring Technologies

Speechly was asked by several large game makers how to address voice chat toxicity without missing true positives or generating false positives. As we analyzed the problem, we were able to identify five techniques in three categories that can help to finally fill the voice chat moderation gap. 

The first category is accurately identifying what was said. This revolves around the transcript and identifying entity labels. The second category is related to meaning and includes semantic labels and tone-of-voice labels. Finally, there are other signals that are not words and are known as audio event labels. 

  1. Transcripts: Transcripts are a written record of the conversation that took place during the voice chat. They allow moderators to review the conversation and identify any inappropriate behavior or rule violations that may have occurred. Transcripts are also used for additional AI-based analysis of what was said. 
  2. Entity Labels: Entity labels refer to identifying and labeling specific people, places, organizations, and other topics mentioned in the conversation. They help moderation systems automatically identify and categorize potentially harmful or inappropriate content that violates the platform's policies. 
  3. Semantic Labels: Semantic labels help the moderation system better understand the context and the meaning of the conversation to identify any potentially harmful or inappropriate content. They can also be used to help avoid false positives that might arise from considering individual words alone. 
  4. Tone-of-Voice Labels: Tone-of-voice labels help the moderation system better understand the way things are said by a user. This can be useful in identifying when someone is becoming agitated or upset. This could potentially lead to rule violations or inappropriate behavior or help identify when a user is simply joking or using sarcasm.
  5. Audio Event Labels: Audio event labels refer to labeling specific sounds or events that occur during the conversation. Audio Event Labels help provide further contextual information to the moderation system that goes past the spoken word alone and identify issues that would otherwise go unnoticed.

It is understandable that game makers would first look to what worked for them in text chat when considering how to address toxicity in voice chat. They usually figure out quickly that these techniques fall far short of meeting their moderation and mitigation objectives. The optimal solution involves a portfolio of AI-based features used in concert. 

If you would like to learn more about any of these AI-driven techniques, feel free to contact our product team using our Contact Form.

About Speechly

Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.

Latest blog posts

company news

Speechly is joining Roblox

Hannes Heikinheimo

Sep 19, 2023

1 min read

voice tech

4 Voice Chat Solutions for Virtual Reality

Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.

Matt Durgavich

Jul 06, 2023

5 min read

company news

Speechly Has Received SOC 2 Type II Certification

Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.

Markus Lång

Jun 01, 2023

1 min read