Blog

voice tech

The Case for Real Time Voice Chat Moderation Technology in the Metaverse

Mandi Galluch

Mar 07, 2022

3 min read

Why the future of the metaverse is dependent upon robust voice chat moderation APIs and AI technologies

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Moderation needs in the metaverse

As Meta continues forward with their commitment to the growth of the metaverse, they’re also grappling with the reality that harassment in VR could turn mainstream consumers away. Their incoming CTO, Andrew Bosworth, referred to this as an “existential threat” to their plans for the metaverse expansion.

The threat is a very real one. Microsoft recently shuttered elements of their AltspaceVR public social hubs and made plans to increase moderation to ensure that the platform is safe. Voice chat has been used to sexually harass players using Oculus for gaming. The potential for harm in these new spaces is obvious and the need for effective moderation solutions is clear.

Real time voice audio offers new opportunities for harassment - and for AI powered moderation solutions

It’s important to note that this isn’t an issue that can be easily solved; Mike Masnick, founder of Techdirt, wrote about what he calls Masnick’s Impossibility Theorum. He argues that, “content moderation at scale is impossible to do well.” (It’s worth calling out that he still feels it’s something that needs to be done.)

What’s interesting about moderation in the metaverse is that you have multiple different modalities at play. People can talk to each other and they can interact with each other through simulated touching and gesturing. Moderation must be occurring across both modalities in order to be effective and solutions for both should be flexible enough to allow them to work together in parallel, to provide additional context and improve the quality of the moderation efforts.

When people talk to each other, they’re listening not just to the words being said but to the way that they’re said. They observe the body language of the speaker. They know the context of the relationship with the speaker. All of these things factor into the way that the words spoken are processed and understood by the listener. For moderation purposes the understanding of all of these things together is key, and it has to be done accurately and **quickly. **Why? Because a recent survey found that 60% of kids and 83% of adults have experienced harassment in online multiplayer games. That is a huge human impact and the online gaming voice experiences offer a lot of parallels to the metaverse experience but now with new, more interactive, ways to cause harm.

This potential for harm is something that all of the big players in building out elements of the metaverse are aware of. If their platform does not have technology in place to help identify, investigate, and intervene in situations like this, their platform becomes a tool that harms people. That’s not good for people and it’s not good for business.

This space is interesting to the team at Speechly because the challenge posed is one that our technology is uniquely positioned to help address. Ideally the technology would be deployed as a flexible chat moderation API with a custom model to suit each specific community and environment. The ability to simultaneously run automated speech recognition and natural language processing means that we’re able to help moderation systems respond faster, and with more accuracy.

How Artificial Intelligence (AI) can support voice chat moderation

If you’ve ever read a transcript of a conversation, you know that it can leave a lot to be desired. The ability to create these transcriptions in real time as people are speaking is at the heart of what is needed for successful voice chat moderation. Then you add in the layers that bring it to life and the context and understanding necessary to determine if something was said that should be escalated.

Building AI powered models around things like sentiment analysis, volume fluctuations, and tone can all be used to help understand the context of what was said. Remember that in the metaverse, unless someone is streaming and recording the experience, harassment that is spoken leaves no “evidence” left behind. There’s no comment to screenshot, no profile to click to better identify the harasser. The experiences often move quickly and the harasser can quickly move on without any intervention. Unless. Unless there’s an AI layer built in to help identify, intercept, and intervene in real time.

The future of voice chat moderation

As companies continue their push into new forms of multimodal online experiences in the metaverse, the need for effective moderation will only grow. The types of harassment will shift and expand along with the capabilities of the metaverse and the technology to monitor and moderate it will need to expand alongside it.

The sooner that AI powered models are deployed, the smarter and more effective the technology will become, and the better everyone’s experiences will be.

Cover photo by Julia M Cameron on Pexels

Latest blog posts

company news

New Feature Release: Batch API for Transcribing Pre-Recorded Audio

Today we are excited to announce the Speechly Batch API for Transcribing massive amounts of pre-recorded audio or video content.

Collin Borns

Sep 27, 2022

2 min read

voice tech

3 Common Voice Chat Moderation Mistakes

Voice chat has become an essential feature in many games and social media platforms making Moderation a critical thing to get right.

Otto Söderlund

Sep 19, 2022

5 min read

use cases

Online Harassment Statistics that Matter for 2022

Online harassment is as old as the internet. However, where it was once rare and infrequent, it is now increasingly common. The data all points in one direction and is compiled here.

Collin Borns

Sep 12, 2022

5 min read