Blog

voice tech

4 Voice Chat Solutions for Virtual Reality

Matt Durgavich

Jul 06, 2023

5 min read

Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

VR and Voice Chat are a Perfect Match

Earlier this month, Apple introduced the Vision Pro at their 2023 Worldwide Developer Conference, reinvigorating the conversation around virtual (VR), augmented (AR), and mixed (XR) realities. With Meta’s Quest 3, the sequel to the world’s most popular headset, due out this Fall it’s an exciting time for developers to bring new virtual world experiences to market. In the last few years, popular multiplayer games such as IRL Studios’ Gym Class, Ramen VR’s Zenith: The Last City, and Big Box VR’s Population: One have reached millions of players and critical acclaim. Social connected experiences like VRChat’s VRChat Plus and Rec Room are incredibly popular with millions of active users a month. VR as a platform continues to attract consumers: the market is on track to grow 50% in 2023.

Each of the above titles shares a common thread: real-time communication. Voice chat presents a fantastic solution to this problem, seamlessly fitting into the interaction paradigms of VR, preserving immersion, and boasting a low learning curve. Voice chat is simply a must-have for collaborative VR experiences. Hosted solutions, where vendors operate and maintain the services, strike a good balance between flexibility, capability, and total cost of ownership. Additionally, the Unity engine remains an exceptional choice for VR developers due to its extensive cross-device support and rapid iteration capabilities. We’ll briefly explore the advantages and weaknesses of four popular Unity-compatible hosted solutions.

Voice Chat Features to Consider

Voice chat support requires careful planning. Like most design considerations, the earlier in the project life cycle the easier it is to adapt and adjust. A common mistake is for a developer to put “voice chat integration” on a schedule late in production, only to find the chosen solution is hard to integrate cleanly. So before you begin, carefully consider these areas:

  • Solution Architecture. Is it peer to peer (p2p) or client to server? Each comes with constraints such as total number of voice participants, user bandwidth requirements, and moderation compatibility. Generally, p2p solutions have lower costs of entry but struggle to work well in common network setups like NATs with firewalls. Client/server setups are much more robust, but require significant server bandwidth and computation resources which elevate costs, though offer some opportunities for centralized features such as recording, transcription, or moderation.
  • Spatial Audio. Spatialized audio modifies voices depending on the location and environment in a 3D virtual space. Voice chat solutions vary in terms of their support for 3D audio, ranging from non-existent to unlimited mixing and matching of spatialized and non-spatialized voices. The right voice chat experience is highly dependent on the style of VR application under consideration, so a clear understanding of your project’s desired user experience is paramount.
  • Developer Experience. What do the APIs look like? Are they easy to adapt to your project? What are the paradigms and abstractions in play? VR platforms are varied, so a good solution abstracts that away and gives the developer a clean and consistent experience. The best way to reduce risk early on is to look closely at samples and tutorials and understand how the paradigms map to your project. For example, an MMO experience might have a population center where hundreds of users can talk. If the voice chat solution has a participant cap of 8 participants per group, that implies a creative implementation and larger level of effort than a solution that supports unlimited participants.
  • Total Cost of Ownership. A simple fact of live services is they incur ongoing maintenance and costs. Voice chat vendors typically price on usage, which means the more popular your game the more it will cost you. Related, a good vendor will provide robust Service Level Agreements (SLAs) that promise uptimes, maintenance windows, defect resolutions, live support, issue turnaround times, and more. A great approach is to prepare a list of questions or concerns to put to a vendor’s pre-sales or developer support team. This will give you an excellent sense of customer care patterns, response times, and general levels of comfort dealing with a given solution.

Popular Voice Chat Solution Pros and Cons

With the above considerations in mind, let’s look more closely at 4 popular services on the market today. All solutions are client-server architectures, with support for spatial audio.

VendorProsCons
Photon Engine- Simple API that is friendly to Unity development best practices like prefabs and drag-and-drop development

- A simple but realistic sample

- All relevant documentation is online and publicly accessible

- Full integration into the Unity audio subsystem

- Web-based service health dashboard
- The free tier is very limited with a hard cap

- Only Magic Leap and HoloLens VR platform support (at the time of this writing)

- No explicit uptime or maintenance window guarantees
Vivox- Nearly unlimited number of users and user groups

- Web-based developer portal with usage data, SDK downloads, and more

- Generous free usage tier of up to 5000 peak users a month

- Robust SLA with 99.9% uptime

- Support through forums and help desk, with paid professional support available
- Developer portal is behind a login, restricting search engine results for documentation and best practices

- Custom audio capture and playback offers limited integration into Unity’s audio subsystem

- No expliict VR support

- Unity sample is limited
Agora- Large number of supported platforms

- Step-by-step instructions with cut and paste ready code for common tasks

- Simple pricing model with per minute costs, with discounts available

- Robust knowledge base and active forums
- No ready-to-use sample

- Unity integration lacks prefabs, editor support, and other ecosystem comforts

- No explicit VR support
Normcore-Unity exclusive means top notch integration and development experience for Unity developers

- Explicit XR support and Unity audio integration with detailed information in articles like XR Avatars and Voice Chat

- Documentation is simple, search engine indexed, and accepts community fixes

- Robust web dashboard for tracking usage and app integrations

- Support through email or Discord
-No explicit uptime or SLA

- Free tier is limited to 30 users, 10 rooms, and 1 hour which is unsuitable for production use

Get Started Now

These four solutions are solid options to consider, and are fast and easy to use in experiments. Virtual experiences are better in every way with voice communication, and it’s never been easier to get started with a third-party solution. With voice technology in place, exciting capabilities like real-time transcription, tonal analysis, recording, or other moderation techniques are achievable and lay a foundation for immersive, exciting virtual worlds.

About the Author

* Matt is a veteran technology leader in and out of the gaming industry with contributions to games like Red Dead Redemption and Marvel Puzzle Quest. His most recent stint was at Vivox, a Unity Technologies brand, helping to bring voice chat to mobile and VR platforms. He writes about these topics as well as practical leadership lessons at thelead.beehiiv.com.

About Speechly

Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.

Latest blog posts

company news

Speechly is joining Roblox

Hannes Heikinheimo

Sep 19, 2023

1 min read

company news

Speechly Has Received SOC 2 Type II Certification

Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.

Markus Lång

Jun 01, 2023

1 min read

use cases

Countering Extremism in Online Games - New NYU Report

A recent NYU report exposes how extremist actors exploit online game communication features. In this blog we expand on NYU's data and recommendations for maintaining safety and security in online gaming communities.

Collin Borns

May 30, 2023

4 min read