Blog

use cases

Voice Picking with Modern Technologies

Ottomatias Peura

Feb 05, 2021

5 min read

Learn how voice picking and voice-directed warehousing with real-time Spoken Language Understanding improves efficiency and key metrics in your warehouse

  • Copy link

  • Mail

  • LinkedIn

  • Facebook

  • Twitter

Voice picking has been employed for decades. But only recently have technologies such as Speechly’s Spoken Language Understanding enabled intuitive and accurate real-time voice-user interfaces that maximize efficiency with minimal customization and development time.

What is voice picking?

Voice-directed warehousing (VDW), voice picking, pick by voice, voice-enabled warehouses, and speech-based picking all refer to the same thing. It’s a paperless, hands-free, and eyes-free computer system that employs voice commands for warehouse processes.

Warehouses have been frontrunners in using voice technology in improving workforce efficiency. Voice picking has been used at least since the early 1990s, but recently the technology has matured enough to make other technologies almost obsolete. The market is expected to grow significantly over the coming years, due to decreasing costs and improved accuracy.

Voice-directed warehouses typically use a multimodal voice user interface that can both direct the operator and take commands from the operator using voice. However, many warehouse operators use the voice-user interface only for data input from the operator to the system, and the information from the system to the operator is shown visually on their screen.

Voice-directed warehousing is well-suited for keeping an operator's hands and eyes free, allowing them to focus more on the task at hand.

It can be used in all kinds of storage environments; freezing and noisy environments are not a problem for Speechly’s voice technology. It can be used in warehouses with a large and small number of SKUs alike, and can be adapted to any process.

How does voice picking work?

In a voice-directed warehouse (VDW), operators are equipped with a device, often a mobile phone, tablet, or a voice-dedicated terminal and headset. Typically, the headset is equipped with noise-canceling features for better performance in loud environments. In addition to voice and touch, the device can also support RFID and barcodes for increased efficiency in certain situations.

Modern voice picking employs speech recognition and natural language understanding technologies for improved accuracy and intuitiveness. Speechly has a unique approach to these technologies by combining these processes into a Spoken Language Understanding system that returns accurate results for voice commands in real time.

When an employee starts their shift, orders are imported from the host system — such as an ERP (Enterprise Resource Planning software) or a WMS (Warehouse Management System) — to the device and then processed. After processing and sequencing, the instructions for what item the operator should pick and where to find it are either spoken out loud (with a text-to-speech system) or shown on the screen.

When the operator is in the correct location, they confirm that they are picking the correct items by checking in to the location. After that, they confirm the products by speaking the product code or another identifier printed on the product. The operator also confirms the quantity they are about to pick. In case of incorrect or inaccurate confirmation, the voice application can correct the operator multi-modally.

Depending on the implementation, location information, RFID and other technologies can be used to optimize the route and maximize the efficiency of the operator.

Typical voice commands that operators use include product code strings, quantities, and locations. The operator can also slow down or hasten the voice user interface. A well-designed multimodal voice user interface is the key to highest efficiency.

Benefits of voice picking

Benefits of using voice in a warehouse setting include, but is not limited to:

  • Faster and more efficient picking Faster and more efficient picking: Picking is the most expensive and labor-intensive warehouse process. It can constitute more than half the cost of a typical distribution center. Voice-user interfaces increase hourly pick rate.
  • More accurate reporting and better data quality Anomalies — such as broken or missing items — can be reported in real time, resulting in better data quality and cost savings.
  • Safer warehouse environments Safety is a priority in an efficient logistics facility. Hands-free and distraction-free operation of voice-user interfaces reduces injuries and accidents.
  • No need for printing and distributing picking documents in paper Because orders are imported directly from the ERP or WMS to the employees' mobile devices, operators are ready to start picking right after they start their shift.
  • Decreased training time for new employees Unlike traditional barcode and RFID scanners and hard-to-use enterprise software, voice-user interfaces are intuitive and require less than a day of training time for new employees. This can be a great benefit in warehouses with many seasonal employees.
  • Improved efficiency due to operators being able to do two things at once Voice-directed warehouses enable operators to spend up to 95% of their work time picking, rather than reporting and searching for documents.
  • Improved customer satisfaction due to no incorrect shipping Incorrect shipping is costly and reduces customer satisfaction. With voice picking, mistakes are massively reduced.
  • Effectiveness in cold environment Traditional user interfaces are hard to use in cold storages and environments that require operators to wear gloves.
  • Happier employees Simplified operations lead to happy, productive employees and decreased employee turnover.

Unlike some older voice systems currently employed in warehouses, voice user interfaces built with Speechly require no per user training of the speech recognition models. The model is trained once, and it will work for all old and new employees.

All interactions between the system and the operator can be tracked — this enables management to track progress in real time and audit trail to resolve anomalies.

Voice picking can be easily integrated into any WMS and ERP with productivity increases of up to 40%. Because of easy implementation and major productivity increases, typical voice projects in warehouses have a relatively short ROI of about 6 to 12 months. Due to improved data quality, it enables warehouse management to track and analyze progress and reallocate resources in almost real time.

The technology doesn’t have to be limited to just picking, though — voice can be used in most other warehouse processes, such as cross-picking, quality control, packing, sortation, replenishment, receiving, and put-away.

How to get started with Speechly in warehouses

Speechly’s Spoken Language Understanding technology offers industry-leading accuracy without the need for special hardware. Our technology works for all accents and can be adapted to all processes. Typically, a POC can be built that integrates to current ERP or WMS and supports most common warehouse processes in less than a month.

Our pricing is competitive and is based on the amount of audio data sent to our API. Typical costs for using our API in a warehouse setting are some thousands of euros per month. Speechly works on all mobile devices and can be used in custom hardware, too.

If you’re interested in learning more about how voice technology can help your logistics workforce be more efficient and improve your business data quality, leave your email address and our industry specialist will contact you with more details.

Latest blog posts

case study

Combating Voice Chat Toxicity in VR Games: Speechly and Gym Class

Gym Class VR is a basketball game that was preparing to launch on Meta Quest after a very successful Beta. Voice chat is an important social element of the game, but the team noticed evidence of toxic behavior emerging. After trying speech recognition from cloud service providers, they quickly learned this was a cost-prohibitive approach and turned to Speechly.

Collin Borns

Mar 20, 2023

5 min read

voice tech

The Dirty Dozen - The Impact of 12 Types of Toxic Behavior in Online Game Voice Chat

Speechly surveyed over 1000 online gamers about toxic behavior in voice and text chat. The results show offensive names, trolling, bullying and annoying behavior top the list with the broadest impact. However, these behaviors are between 50%-200% more frequent in voice chat.

Collin Borns

Mar 09, 2023

3 min read

voice tech

Voice Chat is Popular with Gamers - It's also the Top Source of Toxic Behavior - New Report

Speechly commissioned a survey of a nationally representative sample of over 1000 gamers. The survey found that nearly 70% of gamers have used voice chat at least once. Of those, 72% said they've experienced a toxic incident. Read more today in the Full Report.

Otto Söderlund

Mar 08, 2023

3 min read