Facebook has announced that it will be making its wav2letter@anywhere
online speech recognition framework more readily available as an open source
platform. The framework was developed by Facebook AI Research (FAIR), which
claims that it has created the fastest open source automatic speech recognition
(ASR) platform currently on the market.
“The system has almost three times the throughput of a
well-tuned hybrid ASR baseline while also having lower latency and a better
word error rate,” wrote a group of eight FAIR researchers in a recent paper.
The wav2letter@anywhere framework is based on the wav2letter
and wav2letter++ neural net language models, and utilizes time-depth separable
(TDS) convolutional neural network (CNN) tech – rather than recurrent neural
network (RNN) tech – to achieve its performance gains. Separable models have
more traditionally been used for computer vision applications, but the FAIR
researchers argue that their approach is superior to standard RNN baselines.
If the jargon is somewhat opaque, the upshot is that
Facebook may have delivered an accurate speech recognition platform with lower
latency that can be deployed on edge devices or through the cloud. If that proves
to be the case, it would make it much easier for smaller developers to
incorporate some form of speech recognition into their various solutions.
Of course, speech recognition has become increasingly common in the past few years, turning up in IoT products that range from smart cars to smart appliances. Grand View Research previously predicted that the joint speech and voice recognition market would be worth $31.82 billion in 2025.
In the meantime, it’s not yet clear what plans Facebook has for the speech recognition platform. The social media giant has spent the past few years developing a slew of new technologies, and recently launched a new payments platform and a solution that alters video content to thwart facial recognition.
Source: Venture Beat
(Originally posted on Mobile ID World)
Credt: Source link