Google on Thursday announced that it is now using better acoustic models for its Voice Search feature, which now uses less computational resources, is more accurate, faster, and is more robust to noise interference. The new acoustic models are already being used in the Google app for Android and iOS, which powers the Google Now voice-based virtual assistant, while dictation for Android also sees the benefits.
The Google Speech Team in a blog post said that Google Voice Search, which until now was powered by Deep Neural Network technology, is now using "better neural network acoustic models" powered by Recurrent Neural Networks (RNNs) - specifically, Connectionist Temporal Classification (CTC) and sequence discriminative training techniques.
The new RNN-based models use feedback loops for better recognition that uses less computational resources, allowing its engine to recognise better sounds by also analysing the sound that came before and after it. For example, the /m/ and /j/ sounds surrounding the /u/ in the word 'museum'. This, alongside CTC recognisers, allowed for better phenome recognition without aggressive prediction. However, since it was using larger chunks of audio, performing it real-time proved to be a challenge. Google said it managed to prevent delays by using prediction based on "ground-truth" timing of speech.
It has added artificial noise and reverberation to the training data, all of which would result in better speech recognition even with ambient noise. The details can be found in the company's dedicated blog post.