The search giant's abstract for the published paper noted, "We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone."
For the new system, Google utilised an "embedded speech recognition system" that runs locally on a mobile device and which is said to be more reliable and have lower latency. The company however added that it must be accurate and must not consume significant memory or computational resources.
Google achieved the new system by incorporating various components in the overall system architecture. "Using a combination of SVD-based compression and quantization, along with a compact first-pass decoding strategy and on-the-fly rescoring with a larger LM (language model), we can build a system that is about 20.3 MB in size, without compromising accuracy or latency," explained the paper.
Further, the company claimed that when the new system was tested on the Nexus 5, the system achieved a 13.5 percent word error rate and was to be seven times faster than real-time.
It added that for developing the acoustic modelling, Google exposed the system to train on 3 million hand-transcribed anonymized utterances extracted from voice search traffic (approximately 2,000 hours). For improving accuracy, Google also exposed its systems to noise and reverberation which was samples extracted from YouTube videos and environmental recordings of daily events.
We already know that Google Now currently has limited offline voice-command support. The list of offline commands currently available to users include Play Music, Open Gmail, Turn on Wi-Fi, Turn up the volume, Turn on the flashlight, Turn on airplane mode, Turn on Bluetooth, and Dim the screen.
In other news, Google's Senior fellow Jeff Dean in a recent interview confirmed that Google Translate will soon have more accurate results with deep learning. Dean told VentureBeat that Google is working to make "translations more accurate."