On-device speech recognition may make smart assistants more appealing

Google unveiled the next-generation Google Assistant at I/O 2019, featuring an on-device speech recognition model-bypassing the need to upload voice samples to cloud systems.

Google and Amazon definitely want people to bring smart assistant speakers like the Google Home and Amazon Echo into their homes—the lower-end versions of the two, the Google Home Mini and Amazon Echo Dot, are frequently heavily discounted or bundled with other products or services to the point they have become the digital age’s equivalent of the free toy in the bottom of the cereal box.

There are—naturally—holdouts against this type of technology, given the relative discomfort people have with bringing an internet-connected speaker that constantly listens for the “wake word” used to prompt the device to spring into action in their home. Anecdotes abound, such as a Portland couple claiming their Echo arbitrary recorded and send a conversation to someone in their contact list. Despite resistance to smart speakers as a device class, all of this functionality—an always-on microphone listening, waiting to be called upon, recording your command and sending it to the cloud for processing—already exists in modern smartphones as well.

SEE: Alexa Skills: A guide for business pros (free PDF) (TechRepublic)

Google is pushing this voice recognition from the cloud onto the edge, with the new Google Assistant unveiled at I/O 2019, that uses a compacted machine learning library that the company claims is built from 100 GB of data to less than half a gigabyte, with CNET noting that “the souped-up digital helper requires hefty computing power for a phone, so it will only be available on high-end devices. Google will debut the product on the next premium version of its flagship Pixel phone, expected in the fall.”

For developers, Google is expanding their Edge ML capabilities, with betas of the On-device Translation API, an Object Detection & Tracking API, and AutoML Vision Edge unveiled at I/O 2019. The technology that powers the next-generation Google Assistant is not (yet) deployable for developers’ projects, however.

Options for third-party developers

That does not mean that third-party developers cannot take advantage of on-device voice recognition, however. Snips, a French software firm, makes the Snips platform freely available for non-commercial use, and requires an order of magnitude less in terms of processing power, as it is capable of running on a Raspberry Pi 3. The Snips platform itself does not require an internet connection to operate, though integrations that require internet access—obviously—do.

“The main differentiator of the Snips platform is that it focuses on all the components required to build high quality voice interfaces: Wake word detection, Speech Recognition, and Natural Language Understanding,” Snips CTO Joseph Dureau told TechRepublic. “In contrast, none of these voice processing algorithms are included in the Google ML Kit,” adding that “Our data generation solutions makes it possible to generate large volumes of diverse and high-quality training data, for any voice interface use case. It enables developers to train their assistants with very high performance before their actual launch, helping them to overcome the cold start problem.”

Snips boasts a community of over 25,000 developers, and the platform presently supports English, French, Japanese, Spanish, Italian, and Portuguese.

The potential for developers to utilize this technology in their applications could assuage some of the concerns—founded or otherwise—of those reluctant to adopt voice-activated smart assistants.

For more, check out the 5 biggest IoT security failures of 2018, and why data security is now a top concern for IT leaders.