A Universal Voice Browser

On September 25, Amazon released 80 new devices, some of which can be found here. Even for a company of its size, that is an impressive number and an even more impressive line-up. However, I think something much more profound happened a day earlier. On Tuesday, September 24, 2019, at 11:04 AM EDT, in a simple press release, Amazon announced the Voice Interoperability Initiative.

Its goal is to ensure that voice services work seamlessly alongside one another on a single device. More specifically, voice-enabled products should be designed to support multiple simultaneous wake words. More than 30 companies already supporting this effort, including Amazon, Baidu, BMW, Bose, Harman, Logitech, Microsoft, Salesforce, Sonos, Sony, Spotify, Orange, Verizon, Intel, NXP, and Qualcomm.

I think Marc Benioff, Chairman and co-CEO at Salesforce, which has its own Einstein Voice Assistant, understood and verbalized the significance of this moment best, when he said:

“We’re in the midst of an incredible technological shift, in which voice and AI are completely transforming the customer experience.” — Marc Benioff, Chairman and co-CEO at Salesforce

I have often compared the current proprietary smart speaker/voice assistant market, with the time and market before the Web Browser. I.e., to me, Google Home, Amazon Echo, and Apple Home Pod, look a lot like CompuServe, AOL, and MSN did.
Considering the goals of the Voice Interoperability Initiative and its supporters, I think it’s fair to ask, how much closer to something comparable to a Web browser, did we just get?


What came after CompuServe, AOL, and MSN?

CompuServe, AOL, and MSN unsuccessfully tried to remain gatekeepers while also integrating the Web. Instead, users switched to independent Internet Service Providers (ISP), accessed the internet directly, and used sites like Yahoo.com as homepages so-called portals, which were customizable with relevant links (bookmarks) and content feeds.
Adding a skill to your Alexa profile is somewhat comparable to that. Saying “Alexa” is like opening Yahoo.com, saying “open Spotify” is like clicking on a bookmark, or saying “what is my news brief” can be compared to reading your pre-configured news items on a portal page

What is going to change?

What the Voice Interoperability Initiative proposes is that devices like the Amazon Echo, will not only wake up when you say “Alexa” but also on many other (branded) words. I suppose you will have to actively enable the other wake-words. Those wake words will now allow you to communicate directly with the selected voice assistant.

Example

Instead of saying the propriety wake word(s) like “Alexa”, or “Okay Google”, followed by the invocation word (brand, name of the skill), followed by hearing a greeting, and finally saying the intent. You will be able to just speak the brand-name directly followed by your intent. E.g. “Spotify play Cruel Summer by Taylor Swift.”

What does this mean?

Wake word recognition is implemented in hardware/firmware, meaning happening directly on the device and not performed on a remote server that received a recording of your voice over the Internet. The list of recognizable wake words is limited and most likely not instantaneously be updatable.
To draw another analogy to the Web, I think of wake words like domain names and the rush for getting popular wake-words registered and into the on-device recognizers will probably not be unlike the gold-rush for domain names.
The fight between companies that joined the Voice Interoperability Initiative and those outside (Apple, Google, and Samsung) will be the browser wars all over again.


When Siri was integrated into iPhone 4S at its release in October 2011, most of us knew that Voice was going to be a long game, like running a Marathon. With last week’s announcement of the Voice Interoperability Initiative, we suddenly seem a lot closer to the finish line.
With Voice-Biometrics already working pretty well for Speaker-Identification, Speaker-Authentication seems just around the corner. I.e., there are no more corners, the rest of the race is more like the New York City 5th Avenue Mile race, which takes runners down 20 blocks in one straight line of one of Manhattan’s most well-known streets.

Share this post:

Leave a Reply