The Applied Voice Input Output Society (AVIOS) and TMA Associates organize the annual Mobile Voice Conference, which this year took place at the Sainte Claire Hotel in San Jose, California on April 20 and 21.
The Mobile Voice Conference examines the current state of speech recognition, speech synthesis, and natural language understanding technology, what it can do today, and what to expect as the technologies evolve.
This was my 3rd attendance in as many years and for the 2nd time, I had the opportunity to speak at this limited-attendance event. The organizers’, namely Bill Scholz, President of AVIOS and Bill Meisel, President of TMA Associates, goal is to attracts industry leaders and top speakers that want to interact with their peers and hear talks with significant, practical, and insightful content that goes well beyond the usual sales pitches.
While the central theme this year was Virtual Assistants, I had the impression that Voice User Interfaces (VUI) in a Connected Car was the predominant topic, driving many discussions and related talks were particularly popular and had great audience participation.
The obvious problem with general virtual assistants is that they need vast, almost limitless, and long term access to personal information, which currently is only available to companies like Google, Apple, and Microsoft. In the keynote, Rob Chambers, Speech Platform Group Engineering Manager at Microsoft, tried to give an engaging talk around Cortana, Microsoft’s virtual assistant. However, he probably was the only conference participants with a Windows Phone and the little new information he was able to share was probably not enough to captivate the audience. Let’s hope that making Cortana available in Windows 10 will prolong her life and prevent her being buried next to Microsoft Office Assistant Clippy.
The keynote on the 2nd day of the conference saw a nice change in pace. TMA President William Meisel moderated a panel discussion on “How natural can and should natural language applications be?” On the the panel were Adam Cheyer, co-founder, Viv Labs, Todd Mozer, CEO, Sensory Inc., and Phil Gray, Executive VP, Interactions.
Adam Cheyer of course was a co-founder and VP Engineering of Siri Inc., and after Apple acquired the company to build their voice-based personal assistant, he was a Director of Engineering in Apple’s iOS group. I think I have seen Adam on panels like this one, at least ten times. Still, he never disappoints, is an everlasting source for anecdotes, and to me, he is one of the most inspiring creators of our time.
Adam was joined by Todd F. Mozer, President, CEO, and Chairman of Sensory, the company that is providing billions of super low energy, self-contained speech recognition modules, for instance allowing phones to constantly listen for trigger-words like “OK Google”; and by Phil Gray, Executive Vice President of Business Development at Interactions Corporation. You may not know Interactions Corporation, but last November, Interactions and AT&T announced an agreement to transfer ownership of the AT&T Watson technology and research program to Interactions, including all speech recognition technology from AT&T Labs Advanced Technologies.
Yvonne Gloria, who is leading the design, development, and implementation of speech recognition at Ford Motor Company, gave a presentation titled: “Is natural language the future of speech interaction in the vehicle?”
Yvonne who also works on the voice user interface for SYNC 3 globally and shared interesting details about how the usage of in-car VUIs differs in the US, Europe, and China. However, she had no answer to an obviously provocative audience question, why car manufactures would still try to build proprietary systems, all while Apple-CarPlay and Android-Auto start to emerge.
Yvonne’s presentation was directly followed by Susan L. Hura, owner of SpeechUsability, a consulting firm dedicated to collecting relevant, actionable data from users, to make speech-enabled interaction intuitive and appealing. Dr. Hura, certainly a luminary in this field, has served as Program Co-Chair of SpeechTEK since 2007. Susan talked about the “Worst Practices in Automotive Speech Interfaces”, and also shared some remarkable details, showing that VUIs in cars perform actually much better and are more accepted than their reputation might suggest.
Lisa Falkson, Senior VUI/UX Designer at CloudCar talked about “Designing Effective Voice User Interfaces for the Connected Car”. We learned that CloudCar is working with Expect Labs, using their Knowledge Graph and Language Understanding solutions.
I had the chance to give two talks: “Bridging the gap between Speech Recognition and Business Logic“ and “VUI in Wearables and the IoT, technical possibilities“. As some presenters were trying to put themselves into the shoes of developers, who are trying to succeed with Voice User Interface, I was able to share first hand experience and details about just that, trying to make VUIs work.
Like in previous years, the conference was incredibly well organized and it was great to meet again with luminaries and industry leaders. The conference’s unusual format of keeping presentations to 20 minutes is well established now. It makes it interesting and keeps everything lively. Like I wrote in the opening, Voice in the Car was one of the main themes of this conference and I guess we can and should expect a lot of improvements in this area, in the not too distant future.