The fourth annual Mobile Voice Conference took place at the Hyatt Fisherman’s Wharf, San Francisco, on March 3rd-5th, 2014.
Opening the Mobile Voice Conference, Robert Weideman, GM and Executive VP at Nuance, stated in his keynote address that building an intelligent multichannel virtual assistant, delivering personalized customer service via a human-like conversational interface, built on their Nina technology, requires a collaborative effort from Nuance’s consulting team and the client’s engineering team. Nuance would probably be doing 40% of the work. While he thinks that the Nina platform would eventually evolve and be made available as an SDK, it isn’t something he sees happening anytime soon. Knowing Robert, from the days we were both working at Cardiff Software, (where he served as VP Marketing), those numbers and facts are absolutely trustworthy.
Advancements at an alternative approach for building an adaptive spoken dialogue systems, namely AIML 2.0 (Artificial Intelligence Markup Language), were explained by Mike McTear, author of the just released “Voice Application Development for Android” book and Professor at University of Ulster Belfast Ireland (Spoken Dialog Systems), where the Loebner Prize 2013 / Turing Test competition took place. Mike mentioned to me that tools for efficiently training bots, i.e. automatically feeding information into knowledge bases, is actively being worked on and should give AIML a real boost.
Voice User Interfaces
Personal Assistants like SIRI as well as Virtual Assistants, like those deployed by Comcast or Artificial Solutions, all use Natural Language based interactions and were much discussed, throughout the conference.
Brent White, a Mobile UX Architect at Oracle, showed a Voice User Interface for Oracle’s CRM and Sales Cloud service. My takeaway from his presentation: there is no consensus in the UX community, about how to build a multichannel (voice, touch, display) user interface, or maybe Oracle is not part of that community.
Jeanine Heck, a Senior Director at Comcast, showed us how Comcast had added voice controls to their new X1 set-top box. Comcast’s voice user interface for their TV-Service offering works by capturing voice commands either through a microphone, built-in the remote control, or via a mobile app, running on an Android or iPhone. The speech recognition behind Comcast’s service is provided by the Nuance Cloud Service. Comcast did an enormous amount of user testing, and even today, some time after its introduction, every recognition result is being looked at, by checking log-files and making necessary adjustments. Very interestingly, Comcast found that users speak with their TV not robotically, but rather in sentences or phrases, conversation patterns, not unlike those seen with Apple’s SIRI.
The opposite, a robotic, short, command word like conversation pattern however, was observed by Andy Pert, Chief Marketing Officer at Artificial Solutions, when he described a deployment of their Virtual Assistant technology at Kabel Deutschland, a cable provider in Germany, not unlike Comcast here in the US.
Voice for Wearable
Rick Osterloh, SVP at Motorola Mobility kicked off the 2nd day of the conference. I briefly worked for Rick, while at Motorola and know him as an extremely hands-on guy, who was flashing his Android phone with the lasted MotoBLUR ROMs all by himself. Rick stated that about 70% of all Moto X users would use the always on passive listening, feature every day, waking up their device, saying “OK Google Now”. Rick also said that Motorola Mobility would have a new smart watch and a new smart ear piece available later this year.
A very interesting panel discussion, which among others, had Eric Migicovsky (Founder and CEO of Pebble), Jeff Harris (PM for Google Glass), and Steve Holmes (VP Smart Devices at Intel) on, showed a real divide with regards to voice user interfaces for wearable devices. Eric dismissed voice for devices like the Pebble watch and pointed to the fact that he didn’t even put a touch screen on his device. He contributed much of Pebble’s success to the simplicity paired with receiving instant tactile feedback, by pressing a real, hardware button.
Google Glass’ Jeff Harris on the other hand, contributed much of Glass’ success to the speech recognition capability that is built-in to Google Glass, i.e. simple command recognition happens right on the device and not via a Google Cloud service.
Once again, a great conference, confirming some, but also contradicting a few of my beliefs, with regards to voice user interfaces. And while off-topic, the talks given by William Meisel (Pres. TMA Asso.) and Marsal Gavalda (Director Research at Expect Labs) were truly thought provoking and highlights of this year’s Mobile Voice Conference.
Admittedly, there were the usual, marketing heavy presentations (Nuance, LinguaSys, etc.) but there was also Dr. Marie Meteer, who is running the Computational Linguistics Program at Brandeis University, or Deborah Dahl, presenting a most comprehensive list of technologies, tools, and standards for multimodal application development; and of course, Dr. Silke Witt-Ehsani, VP at Fluential, talking about her experience, building Intelligent Assistants.