The CI-Conference is the successor of the Mobile Voice Conference, and like its predecessor, organized by Bill Meisel and AVIOS (Applied Voice Input Output Society). The two day conference (1/30-31) ran like clockwork at the Westin in San Jose, had a keynote, two keynote panels, and 26 sessions.
What makes this conference unique, is how it balances academia and industry experts, all coming together, to share their latest R&D and market activities. Moreover, a conference where about 20% of the participants are also presenting, has to be all about knowledge exchange and open discussion. If a presenter get’s out of a session alive, he probably proved his point – I’m only partially joking, but find it quite refreshing that at this conference opposing ideas can be discussed instead of being sugarcoated. No need to worry though, given the tight schedule, all were saved by the bell; and before you get the the wrong idea, the atmosphere of the conference was intimate and friendly. It was a great networking event.
Google’s Sunil Vemuri, Product Manager of Google Assistant, started the conference with a keynote loaded with live demos on several devices, showing a vision of a unified experience. Sunil mentioned this about the Google Assistant:
- Speaking like a person, without pretending to be one.
- Always there, never in the way
- Smart, getting smarter
By the Numbers
Google had speakers in eight, Nuance in four, and Amazon in only two sessions. But it was not only the number of speakers, but their caliber that allowed Google to dominate. Here is the sentence that almost every speaker had in his talk: “As X just said” (replace X with a name of one of the eight Googlers, speaking at the conference.) The second most often heard sentence was probably “Allow users to talk, like taking to a friend.”
Several presenters, who had worked on Alexa, Google Home, Siri, and the Cortana platforms, stated that currently Amazon’s Alexa platform is the most powerful, most open, and easiest to develop for. However, Google left us all with the impression to have a much deeper understanding of the problem space and is truly concerned about the user experience. “Design for the user – the technology will follow.”
Amazon’s Alexa approach still seems to prefer to teach users how to best interact with the Echo, especially with regards to the “smart home”. Google on the other hand asserts that users already know how to talk.
Nuance’s speakers focused on speech and voice-recognition applications and case-studies around home-automation and speaker identification/authentication. I don’t know if their presentation-slides have changed over the last two years, the presentation content apparently didn’t. Personally, I felt strangely saddened, seeing a company that almost owned speech technology a couple years ago, has fallen behind so far so fast, unable to contribute to the challenges consider, during this year’s conference.
Conversations have no errors. Nandini introduced some new principles like using each turn in the dialog as an opportunity and to be very flexible and adjust for context. However, many of her ideas require access to the sound file or at least the transcript of the speech recognition, neither of which are currently provided by api.ai or the Alexa Skills Kit.
New Celebrities and Reliable Sources
Ilya Gelfenbeyn, Product Manager, api.ai at Google showed the same demo, he has been showing at conferences, ever since api.ai got acquired. Deborah Dahl, once again, had an impressive presentation on the current state of technology/tools and the standards that are currently being developed around them.
Personally, I found Margaret Urban’s talk “The Balancing Act: Writing Naturally for an Unnatural Voice” most influential and cannot wait to get access to her slide deck. Margaret worked for many years at Nuance and is now at Google …
Great Panel and great Moderator
Moderated by Bill Scholz (30 years experience in cognitive science), this panel talked about the risks and challenges that come with new technology and where on the “hype curve” the VUI belongs.
- Ashwin Ram, Sr. Manager Alexa AI, Amazon
- Dan Bagley, CEO, Cepstral (which provides the only SSML capable speech synthesizer for macOS)
- Raj Tumuluri, Principal, Openstream
- Jay Wilpon, Senior VP, Natural Language Research, Interactions
Voice vs Chat
Could be the sessions I went to, or the fact that the predecessor of this conference was the Mobile Voice Conference, or maybe it’s just what I want to believe, but the VUI – Voice User Interface, absolutely dominated this conference. Text-based chat and chatbot hardly played a role and the focus on prosody/intonation processing and generation may support my biased observation.
While the caravan moves on to meet again at the SpeechTEK conference, in April in Washington DC; I think everyone was very excited about the success of the 1st Conversational Interaction Conference.