Listen 2014 was a short one-day conference, focusing on “Voice Interfaces for the Internet of Things” organized by Wit.ai Inc, the company that provides a WebService, making it easy for developers to build applications and devices that you can talk to. The conference took place on November 6. 2014 at the unique Bluxome Winery in San Francisco.

Voice User Interfaces (VUI) complement the Internet Of Things (IOT); and not just for economical reasons, attaching a touch enabled LCD display to connected devices like door locks, thermostats, etc, is not really an option.
Wit.ai’s Jen Dewalt gave the opening address, stating that VUIs need to be intuitive and effective – and recommended to start small, but that giving the VUI a personality would be absolutely OK.

Siri, Back to the Future

Adam Cheyer, founder of Siri and now VIV, gave the conference Keynote, titled “Siri, Back to the Future”.
Over the last three years, I have seen Adam speak three or for times, but never saw him that good, that influential, maybe because he was given enough time. Hearing him talk about how Siri happened was absolutely inspiring. When Apple took over, Siri already had structured knowledge in 15 domain and was always taking context from previous dialog exchanges into consideration. Siri was developed as a “do-engine” and a “knowledge navigator,” allowing people quick and easy access to details related to travel, scheduling, weather and other kinds of information.

“Apple bought Siri Inc. for a $100 to 200 million, and Siri continued to be available in the app store. When iPhone 4S launched, in early October 2011, it finally had Siri fully integrated into iOS. Steve Jobs died the day after the launch. – Based on the dates mentioned in the Knowledge Navigator video, it takes place on September 16, 2011.”

Adam is currently building VIV, with the goal to build a personal assistant framework that incorporates an app-store for agent knowledge-bases, providing 3rd parties the capability to add domain knowledge.

Panel – How personal assistants will shape the future of IOT

Panelists:

Adam Cheyer, founder of Siri and now VIV
Rob Chambers, Principal Group Program, Manager, Microsoft Cortana
Vishal Sharma, Former VP, Google Now
Ron Croen, Founder & Former CEO Nuance Communications
Sunil Vemuri, Product Manager, Google.

Ron Croen talked about his venture into building an interactive assistant that is built on video recordings of a real person. And Vishal Sharma reminded us all, how much all the components of virtual assistants (browser, devices, frameworks, etc ) are still evolving, how much incompatibility there is, and that workload distribution is still shifting.
The most interesting and controversial part of the panel discussion evolved around the question: if there should be a single virtual assent or multiple domain-specific assistants.

At the end only Microsoft’s Rob Chambers was in the multiple-agent camp, everyone else favored an universal agent, but that needed to be open to 3rd-parties.
Adam Cheyer argued for an multi-modal assistant, allowing the user to interact using touch for objects that are already on the screen, and using voice for instance to add new objects.
Triggered by the audience, the discussion then took a turn towards if and how we should should build virtual personas that would replicate deceased, considering their former knowledge and personality.

We already assumed that virtual assistance will be used as utilities to do things but also to show compassion, emphasizing the importance to show and express emotions.
The business model for virtual assistance remained unclear, but everyone expressed the sincere hope that it would not an ad-driven model.

History of Voice Interfaces

Roberto Pieraccini, Director Advanced Conversational Technology at Jibo did not reveal any new secrets about the Jibo, but gave a great talk about the history of VUI and speech recognition in particular and how far we have come in this area.

Developer Case Studies

DayRev

Daniel Sposito, Software Engineer, Tailwind Founder
The DayRev mobile app helps you staying up-to-date with your favorite topics by reading out loud to you and listening for your feedback. You can listen to the latest news headlines and information from your favorite topics as DayRev reads them out loud to you.

HULU/Cortana

Paul Beck. Software Engineer, Hulu
Paul talked about how they integrated Microsoft’s Cortana into Hulu to except voice commands for content searches. They build their own SemanticParser and LogicTree for actions like “watch”. Paul stressed the importance to find the perfect timeout when recording user input.
HULU already started experimenting with using Android Wear to capture voice input.

MARA – Running Assistant

Joel Wetzel CTO, Affirma Founder
M.A.R.A. is a hands-free running app and virtual running assistant. She is controlled entirely by voice. You tell her what kind of workout you want to do and she will track your progress and respond to your voice commands. You can ask her how you’re doing, where you are, about the weather, to play music, and lots of other things. She’ll be tracking your performance and comparing to historical results. She also watches the weather and will warn you if it’s about to start raining. You never have to take her out of your pocket. Just talk to her through your earbuds and enjoy a hands-free run!

Summing-up

This was an incredible conference, pulling people together in one location that truly deserve to be called the new thought leaders in this space. I think everyone left with the clear understanding that Siri was not just product, but a defining “event” that significantly pushed the idea of a virtual assistant forward, but also that this was only possible because speech recognition and synthesis had progressed so much.

Listen 2014 broke with traditional conferences in this space, like SpeechTEK or the MobileVoice conference, which are heavily sponsored and influenced by company that came out of the IVR space, (IVR: Interactive voice response, is a technology that allows a computer to interact with humans through the use of voice and DTMF tones input via keypad.) like Nuance, LumenVox, etc.
Besides Ron Croen, the founder and first CEO of Nuance, this group was completely absent and a new group of people and ideas took the stage, those that use and understand speech recognition and syntheses as a matured utility to build the next generation of voice driven applications.

The product and technology showcases were clear indications that wit.ai or SpeakToIt’s api.ai, the tool and services have arrived to allow for cost effective experimentation that will take virtual assistant technology to its next level. And as in-car systems already did, the Internet Of Things will advance the adoption for Voice User Interfaces.

Listen2014 Conference