SpeechTEK 2015 – New York City

This year, the annual SpeechTEK conference focused heavily on Virtual Agents and Advances in Biometrics. The conference took place at last year’s venue, the Marriott Marquis Hotel in New York City, but more noteworthy, this was the conference’s twenties anniversary. So I thought it was only appropriate, to put my conference talk Bridging the Gap between Speech Recognition and Business Logic in a historical context: showing a journey of 40 years that started with the standalone computer and a monochrome text user-interface, paused at the globally connected voice-only user interface device, and ended with a virtual assistant that featured an emotion and voice enabled avatar.

The main conference lasted three days with four parallel tracks and had 120 panelists and speakers, including luminaries like William Meisel, Deborah Dahl, Sunil Vermuri, or Roberto Pieraccini.

Jason Mars

Jason Mars, Assistant Professor of Computer Science, University of Michigan, Ann Arbor talked about Sirius, the open end-to-end standalone speech and vision based intelligent personal assistant (IPA) service similar to Apple Siri, Google Now, Microsoft Cortana, or Amazon Echo. Sirius implements the core functionalities of an IPA, including speech recognition, image matching, natural language processing and a question-and-answer system.

Wayne Scholar, CTO at GetAbby talked about that every company needs to provide a Virtual Agent interface and every mobile app should be built with a VA interface.

Roberto Pieraccini

Roberto Pieraccini, Director, Advanced Conversational Technologies at Jibo, Inc., talked about Building Skills for a Conversational Robot and introduced a software development kit for building new custom skills for the still un-released Jibo.

In a customer case study, David Claiborn, Director Voice Portal Technology at United Healthcare and Ellie Garrett, Senior Business Analyst at United HealthCare, showed how they introduced their consumer health insurance organization to voice biometrics and discussed the many challenges they had to overcome. Interestingly, they ended up with having end-users speaking their birthday, instead of the often used “My voice is my password” phrase, and thereby getting another factors (something you know) for free.

Sunil Vermuri

Google’s Sunil Vermuri covered how to use System Voice Actions, which are specified by the Android system, and Custom Voice Actions,which are defined by custom/3rd party apps, in the Android OS.

Sunil shared many statistics, including

  • In the US, only about 36 apps are installed on a smartphone, of which 27 are rarely used.
  • 55% of teens (13-18) and 41% of adults, use voice search more than once per day.
  • Voice search is mostly used while watching TV (59%).

Exposing a 3rd party app to voice searches, may of course also allow Google to more easily reach into an app and get access to data, which is otherwise unavailable

Andrea Ayres

In another interesting case study, Andrea Ayres, Senior Manager of Telephony Automated Services at Lloyds Banking Group, explained how she deployed natural language technology (a.k.a. Natural Speech IVR) into its retail banking customer service.

This technology allows users to freely state their intent when talking to an IVR (interactive voice response) system, instead of following strict menu hierarchies. E.g., after authentication, customers are greeted by their names and the question, “How can I help you today?”

Attending a conference for the 2nd time comes with the benefit of arriving with reasonable expectation and achievable goals. However, being a speaker here for the 1st time certainly adds some excitement. Talking with industry leaders and luminaries helped confirming some viewpoint and re-thinking others. While some part of the industry seems really mature, new solutions like Sirius and companies like VoicePIN,  keep the speech technology and this conference fresh and more lively then some might expect.

Share this post:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.