SpeechTEK 2014 – New York City

I was able to attend this year’s SpeechTek 2014 conference in New York City. Organized in four parallel tracks, the conference’s advanced technology track was devoted to topics like virtual agents, voice biometrics, natural language understanding, or speech technologies for smart devices.

Bruce Balentine, @brucebalentine Chief Scientist at the Enterprise Integration Group, gave the keynote on the 2nd day of the conference, which was probably the most impacting and insightful talk of the whole event. One of this key points was that, “We can now stop selling the future,” all basic technologies are at our disposal, empowering us to succeed, building smart speech systems. However, he also pointed out a major shortcoming of Voice User Interfaces: micro-interactions were never established, have not been learned by users. In comparison, micro-interactions like pinch/zoom, were successfully taught to users of touch enabled user interfaces.

“We can stop selling the future. The technologies are here and at your disposal”
-Bruce Balentine

Bruce advised to be conscious about what you are trying to build,

Real, conscious, sentient entity
As the user explores, the interaction becomes deeper. There is more there than the user can ever discover.
Convincing simulation
The user believes because the illusion is compelling but as he explores, interactions become more shallow, the terrain more familiar and the magic disappears.
User interface into a useful product
User focus is on utility and task, not the interface itself and as the user explores, user power increases

He challenged the audience to build smart speech systems that solve current needs, instead of dreaming of the truly reasoning, truly rationalizing system that he doesn’t expect, for at least another 5 years. Not surprisingly, he was favoring an approach, were the VUI gives clear cues, tell a user what he can say, instead of trying to build a fully unconstrained voice user interface, i.e. still favoring directed dialog over natural language, which however remains the ultimate goal.

Multimodal User Interface Design

Tom Schalk, VP Voice Technology at Sirius XM, Satellite Radio, discussed a multimodal user interface design, combining voice and touch.

He illustrated all concepts on a complex in-vehicle entertainment system, with the focus to accomplish tasks quickly and with only little distraction. While changing the station for instance can easily be done with voice alone, once the user needs to look/glimpse at the display, touch is available and even preferred, e.g., “Change the station” triggers a list of large icons (representing genres) to appear. The user can now for instance say, “Country” or touch the designated icon or tile.

“Accuracy and Latency are both critical”

-Tom Schalk

Natural Language Understanding

Dr. Deborah Dahl, @deborahdahl Principal at Conversational Technologies and in my book, one of the most complete (theory and practice) and experienced experts in the speech and natural language processing industry, taught a three hour workshop on natural language understanding and two days later gave a great overview about available free and Open Source Tools, to develop multi modal applications.

During the NLU workshop, Deborah used Wit.ai extensively. Wit.AI is a web-service and web user interface, enabling developers to add a natural language interface to their app or device very quickly. Even with all the unknowns (scalability, longevity, pricing model, etc.), it’s currently the only knowledge mapping framework that allows for fast prototyping, alternatives include:

Knowledge Representation:

Dialog Management

Free Text to Speech Synthesis

Dr. Richard S. Wallace, Chief Science Officer at Pandorabots, creator of AIML (Artificial Intelligence Markup Language), and Loebner Prize (annual Turing Test competition) winner in 2000, 2001 and 2004, spoke about AIML 2.0 and the upcoming 2.1 specification, which contains a triple-store (Subject Object Predicate) that allows simple deductions like:

Harry is older than Tom : Tom is older than William => Harry is older than William.

Followed by Mike McTear’s talk about how to add voice to a virtual assistant. Michael McTear is an Emeritus Professor at the University of Ulster and the author of the “Voice Application Development for Android” book.

Mike showed how AIML’s so called out-of-band <oob/> tags can be used to integrate external web-service calls into AIML driven voice-enabled virtual assistants.

Ubiquitous Computing and Wearables

The panel discussing: Voice UI Meets Ubiquitous Computing was comprised of luminaries, including Leor Grebler, @grebler Co-founder & CEO – Unified Computer Intelligence Corp and creator of the Ubi and Roberto Pieraccini, @RobertoPieracc who was the CEO and Director of the International Computer Science Institute at Berkeley and is currently leading the advanced interaction team for a startup that is creating the Jibo

Allen Firstenberg, @afirstenberg Project Guru at Objective Consulting gave a talk titled “Designing for Google Glass and Wearables – where voice fits in”.
In Glasshole (Glasshole – a person who constantly talks to their Google Glass, ignoring the outside world) style self-promotion at its best, voice was not mentioned until shorty before the end of the session, when someone had asked, where speech recognition would actually take place, on the wearable device, the phone, or an external Web-Service.

Dr. Judith Markowitz, a leading analyst in speaker biometrics, provided an insightful and entertaining overview of speech-capable, autonomous robots, available today, as well as those being currently developed in labs around the world. Judith Markowitz also later moderated the panel on the Evolution of Computers and Society.

William Meisel, President at TMA Associates, Michael Karasick, VP, InnovationsGroup, IBM, Watson Group at IBM, Juan E. Gilbert, Associate Chair of Research at the University of Florida, discussed how smarter, faster, and more ubiquitous computing could impact the human society.

But it was again Bruce Balentine, asking the best question, “How can we use this enormous computing power to better ourselves?”

SpeechTEK 2014 – New York City

Multimodal User Interface Design

Natural Language Understanding

Ubiquitous Computing and Wearables

NYC – August 2014

Leave a Reply Cancel reply