Emotional Prosody

Mobile Voice Conference, Mar 2014, San Francisco, CA

Voice-enabled mobile application provide users access to information instantly, naturally, and almost effortlessly. Simple Voice-Commands however, have failed to gain traction, probably because it’s hard to remember the exact utterance of a command phrase. Instead, more lenient and flexible, conversational style software agents have been more successful.

When it comes to communicating results back to the user, a text response often seems enough. Still, to provide a truly hands-free, eyes-free user experience, a text response needs to be synthesized and played through the phone’s speaker. The quality of the speech synthesis is determined by many factors, including sound quality (sampling rate, dynamic range), prosody (rhythm, stress, and intonation of speech) and maybe less obviously, by Emotional Prosody, conveyed through changes in pitch, loudness, timbre, speech rate, and pauses. This talk will share some ideas, concepts, and the technology needed, to build a prototype implementation that synthesizes text that was augmented with emotional values.


This browser does not support PDFs. Please download the PDF to view it: Download PDF.