When listing to the radio or a podcast, while driving to work, I don’t think I imagine how the person I’m listing to, looks like. Still, if later, I happen to see them for the first time, in a picture or video, I often find myself surprised.
A verbally responding mobile application has many obvious advantages. For instance, users don’t have to decipher tiny fonts on small displays, in fact, they don’t have to look at the display at all. Just like colors and typography contribute considerably to the look and feel of an application, so does the voice quality for a voice enabled mobile application.
There are at least three different approaches to synthesize text.
There might be a Text-To-Speech module built into the OS, or a separately installed Text-o-Speech engines can plug-in to the OS’s Text-To-Speech module.
Secondly, instead of requiring a separate install, a synthesizer and voices can be packaged and shipped with the application.
Lastly, a web-service can be used, to synthesize text. The advantage of this, would be a more predictable and consistent voice quality, comparatively independent from the hardware and operation system used on the mobile client.






