Wolf Paulus

Journal

Navigation Menu

Speech Synthesis and the Quality of Voice

Posted by on Mar 1, 2013 in Android

When listing to the radio or a podcast, while driving to work, I don’t think I imagine how the person I’m listing to, looks like. Still, if later, I happen to see them for the first time, in a picture or video, I often find myself surprised.

A verbally responding mobile application has many obvious advantages. For instance, users don’t have to decipher tiny fonts on small displays, in fact, they don’t have to look at the display at all. Just like colors and typography contribute considerably to the look and feel of an application, so does the voice quality for a voice enabled mobile application.

There are at least three different approaches to synthesize text.

There might be a Text-To-Speech module built into the OS, or a separately installed Text-o-Speech engines can plug-in to the OS’s Text-To-Speech module.
Secondly, instead of requiring a separate install, a synthesizer and voices can be packaged and shipped with the application.
Lastly, a web-service can be used, to synthesize text. The advantage of this, would be a more predictable and consistent voice quality, comparatively independent from the hardware and operation system used on the mobile client.

Read More

Artist on Android w/ Voice Recognition

Posted by on Feb 21, 2013 in Android

Read More

E*Trade Mobile – Voice Commands

Posted by on Jan 27, 2013 in Android

ETrade provides a great mobile app experience on iPhone, iPad, Android, Windows Phone, and Blackberry. I think it’s almost expected that the feature-set provided by the dedicated native mobile applications are not quite the same. The Windows version and especially the one for Blackberry fall far behind what ETrade has to offer on Android and iOS.

For instance, in April 2012, Speech Recognition was first added to their iPhone (not iPad) mobile application and later to the Android app as well; allowing the user to request stock quotes, options chains, company information, or to launch the stock order, just by using voice.

“Investors are becoming more accustomed to interacting with voice-enabled technology, and we’re proud to be one of the first to offer this innovative feature to our mobile users,” said Michael Curcio, President, ETRADE Securities. “By integrating voice technology, ETRADE provides a mobile experience unlike any other – creating a state-of-the-art and convenient approach to navigation.”

ETrade uses speech recognition and speech synthesis software provided by Nuance Communications, Inc. The application is feature-packed, comes as an 11 MB download, and is not, what you would call a thin client. Only a very few of those features however, are accessible through Voice Commands.

Read More

Pandering to the Lowest Common Denominator of public taste?

Posted by on Jun 3, 2012 in Android

I recently had the chance to attend a CommNexus event in San Diego, titled PhoneGap vs. Titanium: What Is the Best Tool to Build an HTML5 Mobile App?”

Two panelists and a moderator were battling it out, comparing the two frameworks, mentioned benefits, and also demo-ed the mobile applications they had built.

The event-title of course was a little misleading, since you would be using JavaScript and not HTML5, if Titanium were your framework of choice. Its heavy focus on JavaScript should make Titanium more suitable for applications that are heavy on processing logic.

Read More

Android and OCR

Posted by on Dec 18, 2011 in Android

I’m still remembering it well, the first piece of software I wrote when I came to the US was a de-skewing algorithm. Deskewing an image helps a lot, if you want to do OCR, OMR, barcode detect, or just improve the readability of scanned images.
At the time, I was working for a small software company, developing TeleForm, an application that reads data from paper forms and stores that data in previously created databases. The Cardiff TeleForm product was later re-branded Verity-TeleForm for a brief period in 2004 and 2005 when Verity Inc. acquired Cardiff Software. In 2005, when Autonomy acquired Verity, the Cardiff brand was reintroduced as Autonomy Cardiff (http://www.cardiff.com); more recently, Autonomy was acquired by HP.

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten, or printed text into machine-encoded text.

Image Deskew is the process of removing skew from images (especially bitmaps created using a scanner). Skew is an artifact that can occur in scanned images because of the camera being misaligned, imperfections in the scanning or surface, or simply because the paper was not placed completely flat when scanned.

Now most of the data entry or origination happens on the Web, where most of the forms processing has been moved to as well, i.e. OCR hasn’t been in vogue for quite a while. However, the popularity of smartphones, combined with built-in high-quality cameras has created a new category of mobile applications, benefiting greatly from OCR. Take Word-Lens (http://questvisual.com) as an example: an augmented reality translation application that tries to find out what the letters are in an image and then looks in a dictionary, to eventually draws the words back on the screen in translation.

On Device or In The Cloud ?

Before deciding on an OCR library, one needs to decide, where the OCR process should take place: on the Smartphone or in the Cloud. Each approach has its advantages.
On device OCR can be performed without requiring an Internet connection and instead of sending a photo, which can potentially be huge (many phones have 8 or 12 Mega-Pixel cameras now), the text is recognized by an on-board OCR-engine.

Read More

Android ICS Source Code for your IDE

Posted by on Nov 14, 2011 in Android

“Your father’s light saber. This is the weapon of a Jedi Knight. Not as clumsy or random as a blaster; an elegant weapon for a more civilized age. For over a thousand generations, the Jedi Knights were the guardians of peace and justice in the Old Republic. Before the dark times… before the Empire.”

While the Android 4.0 SDK comes with a complete set of javadocs, the source code of the SDK is missing in the SDK distribution. This is very unfortunate, since you cannot easily debug into SDK methods (at least not without running into de-compiled code) nor can you see how things actually work.

Eclipse - Source Not Found

However, there is a quick fix to that problem. I downloaded the complete Android source including the Linux, drivers, libs, etc., like explained here: http://source.android.com/source/download.html and ran small Java program on the source tree. I used to this with a simple bash script but over the last couple of Android Releases, the java source locations got a little more diverse and I started missing a couple files. So instead, this Java program walks the source tree and looks for java source files. All those will then be copied into a new location, considering their package name. Finally, the jar tool gets called to put all the source into a single bundle for easier handling.

Read More