I believe that some concepts are communicated best through video clips or short films. Enjoy some short HD films that I have created over the last few months and years.
Alexa, take Me For A Ride
This video is about Voice User Interfaces
Raspberry Pi 2 – Translator
Short demo video shows the Raspberry Pi running a translator app, using web services from Google and Microsoft for speech recognition, speech synthesis, and translation.
More details can be found here: wolfpaulus.com/jounal/embedded/raspberrypi2-translator/
My Interview with an Avatar
This video is about something new and cool. On which side of the uncanny valley it resides is for you to decide.
Speech Recognition – Coming of Age
Most people read text faster, than listening to it, read aloud. And many type text faster than dictating it.
But even in traditional computing, Voice has been used successfully, for short-cutting complex navigation trees or formulating unstructured queries.
A more than ever diversified mobile device landscape, requires us to rethink established UX patterns. For this next generation of mobile devices, the use of Voice for input and output, will be significant. UI-Widgets, Tables, and Forms, simply don’t work on wear-ables and in-car systems.
Let’s have a look, at how a “hands free – eyes free” User Experience could look like, even for a task that is not necessary a favorable showcase: “Capturing Receipt Information.”
Listing for voice commands stays dormant, until the user says “OK Intuit”, this hot-word activates the app’s on-device speech recognizer, which is optimized for recognizing about a dozen menu items and commands. However, if the application detects that it was open through the Google Voice launcher, it automatically activate the on-device recognizer.
On device recognition is performed by an Open Source Toolkit developed by Carnegie Mellon University. It’s highly accurate only for a few dozen words, but preferably suitable for in-application navigation. On application launch, complex dynamic grammars that are unique to every customer, are uploaded to a LumenVox recognition server in our private Cloud.
Capturing the four required data fields for an expense, happens in a two step process: Step one asks for Amount and Payee, secondly, we ask for Payment Method and Category.
While, Speech Recognition as come a long way, it’s far from perfect, and if the recognition confidence is too low, we intelligently ask again, but including the information we already got right. At we end, we present a summary and ask for confirmation.
Cora, Taught to Amaze
Hi, my name is Cora, I hope you still remember me. Last time, I helped you finding out what the balance of your checking account was, and helped you answer if you could afford eating out that night. I also provided you with up to the minute stock quotes, all using natural language.
So here we are, a couple weeks later, and I am back. You can now tell me to “Open Preferences” and I will show you this application’s preferences screen, or tell me to “Shut Down” and I will quit the Cora application for you. And about the stock quotes, you do not have to know the ticker symbols anymore, I got that covered now; and while we are talking about stock quotes, I would love to show you something really cool.
Please, allow me to show you how a dual screen experience could help you, doing your taxes next time.
Cora, your imaginary friend
Cora is a mobile, natural language user-interface a.k.a. NUI, accessing vast AIML-Knowledge bases in the cloud, as well as realtime information Web-services like Yahoo-Finance and Mint.
To the user however, Cora is a 21 year old, sometimes highly unorthodox, snarky, young lady, who grew up in Birmingham, United Kingdom. If you put her on your phone, she will be your imaginary friend, chatting with you whenever you feel like it.
Voice User Interfaces – It all starts with Recognition
The adaption of Voice may become the biggest advancement in user interface design since the transition from text-based to graphical user interfaces.
It all starts with Speech Recognition, the correct and instantaneous recognition of a spoken input. Users’ tolerance for an incorrect transcription or the patients for not getting an almost immediate response seems to be rather low.
To demonstrate what can reasonably be expected from presently available Speech Recognition Engines when called from a mobile device, I have developed an Android Mobile Application that probes the four most prominent Speech To Text Services from Google, AT&T, iSpeech, and Nuance.
Live Coding, Speech Synthesis on Android
If you put you ear on the ground, you could almost hear it coming. The broad introduction of Voice User Interfaces, allowing the interaction with mobile devices through voice/speech, may be the next revolution, when it comes to User Interface Engineering. Google’s latest high profile hiring and acquisitions, Nuance’s acquisitions of MacSpeech, iTa, PerSay, SVOX (provider of Android’s built-in TTS), Loquendo, Vlingo, and VirtuOz, and Amazon’s acquisition of iSpeech .. coincidence or land-grab? You decide!
The Voice User Interface (a.k.a VUI) related short films I have created so far, are meant to promote and inspire the creation of awesome product experiences. Today, I would like to shift gears, just a little, and show you a developer focused, deep-dive into Speech Synthesis, the computer-generated spoken language on the basis of written input.
Just like colors and typography contribute to the look and feel of an application, so does the voice for a voice enabled mobile application. Just consider how much Dennis Haysbert’s Voice (the All State guy) contributes to the All State brand, “You’re In Good Hands With Allstate.”
I have developed a thin API layer, allowing to exchange the underlying Text-To-Speech provider at run-time, providing an easy to use framework, inviting mobile developers to experiment with Voice Synthesis.
With over 10 minutes, this film is a little longer; I was trying to explain a relatively complex topic in easy to digest segments, using inputs form video camera, mobile phone, and computer. If you manage to watch the whole piece, let me know what you think and if this worked for you. Sorry, but to make the code readable, this was produced in 720p and you may have to let it sit and buffer for a little while before you start playing it.
Mobile NFC Card Readers and Terminals
We present two ideas that could eventually allow small and micro merchants to use mobile phones to accept payments from NFC Credit Cards or Sticker. Both approaches hide most of the involved complexity behind an elegant solution, allowing small merchants to not miss a single sale
Tap & Go a.k.a PayPass, is a new simple way of paying. PayPass is a payment method that lets you make purchases without having to swipe your card or provide your signature. A simple tap with a card, key fob, or mobile phone is all it takes to pay at checkout.
So this Saturday morning, I took paying with a mobile phone to the test – the only method of payment available to me was the Google Wallet application on a Samsung Nexus S Android Phone running on Sprint’s 4G Network.
Google Wallet can be linked to a Citi MasterCard, or like I did, used as a prepaid card, funded with any of my existing credit cards.
Printing by Proximity
Printing directly from your mobile phone to a Wifi printer, all triggered by NFC. Pretty cool, don’t you think so too?