If you’re interested in having me present these talks to your team or user group, please get in touch
Onward to Conversational Applications
Conversational Interaction Conference 2022, April 12-13, San Jose, CA
When in the nineties, companies started publishing their websites, they needed people with a new skill set: Webmasters. Today, this dated job title has morphed into a broad field of tech employment, including graphic design, search engine optimization, and content strategy. The separation of concerns has led to a better and faster process of building modern web experiences.
This talk explores how much of this can already be observed in the development process of Conversational and Voice User Interfaces and how a new approach to API design may make the “Webmasters for VUI” obsolete.
Shallow CUIs and VUIs are today’s goofy static web pages that need to be replaced by full-blown conversational applications.
Striving for likability – (always empathic but never biased)
Voice 2020, Nov, Online Event, Nov. 2020
Over the next decade, the ambient computing era will eclipse the PC era. Information will be available everywhere, accessible frictionless and frustration-free through voice user interfaces. However, many of the old recipes for creating delight don’t apply to Voice-first or Voice-Only experiences.
Recent research shows that when communicating emotions, your voice matters more than your words. I.e., not WHAT you say, but HOW you say it, the linguistic and paralinguistic cues, most influences the emotion that is communicated when you talk. Interestingly, those emotions are more accurately perceived in a voice-only interaction, when compared to multi-modal.
This talk explores and demonstrates possibilities of a likable and unbiased engagement, by using affective computing technologies and emotions analytics (e.g., expressive speech synthesis, sentiment analysis, or readability statistics).
After briefly rehashing that a young, feminine, overly upbeat voice might not always be the most appropriate choice when synthesizing messages, we will turn to a more controversial, but interesting topic: genderless bots and an approach that is using machine learning models, including “Neural Word Embeddings”, to detect bias, before a bot relays it to its users.
One AWS Lambda to Rule Them All
AWS Community Day 2020, Nov 13, Online Event
Whenever I got a new laptop, or was just (re-) installing Mac OS from scratch, a Java JDK, IntelliJ IDEA, and Tomcat, the “pure Java” HTTP web server environment, were always among the 1st things I installed. How times have changed. Now it’s Docker, Python3, PyCharm, and AWS and SAM CLIs that go on first. I still do Java, quite a bit actually, but Python and AWS Lambda are on the rise. An AWS Lambda function can be simple but still quite powerful, doing many things I used to do with Tomcat.
This talk shows an AWS Lambda function, implemented in Python, performing things like:
- Serving an HTML page
- Consuming HTTP Post requests sent from that page HTML page
- Securely storing received information in a Dynamo DB
- Synthesizing text into speech, i.e. returning MP3 (digital audio)
- Calling others AWS Lambda functions
- Calling native libraries or executable that were deployed with the lambda function .. and more.
We will be using the AWS Web UI only very sparingly, but use a YAML file instead wherever it makes sense, like for declaring the DynamoDB.
Landbot Scoop – Season 1: The Future of Chatbots
Chatbots are a transition to a more open interface .. hosted by Jiaqi Pan, Sept 19, 2020
How to Set Chatbot User Expectations .. hosted by Billy Bateman, July 13, 2020
Always empathic, but never biased
Conversational Interaction Conference 2020, February 10-11, San Jose, CA
This talk explores and demonstrates possibilities of a likable and unbiased engagement, by using affective computing technologies and emotions analytics. It also turns to a more controversial, but interesting topic: genderless bots and an approach that detects bias, before a bot relays it to its users.
Exploring Neural Word Embeddings with Python
Desert Code Camp 2019, Oct 10, Chandler-Gilbert, AZ Pecos Campus
Standing on the shoulders of giants, we don’t have to design, create, and train a neural network, but instead, use one that already exists. We download a public domain fully trained neural network and modifying it slightly (to make it load faster during the session).
I’m teaching a junior college Python course and usually show this at the end of a 16-week intro class. I.e., you don’t need to be a Python expert to get something out of this session. Knowing a little Python will be helpful, but all demos easily translate into other languages, like Java for instance.
After the session you will have a general understanding of “Neural Word Embeddings”, understand what “cosine similarity” means and how to calculate it. I know, that may not sound all that exciting. But imagine you type in the question, “Men is to boys what woman is to” and your Python program answers with the word “girls”. Or you type “Men is to king, what woman is to” and your Python program answers with the word “queen”. But that is just the beginning, I will also show you an example that uses the same technique applied to a much more relevant topic, detecting bias in a text.
Running Docker images in AWS Fargate
Desert Code Camp 2019, Oct 10, Chandler-Gilbert, AZ Pecos Campus
Allow me to tell you a story, a story about a simple web-service that answers only one question: if a given number is prime.
The core problem is first solved with a Java class, which is then wrapped into a WebServlet and tested within a web server environment. The web server, however, does not get directly installed, but a docker image is created, containing all the mentioned components.
Eventually, the docker container is pushed into AWS ECR, a container registry, from which it is deployed and run. Using AWS ECS and Fargate, the simple service is finally made public and available to the world, scalable, all without having to manage servers or clusters.
Well, this is not about story telling of course, nor will I focus too much on Web-Services or Docker, still, as a starting point, we want to create a simple Web-Service, implemented in Java and made available via Tomcat. This web server will then be put into a docker container and stored at the Amazon Elastic Container Registry (ECR), a private, but fully-managed container registry that makes it easy for developers to store, manage, and deploy Docker container images… But that’s just the the beginning, the focus will then be on AWS ECS and Fargate, a compute engine for Amazon ECS allowing you to run containers without having to manage servers or clusters.
All of this is done in code, i.e. not using the AWS Web UI. Of course all the code and shell scripts are demo-ed in this hands-on session .. and shared on github.
I expect that you’ll leave the session with a good understand of what AWS ECR, AWS ECS, and Fargate are all about and with a motivation to try it out and run your own docker container in fargate, making it available to your users, customers, or the world.
Chatbots Voice Multimodal .. hosted by Chad Oda
Affect in Bot Conversations
Business of Bots, Feb 6 2019, San Francisco, CA
We pride ourselves on creating delightful user experiences, where designers have obsessed over shades of colors, font types, or padding around text. But change is upon us, Voice first or Voice Only experiences don’t have a traditional UI. In this new environment of Ambient-Computing, where neither form factor nor looks matter, likability may become the ultimate differentiator. Not only what, but how a virtual assistant says it, will determine success. This talk explores the creation of a likable response, using affective computing and emotion analytics.
Impl. an AWS Lambda function in Java, building and deploying w/ AWS CodeBuild
Desert Code Camp 2018, Oct 6, Chandler-Gilbert, AZ Pecos Campus
Java sometimes doesn’t seem to be a 1st class citizen when it comes to AWS Lambda, but with just a few considerations, it’s easy to implement an efficient AWS Lambda function in Java. To become useful, we’ll also put the Lambda function behind the AWS API Gateway, so that it can be called from the web (i.e. with an HTTP GET or POST request).
Once we have done that we’ll push the code into a Github Repository and move the build and deployment process into AWS CodeBuild. This (still new) AWS service for Continuous Integration and Deployment (aka CI/CD) will pull the code from the Git Repo and build the artifact, puts it into an S3 bucket, from which it is deployed into AWS Lambda …
Come on .. that’s cool stuff: Software Development meets Dev Ops.
Creating an Alexa Skill w/ the newest Alexa Skill Kit for Java
Desert Code Camp 2018, Oct 6, Chandler-Gilbert, AZ Pecos Campus
A short two years ago, I talked about “Bots, Amazon Echo, language user interfaces in general” the slides are still available and the code is of course still in GitHub. However, much has changed and fortunately for the better!
I still belief that Java is a very suitable language to build Alexa Skills and even if you are eventually going to host your skill in EC2 or as an AWS Lambda function, during development, Tomcat is your friend. Running and debugging your skill right on your laptop offers tremendous advantages during the development process. So lets talk about how to develop an Alexa Skill using Java and the latest Alexa Skill Kit for Java. We wil be using Java/Tomcat installed on a Laptop as our hosting platform, quickly put an SLL Cert in place, and develop an Alexa Skill.
Affect in Bot Conversations
Conversational Interaction Conference 2018, February 5-6, San Jose, CA
In the near term, we can assume conversations with bots (voice and/or text) to remain short. With limited or no GUI necessary, the important differentiator of a delightful GUI now plays little to no role. With only limited information being exchanged in short conversations, opportunities for data-driven differentiation may also remain limited, (e.g. Google Home and Alexa Echo perform about equally, when asked for the weather). Deciding which bot, virtual assistant, or device to use, may come down to which is more likable (pleasant, friendly, kind, easy to like.)
This talk explores possibilities of a more personalized, contextual and likeable customer engagement, by using affective computing technologies and emotions analytics.
Java for Serverless Compute with AWS Lambda
Desert Code Camp 2017, Oct 14, Phoenix, AZ
Serverless computing is a cloud computing execution model in which the cloud provider dynamically manages the allocation of machine resources. Serverless computing allows running applications and services without thinking much about servers, runtime resources, or scaling issues.
This talk presents a simple serverless computing application, intended to be used as a template project or model that should help you getting started more easily.
Here are the cornerstones:
- Java 8 is used as the implementation language of the serverless function(s)
- AWS Lamba is used as the Serverless runtime
- Gradle is used as the build automation system to compile, build, and deploy the serverless function(s).
- JUnit 4 is used for unit-testing
- Jackson is used as the the JSON processor to serialize and deserialize objects
- Apache Log4J 1.2 is used as the remote logger on the serverless runtime system
You will walk away with solid knowledge about how to write, test, and deploy Java code on Amazon’s AWS serverless runtime platform.
You will find out how to easily expose your function as a micro web service through the API-Gateway – and leave with a template project in hand, a blueprint or starting point, for your very own Java-based serverless cloud project.
The Conversational User Interface Is a Minefield
SpeechTEK 2017, April 24-26, Washington, D.C.
With VR/AR user interfaces, we leave behind the 2D communication of the mouse pad as we move toward saying exactly what we mean or want. Once used to “talking” with Siri or Alexa, users tend to use voice elsewhere. Chatbots seem to be a transitional step to a future where we “talk” to services. This presentation focuses on voice vs. chat user interfaces and introduces attributes of good chatbot user interfaces.
The path to the CUI is heavily mined and booby-trapped
Conversational Interaction Conference 2017, January 30-31, San Jose, CA
Don’t misinterpret the popularity of messaging apps as a glowing endorsement of chatbots. No one ever claimed that IVRs were popular, just because people ordered landlines phones.
While they can benefit greatly from each other, there is no need to create a dependency between the Conversational User Interface and Machine Learning. I.e., it is not hard to imagine how a Conversation User Interface can be put to good use with a currently existing service infrastructure. However, a bunch of cruddy IVR style bots is the shortest path to nuking this nascent opportunity. This talk tries to identify those use cases that truly work in a conversation UI, by providing customer benefit and delight. An all-out effort to not fall into the trap of re-creating the much hated IVR experience.
Patterns for Natural Language Applications – beyond declaring User Intents
Mobile Voice Conference 2016, April 11-12, San Jose, CA
Simple patterns, like adaptive greeting, randomness, maintaining context, or predictive follow-up, can make an already good Voice User Interface spectacular.
Bridging the gap between Speech Recognition and Business Logic
SpeechTEK Conference 2015, August 17-19, New York City, NY
Mobile Voice Conference 2015, April 20, 2015, San Jose, CA
Speech Recognition is readily available for integration into mobile and IoT projects and products. It’s affordable, mostly accurate, and incredibly fast, when streamed. Context-free recognizers require post recognition processing, i.e. dictionaries, string-to-sound language encoders, etc. Web services like AIML/Panadorabots, Wit.ai, and Api.ai, and Amazon’s Alexa Skill-Kit support an declarative approach to define rules to identify user intent and entities. Let’s try to make sense of all of this.
Mobile Voice Conference, March 3rd-5th, 2014, San Francisco, CA
Voice-enabled mobile application provide users access to information instantly, naturally, and almost effortlessly. Simple Voice-Commands however, have failed to gain traction, probably because it’s hard to remember the exact utterance of a command phrase. Instead, more lenient and flexible, conversational style software agents have been more successful.
When it comes to communicating results back to the user, a text response often seems enough. Still, to provide a truly hands-free, eyes-free user experience, a text response needs to be synthesized and played through the phone’s speaker. The quality of the speech synthesis is determined by many factors, including sound quality (sampling rate, dynamic range), prosody (rhythm, stress, and intonation of speech) and maybe less obviously, by Emotional Prosody, conveyed through changes in pitch, loudness, timbre, speech rate, and pauses. This talk will share some ideas, concepts, and the technology needed, to build a prototype implementation that synthesizes text that was augmented with emotional values.
Chatbots 3.3 Conference, March 23, 2013, Philadelphia, PA
You don’t have to put you ear on the ground, and still, can literally hear it coming. The broad introduction of Voice User Interfaces, allowing the interaction with mobile devices through voice, may become the biggest advancement in user interface design since the transition from text-based to graphical user interfaces.
Voice User Interface
How to voice-enable your mobile application
Android Speech Recognition and Text-To-Speech – How to voice-enable your mobile application “What does a weasel look like?” – We are taking a closer look at Android’s Speech-To-Text (STT) and Text-To-Speech (TTS) capabilities – and will develop and deploy three small apps, each a little more capable, and finally walk through the steps of building a voice controlled assistant.
Android uses Google’s Speech-To-Text engine in the cloud but has Text-To-Speech capabilities baked right into Android since Android 2.0 (Donut), using SVOX Pico with six language packages (US and UK English, German, French, Italian and Spanish).
While Speech Recognition, Interpretation, and Text-To-Speech Synthesizer are addressed by phone equipment- and OS makers, the core problem of how to capture knowledge and make it accessible to smart software agents is ignored and all service like SIRI or Google Voice Actions remain closed, i.e. not easily extendable with 3rd party information/knowledge.
Get In Touch.
If you are interested in working together, or in having me present one of my talks to your team or user group, please do get in touch.