Custom wakeup-words for an Android app

With most modern Android phones, just saying the phrase “OK Google” will launch the Google assistant app, which is capable of answering simple questions, or functioning as a app launcher. Following up with “open g mail” will launch the Gmail app on your phone, or saying “navigate home”, will open the Google Maps app, with the destination already set to your home address.

In the following few paragraphs, I demonstrate, how a wake-up word (a.k.a. hot word) can be used inside an android app, to wake it up. Imagine a scenario, where the android app is already launched and running in the foreground, just waiting for a user to say the wake-up word or phrase, to start the full experience, i.e., start the next activity.

Waiting is somewhat indeterministic, we don’t really know how long we have to wait, until the wake-up word gets spoken, which means using an on-line speech recognition service doesn’t sound like a good idea. Fortunately, there is PocketSphinx, a lightweight speech recognition engine, specifically tuned for handheld and mobile devices that works locally on the phone. Let’s get started, by creating a simple project with Android Studio.

1. AndroidManifest.xml

Since the app needs to be able the use the microphone, and read and write to storage, the following tags have to be inserted into the AndroidManifest.xml file, just before the application tag.

The vibrate permission gets requested, so that we can briefly vibrate the phone, to indicate that the wake-up word was successfully recognized.

2. Two Activities

The simple demo app has just two activities:

  1. The ListeningActivity that patiently waits for the wake-up word to be uttered.
  2. The MainActivity, which will be started from the ListeningActivity, once the wake-up word was recognized.

The ListeningActivity will cleanup after itself, so that the Microphone for instance can be used, by any activity that follows. Should the ListeningActivity be called again, it will again, patiently waits for the wake-up word to be uttered.

3. PocketSphinx Android Library

Instead of compiling and packaging the PocketSphinx code into an Android Archive (AAR) locally on your development machine, simply download the Android library and put in the applications libs folder.

The top-level build.gradle file needs to be modified so that the newly added library can be found and should now look like this:

The app-level build.gradle file needs to have the library added as a compile dependency:

4. Language Model

Inside your project’s src/main folder, create a directory path like this assets/sync/models/en-us-ptm and copy the following files from here and here

  • mdef
  • means
  • noisedict
  • sendump
  • transaction-matrices
  • variances
  • en-phoe.dmp
  • feat.params

For each file, an MD5 hash has be created and stored in a file with the same name, plus an “.md5” extension added (or look for the files and hashes in the git repo mentioned at the very end of this post.)

5. Wake-Word Phonetic Dictionary

Inside your project’s src/main folder, create a directory path like this assets/sync/models/lm and store a dictionary, containing all the words you want to recognize. Again, an MD5 hash has be created and stored. Remember that the md5 hash needs to be updated, each time you make a change to the dictionary. (E.g. use http://passwordsgenerator.net/md5-hash-generator/)

Here for instance it a dictionary for recognizing the words {hey, okay, john, george, paul, ringo, stop}

 

Here is the tutorial for how to create the pronunciations for each word in the dictionary.

Finally, all assets need to be referenced in the assets/sync/assets.lst file, like so:

6. Recognition Sensitivity

The sensitivity of the key phrase recognition can be modified with an threshold value. If you experience too many false alarms, move the threshold closer to 1. 1 means no false alarms, but also many matches might be missed. This app puts a seek-bar at the top of the screen, allowing for interactive threshold tuning.

7. App Details

onCreate()

In onCreate, the app permissions are requested and the SeekBar gets setup.

onResume()

In onResume, the recognizer gets setup. This means that the assets are loaded, the threshold set, and most importantly, the recognizer gets set into  “KeyphraseSearch” mode and the recognition is started. Since the ListeningActivity implements the PocketSphinx’s RecognitionListener interface, the activity can be registered to receive recognition results.

onPartialResult()

This method gets called when the wake-up word or phrase has been spotted. Therefore, the device vibrates briefly, before the MainActivity is started.

onPause()

To allow other activities to use the microphone, in onPause, the recognition process gets shutdown and resources released.

Hot-Word demo app on Github

https://github.com/wolfpaulus/hotword

 

Leave a Reply