Custom wakeup-words for an Android app

With most modern Android phones, just saying the phrase “OK Google” will launch the Google assistant app, which is capable of answering simple questions, or functioning as a app launcher. Following up with “open g mail” will launch the Gmail app on your phone, or saying “navigate home”, will open the Google Maps app, with the destination already set to your home address.

In the following few paragraphs, I demonstrate, how a wake-up word (a.k.a. hot word) can be used inside an android app, to wake it up. Imagine a scenario, where the android app is already launched and running in the foreground, just waiting for a user to say the wake-up word or phrase, to start the full experience, i.e., start the next activity.

Waiting is somewhat indeterministic, we don’t really know how long we have to wait, until the wake-up word gets spoken, which means using an on-line speech recognition service doesn’t sound like a good idea. Fortunately, there is PocketSphinx, a lightweight speech recognition engine, specifically tuned for handheld and mobile devices that works locally on the phone. Let’s get started, by creating a simple project with Android Studio.

1. AndroidManifest.xml

Since the app needs to be able the use the microphone, and read and write to storage, the following tags have to be inserted into the AndroidManifest.xml file, just before the application tag.

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.VIBRATE" />

The vibrate permission gets requested, so that we can briefly vibrate the phone, to indicate that the wake-up word was successfully recognized.

2. Two Activities

The simple demo app has just two activities:

  1. The ListeningActivity that patiently waits for the wake-up word to be uttered.
  2. The MainActivity, which will be started from the ListeningActivity, once the wake-up word was recognized.

The ListeningActivity will cleanup after itself, so that the Microphone for instance can be used, by any activity that follows. Should the ListeningActivity be called again, it will again, patiently waits for the wake-up word to be uttered.

3. PocketSphinx Android Library

Instead of compiling and packaging the PocketSphinx code into an Android Archive (AAR) locally on your development machine, simply download the Android library and put in the applications libs folder.

The top-level build.gradle file needs to be modified so that the newly added library can be found and should now look like this:

// Top-level build file where you can add configuration options common to all sub-projects/modules.

buildscript {
    repositories {
        jcenter()
    }
    dependencies {
        classpath 'com.android.tools.build:gradle:2.3.3'
    }
}

allprojects {
    repositories {
        jcenter()
        flatDir {
            dirs 'libs'
        }
    }
}

task clean(type: Delete) {
    delete rootProject.buildDir
}

The app-level build.gradle file needs to have the library added as a compile dependency:

compile(name: 'pocketsphinx-android-5prealpha-release', ext: 'aar')

4. Language Model

Inside your project’s src/main folder, create a directory path like this assets/sync/models/en-us-ptm and copy the following files from here and here

  • mdef
  • means
  • noisedict
  • sendump
  • transaction-matrices
  • variances
  • en-phoe.dmp
  • feat.params

For each file, an MD5 hash has be created and stored in a file with the same name, plus an “.md5” extension added (or look for the files and hashes in the git repo mentioned at the very end of this post.)

5. Wake-Word Phonetic Dictionary

Inside your project’s src/main folder, create a directory path like this assets/sync/models/lm and store a dictionary, containing all the words you want to recognize. Again, an MD5 hash has be created and stored. Remember that the md5 hash needs to be updated, each time you make a change to the dictionary. (E.g. use http://passwordsgenerator.net/md5-hash-generator/)

Here for instance it a dictionary for recognizing the words {hey, okay, john, george, paul, ringo, stop}

george  JH AO R JH
hey HH EY
john    JH AA N
okay	OW K EY
paul P AO L
ringo   R IY NG G OW
stop	S T AA P

 

Here is the tutorial for how to create the pronunciations for each word in the dictionary.

Finally, all assets need to be referenced in the assets/sync/assets.lst file, like so:

models/lm/words.dic
models/en-us-ptm/feat.params
models/en-us-ptm/mdef
models/en-us-ptm/means
models/en-us-ptm/noisedict
models/en-us-ptm/sendump
models/en-us-ptm/transition_matrices
models/en-us-ptm/variances

6. Recognition Sensitivity

The sensitivity of the key phrase recognition can be modified with an threshold value. If you experience too many false alarms, move the threshold closer to 1. 1 means no false alarms, but also many matches might be missed. This app puts a seek-bar at the top of the screen, allowing for interactive threshold tuning.

7. App Details

onCreate()

In onCreate, the app permissions are requested and the SeekBar gets setup.

onResume()

In onResume, the recognizer gets setup. This means that the assets are loaded, the threshold set, and most importantly, the recognizer gets set into  “KeyphraseSearch” mode and the recognition is started. Since the ListeningActivity implements the PocketSphinx’s RecognitionListener interface, the activity can be registered to receive recognition results.

    private void setup() {
        try {
            final Assets assets = new Assets(ListeningActivity.this);
            final File assetDir = assets.syncAssets();
            mRecognizer = SpeechRecognizerSetup.defaultSetup()
                    .setAcousticModel(new File(assetDir, "models/en-us-ptm"))
                    .setDictionary(new File(assetDir, "models/lm/words.dic"))
                    .setKeywordThreshold(Float.valueOf("1.e-" + 2 * sensibility))
                    .getRecognizer();
            mRecognizer.addKeyphraseSearch(WAKEWORD_SEARCH, getString(R.string.wake_word));
            mRecognizer.addListener(this);
            mRecognizer.startListening(WAKEWORD_SEARCH);
            Log.d(LOG_TAG, "... listening");
        } catch (IOException e) {
            Log.e(LOG_TAG, e.toString());
        }
    }

onPartialResult()

This method gets called when the wake-up word or phrase has been spotted. Therefore, the device vibrates briefly, before the MainActivity is started.

onPause()

To allow other activities to use the microphone, in onPause, the recognition process gets shutdown and resources released.

Hot-Word demo app on Github

https://github.com/wolfpaulus/hotword

 

5 Replies to “Custom wakeup-words for an Android app”

  1. Hi, Nice tutorial.
    I tried to run your project and while checking, only “hey george” word are only recognizing from string file and while
    saying words from DICTIONARY file, the words are not recognizing.
    Please help me out. Thanks.

  2. Hi, Hotword not getting detected while texttospeech to playing. How to handle stop when tts is speaking.

  3. Hello, one question: Is it possible that e.g. Wake-Up-Word-1 calls function-1 and Wake-Up-Word-2 calls function-2?

    Or is it only possible that different Wake-Up-Words call the same function?

    Thanks

  4. Hi, Nice tutorial, but I changed the word.dic file and changed the md5 file but it is not recognizing the new wake word. Why is this happening. Is there anything else that I needed to change while adding a new wake word. Thank you

  5. Icaro Mourao says: Reply

    How can I make this, but with app in background or closed?

Leave a Reply to Swathi Cancel reply