Raspberry Pi 2 – Speech Recognition on device

This is a lengthy post and very dry, but it provides detailed instructions for how to build and install SphinxBase and PocketSphinx and how to generate a pronunciation dictionary and a language model, all so that speech recognition can be run directly on the Raspberry Pi, without network access. Don’t expect it to be as fast as Google’s recognizer, tho …

Creating the RASPBIAN boot MicroSD

Starting with the current RASPBIAN (Debian Wheezy) image, the creation of a bootable MicroSD Card is a well understood and well documented process.

Uncompressing the zip (again, there is no better tool than The Unarchiver, if you are on a Mac) reveals the 2015-02-16-raspbian-wheezy.img

With the MicroSD (inside an SD-Card adapter – no less than 8GB) inserted into the Mac, I run the df -h command in Terminal, to find out how to address the card. Today, it showed up as /dev/disk4s1 56Mi 14Mi 42Mi 26% 512 0 100% /Volumes/boot, which means, I run something like this, to put the boot image onto the MicroSD:

sudo diskutil unmount /dev/disk4s1
sudo dd bs=1m if=/Users/wolf/Downloads/2015-02-16-raspbian-wheezy.img of=/dev/rdisk4

… after a few minutes, once the 3.28 GB have been written onto the card, I execute:

sync
sudo diskutil eject /dev/rdisk4

Customizing the OS

Once booted, using the sudo raspi-config allow the customization of the OS, which means that time-zone, keyboard, and other settings are adjusted, to closely match its environment.
I usually start (PI is already connected to the internet via Ethernet Cable) with

updating the raspi-config
expanding the filesystem
internationalization: un-check en-GB, check en-US.UTF-8 UTF-8
internationalization: timezone ..
internationalization: keyboard: change to English US
setting the hostname to translator, there are too many Raspberry Pis on my home network, to leave it at the default
make sure SSH is enabled
force audio out on the 3.5mm headphone jack

Microphone

Given the sparse analog-to-digital support provided by the Raspberry Pi, the probably best and easiest way to connect a decent Mic to the device, is using a USB microphone. I happen to have an older Logitech USB Mic, which works perfectly fine with the Pi.

After a reboot and now with the microphone connected, let’s get started ..
ssh pi@translator with the default password ‘raspberry’ gets me in from everywhere on my local network
cat /proc/asound/cards
returns
0 [ALSA ]: bcm2835 - bcm2835 ALSA bcm2835 ALSA 1 [AK5370 ]: USB-Audio - AK5370 AKM AK5370 at usb-bcm2708_usb-1.2, full speed
showing that the microphone is visible and its usb extension.
Next, I edit alsa-base.conf to load snd-usb-audio like so:
sudo nano /etc/modprobe.d/alsa-base.conf
Edit
options snd-usb-audio index=-2
to
options snd-usb-audio index=0
and after a sudo reboot, cat /proc/asound/cards
looks like this
0 [AK5370 ]: USB-Audio - AK5370 AKM AK5370 at usb-bcm2708_usb-1.2, full speed 1 [ALSA ]: bcm2835 - bcm2835 ALSA bcm2835 ALSA

Recording – Playback – Test

Before worrying about Speech Recognition and Speech Synthesis, let’s make sure that the basic recording and audio playback works.
Again, I have an USB Microphone connected to the Pi, as well as a speaker, using the 3.5mm audio plug.

Installing build tools and required libraries

sudo apt-get update sudo apt-get upgrade sudo apt-get install bison sudo apt-get install libasound2-dev sudo apt-get install swig sudo apt-get install python-dev sudo apt-get install mplayer sudo reboot

/etc/asound.conf

sudo nano etc/asound.conf and enter something like this:

pcm.usb
{
    type hw
    card AK5370
}

pcm.internal
{
    type hw
    card ALSA
}

pcm.!default
{
    type asym
    playback.pcm
    {
        type plug
        slave.pcm "internal"
    }
    capture.pcm
    {
        type plug
        slave.pcm "usb"
    }
}

ctl.!default
{
    type asym
    playback.pcm
    {
        type plug
        slave.pcm "internal"
    }
    capture.pcm
    {
        type plug
        slave.pcm "usb"
    }
}

Recording

The current recording settings can be looked at with:
amixer -c 0 sget 'Mic',0
and for me that looks something like this:

  Simple mixer control 'Mic',0
  Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
  Capture channels: Mono
  Limits: Capture 0 - 78
  Mono: Capture 68 [87%] [10.00dB] [on]

alsamixer -c 0 can be used to increase the capture levels. After an increase, it looks like this:

  ...
  Mono: Capture 68 [87%] [10.00dB] [on]

Playback

The current playback settings can be looked at with:
amixer -c 1
alsamixer -c 0 can be used to increase the volume. After an increase,
amixer -c 1
it looks like this:

  Simple mixer control 'PCM',0
  Capabilities: pvolume pvolume-joined pswitch pswitch-joined penum
  Playback channels: Mono
  Limits: Playback -10239 - 400
  Mono: Playback -685 [90%] [-6.85dB] [on]

Test Recording and Playback

With the mic switched on ..
arecord -D plughw:0,0 -f cd ./test.wav .. use Control-C to stop the recording.
aplay ./test.wav

With recording and playback working, let’s get into the really cool stuff, on-device speech recognition.

Speech Recognition Toolkit

CMU Sphinx a.k.a. PocketSphinx
Currently pocket sphinx 5 pre-alpha (2015-02-15) is the most recent version. However, there are a few prerequisites that need to be installed first ..

Installing build tools and required libraries

sudo apt-get update sudo apt-get upgrade sudo apt-get install bison sudo apt-get install libasound2-dev sudo apt-get install swig sudo apt-get install python-dev sudo apt-get install mplayer

Building Sphinxbase

cd ~/ wget http://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/sphinxbase-5prealpha.tar.gz tar -zxvf ./sphinxbase-5prealpha.tar.gz cd ./sphinxbase-5prealpha ./configure --enable-fixed make clean all make check sudo make install

Building PocketSphinx

cd ~/ wget http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/pocketsphinx-5prealpha.tar.gz tar -zxvf pocketsphinx-5prealpha.tar.gz cd ./pocketsphinx-5prealpha ./configure make clean all make check sudo make install

Creating a Language Model

Create a text file, containing a list of words/sentences we want to be recognized

For instance ..

Okay Pi
Open Garage
Start Translator
Shutdown
What is the weather in Ramona
What is the time

Upload the text file here: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
and then download the generated Pronunciation Dictionary and Language Model

For the the text file mentioned above, this is what the tool generates:

Pronunciation Dictionary

GARAGE	G ER AA ZH
IN	IH N
IS	IH Z
OKAY	OW K EY
OPEN	OW P AH N
PI	P AY
RAMONA	R AH M OW N AH
SHUTDOWN	SH AH T D AW N
START	S T AA R T
THE	DH AH
THE(2)	DH IY
TIME	T AY M
TRANSLATOR	T R AE N S L EY T ER
TRANSLATOR(2)	T R AE N Z L EY T ER
WEATHER	W EH DH ER
WHAT	W AH T
WHAT(2)	HH W AH T

Language Model

Language model created by QuickLM on Thu Mar 26 00:23:34 EDT 2015
Copyright (c) 1996-2010 Carnegie Mellon University and Alexander I. Rudnicky

The model is in standard ARPA format, designed by Doug Paul while he was at MITRE.

The code that was used to produce this language model is available in Open Source.
Please visit http://www.speech.cs.cmu.edu/tools/ for more information

The (fixed) discount mass is 0.5. The backoffs are computed using the ratio method.
This model based on a corpus of 6 sentences and 16 words

\data\
ngram 1=16
ngram 2=20
ngram 3=15

\1-grams:
-0.9853 </s> -0.3010
-0.9853 <s> -0.2536
-1.7634 GARAGE -0.2536
-1.7634 IN -0.2935
-1.4624 IS -0.2858
-1.7634 OKAY -0.2935
-1.7634 OPEN -0.2935
-1.7634 PI -0.2536
-1.7634 RAMONA -0.2536
-1.7634 SHUTDOWN -0.2536
-1.7634 START -0.2935
-1.4624 THE -0.2858
-1.7634 TIME -0.2536
-1.7634 TRANSLATOR -0.2536
-1.7634 WEATHER -0.2935
-1.4624 WHAT -0.2858

\2-grams:
-1.0792 <s> OKAY 0.0000
-1.0792 <s> OPEN 0.0000
-1.0792 <s> SHUTDOWN 0.0000
-1.0792 <s> START 0.0000
-0.7782 <s> WHAT 0.0000
-0.3010 GARAGE </s> -0.3010
-0.3010 IN RAMONA 0.0000
-0.3010 IS THE 0.0000
-0.3010 OKAY PI 0.0000
-0.3010 OPEN GARAGE 0.0000
-0.3010 PI </s> -0.3010
-0.3010 RAMONA </s> -0.3010
-0.3010 SHUTDOWN </s> -0.3010
-0.3010 START TRANSLATOR 0.0000
-0.6021 THE TIME 0.0000
-0.6021 THE WEATHER 0.0000
-0.3010 TIME </s> -0.3010
-0.3010 TRANSLATOR </s> -0.3010
-0.3010 WEATHER IN 0.0000
-0.3010 WHAT IS 0.0000

\3-grams:
-0.3010 <s> OKAY PI
-0.3010 <s> OPEN GARAGE
-0.3010 <s> SHUTDOWN </s>
-0.3010 <s> START TRANSLATOR
-0.3010 <s> WHAT IS
-0.3010 IN RAMONA </s>
-0.6021 IS THE TIME
-0.6021 IS THE WEATHER
-0.3010 OKAY PI </s>
-0.3010 OPEN GARAGE </s>
-0.3010 START TRANSLATOR </s>
-0.3010 THE TIME </s>
-0.3010 THE WEATHER IN
-0.3010 WEATHER IN RAMONA
-0.3010 WHAT IS THE

\end\

Looking carefully, the Sphinx knowledge base generator provides links to the just generated files, which make sit super convenient to pull them down to the Pi. For me it generated a base set with the name 3199:
wget http://www.speech.cs.cmu.edu/tools/product/1427343814_14328/3199.dic wget http://www.speech.cs.cmu.edu/tools/product/1427343814_14328/3199.lm

Running Speech-recognition locally on the Raspberry Pi

Finally everything is in place, SphinxBase and PocketSphinx have been building installed, a pronunciation dictionary and a language model has been created and locally stored.
During the build process, acoustic model files for the english language, were deployed here: /usr/local/share/pocketsphinx/model/en-us/en-us

.. time to try out the the recognizer:
cd ~/ export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 3199.lm -dict 3199.dic -samprate 16000/8000/48000 -inmic yes

Output

READY….
Listening…
…

INFO: ps_lattice.c(1380): Bestpath score: -7682
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:285:334) = -403763
INFO: ps_lattice.c(1441): Joint P(O,S) = -426231 P(S|O) = -22468
INFO: ngram_search.c(874): bestpath 0.01 CPU 0.003 xRT
INFO: ngram_search.c(877): bestpath 0.01 wall 0.002 xRT
OPEN GARAGE
READY….
Listening…

Live Demo

This video shows the recognizer running in keyword spotting mode, using the dictionary and model mentioned above:
pocketsphinx_continuous -lm 3199.lm -dict 3199.dic -keyphrase "OKAY PI" -kws_threshold 1e-20 -inmic yes
The purpose is to provide some indication of the recognition speed that can be expected, running PocketSphinx on the Raspberry Pi 2.

28 Replies to “Raspberry Pi 2 – Speech Recognition on device”

Tom says: Reply
January 3, 2017 at 6:22 am

HI, great tutorial, wondering if this would work on raspbian Jessie?
Anton says: Reply
January 11, 2017 at 10:54 am

getting Input overrun, read calls are too rare .. and poor recognition
Phil says: Reply
January 29, 2017 at 3:39 pm

Hey
any idea how to get the pocketsphinx-python package running on a raspberry pi 3?
1. roberto says: Reply
  February 10, 2017 at 5:24 pm
  
  yes i started on raspberry pi 3
  1. Fotis says: Reply
    July 5, 2017 at 3:06 am
    
    it works fine on raspberry pi 3.i followed the instructions and all went smoothly.
    https://www.youtube.com/watch?v=5kp5qpwVh_8
    1. toms says: Reply
      March 30, 2018 at 3:16 pm
      
      Hi,
      would be great if you could share your SD card image with me – i couldn’t get it to work and there are so many dependancies that are depreciated, it is just a huge pain to make it work on RaspberryPi 3…
roberto says: Reply
February 10, 2017 at 5:23 pm

excellent guide … but you could just connect a bash command recognizes the word
Andre says: Reply
February 25, 2017 at 7:03 am

it would be nice to have a tutorial how to use this with GPIOs..
shaurya says: Reply
April 29, 2017 at 5:10 am

I have given all the commands as stated but i am getting the following error;
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Apr 28 2017, AT: 09:23:15

Error opening audio device (null) for capture: Connection refused
FATAL: “continuous.c”, line 245: Failed to open audio device
Please tell how to rectify this
1. Jeroen says: Reply
  June 16, 2017 at 10:46 am
  
  Like this post: https://stackoverflow.com/questions/35867490/fatal-error-continuous-c-line-246-failed-to-open-audio-device
  
  pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 5893.lm -dict 5893.dic -samprate 16000/8000/48000 -inmic yes -adcdev plughw:0,0
  
  Just add the -adcdev and plughw:0,0 !
ahmed says: Reply
June 8, 2017 at 5:28 am

please help me! everything worked, but I need to run scripts in python with the recognized phrases.
1. Jeroen says: Reply
  June 16, 2017 at 10:47 am
  
  Try to add the two missing parameters at the end:
  
  pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 5893.lm -dict 5893.dic -samprate 16000/8000/48000 -inmic yes -adcdev plughw:0,0
2. Racha Nikhil says: Reply
  June 16, 2017 at 7:07 pm
  
  Could you tell me did u install in jessie and even if possible please respond to my reply coz i am stuck with the imstallation part
3. kenti says: Reply
  March 18, 2018 at 3:19 pm
  
  Hi, did you manage to run a python app? And how? Thanks.
4. kenti says: Reply
  March 19, 2018 at 4:31 am
  
  Hi, did you find any solution to trigger scripts in python? And if yes, could your share your code? I’ve been spending days to figure this out. Thanks.
nafis says: Reply
July 1, 2017 at 3:02 am

please help!
i am getting this error. what to do?
FATAL: “continuous.c”, line 245: Failed to open audio device
David Boccabella says: Reply
July 5, 2017 at 6:14 pm

Hi.
I am interesting in this for a different purpose. I would like to take a continuous stream (live microphone) and extract a phoneme string from the spoken text. The Phoneme string would then be used to manipulate servo’s controlling an animatronic mouth.
This is an approximation – not a accurate reproduction.

Many thanks
Dave
Fred Roser says: Reply
August 26, 2017 at 2:02 pm

Please Help!
All went well; I successfully ran pocketsphinx_continuous.
I then tried to run a python app that had the import statements:
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

The import failed on the first statement:
Traceback (most recent call last):
File “eb3.py”, line 4, in
from pocketsphinx.pocketsphinx import *
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.py”, line 37, in
from pocketsphinx import *
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 35, in
_pocketsphinx = swig_import_helper()
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 34, in swig_import_helper
return importlib.import_module(‘_pocketsphinx’)
File “/usr/lib/python2.7/importlib/__init__.py”, line 37, in import_module
__import__(name)
ImportError: No module named _pocketsphinx

This was my second attempt to install pocketsphinx. Same error as on the first attempt.
What am I missing?

Thanks for your guide, It is very professionally done.
Fred
1. Fred Roser says: Reply
  August 26, 2017 at 2:34 pm
  
  More info:
  I am attempting to install on RPI with
  RASPBIAN STRETCH OS
2. kenti says: Reply
  March 18, 2018 at 3:19 pm
  
  Hi, did you manage to run a python app? And how? Thanks.
kenti says: Reply
March 19, 2018 at 4:46 am

Hi, thanks to this tutorial, I’ve installed pocketsphinx on my RPI3, and I’ve created my own 5808.dic and 5808.lm files.
But when I run:
pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 5808.lm -dict 5808.dic -samprate 16000/8000/48000 -inmic yes -adcdev plughw:0,0
it says:
“Failed to open dictionary file ‘5808.dic’ for reading: No such file or directory.”
I don’t know why. The files are in the right location and I can open them if I double click.
If I run with a simple keyphrase:
“pocketsphinx_continuous -keyphrase “hey sam” -kws_threshold 1e-20 -adcdev plughw:2 -inmic yes”
it’s working. Why it doesn’t see the dictionary file? Any idea? Thanks.
emmto says: Reply
June 28, 2018 at 2:33 am

Hello,

Very nice tutorial, many thanks! Does it works with Python 3?
And where did you buy your aluminium case for your raspberry? It looks great!
EMMTO says: Reply
June 28, 2018 at 2:38 am

Hello!

Very nice tutorial and project, many thanks!
Where did you get your raspberry’s case? It looks great!
maliha says: Reply
August 27, 2018 at 12:55 pm

hi, im having some trouble while using your code. after using
” sudo nano /etc/modprobe.d/alsa-base.conf
Edit
options snd-usb-audio index=-2
to
options snd-usb-audio index=0
”
its not showing my usb microphone anymore.
what should i do if i want to change it back??
fueng says: Reply
September 11, 2018 at 5:00 pm

How to use the voice recognition in a python file in order to control motors or something please
Tan says: Reply
October 20, 2018 at 2:30 am

Hi, i come from Malaysia. May i possible make it in Malay language, which mean make it in other language than English, and how can i do it?
abubaker says: Reply
March 30, 2019 at 3:03 am

decoder = Decoder(config)

File “C:\Users\Abubaker\Anaconda3\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 272, in __init__
this = _pocketsphinx.new_Decoder(*args)

RuntimeError: new_Decoder returned -1
i m getting this error can u help
karina says: Reply
September 8, 2019 at 1:11 am

hi thanks for tutorial, i got error

Available samping rate 44100 is too far from requested 16000
FATAL: “continuous.c”, line 245: Failed to open audio device
help me solve this

Creating the RASPBIAN boot MicroSD

Customizing the OS

Microphone

Recording – Playback – Test

Installing build tools and required libraries

/etc/asound.conf

Recording

Playback

Test Recording and Playback

Speech Recognition Toolkit

Installing build tools and required libraries

Building Sphinxbase

Building PocketSphinx

Creating a Language Model

Create a text file, containing a list of words/sentences we want to be recognized

Pronunciation Dictionary

Language Model

Running Speech-recognition locally on the Raspberry Pi

Output

Live Demo

28 Replies to “Raspberry Pi 2 – Speech Recognition on device”

Leave a Reply Cancel reply