Recently, I described how to perform speech recognition on a Raspberry Pi, using the on device sphinxbase / pocketsphinx open source speech recognition toolkit. This approach works reasonably well, but with high accuracy, only for a relatively small dictionary of words.

Like the article showed, pocketsphinx works great on a Raspberry Pi to do keyword spotting, for instance to use your voice, to launch an application. General purpose speech recognition however, is still best performed, using one of the prominent web services.

Speech Recognition via Google Service

Google’s speech recognition and related services used to be accessible and easy to integrate. Recently however, they got much more restrictive and (hard to belief, I know) Microsoft is now the place to start, when looking for decent speech related services. Still, let’s start with Google’s Speech Recognition Service, which requires an FLAC (Free Lossless Audio Codec) encoded voice sound file.

Google API Key

Accessing Google’s speech recognition service requires an API key, available through the Google Developers Console.
I followed these instructions for Chromium Developers and while the process is a little involved, even intimidating, it’s manageable.
I created a project and named in TranslatorPi and selected the following APIs for this project:

The important part is to create an API Key for public access. On the left side menu, select API & auth / Credentials. Here you can create the API key, a 40 character long alpha numeric string.

Installing tools and required libraries

Back on the Raspberry Pi, there are only a few more libraries needed, additionally to what was installed in the above mentioned on-device recognition project.

sudo apt-get install flac sudo apt-get install python-requests sudo apt-get install python-pycurl

Testing Google’s Recognition Service from a Raspberry Pi

I have the same audio setup as previously described, now allowing me to capture a FLAC encoded test recording like so:
arecord -D plughw:0,0 -f cd -c 1 -t wav -d 0 -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o test.flac
..which does a high quality, wave type recording and pipes it into the flac encoder, which outputs ./test.flac

The following bash script will send the flac encoded voice sound to Google’s recognition service and display the received JSON response:

#!/bin/bash
# parameter 1 : file name, contains flac encoded voice recording

echo Sending FLAC encoded Sound File to Google:
key=''
url='https://www.google.com/speech-api/v2/recognize?output=json&amp;lang=en-us&amp;key='$key
curl -i -X POST -H "Content-Type: audio/x-flac; rate=16000" --data-binary @$1 $url
echo '..all done'

{
  "result": [
    {
      "alternative": [
        {
          "transcript": "today is Sunday",
          "confidence": 0.98650438
        },
        {
          "transcript": "today it's Sunday"
        },
        {
          "transcript": "today is Sundy"
        },
        {
          "transcript": "today it is Sundy"
        },
        {
          "transcript": "today he is Sundy"
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

More details about accessing the Google Speech API can be found here: https://github.com/gillesdemey/google-speech-v2

Building a Translator

Encoding doesn’t take long and the Google Speech Recognizer is the fastest in the industry, i.e. the transcription is available swiftly and we can send it for translation to yet another web service.

Microsoft Azure Marketplace

Creating an account at the Azure Marketplace is a little easier and the My Data section shows that I have subscribed to the freetranslation service, providing me with 2,000,000 Characters/month. Again, I named my project TranslatorPi. On the ‘Developers‘ page, under ‘Registered Applications‘, take note of the Client ID and Client secret, both are required for the next step.

Strategy

With the speech recognition from Google and text translation from Microsoft, the strategy to build the translator looks like this:

Record voice sound, FLAC encode it, and send it to Google for transcription
Use Google’s Speech Synthesizer and synthesize the recognized utterance.
Use Microsoft’s translation service to translate the transcription into the target language.
Use Google’s Speech Synthesizer again, to synthesize the translation in the target language.

For my taste, that’s a little too much for a shell script and I use the following Python program instead:

# -*- coding: utf-8 -*-
import json
import requests
import urllib
import subprocess
import argparse
import pycurl
import StringIO
import os.path


def speak_text(language, phrase):
    tts_url = "http://translate.google.com/translate_tts?tl=" + language + "&q=" + phrase
    subprocess.call(["mplayer", tts_url], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

def transcribe():
    key = '[Google API Key]'
    stt_url = 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=' + key
    filename = 'test.flac'
    print "listening .."
    os.system(
        'arecord -D plughw:0,0 -f cd -c 1 -t wav -d 0 -q -r 16000 -d 3 | flac - -s -f --best --sample-rate 16000 -o ' + filename)
    print "interpreting .."
    # send the file to google speech api
    c = pycurl.Curl()
    c.setopt(pycurl.VERBOSE, 0)
    c.setopt(pycurl.URL, stt_url)
    fout = StringIO.StringIO()
    c.setopt(pycurl.WRITEFUNCTION, fout.write)

    c.setopt(pycurl.POST, 1)
    c.setopt(pycurl.HTTPHEADER, ['Content-Type: audio/x-flac; rate=16000'])

    file_size = os.path.getsize(filename)
    c.setopt(pycurl.POSTFIELDSIZE, file_size)
    fin = open(filename, 'rb')
    c.setopt(pycurl.READFUNCTION, fin.read)
    c.perform()

    response_data = fout.getvalue()

    start_loc = response_data.find("transcript")
    temp_str = response_data[start_loc + 13:]
    end_loc = temp_str.find("\"")
    final_result = temp_str[:end_loc]
    c.close()
    return final_result


class Translator(object):
    oauth_url = 'https://datamarket.accesscontrol.windows.net/v2/OAuth2-13'
    translation_url = 'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?'

    def __init__(self):
        oauth_args = {
            'client_id': 'TranslatorPI',
            'client_secret': '[Microsoft Client Secret]',
            'scope': 'http://api.microsofttranslator.com',
            'grant_type': 'client_credentials'
        }
        oauth_junk = json.loads(requests.post(Translator.oauth_url, data=urllib.urlencode(oauth_args)).content)
        self.headers = {'Authorization': 'Bearer ' + oauth_junk['access_token']}

    def translate(self, origin_language, destination_language, text):
        german_umlauts = {
            0xe4: u'ae',
            ord(u'ö'): u'oe',
            ord(u'ü'): u'ue',
            ord(u'ß'): None,
        }

        translation_args = {
            'text': text,
            'to': destination_language,
            'from': origin_language
        }
        translation_result = requests.get(Translator.translation_url + urllib.urlencode(translation_args),
                                          headers=self.headers)
        translation = translation_result.text[2:-1]
        if destination_language == 'DE':
            translation = translation.translate(german_umlauts)
        print "Translation: ", translation
        speak_text(origin_language, 'Translating ' + text)
        speak_text(destination_language, translation)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Raspberry Pi - Translator.')
    parser.add_argument('-o', '--origin_language', help='Origin Language', required=True)
    parser.add_argument('-d', '--destination_language', help='Destination Language', required=True)
    args = parser.parse_args()
    while True:
        Translator().translate(args.origin_language, args.destination_language, transcribe())

Testing the $35 Universal Translator

So here are a few test sentences for our translator app, using English to Spanish or English to German:

How are you today?
What would you recommend on this menu?
Where is the nearest train station?
Thanks for listening.

Live Demo

This video shows the Raspberry Pi running the translator, using web services from Google and Microsoft for speech recognition, speech synthesis, and translation.

7 Replies to “Raspberry Pi – Translator”

SM says: Reply
January 11, 2017 at 3:02 pm

Thanks for this, it helped me write my own python script to connect to the Google Cloud Speech API and then execute a command. It’s a bit like a basic Amazon Echo.

Details are on my blog post: http://randomconsultant.blogspot.co.uk/2017/01/building-amazon-echo-like-device-with.html
joehoeller says: Reply
April 29, 2017 at 11:52 pm

I get an error:

self.headers = {‘Authorization’: ‘Bearer ‘ + oauth_junk[‘access_token’]}
KeyError: ‘access_token’

What goes in access_token?
joehoeller says: Reply
April 30, 2017 at 12:25 am

I hacked away and now I am getting this error:

Translation: ArgumentException: Invalid appId\u000d\u000aParameter name: appId : ID=0646.V2_Json.Translate.4D9A656E listening ..

Please help, thank you kindly.
Sohail Anwar says: Reply
March 12, 2018 at 2:35 am

Good work sir, Well post, I just wanna know can I use Arduino instead of Raspberry Pi?
abhinav says: Reply
February 12, 2019 at 6:31 pm

excellent work , and very essay language
abhinav says: Reply
February 12, 2019 at 6:33 pm

price ?
komal says: Reply
September 16, 2019 at 11:10 pm

I am trying this but failed well, definitely will try again.