Android and OCR

Android and OCR
I’m still remembering it well, the first piece of software I wrote when I came to the US was a de-skewing algorithm. Deskewing an image helps a lot, if you want to do OCR, OMR, barcode detect, or just improve the readability of scanned images.
At the time, I was working for a small software company, developing TeleForm, an application that reads data from paper forms and stores that data in previously created databases. The Cardiff TeleForm product was later re-branded Verity-TeleForm for a brief period in 2004 and 2005 when Verity Inc. acquired Cardiff Software. In 2005, when Autonomy acquired Verity, the Cardiff brand was reintroduced as Autonomy Cardiff (; more recently, Autonomy was acquired by HP.

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten, or printed text into machine-encoded text.

Image Deskew is the process of removing skew from images (especially bitmaps created using a scanner). Skew is an artifact that can occur in scanned images because of the camera being misaligned, imperfections in the scanning or surface, or simply because the paper was not placed completely flat when scanned.

Now most of the data entry or origination happens on the Web, where most of the forms processing has been moved to as well, i.e. OCR hasn’t been in vogue for quite a while. However, the popularity of smartphones, combined with built-in high-quality cameras has created a new category of mobile applications, benefiting greatly from OCR. Take Word-Lens ( as an example: an augmented reality translation application that tries to find out what the letters are in an image and then looks in a dictionary, to eventually draws the words back on the screen in translation.

On Device or In The Cloud ?

Before deciding on an OCR library, one needs to decide, where the OCR process should take place: on the Smartphone or in the Cloud. Each approach has its advantages.
On device OCR can be performed without requiring an Internet connection and instead of sending a photo, which can potentially be huge (many phones have 8 or 12 Mega-Pixel cameras now), the text is recognized by an on-board OCR-engine.
However, OCR-libraries tend to be large, i.e. the mobile application will be of considerable size. Depending on the amount of text that needs to be recognized and the available data transfer speed, a cloud-service may provide the result faster. A cloud-service can be updated more easily but individually optimizing (training) an OCR engine may work better when done locally on the device.

Which OCR Library to choose ?

After taking a closer look at the all comparisons, Tesseract stands out. It provides good accuracy, it’s open-source and Apache-Licensed, and has broad language support. It was created by HP and is now developed by Google.

Also, since Tesseract is open source and Apache- Licensed, we can take the source and port it to the Android platform, or put it on a Web-server to run our very own Cloud-service.

A Tesseract is a four- dimensional object, much like a cube is a three-dimensional object. A square has two dimensions. You can make a cube from six squares. A cube has three dimensions. The tesseract is made in the same way, but in four dimensions.

1. Tesseract

The Tesseract OCR engine was developed at Hewlett Packard Labs and is currently sponsored by Google. It was among the top three OCR engines in terms of character accuracy in 1995.

1.1. Running Tesseract locally on a Mac

Like with so make other Unix and Linux tools, Homebrew ( is the easiest and most flexible way to install the UNIX tools Apple didn’t include with OS X. Once Homebrew is installed (, Tesseract can be installed on OS X as easy as:
$ brew install tesseract
Once installed,
$ brew info tesseract will return something like this:

tesseract 3.00
Depends on: libtiff
/usr/local/Cellar/tesseract/3.00 (316 files, 11M)
Tesseract is an OCR (Optical Character Recognition) engine.
The easiest way to use it is to convert the source to a Grayscale tiff:
`convert source.png -type Grayscale terre_input.tif`
then run tesseract:
`tesseract terre_input.tif output`

Tesseract doesn’t come with a GUI and instead runs from a command-line interface. To OCR a TIFF-encoded image located on your desktop, you would do something like this:
$ tesseract ~/Desktop/cox.tiff ~/Desktop/cox
Using the image below, Tesseract wrote with perfect accuracy the resulting text into

There are at least two projects, providing a GUI-front-end for Tesseract on OS X

  1. TesseractGUI, a native OSX client:
  2. VietOCR, a Java Client:

TesseractGUI, a native OSX Client for Tesseract

VietOCR, a Java Client for Tesseract

1.2. Running Tesseract as a Could-Service on a Linux Server

One of the fastest and easiest ways to deploy Tesseract as a Web-service, uses Tornado (, an open source (Apache Licensed) Python non-blocking web server. Since Tesseract accepts TIFF encoded images but our Cloud-Service should rather work with the more popular JPEG image format, we also need to deploy the free Python Imaging Library (, license terms are here:

The deployment on Ubuntu 11.10 64-bit server looks something like this:

sudo apt-get install python-tornado
sudo apt-get install python-imaging
sudo apt-get install tesseract-ocr

1.2.1. The HTTP Server-Script for port 8080

#!/usr/bin/env python
import tornado.httpserver
import tornado.ioloop
import tornado.web
import pprint
import Image
from tesseract import image_to_string
import StringIO
import os.path
import uuid

class MainHandler(tornado.web.RequestHandler):
    def get(self):
<form action="/" method="post" enctype="multipart/form-data">' '
<input type="file" name="the_file" />' '
<input type="submit" value="Submit" />' '</form>
<pre class="prettyprint">')

    def post(self):
        self.set_header("Content-Type", "text/html")
	self.write("") # create a unique ID file
        tempname = str(uuid.uuid4()) + ".jpg"
        myimg =[0][1][0  ['body']))
        myfilename = os.path.join(os.path.dirname(__file__),"static",tempname);

        # save image to file as JPEG

        # do OCR, print result

settings = {
    "static_path": os.path.join(os.path.dirname(__file__), "static"),

application = tornado.web.Application([
    (r"/", MainHandler),
], **settings)

if __name__ == "__main__":
    http_server = tornado.httpserver.HTTPServer(application)

The Server receives a JPEG image file and stores it locally in the ./static directory, before calling image_to_string, which is defined in the Python script below:

1.2.2. image_to_string function implementation

#!/usr/bin/env python

tesseract_cmd = 'tesseract'

import Image
import StringIO
import subprocess
import sys
import os

__all__ = ['image_to_string']

def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False):
    runs the command:
        `tesseract_cmd` `input_filename` `output_filename_base`

    returns the exit status of tesseract, as well as tesseract's stderr output


    command = [tesseract_cmd, input_filename, output_filename_base]

    if lang is not None:
        command += ['-l', lang]

    if boxes:
        command += ['batch.nochop', 'makebox']

    proc = subprocess.Popen(command,
    return (proc.wait(),

def cleanup(filename):
    ''' tries to remove the given filename. Ignores non-existent files '''
    except OSError:

def get_errors(error_string):
    returns all lines in the error_string that start with the string "error"


    lines = error_string.splitlines()
    error_lines = tuple(line for line in lines if line.find('Error') >= 0)
    if len(error_lines) > 0:
        return '\n'.join(error_lines)
        return error_string.strip()

def tempnam():
    ''' returns a temporary file-name '''

    # prevent os.tmpname from printing an error...
    stderr = sys.stderr
        sys.stderr = StringIO.StringIO()
        return os.tempnam(None, 'tess_')
        sys.stderr = stderr

class TesseractError(Exception):
    def __init__(self, status, message):
        self.status = status
        self.message = message
        self.args = (status, message)

def image_to_string(image, lang=None, boxes=False):
    Runs tesseract on the specified image. First, the image is written to disk,
    and then the tesseract command is run on the image. Resseract's result is
    read, and the temporary files are erased.


    input_file_name = '%s.bmp' % tempnam()
    output_file_name_base = tempnam()
    if not boxes:
        output_file_name = '%s.txt' % output_file_name_base
        output_file_name = '' % output_file_name_base
        status, error_string = run_tesseract(input_file_name,
        if status:
            errors = get_errors(error_string)
            raise TesseractError(status, errors)
        f = file(output_file_name)

if __name__ == '__main__':
    if len(sys.argv) == 2:
        filename = sys.argv[1]
            image =
        except IOError:
            sys.stderr.write('ERROR: Could not open file "%s"\n' % filename)
        print image_to_string(image)
    elif len(sys.argv) == 4 and sys.argv[1] == '-l':
        lang = sys.argv[2]
        filename = sys.argv[3]
            image =
        except IOError:
            sys.stderr.write('ERROR: Could not open file "%s"\n' % filename)
        print image_to_string(image, lang=lang)
        sys.stderr.write('Usage: python [-l language] input_file\n')

1.2.3. The Service deploy/start Script

description  "OCR WebService"

start on runlevel [2345]
stop on runlevel [!2345]

pre-start script
mkdir /tmp/ocr

mkdir /tmp/ocr/static

cp /usr/share/ocr/*.py /tmp/ocr

end script
exec /tmp/ocr/

After the service has been started, it can be accessed through a Web browser like shown here: I’m currently running tesseract 3.01 on Ubuntu Linux 11.10 64-bit, please be gentle, it runs on an Intel Atom CPU 330 @ 1.60GHz, 4 cores (typically found in Netbooks) The HTML encoded result looks something like this:

<html><body>Contact Us
Customer Serv 760-788-9000
Repair 76O—788~71O0
Cox Telephone 888-222-7743</body></html>

1.3 Accessing the Tesseract Cloud-Service from Android

The OCRTaskActivity below utilizes Android’s built-in AsyncTask as well as Apache Software Foundation’s HttpComponent library HttpClient4.1.2, available here: OCRTaskActivity expects the image to be passed in as the Intent Extra “ByteArray” of type ByteArray. The OCR result is returned to the calling Activity as OCR_TEXT, like shown here:

setResult(Activity.RESULT_OK, getIntent().putExtra("OCR_TEXT", result));
import android.os.AsyncTask;
import android.os.Bundle;
import android.util.Log;
import android.view.View;
import android.widget.ImageView;
import android.widget.ProgressBar;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.mime.HttpMultipartMode;
import org.apache.http.entity.mime.MultipartEntity;
import org.apache.http.entity.mime.content.ByteArrayBody;
import org.apache.http.entity.mime.content.StringBody;
import org.apache.http.impl.client.DefaultHttpClient;


public class OCRTaskActivity extends Activity {
    private static String LOG_TAG = OCRAsyncTaskActivity.class.getSimpleName();
    private static String[] URL_STRINGS = {""};

    private byte[] mBA;
    private ProgressBar mProgressBar;

    public void onCreate(final Bundle savedInstanceState) {
        mBA = getIntent().getExtras().getByteArray("ByteArray");
        ImageView iv = (ImageView) findViewById(;
        iv.setImageBitmap(BitmapFactory.decodeByteArray(mBA, 0, mBA.length));
        mProgressBar = (ProgressBar) findViewById(;
        OCRTask task = new OCRTask();

    private class OCRTask extends AsyncTask {
        protected String doInBackground(final String... urls) {
            String response = "";
            for (String url : urls) {
                try {
                    response = executeMultipartPost(url, mBA);
                    Log.v(LOG_TAG, "Response:" + response);
                } catch (Throwable ex) {
                    Log.e(LOG_TAG, "error: " + ex.getMessage());
            return response;

        protected void onPostExecute(final String result) {
            setResult(Activity.RESULT_OK, getIntent().putExtra("OCR_TEXT", result));

    private String executeMultipartPost(final String stringUrl, final byte[] bm) throws Exception {
        HttpClient httpClient = new DefaultHttpClient();
        HttpPost postRequest = new HttpPost(stringUrl);
        ByteArrayBody bab = new ByteArrayBody(bm, "the_image.jpg");
        MultipartEntity reqEntity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
        reqEntity.addPart("uploaded", bab);
        reqEntity.addPart("name", new StringBody("the_file"));
        HttpResponse response = httpClient.execute(postRequest);
        BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
        String sResponse;
        StringBuilder s = new StringBuilder();

        while ((sResponse = reader.readLine()) != null) {
            s = s.append(sResponse).append('\n');
        int i = s.indexOf("body");
        int j = s.lastIndexOf("body");
        return s.substring(i + 5, j - 2);

This sample Android app has an Activity that sends a small JPEG image to the Cloud-Service, which is running the Tesseract OCR engine.

1.4. Building a Tesseract native Android Library to be bundled with an Android App

This approach allow an Android application to perform OCR even without a network connection. I.e. the OCR engine is on-board. There are currently two source-bases to start from, the original Tesseract project here:

  1. Tesseract Tools for Android is a set of Android APIs and build files for the Tesseract OCR and Leptonica image processing libraries:
    svn checkout tesseract-android-tools
  2. A fork of Tesseract Tools for Android (tesseract-android-tools) that adds some additional functions:
    git clone git://

… I went with option 2.

1.4.1. Building the native lib

Each project can be build with the same build steps (see below) and neither works with Android’s NDK r7. However, going back to NDK r6b solved that problem. Here are the build steps. It takes a little while, even on a fast machine.

cd <project-directory>/tess-two
export TESSERACT_PATH=${PWD}/external/tesseract-3.01
export LEPTONICA_PATH=${PWD}/external/leptonica-1.68
export LIBJPEG_PATH=${PWD}/external/libjpeg
android update project --path .
ant release

The build-steps create the native libraries in the libs/armabi and libs/armabi-v7a directories.

The tess-two project can now be included as a library-project into an Android project and with the JNI layer in place, calling into the native OCR library now looks something like this:

JNI and Native Libraries

1.4.2. Developing a simple Android App with built-in OCR capabilities

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(DATA_PATH, LANG);
String recognizedText = baseApi.getUTF8Text();
... Libraries / TrainedData / App Size

The native libraries are about 3 MBytes in size. Additionally, a language and font depending training resource files is needed.
The eng.traineddata file (e.g. available with the desktop version of Tesseract) is placed into the main android’s assers/tessdata folder and deployed with the application, adding another 2 MBytes to the app. However, due to compression, the actual downloadable Android application is “only” about 4.1 MBytes.

During the first start of the application, the eng.traineddata resource file is copied to the phone’s SDCard.

The ocr() method for the sample app may look something like this:

protected void ocr() {

        BitmapFactory.Options options = new BitmapFactory.Options();
        options.inSampleSize = 2;
        Bitmap bitmap = BitmapFactory.decodeFile(IMAGE_PATH, options);

        try {
            ExifInterface exif = new ExifInterface(IMAGE_PATH);
            int exifOrientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION, ExifInterface.ORIENTATION_NORMAL);

            Log.v(LOG_TAG, "Orient: " + exifOrientation);

            int rotate = 0;
            switch (exifOrientation) {
                case ExifInterface.ORIENTATION_ROTATE_90:
                    rotate = 90;
                case ExifInterface.ORIENTATION_ROTATE_180:
                    rotate = 180;
                case ExifInterface.ORIENTATION_ROTATE_270:
                    rotate = 270;

            Log.v(LOG_TAG, "Rotation: " + rotate);

            if (rotate != 0) {

                // Getting width & height of the given image.
                int w = bitmap.getWidth();
                int h = bitmap.getHeight();

                // Setting pre rotate
                Matrix mtx = new Matrix();

                // Rotating Bitmap
                bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
                // tesseract req. ARGB_8888
                bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);

        } catch (IOException e) {
            Log.e(LOG_TAG, "Rotate or coversion failed: " + e.toString());

        ImageView iv = (ImageView) findViewById(;

        Log.v(LOG_TAG, "Before baseApi");

        TessBaseAPI baseApi = new TessBaseAPI();
        baseApi.init(DATA_PATH, LANG);
        String recognizedText = baseApi.getUTF8Text();

        Log.v(LOG_TAG, "OCR Result: " + recognizedText);

        // clean up and show
        if (LANG.equalsIgnoreCase("eng")) {
            recognizedText = recognizedText.replaceAll("[^a-zA-Z0-9]+", " ");
        if (recognizedText.length() != 0) {
            ((TextView) findViewById(;

OCR on Android

The popularity of smartphones, combined with built-in high-quality cameras has created a new category of mobile applications, benefiting greatly from OCR.

OCR is very mature technology with a broad range of available libraries to chose from. There are Apache and BSD licensed, fast and accurate solutions available from the open-source community, I have taken a closer look at Tesseract, which is developed by HP and Google.

Tesseract can be used to build a Desktop application, a CloudService, and even baked into a mobile Android application, performing on-board OCR. All three variation of OCR with the Tesseract library have been demonstrated above.

Focussing on mobile applications, however, it became very clear that even on phones with a 5MP camera, the accuracy of the results still vary greatly, depending on lighting conditions, font, and font-sizes, as well as surrounding artifact.

Just like with the TeleForm application, even the best OCR engines perform purely, if the input-image has not been prepared correctly. To make OCR work on a mobile device, no matter if the OCR will eventually be run onboard or in the cloud, much development time needs to be spend to train the engine – but even more importantly, to select and prepare the image areas that will be provided as input to the OCR engine – it’s going to be all about the pre-processing.

This shows my Capture OCR sample Android-OCR application (with Tesseract OCR engine built-in), after it performed the OCR on a just taken photo of a book cover.


  1. Nice article. I would like to know how to do the same using Windows. I have been trying out since 2 weeks to compile the same code over Windows XP, Cygwin and eclipse but was not able to crack it.

    Any help would be appreciated.

  2. i’ve an issue:
    the program failed at “Before baseApi” LOG.
    W/dalvikvm(11691): Exception Ljava/lang/UnsatisfiedLinkError; thrown while initializing Lcom/googlecode/tesseract/android/TessBaseAPI;
    this is the error.
    how can i solve it?

  3. Very informative for doing my project, “Text scanner in Android ”
    Thank you

  4. Recently found, it could be considered as a cloud solution for Anroid ocr

  5. When i Build the tess-two project using NDK r6b.. i failed with the below mentioned ERROR.. PLS RESOLVE ME..

    bharath@bharath-desktop:~/workspace/tess-two$ /home/bharath/Downloads/android-ndk-r6b/ndk-build
    Install : => libs/armeabi/
    make: *** No rule to make target `/home/bharath/workspace/tess-two/jni/com_googlecode_leptonica_android/../..//home/bharath/workspace/tess-two/jni/../external/leptonica-1.68/src/adaptmap.c’, needed by `/home/bharath/workspace/tess-two/obj/local/armeabi/objs/lept//home/bharath/workspace/tess-two/jni/../external/leptonica-1.68/src/adaptmap.o’. Stop.

  6. Wolf – Very nice writeup, I do development for Scanthing an OCR app, we experimented a lot with tesseract before deciding to go with a cloud solution. The quality is superior and we have managed to find ways to get the roundtrip down to under 10 seconds on average with the processing being done in the background. The other benefit of cloud is language support i.e. we have full dictionary support for 33 languages, having this on the handset would consume a huge amount of memory. Its also worth noting with OCR that image pre-processing i.e. cleaning up the image before throwing it at the OCR engine is absolutely key to the end result. Our app link is below, check out our reviews, happy to give the author of this blog a free copy for a writeup if you like.

  7. Wolf – Thanks for this article, it’s really helpful for me. I want to ask that if I want to change the tessdata’s location into my package, how to do it? Can you help me :) ?

  8. Hi, do you know any existing way to deskew an image on android?

  9. Hey, great article.

    I was wondering whether it was possible to implement a custom library into the app? What I want my native app to do is recognize numbers ONLY hence that should significantly lower the app size from about 4.1MB to 1MB or less.

  10. Hi,
    Thanks for the article.
    I’m having a problem building the code. When I run ndk-build I get an error: /android-ndk/toolchains/arm-linux-androideabi-4.6/prebuilt/linux-x86/arm-linux-androideabi-strip: Command not found.

    Anyone knows what the problem??


  11. nice but the accuracy of this ocr is not good do you have any solution to improve the accuracy of ocr image captures.???

  12. Wolf, I’ve trained tesseract with 30 fonts, I’m preprocessing my image by applying unsharp mask thrice, and then using leptonica’s convertTo8, otsu adaptive thresholding, and skewing (which for some reason always gives me 0.0 as the skew angle, irrespective of how skewed the image actually is) . .. but my accuracy is still not quite there yet. Your app, on the other hand, has near perfect accuracy. How do I achieve that level of accuracy? Running tesseract on an android application (built-in)

  13. hey how can you build file?
    I am struggling with it.
    Please Help Me.

  14. Sir i already develop the simple camera application…now i want to integrate the simple camera application with the OCR..what is the way to make it successful.

  15. Hi ,
    Thanks for the article.
    I want to OCR android with out internet connection ,can any one help me ..

  16. hey i want to use tesseractfor hindi on android app.. app is not working.. crashes as soon as tessbase api is called.. can u suggest me some way out??

  17. Nice article.
    Please suggest some ways to improve on accuracy. At the moment accuracy is very poor. Are there pre-processing techniques for images prior to providing it to Tesseract for processing? Also ExifInterface is not returing Orientation at all in my case. Any idea on that?

  18. Hey Wolf,

    I have read through your journal several times over and I still can’t get the sample to work. Literally pulling my hair out right now after having ALOT of errors and after spending over a week on this project I am starting to give up. Can you help me out by showing me more of just of the ocr method and the whole activity (ex. xml, etc.) or a more indepth description on 1.4. I have NO idea what’s wrong and wanted to see what you did or imported. AMAZING tutorial though! Keep it up and hope to see more from you in the future!!

  19. hello
    can you email me which languages are supported for this ocr?

  20. hey dear, the tutorial is really helpful for me, but im not gettting the correct image-text in response. Kindly suggest what to do to get the correct text.

  21. it’s cool tutorial, thanks

  22. Great article, dude! Any tips on deskew for invoice picture? Text is too small and gets blurry in the picture.

  23. For other languages which changes is required?

    When I am trying to implement for hindi language I got following error:

    02-11 15:40:47.905: A/libc(32009): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1), thread 32009 (roid.ocr.simple)

    Anyone can help me please.

  24. Is it possible to give me code for OCR using java scripts so I can use it for building my phonegap project please help me

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>