Open Text Summarizer

The open text summarizer is an open source command line tool for summarizing texts. The program reads a text and generates a summary. By default, the summarizer tries to reduce the text size down to 20%, for a this short text about python for instance, the summary would contain only the highlighted text.


Python is an interpreted, object-oriented, high-level programming language with dynamic semantics.
Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
Python’s simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity and code reuse.
The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. Often, programmers fall in love with Python because of the increased productivity it provides.
Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault.
Instead, when the interpreter discovers an error, it raises an exception.
When the program doesn’t catch the exception, the interpreter prints a stack trace.
A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on.
The debugger is written in Python itself, testifying to Python’s introspective power.
On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective.

It’s interface is pretty simple:

Usage:
  ots [OPTION...] [file.txt | stdin]  - Open Text Summarizer
Help Options:
  -?, --help                Show help options
Application Options:
  -r, --ratio=<int>         summarization % [default = 20%]
  -d, --dic=<string>        dictionary to use
  -o, --out=<string>        output file [default = stdout]
  -h, --html                output as html
  -k, --keywords            only output keywords
  -a, --about               only output the summary
  -v, --version             show version information

Installation on Linux

Some Linux distributions come with OTS already pre-installed and otherwise the installation is very straight forward. For CentOS, the ots package can be found via rpmfind and deployed like so:

 

wget ftp://195.220.108.108/linux/fedora/linux/development/rawhide/x86_64/os/Packages/o/ots-0.5.0-8.fc20.x86_64.rpm
wget ftp://195.220.108.108/linux/fedora/linux/development/rawhide/x86_64/os/Packages/o/ots-libs-0.5.0-8.fc20.x86_64.rpm
sudo yum install ./ots-libs-0.5.0-8.fc20.x86_64.rpm
sudo yum install ./ots-0.5.0-8.fc20.x86_64.rpm

While OTS version 5.0.x was installed, running the tool still reports an older version number.

ots -v
 
ots 0.4.2

Installation on OS X (including 10.9 Mavericks)

Installing OTS on a OS X is more involved, but can be done in about 10 to 15 minutes. The following steps require that XTools or XTools’s command-line tools are installed already.
1. Install Home Brew and some libs needed to build OTS

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
GIT_SSL_NO_VERIFY=1 brew update
brew install wget 
brew install libxml2
brew install glib
brew install popt

2. Get the source:

mkdir ~/tmp
cd tmp
wget http://sourceforge.net/projects/libots/files/libots/ots-0.5.0/ots-0.5.0.tar.gz
tar -xzvf ./ots-0.5.0.tar.gz 
cd ./ots-0.5.0

3. Configure will create the Makefile:

./configure

4. Edit the just created Makefile:remove doc at the end of this line: (around line 103)

SUBDIRS = src dic doc

5. Make and deploy:

make
sudo make install

6. Confirm

ots -v

Building on Cent OS 5.x Linux

As I’m sure you are very much accustomed to, building on CentOS 5.x is a pain in the ass. Here are some steps you may have to consider doing after uncompressing the source archive:

  • yum install glib2-devel
  • yum install libxml2-devel
  • ./configure
  • .. edit Makefile (see above)
  • make install
  • ln -s /usr/local/lib/libots* /usr/lib64

Accessing the OTS command line from a python program

While the open text summarizer is a useful tool to have, using it from within another program makes it so much more worthwhile.
Here for instance is how you can access OTS from python:

        content = content.encode('utf-8')
        content = str(content.decode('ascii', 'ignore'))
        content = " ".join(content.split()) # remove unnecessary white space

        temp_dir = tempfile.mkdtemp() # create temp directory and two temp files
        temp1 = tempfile.NamedTemporaryFile(suffix=".txt", dir = temp_dir, delete=False)
        temp2 = tempfile.NamedTemporaryFile(suffix=".txt", dir = temp_dir, delete=False)

        result = None
        try:
            #
            #   write text content into file to be summarized
            #
            temp1.write(content)
            temp1.close()
            r = 20
            cmdline = '{0} -r {1} -o {2} {3}'.format('/usr/local/bin/ots', str(20), temp2.name, temp1.name)
            #
            #   summarize into temp2
            #
            result = subprocess.Popen(cmdline, shell=True, stdout=subprocess.PIPE).communicate()[0]            

        finally:
            # cleanup
            temp1.close()
            os.remove(temp1.name)
            temp2.close()
            os.remove(temp2.name)
            return result

 

Leave a Reply