The open text summarizer is an open source command line tool for summarizing texts. The program reads a text and generates a summary. By default, the summarizer tries to reduce the text size down to 20%, for a this short text about python for instance, the summary would contain only the highlighted text.
Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.
Python’s simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance.
Python supports modules and packages, which encourages program modularity and code reuse.
The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. Often, programmers fall in love with Python because of the increased productivity it provides.
Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault.
Instead, when the interpreter discovers an error, it raises an exception.
When the program doesn’t catch the exception, the interpreter prints a stack trace.
A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on.
The debugger is written in Python itself, testifying to Python’s introspective power.
On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective.
It’s interface is pretty simple:
Usage: ots [OPTION...] [file.txt | stdin] - Open Text Summarizer Help Options: -?, --help Show help options Application Options: -r, --ratio=<int> summarization % [default = 20%] -d, --dic=<string> dictionary to use -o, --out=<string> output file [default = stdout] -h, --html output as html -k, --keywords only output keywords -a, --about only output the summary -v, --version show version information
Installation on Linux
Some Linux distributions come with OTS already pre-installed and otherwise the installation is very straight forward. For CentOS, the ots package can be found via rpmfind and deployed like so:
wget ftp://195.220.108.108/linux/fedora/linux/development/rawhide/x86_64/os/Packages/o/ots-0.5.0-8.fc20.x86_64.rpm wget ftp://195.220.108.108/linux/fedora/linux/development/rawhide/x86_64/os/Packages/o/ots-libs-0.5.0-8.fc20.x86_64.rpm sudo yum install ./ots-libs-0.5.0-8.fc20.x86_64.rpm sudo yum install ./ots-0.5.0-8.fc20.x86_64.rpm
While OTS version 5.0.x was installed, running the tool still reports an older version number.
ots -v ots 0.4.2
Installation on OS X (including 10.9 Mavericks)
Installing OTS on a OS X is more involved, but can be done in about 10 to 15 minutes. The following steps require that XTools or XTools’s command-line tools are installed already.
1. Install Home Brew and some libs needed to build OTS
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)" GIT_SSL_NO_VERIFY=1 brew update brew install wget brew install libxml2 brew install glib brew install popt
2. Get the source:
mkdir ~/tmp cd tmp wget http://sourceforge.net/projects/libots/files/libots/ots-0.5.0/ots-0.5.0.tar.gz tar -xzvf ./ots-0.5.0.tar.gz cd ./ots-0.5.0
3. Configure will create the Makefile:
./configure
4. Edit the just created Makefile:remove doc at the end of this line: (around line 103)
SUBDIRS = src dic doc
5. Make and deploy:
make sudo make install
6. Confirm
ots -v
Building on Cent OS 5.x Linux
As I’m sure you are very much accustomed to, building on CentOS 5.x is a pain in the ass. Here are some steps you may have to consider doing after uncompressing the source archive:
- yum install glib2-devel
- yum install libxml2-devel
- ./configure
- .. edit Makefile (see above)
- make install
- ln -s /usr/local/lib/libots* /usr/lib64
Accessing the OTS command line from a python program
While the open text summarizer is a useful tool to have, using it from within another program makes it so much more worthwhile.
Here for instance is how you can access OTS from python:
content = content.encode('utf-8') content = str(content.decode('ascii', 'ignore')) content = " ".join(content.split()) # remove unnecessary white space temp_dir = tempfile.mkdtemp() # create temp directory and two temp files temp1 = tempfile.NamedTemporaryFile(suffix=".txt", dir = temp_dir, delete=False) temp2 = tempfile.NamedTemporaryFile(suffix=".txt", dir = temp_dir, delete=False) result = None try: # # write text content into file to be summarized # temp1.write(content) temp1.close() r = 20 cmdline = '{0} -r {1} -o {2} {3}'.format('/usr/local/bin/ots', str(20), temp2.name, temp1.name) # # summarize into temp2 # result = subprocess.Popen(cmdline, shell=True, stdout=subprocess.PIPE).communicate()[0] finally: # cleanup temp1.close() os.remove(temp1.name) temp2.close() os.remove(temp2.name) return result