Thursday, April 2, 2009

IBSuite

To read book on my PRS505 e-book reader, I write some tools. Now I collect these tools into IBSuite (Image Book Suite), and put that on sourceforge. You can download the source code from the git of sourceforge project. Hope that can be useful for somebody.

http://sourceforge.net/projects/ibsuite/

IBSUITE

ibsuite stands for image book suite. It contains a set of tools to convert ebook in various format (pdf, chm, html) into a set of images, reformat the images (crop, embold, divide, etc), and assemble the result images into a new ebook.


COMPONENTS

  • ibhtml2img: convert html to image with xulrunner
  • ibhtml2pdf: convert html to pdf with xulrunner
  • iblineparser: parser input image, extract line information
  • ibpdfinfo: get some meta-information from pdf file, such as title, author, table of contents etc.
  • ibpy: python module, which is the driver of the whole system, it uses above programs to convert input file to image, extract line information from image, dilate image, and re-assemble lines into a new image, generate output e-book.
  • ibtools: A set of utilities and tools, some of them are used internal by ibsuite, others are user command provided by ibsuite.


USAGE

The most important command of ibsuite is ibreformat, the basic usage is as follow:

ibreformat [options] <input file>

In most cases, something like following:

ibreformat -o <output file> --iprof=<iprof> --oprof=<oprof>
--pprof=<pprof> <input file>

Where <input file> is input file name, <output file> is output file name, <iprof> is input profile, <pprof> is output profile, is <pprof> processing profile.

Available input/output/processing profiles are as follow:


input profiles:
img: for scanned book

output profiles:
prs505p: for Sony PRS505 in portrait mode
prs505l: for Sony PRS505 in landscape mode

processing profiles:

divide2: divide one line into two line (a kind of simple reflow)
resize: Resize and dilate pages
repage: Re-page input book, without much other processing such as dilate.

For other command line options, please refer to "ibreformat -h".


Some useful command line option combinations:

For scanned book on PRS505:

ibsuite -o -iprof=img -pprof=resize -oprof=prs505l
<input file>

For chm file on PRS505:

ibchm2imb .chm

When it finishs, .imb will be generated, then

ibsuite -o -pprof=repage --oprof=prs505l .imb


INSTALL

Currently, only Linux is supported, but I think it may work on some unix enviroment (including cygwin on Windows) after some work. Currently only install from source code is supported.

Pre-requirement:

  • gcc, g++, bash, make
  • libfontconfig-dev, libnetpbm-dev, libgtk-dev
  • python, python-imaging
  • imagemagick
  • for HTML/CHM support: python-chm, xulrunner
  • for scanned book: unpaper

Build:

./configure [--prefix=]
make [PREFIX=] [NO_XUL=1]

Install:

# become root
make install [PREFIX=]

1 comment:

Ken said...

Hi there. I just purchased a Sony PRS 300 and am looking for a way to put scientific papers (that contain math formulas and diagrams) on this device, and have them display reasonably well.

I downloaded and compiled your code from the git repository, but I'm not sure if and how it can be used to do achieve what I am trying to do.

In a nutshell, I imagine taking a document, such as:

https://netfiles.uiuc.edu/goldwas1/shared/publications/NAACL10.pdf

and sliding a "window" (whose width is roughly the width of a single text column) from the top of the 1st column to the bottom, taking non-overlapping snapshots. This would be repeated for the 2nd column, and then for subsequent pages. My output file would consist of these snapshots (in the same order they were produced). The format would have to be some sort of image -- not a pdf because the math equations often get messed up.

Can I do something like this with your software? If not, is something like this even possible?

Thanks,
Ken