Doc⚡split
december 2010 by jonty
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)"
ruby
pdf
document
parsing
ocr
documents
data
processing
split
from delicious
december 2010 by jonty
Copy this bookmark: