garrettc + gem   3

Docsplit
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata"
pdf  ruby  data  gem  library  text  tool  parsing  imagery 
august 2010 by garrettc

Copy this bookmark:



description:


tags: