Copy this bookmark:



description:


tags:



Doc⚡split
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)"
pdf  text  documents  text-extraction 
august 2010 by martinkenny
view in context