bycoffe + pdf   8

anderser's pydocsplit at master - GitHub
Python implementation of DocumentCloud's Docsplit utility
pdf  data  documentcloud  docsplit 
january 2010 by bycoffe
Doc⚡split
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)"
pdf  data  documents 
december 2009 by bycoffe
Python Package Index : pdfminer 20090330
"PDFMiner is a suite of programs that aims to help extracting or analyzing text data from PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other layout information such as font size or font name, which could be useful for analyzing the document. It can be also used as a basis for a full-fledged PDF interpreter."
python  pdf  data 
march 2009 by bycoffe

Copy this bookmark:



description:


tags: