dentarg + pdf   23

The Business of Bookmarking
Risks

Jack-Booted Thugs

This would not ordinarily figure high on my list, but the FBI confiscated a Pinboard
server in the summer of 2011. Turns out they were interested in someone else using the
same physical enclosure, but that didn’t make the server any less gone.

The lesson here is not so much to fear the FBI, but rather that there’s no such thing as a
‘cloud service’. Bits have to physically exist somewhere, and strange things can happen
to them. Jurisdictional redundancy is just as important as physical redundancy.
archive  business  pdf  pinboard 
february 2012 by dentarg
Prince: Home
Prince is software for converting XML and HTML documents to PDF files.
css  html  pdf  software  xml 
january 2012 by dentarg
Mac software to add searchable text to scanned PDFs
I use a Fujitsu ScanSnap S510M (which has since been replaced in their lineup) and love it. I’ve scanned, shredded, and recycled more than 4,200 pages so far that could have been taking up space in my house, but now aren’t.


As part of my workflow, which isn’t very interesting, I’d like OCR software to recognize the text in scanned documents and embed it under the page images in their PDF files. With the text embedded, I can search the documents with Spotlight and attempt to organize them more easily.


The ScanSnap came with ABBYY FineReader, which does an acceptable job, but degrades the image quality noticeably when it saves the text-embedded PDF copy. It’s enough of a problem that I’m not comfortable deleting the original, and I’d rather not keep two copies of every file around, so I tried to find an alternative that could output better-quality PDFs with text.


NOTE: I know there are more OCR apps than this. I probably forgot yours. There’s only so much time in the day, so I picked the ones that people recommended most and that seemed like good fits for what I want.


To test these apps, I made them all process a scan of a common document: a New York driver’s license eye-test form. (It was the last thing I scanned. I’m still 20/20, but probably not for much longer.)


ABBYY FineReader

Bundled with many ScanSnaps, based on ABBYY’s own OCR engine.


Moderately degraded image quality.


Few OCR errors.


Easily automated. (That’s what it’s for.)


Prizmo

$49, based on OpenRTK. It’s intended for recognizing text from photos, not scanners, to do cool things like “scan” from your iPhone camera.


Destroyed image quality, reduced to low-resolution monochrome.


Very few OCR errors.


I can’t figure out if it can be automated, but it’s clearly not designed for this type of use, so I can’t blame them if it can’t be.


VelOCRaptor

$29, based on Google’s free OCRopus engine.


Severely degraded image quality.


Many OCR errors.


Can be automated easily.


PDF OCR X

$29, based on the free Tesseract engine.


Perfect image quality.


Few OCR errors.


Can be automated easily, but the results still forcibly open in Preview after conversion, which gets in the way for my intended use.


PDFpen

$59, based on Nuance’s commercial OmniPage engine. This app does a lot; OCR is just one feature.


Perfect image quality.


Very few OCR errors.


Can be automated with AppleScript, although the windows still get shoved in your face while it’s working.


Acrobat

It also came with the ScanSnap, but testing it would require me to… install Acrobat. On my Mac. Where things work.


No.


I hate having to write “conclusion” headers

Only PDF OCR X and PDFpen preserved perfect image quality, so they’re the only options for which I’d feel comfortable deleting the original PDFs and keeping only the embedded-text copies.


PDF OCR X looks… like someone wrapped a bare-bones interface around an open-source OCR library.


PDFpen is nicely designed and built by an extremely well-respected, well-established Mac developer, and it’s available in the App Store. This means that it’s likely to be maintained for a while, an OS update probably won’t kill it, I’ll never need to to worry about serial numbers or licensing it between my desktop and laptop, and it will update automatically when I update other App Store apps.


So I’m going to try PDFpen for a while. I’ve been eyeing it for years because it does a lot of very useful things, but I’ve never quite been pushed to get it for a particular need. But I think this is it.
mac  ocr  pdf  software  marco  scanner  from google
june 2011 by dentarg
Free PDF Reader - Sumatra PDF by Krzysztof Kowalczyk
Sumatra: PDF reader for anyone who finds Foxit too crufty and bloated.
pdf  reader  software  windows  from twitter_favs
december 2010 by dentarg

Copy this bookmark:



description:


tags: