Convert PDFs to Excel
There’s few things more frustrating than when you’re trying to get a simple spreadsheet from a government agency, and the public information officials insist that the data in question (say, a spreadsheet of property tax rates in municipalities that was clearly created in Excel) only exists in a PDF format, with the data locked into it.
For the sake of expediency, you sometimes just have to bite the bullet and try to wrestle the data free from the PDFs clutches so that you can gently guide it into a more useful spreadsheet.
Here’s a few resources to do just that:
ProPublica recently unveiled this very helpful and comprehensive guide to the various strategies for unlocking data from PDFs.
I recently was introduced to CometDocs, a free site that had a surprisingly high accuracy rate for converting documents on a labor-intensive project.
If you’re not afraid of installing a simple command line program, you can have some luck with PDF2text. Here’s a nice tutorial from IRE, as well as a guide to how to automate the conversions and not be bothered by the pesky command line.
I’ve also heard good things about the commercial software DeskUnPDF, but I haven’t had an opportunity to use it myself.
So when you don’t have the time or patience to negotiate with an agency to give you what you want how you want it, you can give these solutions a try. Good luck!
-Tom Meagher, The Star-Ledger, Jan. 2011