Diving into data with SPJ
Today, I had a chance to speak at the Society of Professional Journalists’ Region 1 Spring conference, held at Rutgers University in New Brunswick. Debbie Galant of the NJ News Commons and I talked about the projects that came out of our Hack Jersey hackathon. Then I laid out a map for building data skills in the newsroom.
For those who attended (and those who didn’t), here are links to some of the tools we discussed and tutorials to start to learn them.
The slides: bit.ly/R1C13DATA
We started by sharing examples of data-driven news applications on the web:
- CrashDataNJ
- BecauseofUs.net
- Cost of Radiology in New Jersey
- USA Today’s “Ghost Factories” project
- WNYC schools waitlist map
- ProPublica’s Dollars for Docs
So how do you do this kind of work (or get to Carnegie Hall)? Practice. Practice. Practice.
A brief detour on the history of data journalism included a data piece in the first issue of the Manchester Guardian in 1821 and the cover of the program for IRE’s first computer-assisted reporting conference in 1993. And now, to today…
The disciplines of data reporting
1. Collection
- PANDA Project - a newsroom warehouse for data, with search alerts when new data arrives
- FOIA Machine - a new site to help track FOIA requests
- DocumentCloud - a free tool for journalists to store, annotate, publish and embed PDF documents
- For scraping web pages - http://www.reporterslab.org/scraping-roundup/
- Comet Docs - a free service to scrape PDFs
- Tabula - a new project to help make PDF scraping easier
2. Cleaning
- My go-to cleaning tool for relatively small data sets (less than, say, 10,000 records) is Open Refine. Can’t say enough good things about it. In this blog post are links to my Open Refine class and several others to get you started.
- MaryJo Webster’s “Excel Magic” class for cleaning data in Excel.
- For the truly adventurous, check out Dan Nguyen’s great new ebook introducing regular expressions.
3. Analysis
- Excel is going to be your favorite tool ever. There are a number of good tutorials on basic Excel and more advanced Pivot Tables listed here, as well as a link to a great, free online course introducing databases.
- And don’t forget the formula for percent change = (new-old)/old
- Only for a few more days, you can get 50-percent off of all Excel e-books from O’Reilly here: bit.ly/R1C13EXCEL
4. Visualization
- Sharon Machlis of ComputerWorld’s great list of 30 free dataviz tools.
- A favorite for making embeddable maps and charts, Google’s experimental Fusion Tables.
- Fusion Tables tutorial (on the second half of the post).
- More Fusion Tables tutorials.
- Tableau Public, another free data viz program.
- DataWrapper, more free charts, powered by your Google Spreadsheet
- Google’s new, free Map Engine Lite
- Make searchable, sortable tables and embed them with FreeDive
5. Interaction
- The open source code from Hack Jersey on Github. (Remember the caveat that this is hackathon code and in most cases probably not ready for prime time)
- The wonderful Source blog from Knight-Mozilla Open News
Support groups
- Investigative Reporters & Editors
- IRE’s NICAR-L email list for computer-assisted reporting and data journalism discussion
- Hacks/Hackers
- Online News Association
- And for more advanced topics, Chrys Wu’s blog posts rounding up all the tools, classes and workshops at IRE’s CAR conference are must-reads.
What are your favorite data techniques, tools and tutorials? Please share them with me. I’d love to check them out.