Data validity checklist
Much like surgeons, journalists who work with data can always use help ensuring they’re getting things right. I compiled this checklist of questions I ask myself whenever I work with a data set. Any additions would be very welcome. Drop me an email or post to me on Twitter.
Print it off in an easy-to-use format.
• Where did the data come from?
• Who created it? Is this the best source for this data? What was the methodology behind its collection?
• What documentation came with it?
• Did you read the documentation?
• Is there a record layout or data dictionary?
• Did you save an original copy of the database if you need to retrace your steps?
• Are your field headings accurate? Do you have the columns labeled correctly?
• Are you looking at the correct tab? Is there additional data in other tabs in the worksheet?
• How many records are in your table or database? How many should be there? Are there any missing? Too many? Any chance you maxed out the number of fields the software can handle?
• Examine each column of data one at a time. Is it formatted in the appropriate manner? Are there any gaps in the column? Are any data missing?
• When joining tables, are you certain the join worked? How many records are there now? Too many? Not enough?
• When pasting tables with common columns side-by-side in Excel, do they line up? Any chance two records might be switched?
• What calculation are you trying to do? Are you using the appropriate formula or function for this task?
• Do your cell/column references in functions point to the right places? Are any columns transposed? Can you trace the formula’s path? Have you pasted the correct formula all the way down the column to the bottom of the table?
• When sorting, did you sort the entire table together and not omit any columns?
• When pasting, is it a good idea to Paste Special>Values to strip out any pesky underlying formulas that may become misdirected?
• When moving columns, were there any $ anchors that you pasted that now point to the wrong column or row of data and throw off your calculations?
• Is there an expert–either an agency data guru or academic–who has vetted your process or calculations?
• Does your calculation make sense?
• Have you spot-checked individual records to ensure the numbers match up? Did you ask a colleague to assist by reading off numbers to be CQ’d?
• Have you asked an uninvolved colleague to look at your process, review the numbers and poke holes in the methodology?
• Can you reproduce your calculations from the beginning or explain step-by-step how you arrived at your conclusion?
• What numbers are you using in the story? Have you circled them and reviewed them in the database? Is the description of the numbers accurate? Does it omit any pertinent details that might mislead an unwary reader?
• Are there any graphics with the story? Have you checked them? Can you verify every number? Are the introduction and column heads accurate? Are there numbers missing from the graphic that are important to the story?
• How did you get here, and does it make sense?