Lessons in open source code management
Cross-posted from OpenNews's Source blog. This piece ran a few days after the closure of my employer, Digital First Media's Project Thunderdome.
Nearly every time the data team at Project Thunderdome prepared to launch an interactive graphic, a CAR story, or a news app, I’d say to myself something along the lines of, “As soon as this hits production, we’ll go back and beef up the inline documentation to save ourselves headaches the next time we try something like this.”
Of course, despite our best intentions, we never went back until so much time had passed that we’d completely forgotten just about everything we had tried to do.
I suspect that the Thunderdome data team is not peculiar in this respect. But last month, we were given a pretty good reason to revisit our code one last time. Faced with the demise of our newsroom, we decided to open source much of our code under a permissive MIT license. In doing so, we hope that our now former colleagues at Digital First Media, the nation’s second-largest newspaper company, can fork and reuse our work.
Just as importantly, we hope journalist-developers in other news organizations will build on our code, so they don’t have to reinvent the wheel every time they want to do something simple that we figured out. And maybe, just maybe, sharing this code, as so many other great news organizations have already done, will encourage more newsrooms to do the same.
On April 2, we learned that Digital First Media decided to shut down the Thunderdome newsroom. Among our many duties, the data team–Peggy Bustamante, Vaughn Hagerty, Nelson Hsu, MaryJo Webster, and me–built interactive news applications and pursued computer-assisted reporting, on our own and with colleagues across the company’s 75 local newsrooms. After the announcement, we were left scrambling to figure out how to finish a half-dozen projects and how to preserve the code behind our work when no one would be responsible for picking up the pieces after we left.
Our team had six weeks to wrap things up, which seemed like plenty of time. But we had projects that we desperately wanted to finish, like our election news game, The Waffler or our examination of why national data on soaring heroin deaths is so flawed. That left us with a little less than two weeks to clean up our code base.
When I mentioned to Scott Klein that we wanted to prevent nearly two years of work from disappearing, he offered a simple suggestion: open source it.
Last week, as we walked out the door of Thunderdome for the last time, we did just that.
How We Did It
First, we had to get permission. Our bosses, editor Robyn Tomlin and managing editor Mandy Jenkins, had been incredibly supportive of the data team and our work from the outset, and they didn’t require much convincing. So we had a brief meeting with our CEO, John Paton, and made our pitch that to not publish our code under an open source license would be a waste. To his credit, John, who has spoken and written extensively on how our industry needs to innovate and to invest in digital storytelling, required no more convincing. He gave his approval, and we were off and running.
Our next step was to review all of our code that was stuffed into private Github repositories and prioritize projects for release. We put a checklist in a Google Spreadsheet and each member of the team labelled her or his projects:
- Reusable—we had used this code over and over as a template or that we thought others might find useful.
- Just publish—these we’d like to share but were probably discrete and didn’t warrant a lot of additional polishing work.
- NO—these were either stories that hadn’t yet published or applications that had a reasonable expectation of being used by our papers for some time, like our crime maps in St. Paul and Denver.
- Bury it in a swamp—these were so embarrassing or would require such a monumental amount of work to make them presentable, that we preferred to pretend they never happened.
We went through scores of repos and categorized them, ending up with about 40 or so that we wanted to release. For each of them, we reviewed all of the code for security and to ensure there was nothing proprietary or worrisome in it. We added a more helpful README (using a template borrowed from NPR Viz’s fantastic app-template) and an MIT license.
For the most reusable projects, we wanted to offer guidance on how someone else might bootstrap it or abstract the code to make it a more generic template. And this is the area where we simply ran out of time and fell short.
Why? Because as many of you have probably already thought to yourselves, this whole endeavor was inherently flawed. We made this much, much harder on ourselves than it needed to be, because we did it all backwards. We should have been releasing our code as open source from the outset, in a sane way. But when you’re pushing code on a news deadline and storing it in private repos, robust documentation is not always the priority that it needs to be.
It can be hard to make the case to executives that open sourcing code isn’t the same as giving away the farm. We benefitted in this regard here, because we had a good track record within the company, and we had a body of work to point to. We could honestly say, “Look at this work. There’s nothing here that would damage the company, and in fact, if we don’t publish this code, we’d be preventing many people in the company from ever using it again.” That this was our dying wish probably helped seal the deal.
What We Ended Up With
In our new public Github organization we pushed dozens of projects, built in HTML, CSS, JS, a bit of PHP, and a smidgeon of Python. Among those, there are nine in particular that we think other newsrooms might find most useful, including a balloting tool, an interactive bracket, a profile tool and more. But we still need help to finish the job.
If you want to use one of these, help us to beef up the documentation and truly abstract it. Fork it and send us pull requests. If you want more advice on how to get started, email me.
And in your own work, start writing good documentation from the start of every project, and don’t wait to open source your code until it’s too late.