Intro to Python
in the Newsroom

IRE 2015 - Philadelphia

By Tom Meagher / @ultracasual

Why should you code?

Cover more ground, faster

Helps document reporting

Makes analysis replicable

Automation!

Today, we won't learn everything

The Command Line

Git or Github

pip or virtualenvs

Frameworks

News apps

Journalism > "Development"

This will not be nuanced, idiomatic Python.

Some programmers may be saddened by this code.

But you know what? If it works, and it works on deadline, that's what matters for us today.

The goal

To start thinking about
how to break problems down
into the smallest tasks
that can be programmed.

If you know Excel

You can learn to program.

  • "AZ Arizona" is a string.
  • "A2" is a variable.
  • "=left()" is a function.
  • A2 and 2 are function arguments.

Why Python?

Easy to learn.

Explicit.

Mature and well-documented.

Strong PythonJournos community of support.


Prep your workspace

To set up your machine at home, you'll need to have pip, virtualenv and virtualenvwrapper installed.

#create and activate a sandbox to work in
mkvirtualenv ire15    
#clone the code repo from Github
git clone git@github.com:tommeagher/pythonIRE15.git
#install the dependencies: requests, beautifulsoup4, unicodecsv
pip install -r requirements.txt   

#launch the interactive interpreter
ipython
					

Strings

Strings are ordered sequences of characters wrapped in quotes.


var1 = "This class is at IRE in Philadelphia."
var2 = "&You!_123 Four"


Follow along at home here.
If you want to cheat, the answers are here.

Integers and Floats

Numbers that you can do math on.

Integers are whole numbers.
Floats are decimals.

Lists

An ordered collection of objects, wrapped in brackets.

my_list = [1, 2, "Liberty Bell"]

Dicts

A collection of named keys and their associated values, wrapped in curly braces.

my_dict = {'Fruit': 'Orange', 'Weight': 10}

Conditionals

Logic that can trigger other operations,
similar to Excel's if function.

In Excel:

In Python:

score = 1
if score > 2:
    print "Win"
else:
    print "Lose"

Intermission

For more practice with the basics,
try this tutorial from PyCAR, or this one.

Part Deux

The Problem

How to go from this...

...to this?

The Reporting Phase

  • Find a website
  • Does it look like a data table?
  • "View Source"
  • Is there a table tag?
  • Does it follow a predictable pattern, like
    body >> div >> table >> tr >> td ?

The Writing Phase

  • Make an http request to the site
  • Collect the text content of the response.
  • Parse the text and step through it, tag by tag
  • Find a tag, assign its content to a variable
  • Store those variables in lists or dicts
  • Make any additional requests to other pages and repeat above.
  • Loop through the collected list or dict and write it to a file

Let's go

Open your text editor of choice and a terminal window.
Write a line or two of code under each comment,
save the text file and then try to run:


python scrape1.py

Now you try

Now, expand your code to scrape a similar, but bigger page.

We probably won't have time to get to these.
But if you want to keep working,
try the extra, extra credit project here.

And you can find the working scripts in the completed dir.

Keep learning

Excellent post on ethics of scraping

Python Journos

More resources for learning Python

NICAR-L

Source

Github

StackOverflow

Google

--30--

Clone the source code


Email me or ping me on Twitter