Press "Enter" to skip to content

Scraping the web for Durham’s ‘cold cases’

At the end of each year, we struggle to give a 12-month period of our lives an identity. Time Magazine picks a Person of the Year, Barbara Walters whittles everything down into an hour’s worth of interviews—always, a winner must be named.

Last year, several stories in different states formed a common thread about the relationship between race and crime. The summer’s heat boiled over with weeks of protest in Ferguson, Mo., following the shooting death of 18-year-old African American Michael Brown by a police officer. And then the unrest spread after grand juries failed to indict officers involved in the deaths of Brown and Eric Garner, whose fatal strangulation by a New York City police officer was captured in a viral YouTube video.

Just as we got set to turn the calendar toward a new year, New York was again struck by tragedy, this time in its police department when police officers Rafael Ramos and Wenjian Liu were ambushed and executed in what is being considered an act of retribution for Garner’s death.

These developments, both tragic and captivating, were my primary motivation for digging into the Durham Police Department in an exercise of web scraping, an automated process that copies content from websites, allowing you to analyze or republish it.

Every web page you visit on the Internet is nothing more than a series of tables and lists. And although some pages are more complicated than others and contain many moving parts, there are always you can pull data directly out of pages with short lines of code and easy-to-use widgets. The best part: as long as your code remains active, your data will continue to update in real-time.

Some background on my project: During the fall semester of my senior year, I encountered a late-college crisis of sorts as a humanities major with an extensive journalism background but few quantitative skills. Tasked with a research project to help complete my Public Policy degree, I decided to use the opportunity as an excuse to learn new computer skills, which is how I settled on web scraping.

My journey began at square one — the absolute basics of coding. After a few weeks spent learning the ins and outs of HTML, CSS and Python, I was ready to learn how basic scrapers worked. The next couple weeks were spent learning about scrapers and doing scraping exercises from a textbook before compiling a list of dozens of Durham city and county organizations I could potentially scrape. This ultimately landed me on the Durham Police Department, which publishes an intriguing list of unsolved homicides on its website for all to see.

Screen Shot 2015-01-26 at 3.38.08 PM
A map of Durham’s unsolved homicides indicates that the majority of cold cases are located on the city’s eastern half, farther away from Duke University.

After painstaking trial and error (and mostly error), I developed a scraper using a Google Doc that pulled all of Durham County’s unsolved homicide victims, dates and locations into a spreadsheet—25 years worth of cold cases. Using Python and the web app ScraperWiki, I wrote a loop that pulled every sub-URL present on the page into a long list, extracted the victims’ individual pages and inserted them into the sheet. This allowed me to write a second scraper that pulled full descriptions of the homicide out of each victim’s page. I then plotted my data on an interactive map.

Some context and reactions to my analysis of Durham’s unsolved homicides:

  • Currently, there are just 28 unsolved homicides in the last 25 years. To put that in perspective, Durham had 30 homicides in 2013 and has since solved all but one of them.
  • Of the 24 unsolved homicides that took place within Durham city limits, 20 of them took place on the city’s East side. East Durham is less developed and more poverty-stricken than the West side, home to Duke University, the city’s downtown area and most of its urban gentrification.
  • Of the 28 unsolved homicides, five of the victims were white (17.9 percent)—17 were African American (60.7 percent) and six were Hispanic (21.4 percent). There had also been only one unsolved homicide with a white victim in the past 18 years. For reference, Durham County’s 2013 Census statistics indicate that 42.1 percent of residents were white, 13.5 percent Hispanic and 38.7 percent African American.