Checking in, Late 2013 Edition

So once again, it’s been a while since I last posted. What have I been up to? Well, to start, this came out in the spring:

And then early September, this happened:

1185366_10102232842632269_397210563_n

62604_10102235328590389_8784511_n

1004862_10102235327667239_1423902327_n

531851_10102239091140209_151381956_n

And at the end of it, I got these:

gqr_card

So I’m now in DC for the foreseeable future, doing very interesting things with very obscene quantities of data. I have a few invited talks and conference presentations coming up, so hopefully sometime soon I’ll be able to share some of those materials on here as well.

When Can You Trust a Data Scientist?

Pete Warden’s Monkey Cage post, “Why You Should Never Trust a Data Scientist” (original version from his blog), illustrates one of the biggest challenges facing both consumers and practitioners of data science: the issue of accountability. And while I suspect that Warden—a confessed data scientist himself—was being hyperbolic when choosing the title for his post, I worry that some readers may well take it at face value. So for those who are worried that they really can’t trust a data scientist, I’d like to offer a few reassurances and suggestions.

Data science (sometimes referred to as “data mining”, “big data”, “machine learning”, or “analytics”) has long been subject to criticism from more traditional researchers. Some of these critiques are justified, others less so, but in reality data science has the same baby/bathwater issues as any other approach to research. Its tools can provide tremendous value, but we also need to accept their limitations. Those limitations are far too extensive to get into here, and that’s indicative of the real problem Warden identified: as a data scientist, nobody checks your work, mostly because few of your consumers even understand it.

As a political scientist by training, this was a strange thing to accept when I left the ivory tower (or its Southern equivalent, anyway) last year to do applied research. The reason for a client to hire someone like me is because I know how to do things they don’t, but that also means that they can’t really tell if I’ve done my job correctly. It’s ultimately a leap of faith—the work we do often looks, as one client put it, like “magic.” But that magic can offer big rewards when done properly, because it can provide insights that simply aren’t available any other way.

So for those who could benefit from such insights, here are a few things to look for when deciding whether to trust a data scientist:

  • Transparency: Beware the “black box” approach to analysis that’s all too common. Good practitioners will share their methodology when they can, explain why when they can’t, and never use the words, “it’s proprietary,” when they really mean, “I don’t know.”
  • Accessibility: The best practitioners are those who help their audience understand what they did and what it means, as much as possible given the audience’s technical sophistication. Not only is it a good sign that that they understand what they’re doing, it will also help you make the most of what they provide.
  • Rigor: There are always multiple ways to analyze a “big data” problem, so a good practitioner will try different approaches in the course of a project. This is especially important when using methods that can be opaque, since it’s harder to spot problems along the way.
  • Humility: Find someone who will tell you what they don’t know, not just what they do.

These are, of course, fundamental characteristics of good research in any field, and that’s exactly my point. Data science is to data as political science is to politics, in that the approach to research matters as much as the raw material. Identifying meaningful patterns in large datasets is a science, and so my best advice is to find someone who treats it that way.

APSA, new papers, and more…

Another busy month for me. Just a few of the highlights since I last posted:

  • Successfully defended last month and submitted my final dissertation. When I started grad school, my goal was to have my PhD before I turned 30, and since I technically graduate on the 26th of this month I’ll have accomplished that goal with a whole 3 days to spare!
  • Immediately after finishing my dissertation, I packed up the last of my things and moved down to Nashville to start my postdoc at Vanderbilt’s CSDI.
  • Presented two papers at APSA in Seattle, one a coauthored paper with Josh Tucker and Ted Brader on the relationship between cross-pressures and participation (a newer version of what we presented at Midwest, EPSA, and PolNet), and the other a new paper introducing an original dataset which records the discussion of issues on the websites of major-party US Senate candidates between 2002 and 2008.
  • Finished two more papers and sent them off for review.
  • Now getting ready to present my paper on campaign effects in presidential elections this Friday at the CSDI seminar series.

I’ve also done some more revisions to the site, including posting my research statement and adding more papers and other information to my Research page. (I guess it’s pretty obvious by now that I’m on the market this year, no?) You can also find links to revised versions of all my posted papers, my slides from APSA, and other interesting things on that page as well. While the coauthored APSA paper is posted there, I’ve left off the new paper because it’s still at the preliminary stage. But if after looking at the slides you still want more, just get in touch and I’ll send it along.