Open Source and Journalism (and NICAR 2016) talk

⌫ Home

This post is a long collection of tools, resources, and tips that I saw at NICAR 2016 or in conversations around it. There's also some links in here that are not from NICAR 2016, but seemed relevant. This blog post serves as lecture notes for my March 24, 2016 talk at the The Ohio State University Open Source Club


What is NICAR?

It's a yearly conference about computer-assisted reporting.

Computer-assisted reporting is, as Wikipedia puts it:

the use of computers to gather and analyze the data necessary to write news stories.

In practice, this means a ton of things:

If this sounds like it's trending into Math, don't worry, we'll cover that.

More generally, NICAR is a bunch of journalists and coders getting together to talk about issues they've faced and how they've overcome the issues. It's a skillshare. It's showing off code. It's showing off tools. It's a lot of talk about cross-org collaborations.

This talk isn't even a tenth of what was covered at NICAR 2016. If you want everything that was linked to or discussed at NICAR 2016, check out Chrys Wu's megacollection.


This talk is broken into several segments:

  • How to get started
  • Neat tools and projects
  • Ponderable reads
  • Awesome people
  • Cool organizations
  • Events to attend

First, a word from our sponsors.

I'm a news apps developer at the Institute for Nonprofit News. Our dev team does a mix of WordPress and Python coding. INN has covered my cost of attendance at NICAR for two years running. INN is closely associated with IRE, and therefore with DocumentCloud. INN has more than 100 member organizations, who you should check out.

I graduated from OSU in May 2014 with a B.S. in Agricultural Communication and a minor in plant pathology, and a lot of bylines from The Lantern, OSU's student-run newspaper.

These are my own opinions and endorsements, and not those of my employer.

How to get started

A pep talk.

1. Don't be intimidated.

You can do this.

2. Okay, be a little intimidated.

You've never used this before, and the documentation doesn't make a lot of sense.

3. Be very intimidated.

What you're trying to do has literally never been done before.

4. Don't give up.

5. Ask questions.

These resources will be mentioned later, but in short:

  • Ask the maintainers of the tool.
  • Ask other users of the tool.
  • Ask your friends and coworkers and family and competitors who have knowledge that might apply.

If someone doesn't know the answer, they still might be able to point you in the right direction. It's very much like interviewing sources to find facts for a story.

If it's not a tool:

  • Ask people who might know the answer.

An example: I needed advice on a FOIA request, and wasn't sure what sort of advice I needed. So I asked a few journalists under a frieNDA for their take on the topic. What they said didn't apply exactly to what I was asking for, but it pointed me down the right track.

Tools to check out

Includes things that I want to play with a result of the conference, and a bunch of questions I want to answer.

Embeddable PDFs

Do you ever need to upload a PDF and make it searchable? Use DocumentCloud! Here's a WaPo example. Here's a direct document link. DocumentCloud is a part of Investigative Reporters and Editors and is funded in part by the University of Missouri. Most of their code is on Github.

DocumentCloud is responsible for Backbone.js and Underscore.js.

Don't qualify for DocumentCloud? Use Scribd.

Audio tricks

Uploading audio? Gonna have to host your own, if SoundCloud goes out of business.

If you have audio somewhere, you can do inline audio players with SoundCite, from the Northwestern University Knight Lab. Do what? Let's make the dialup noises inline.

Image things

Want to compare two images? Use JuxtaposeJS from the Knight Lab:

Want to embed video? Probably going to have to use YouTube or Vimeo for that, sorry.

Maps

Google My Maps lets you create maps with placed icons. Here's Google's tutorial. The Lantern uses them to create the campus-area crime maps.

Want to do something even cooler? Use Google Fusion Tables. This exposes all your data, as LoHud.com found out. It's a cautionary tale.

Have a server where you can put files? Seattle Times Map Template uses the power of Node to build you a map using Leaflet.

What's Leaflet? It's a library for mobile-friendly maps. It's really customizable. It's really cool. You can use it for things that aren't maps, too!

How about Knight Lab storymap?

Timelines

Seattle Times built a timeline template, but it's not very well documents. If you use it, please contribute some docs.

TimelineJS from the Knight Lab, which is the first actual interactive I made.

Embedding things at all:

Seattle Times' Responsive Frames

Pym.js, when used on both the embedding page and the child page, can make your frames responsive!

Building presentations

There are a lot of tools out there for building presentations, but if you want something open-source that runs in a browser, check out:

Each has a slightly different approach. Reveal.js is the one you see the most often, most likely. There's a Software-as-a-service deck builder and hosting site run by the creator of reveal.js.

I used impress.js for a presentation once. It was interesting.

Static site generators

A common theme among these generators is that all the code goes in a repository, and all the content lives in a Google Sheets spreadsheet set.

What I want to know:

  • Is it possible to deploy NPR Visuals' app-template to places other than S3?

A note about news apps

Most stuff written by news apps teams these days is written in Python. Generally in Python 2.7.

Why Python, and why Python 2.7?

It's installed by default on all recent Macs. Instant, no-hassle setup.

Windows users, I feel your pain.

Python data analysis and import tools

Here, we've gone a few steps up the difficulty ladder.

  • csvkit, which transforms other data formats into CSV. Why comma-separated value files? Because they're really easy to work with.
  • agate (formerly journalism.py), for data analysis. Here's an intro to Agate by a journalism professor at University of Nebraska - Lincoln.
  • Jupyter notebooks for data analysis are slowly becoming standard. Why? Because they allow you to show your work in a way that readers can follow, and you can comment on your work to explain how it works. Other people can download and rerun your analyses. (Assuming, of course, that your datasets are public.)
  • If you have an Associated Press elections API key, elex is a wrapper to make your real-time elections analytics much much nicer. It's a joint project by NPR and the New York Times, and is open source! Associated Press API keys, sadly, are neither open-source nor free. The pricing isn't listed - you have to contact them.

What else can you do with Python?

Want a static site generator without using Python?

How about the Seattle Times' News App Template? It uses Grunt, which uses Node, which is JavaScript.

Other cool tools

  • How does inter-process messaging work? If I have a Twitter bot written in Node.js and a Python Markov chain loaded in memory, how do I get the Node bot to talk to the Twitter bot? How do I get the Twitter library to talk to a Python IRC bot?
  • Tor Messenger, an encrypted OTR client for Twitter DMs, gChat, Facebook Messenger, AIM and others.
  • FOIA Machine, an automated FOIA-filing robot that reminds you of common deadlines.
  • Signal, an app for Android and iPhone that gives you free encrypted calls and texts around the world. (contingent on data plan, of course.)
  • Working with Google Sheets? You can use the function importHTML() to scrape tables, says Thomas Wilburn

On themes

There's a very common theme here: Here's our code. Use it, because it will save you time. Less time writing code is more time reporting.

Not sure what to do?

Check out the How to train your newsroom presentation slides from Dana Williams & Rachel Schallom. Ask yourself those questions.

Follow through the Your First News App tutorial by Ben Welsh.

Email some journalists. Read their code. Make a GitHub account and use it. Follow developers on Twitter. (Lists of developers below.)

Ponderable reads: tipsheets and tutorials

During the talk, I'm going to skip this section, but you are strongly encouraged to come back here and read it later.

Things to read

First, watch the NICAR 2016 lightning talks.

Notes in general

Building better graphics

From Aaron Williams's talk on building better graphics, a strong recommendation to build mobile-first:

  • Read the thing.
  • Stop building charts/websites/apps that don't work on mobile.
  • Death to tooltips. People without mice can't hover over elements. Phone browsers may support the :hover selector, but phone hardware doesn't.
  • Build designs that work great on mobile and desktop. You can still build those large 1600-px experiences, but they should be the progressive enhancement from the phone experience. Single-column approaches are great.
  • The stuff on first page load is frequently all that a mobile user will see, because of slow connection or bad connection/reception. He says that anymore, he's using plain unenhanced Javascript for the simple stuff, because it keeps the page load smaller.
  • Mobile-first design isn't "paring things down", it's making sure that you are creating designs that are the most effective. It's the nut graf, essentially.
  • Scrollytelling is better for phones. Tapping is , especially for transit users. They have one hand to hold their phone and one hand
  • Test as much as you can, especially on older hardware. Old Android, old iPhones. If you're training your readers to expect that your site will perform poorly on your device, you will lose readers.
  • Find ways to gif your dataviz projects. Use Licecap or other screencasting tools, or turn your video into gifs.
  • Make it snackable

On testing

A lot of newsroom execs use iPads. Remember to test your content there, if your exec has or is rumored to have an iPad.

The Digital Dark Arts

Use Twitter to discover other reporter's stories by searching for the "Hi, I'm a reporter at [blank] working at a story on [blank]. DM me or email me at [email] or call me at [number]" tweets. The people being @mentioned are the people you don't want to contact. Explore the @mentioned person's tweets, finding the people who the @mentioned person tweets to. You get an idea of who everyone's listening to. This comes from Nicole Hensley and Mike Tigas.

Slides from the Digital Dark Arts talk:

Should you use other platforms for news content? (From the conversations track)

These are a lot of unattributed opinions.

  • If your website is so bad that you're offloading things to other platforms, you need to fix your website. You're losing your ad views.
  • But being on other platforms is about being where your readers are.
  • We need to be concerned about archiving our content and keeping our permalinks working on our own sites.
  • You can't do stuff beyond text-and-photos in the Facebook Instant Articles, and building AMP-compliant interactives is hard.
  • AMP/FbIA are already preferred content, over outside links. Your readers prefer them because they're faster. but you lose control.
  • "If your business model depends on the whim of an organization that doesn't share your values, you're setting yourself up to fail." (Regarding tweaking things for Facebook's algos, or SEO, or ...!)
  • Our performance is bad because of ads and analytics. Selling online ads is too expensive for most orgs, because it requires infrastructure. So we use DoubleClick and Outbrain and Taboola. Part of it is that it makes the page load more slowly. Part of it is that they make your readers' experience more unpleasant.
  • Maybe we need more performance budgets in news websites.
  • What if we built our own AMP?
  • What if news orgs pushed back against the platforms? Against AMP, against Instant Articles, against Taboola and Outbrain and, heck, DoubleClick?
  • This pushback would be celebrated like Newsies, but have the plot of Fountainhead or Atlas Shrugged. Fed up with advertising companies' mistreatment of readers, news publishing execs remove ads from their websites to punish the ad companies.

Wireservice!

Wireservice is the new umbrella project for things like csvkit and agate. lookup is a new repo that is a set of reliable lookup tables for finding X from Y. Want examples of how it's useful? Here's how to use it with agate. It gets rid of VLOOKUPs, LEFT JOINs, bespoke scripts. By Christopher Groskopf.

Building a map? Vector tiles.

Only use vector tiles, says Ken Schwencke. Map tiles are prerendered as images, and they're great for saving your reader better data. But rendering and re-rendering tiles takes f o r e v e r. Vector tiles are the same data, but easier to simplify. Here's a blog post by him about it.

What is the goal of Journalism?

Gripping journalism is that makes the world a better place is the goal of journalism, right? Thus says Adam Playford.

Make your interactives engaging

From Gregor Aisch, The Future of Interactive News: Clickers or scrollers are just like board books asking people to just keep flipping. Actual interactives are grest! Ask your readers questions. Ask them to draw things. Have them press buttons that affect the outcome of the interactive. Here's some examples. But remember that other thing about making things to interact with with only one hand. Remember what areas of the screen are most accessible to thumbs.

But run tests.

If people aren't clicking on your engaging interactive content, don't waste time on it.

Many people prefer scrolling.

Archiving

Build your web interactives with archiving in mind. What happens to them at end-of-life?

  • How will you notify readers that it's dead?
  • How will you direct readers to newer content?
  • How will you reduce functionality? Will you disable search engines? Forms? Database queries?
  • How long will you keep static assets around? (Baking dynamic sites to static assets is a great thing to do!)
  • Are you donating to The Internet Archive?

Make sure your websites support HTTPS. Here's the ethics reason why. Some tech reasons:

  • HTTP/2 is faster than HTTP, and requires HTTPS.
  • If you want to use push notifications, the browser's microphone or camera, to use geolocation data, the accelerometer, fullscreen displays, or anything else - browser manufacturers will require you to use HTTPS.
  • Commenting systems, analytics, ads, social, most font providers, many embeds - all of this stuff already works over HTTPS. Most CDNs support HTTPS, just might charge you for it.

What's the cost of HTTPS? Setup time and server config, and training people to be aware of the https versus http. But the big one is fixing your archives. Content-Security-Policy = "upgrade-insecure-requests" works great. Read more about it.

Make your visualizations 'snackable': A small image suitable for sharing on social media. This comes back to the interactives thing: It needs to load fast, be simplee, and be understandable. Hook them. Give them the strongest image that isn't misleading.

Notes for student journos

Appeal all FOIA decisions. And ask public officials for help, or say you're doing a story on FOIA denials. Run a pre-FOIA to find what you need to FOIA.

Cryptographically sign your emails to techy people, Andy Boyle says. They're more likely to respond. It's the one advantage of PGP.

It is possible to work a coding job in journalism without a computer science degree. Many devs have learned all their tooling on the job and through 'hobby coding'. Experiment when you have time.

Don't be afraid of math. "I'm not good at math" is not a reason to not practice math. You weren't good at other things, and you're certainly good at them now. Read some books. Get out the pencil and practice. Do things over and over and then cover them up and try to remember how to do them by memory. Solve a problem with the books, then cover it up and do it again without the aids. Take mental breaks. Break things up into smaller problems. These are the lessons from Ryann Jones.

Stats made easy: Resampling! It works on abnormal data. It makes things interesting! So says Jonathan Stray. Here's more info.

Consider automation. You've read the XKCD strip about whether it's a good idea to automate, and how it plays out in practice. But if you're a newsroom, you really should look into building automation tools. Simple things, like alerts about new uploads or updated data. Use IFTTT to bug you when things change. Write parsers to grab data. Don't bother automating things that atren't straightforward. Don't bother with Machine Learning; it takes too long to train things. If it's something you'll need to do again, that's clear and straightforward, that is simple and answers simple questions.

If you have problems making time for something, set goals and make accountability and take it a day at a time and realise there's no good time to start and be kind to yourself. "I think it's okay to lose the battle but win the war. These side projects are about making you feel good," says Nicole Zhu, who reads 52 books in 52 weeks.

Side projects let you define the metrics of your own success.

Ask the most-basic questions, even if they sounds naïve. The answers are probably not as simple as you think.

Cool people

A list of Twitter users at INN member organizations

NPR's visuals team

Another list of news nerds and apps teams

Cool organizations

A Twitter list of apps teams near INN.

Another list of apps teams

Conferences and Training to go to

There are a ton of journalism-adjacent conferences and web-development-adjacent conferences. I'm interested in data-driven journalism, design, WordPress and Python.

Society for News Design

Like IRE, SND runs a lot of events.

From an old hand, tips on attending SNDSF 2016.

SNDMakes is a series of news-focused hackathons, which frequently have scholarships available.

SRCCON

SRCSourceCONCon is run by Knight-Mozilla OpenNews. This year, it's July 28 and 29. You know how I said that NICAR was people talking about how they solved problems? SRCCON is people discussing problems, and thinking about how to solve them. It's a smaller conference, and there are often scholarships for students to attend. If you volunteer, your registration price is reduced.

Write the Docs

Write The Docs, May 22-24 in Portland: a documentation conference. Docs are important if you want anyone to understand your code. Three months from now, you're part of "Anyone". I went in 2015, and learned how to think like my users.

They also have a European meeting.

WordCamp

  • WordCamp Columbus and WordCamp US, two WordPress-centric conferences. WordPress catches a lot of flak, but it's also powering most of the web for very good reasons.

Propublica training

OpenNews community calls

  • OpenNews runs a biweekly community call. They're interesting to listen in on: who's working on what, and some short interviews with interesting folks.

SPJ/NECIR training

This is not a complete list.

There are a lot of opportunities out there.

Conclusion

Do you want to be a news dev? Are you already a news dev?

Check out:

Here's my 2015 recap.