⌫ Home

Open Source and Journalism talk at OSUOSC

  1. Preamble
  2. Examples of open-source code in journalism
    1. Charting libraries
    2. Data processing workflows
    3. Scrapers
    4. Bots
    5. Web utilities
    6. Entire websites
    7. Team internal structures
  3. Why make open-source journalism code?
  4. Really cool #NICAR18 sessions that I think college students would be interested in
  5. A list of resources to be aware of
  6. Career advice
  7. A list of currently-open paid internships and apprenticeships, and jobs
  8. Updates

The following post is talk notes and links from my Thursday, March 29, 2018 talk at the Ohio State University Open Source Club meeting. The talk's about NICAR 2018 and open source in news and journalism. This list would not be possible without the support of many, many news nerds, in particular Chrys Wu who maintains the unofficial NICAR link list every year.

I work at the Institute for Nonprofit News, but in this talk I'm speaking as an individual. opinions are mine, though they may in place overlap with my employer.

Why do I give this talk every year at OSUOSC? Because I want to pay OSC and OSU back for the resources they gave me, now that I have other resources. I want everyone to feel the sort of amazement captured in this tweet:

my entire life after an awesome conference like #NICAR18 is a) tweeting questions at people who taught sessions and then b) tweeting articles at them like LOOK WHAT I MADE TEACHER!!!!

— Rachel Alexander (@rachelwalexande) March 26, 2018

Hopefully, you'll come out of this with a general idea of the sorts of open-source tools that exist in journalism, and also ideas about how to use them, where to go to learn more, and maybe ideas

If you'd like to skip straight to the the largest link list of them all, go to Chrys Wu's list. Other peoples' notes are linked from there; but I'm gonna say go look at Brent Jones' notes and mine.

Preamble

Why do journalists publish the code they use to write stories and build graphics? Let's first look at the things that get open-sourced, and then I'll talk about why these things are published for free under a license that says anyone can use them for any purpose.

What does it mean to "open-source" some code, to publish it with an "open source" license? Take a gander at Wikipedia:

The open-source model is a decentralized software-development model that encourages open collaboration. A main principle of open-source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open-source movement in software began as a response to the limitations of proprietary code.

Wikipedia, Open-source model

Basically, you're putting all your work out there with a binding statement that says that anyone can use, modify, and distribute your code, so long as they say that they got it from you. That's the gist of the very popular MIT License, and most of the other commonly-used licenses are a little more restrictive than that.Of course, some people publish their code under joke licenses, which can cause problems for the lawyers….

Examples of open-source code in journalism

What sorts of things get open-sourced? Tools that people have built. Some of the stuff you'll find on GitHub or in a blog post is one-off code that solved a problem. Some of the code is projects that people use daily. Some of it has gained a life of its own beyond its original use.

Not all of it has good documentation. If you encounter a problem, politely ask the maintainers. Or ask in the News Nerdery Slack. When you figure out the solution to a problem, talk to the maintainers of the tool about updating the documentation. If it's on GitHub, you can make a pull request to update the docs, like so.

Charting libraries

Chartbuilder is a chart-making tool for use in your browser. Use it in your browser! I think at one point The Lantern had its own hosted version of Chartbuilder, but that was back when it first moved to WordPress, oh many years ago.

NPR's dailygraphics toolkit for making embeddable web charts. They have blog posts about it; I gave a talk about it, and Saint Louis Public Radio has five years' of archived graphics built with that tool. Here's an example in action.

A map template using Google Fusion Tables, which was used in the Maps Made Easy session. Here's an example. For a different perspective on maps, try the LA Times' Web Map Maker.

Landline and Stateline, JavaScript libraries that turn GeoJSON into SVGs for use on websites.

Data processing workflows

journalize, a Javascript library for formatting some things in the AP Style way.

The Python libraries csvkit, agate, dedupe,

ai2html turns Adobe Illustrator files into HTML and CSS.

Elex is a wrapper for the Associated Press elections API, version 2. For the fuller explanation of why it's significant, read the Source and Poynter. The short version is that the AP elections API is something that everyone uses, and everyone has to write code to handle, so they decided to team up to save time and money, and to "end the elections arms race":

“End the elections arms race” has become a rallying cry in American data journalism. Many newsrooms spend tremendous resources writing code to simply load and parse election data. It’s time we stopped worrying about the plumbing and started competing on the interesting parts. We decided it was time we put some code against our beliefs – our contribution is a tool we’re calling Elex.

Introducing Elex, a Tool to Make Election Coverage Better for Everyone by Jeremy Bowers and David Eads

That's one of the reasons people and newsrooms make their code open-source.

Scrapers

News Klaxon was built by the Marshall Project, and what it does is watch webpages for changes. And then it lets you know. You can probably wire it up to a horn.

scotusbot is a New York Times project that makes a Slack message when the US Supreme Court files a new filing.

Bots

The LA Times and Politico use a common Django-powered Slackbot for publishing live chats to their website.

Web utilities

The model-view-collection library Backbone.js and its related library Underscore.js are built and maintained by the nigh-ubiquitous document-hosting service DocumentCloud. These were the main MVC powering the web before Angular and React were made.

Pym.js makes it easy to embed <iframe> elements in your page actually match the width of the page. This script powers NPR's dailygraphics rig for embedding small inreactive graphics in webpages, which you can see in action in a past presentation of mine. And because NPR built that rig, INN built a WordPress plugin to make it easier to use with WordPress. St. Louis Public Radio uses Pym frequently, and archives all their past graphics on GitHub.

oTranscribe helps you transcribe audio.

autoEdit is transcript-based video editing.

CSV to HTML Table is a simple solution to a common problem of turning data into something that people can browse and interact with.

The Northwestern University Knight Lab's Juxtapose, Soundcite, StoryMap, and Timeline tools for building simple interactives. Here's a simple Timeline instance on The Lantern. They're working on Storyline and Scene, a VR toolkit

There are a bunch of new transcription services. Ones that people talked about were Temi, Casette, and Otter.

Entire websites

WPGraphQL is a WordPress plugin that implements the GraphQL API query language for Wordpress sites. Facebook made GraphQL; WPGraphQL was started by the fine folks at The Denver Post.

SecureDrop, the Tor-based document submission system for whistleblowers and leakers, is maintained by the Freedom of the Press Foundation on GitHub. This project was originally started by Aaron Swartz, and it's used today by at least 38 news organizations.

The Institute for Nonprofit News maintains the WordPress theme Largo for news groups. It's good.

If you've heard of the web framework Django, it was originally developed by the newspaper The Lawrence Journal-World.

Team internal structures

Instead of linking to one single website style guide, I'll link to Styleguides.io, a repository of website style guides. They're useful if you want to see how other people think about how they design their websites.

Want to see how a team is organized? Here are team documents for INN Labs, ProPublica, NPR Visuals, The Chicago Tribune's former Tribune News Apps team (bonus link),

And this one isn't a tool (and isn't an ad) but we've found Mural useful for sticky-note-style brainstorming exercises with a fully remote team.

SRCCON is a great conference to learn more about newsroom culture and code. They offer session transcripts, which you can find by going to the sessions page and scrolling down to "Sessions from Previous SRCCONs, and then browsing the sessions by year and day.

Why make open-source journalism code?

Journo-developers open-source our code in the hopes that, by doing so, someone else will find our code and find that it is useful to them. It may save them time. It may save us time, down the road, when we're working at a different company, because we won't have to re-implement a template or a data-processing workflow or a shim. It's in the hopes that any of the following can happen:

  1. Someone else will find the code and use it
  2. Someone else will find the code, use it, find a problem, and tell the owner
  3. Someone else will find the code, use it, find a problem, fix it, and contribute that fix back to the larger community
  4. Many someones will continue to find fixes and improve the code until it's a well-built thing that solves a problem or many problems

And the thing is, "someone else" there is a really broad category. Look at the list of things above. Internal newsroom tools have been adopted by other teams within the same newsroom, by other newsrooms, by other organizations, and even people completely unrelated to journalism.

All of this in the service of journalism. What journalism is, and isn't, and whether it is a form of activism, are topics for another blog post that is not this one.

Who funds this sort of exploratory, aspirational coding? Besides the newsrooms that use and write the code, there's grants like the Knight Foundation's technology innovation grants. There are strange things like Bad Idea Factory.seriously, go read Biffud's official articles of incorporation And there are things that are funded in part by viewers like you.

Really cool #NICAR18 sessions that I think college students would be interested in

Want an intro to Javascript and the D3.js library, and a simple map to boot? Here's Carla Astudillo's NICAR 2018 session.

If you already have some data analysis, check out the First Python Notebook course from the California Civic Data Coalition. If you don't think Python and Jupyter/iPython is your language of choice, then maybe Observable is more your style — it's Javascript. There's also R Markdown for the R-users in the audience.

Want to learn to build news "applications", aka single-page fancy interactives like Snowfall or The Intercept's Trial and Terror? Follow the coursework at First News App, from IRE and NICAR. NPR maintains a rig for building single-page apps that's built around Google Sheets.

From a strictly technical point of view, the Trial and Terror page by @MoizSyed is a great example of the type of "graphics stack" that is increasing common as a story type in our field.

— Ben Welsh (@palewire) March 30, 2018

It is exactly the sort of thing @emamd, @write_this_way and I and I had in mind when we designed our "First Graphics App" class at #NICAR18.
Want to be like @MoizSyed? Take our free class today online here. firstgraphicsapp.org

— Ben Welsh (@palewire) March 30, 2018

Want to learn machine learning? Here's Peter Aldhous's slides and stories and code for that.

Build satellite images using freely-available, public-domain Landsat satellite imagery, using Photoshop or a command-line utility.

Resources for project management from a project management consultancy.

If you're looking for web literacy stuff, check out Mozilla's web literacy courses. It's not a NICAR session, but it's still good.

Why might you want to learn this sort of stuff? Scroll down to the jobs section of this post.

A list of resources to be aware of

Now I'm going to talk about things that aren't lessons, that aren't training, that aren't jobs, but are places to go where you can learn of new lessons, training, and jobs. They're also places to meet interesting people with cool ideas.

As much as Twitter is a blue hellsite, Twitter is also the place where journalists go to be social. Twitter-dot-com as an interface is becoming pretty useless, though. I recommend Tweetdeck as a slightly better interface.

Save your clippings using Save My News, which makes sure your stuff is on the Internet Archive's Wayback Machine and on Archive.is.

For articles about how people build projects and code for journalism, read Source. It has articles like How we publish live chats with Slack, which not only has explanations and descriptions, but instructions and code samples. Source is a project of OpenNews, which also runs the fantastic how-we-work conference SRCSourceCONCon. SRCCON has scholarships and volunteer positions available that cut the ticket price and help cover travel and housing. You should apply, when they become available. Participation applications open Monday, April 2, and the conference is June 28-29 in Minneapolis. It's a highly participatory conference.

Also, OpenNews in its own right is a great organization. If you have free time on every other Thursday at 12 p.m. Eastern, join the community conference call. It's okay if you don't have anything to say; lurking is encouraged!

If you're looking for a SlackRemember that Slack logs everything and you should not discuss sensitive information there. organization to talk all things related to being a nerd in news — for whatever value of 'nerd' you use — join the News Nerdery slack. There's also Lonely Coders Club. Read the Source article "Lonely Coders, Here's How to Get Things Done". Are you worried about being the only one in your newsroom who does the codes? Read the slides and tipsheet from the relevant NICAR 2018 talk.

For more training, check out JournalismCourses.org, a series of MOOCs organized by the Knight Center for Journalism in the Americas.

Are you a member of the Society of Professional Journalists? Why not? Rip a copy of the code of ethics out of the back of the magazine, buy a cheap frame, and hang it on your wall.

Are you an investigative reporter? Sign up for membership in Investigative Reporters and Editors, and not just because membership is required for their flagship IRE and Computer-Assisted Reporting conferences. Beyond NInighCARcar Technically, CAR is the name of the conference; NICAR is its parent. :p, IRE provides tipsheets, curated databases, and a boatload of training events and documentation. NICAR was in March; the 2018 IRE conference will be in Orlando June 14-17 and does have a mentorship program.

I've already linked to Chrys Wu's roundup of links and tips from NICAR 2018, but there's also the Official IRE link list.

Do you do design stuff? Check out the Society for News Design. See also the Asian American Journalists Association, the National Association of Black Journalists, the National Association of Hispanic Journalists, the Native American Journalists Association, and the National Lesbian and Gay Journalists Association. ProPublica will pay your way to those conferences.

Do you like WordPress? Join WordCamp for Publishers in Chicago this summer.

News nerd blogs to follow, which occasionally have interesting open-source code releases:

Career advice

I missed this session at NICAR, but the Early career straight-talk panel is something to read.

Sara Hendren's list of things to be aware of is good. I'm going to quote the bullet points here:

  1. It's okay to factor a relationship into your professional plans.
  2. Making decisions can actually be a way to find yourseld—not the other way around.
  3. What if everything you’re doing right now is exactly what you need to be doing?
  4. If you’re really stuck but in fortunate possession of a job and health insurance….
  5. Here is the hardest thing for many people about adulthood: Staying awake.

If you're kinda weirded out by the cultural aspects of "drinking to forget" and "three-martini lunches" and the whole drinking culture, you're not alone. Heavy drinking doesn't have to be a cultural norm, and we can change it. Rachel Alexander, the same person I quoted at the start of this post, gave a lightning talk at NICAR entitled "real talk: alcohol and journalism" and posted online the slides and a rough transcript.

Millie Tran's WHAT AM I GOING TO DO WITH MY LIFE slide deck is good.

A list of currently-open paid internships and apprenticeships, and jobs

This list is not complete! If you'd like to stay up-to-date with news nerdery jobs, I recommend:

(Beware newsrooms that are owned by trust funds and other financially-extractive entities. You might not have a job there in 5 years, if they decide to downsize your newspaper for profit.)

Politico is hiring a front-end dev/designer and a developer.

VT Digger is hiring a full-stack dev.

The Washington Post is hiring a summer design intern.

The New York Times is hiring a interactive news developer, and is now accepting visual op-eds.

City Lab is hiring a data visalization journalist.

City Bureau is hiring a developer for their labs team.

Chalkbeat is hiring a director of product.

The University of Nebraska-Lincoln is hiring a data journalist for an assistant professor of practice.

NPR is hiring a devops engineeer and systems administrator, a dev manager, and a news apps developer, and a graphic designer, and a lot more.

USA Today is hiring a senior data reporter.

Updates

This thing has been heavily edited since it first went online on Thursday, March 29, 2018. Typos have been fixed, words have been expanded, and general fever-brain noises have been reduced or expanded upon as appropriate. The majority of this post was drafted with a giddying head cold. It's probably doubled in length since it was first published, and improved in readability, flow, and content.

Open Source and Journalism talk at OSUOSC - March 29, 2018 - Ben Keith