Open Source and Journalism (and NICAR 2016) talk
- What is NICAR?
- First, a word from our sponsors.
- How to get started
- Tools to check out
- Ponderable reads: tipsheets and tutorials
- Things to read
- Notes in general
- Notes for student journos
- Cool people
- Cool organizations
- Conferences and Training to go to
- Conclusion
This post is a long collection of tools, resources, and tips that I saw at NICAR 2016 or in conversations around it. There's also some links in here that are not from NICAR 2016, but seemed relevant. This blog post serves as lecture notes for my March 24, 2016 talk at the The Ohio State University Open Source Club
What is NICAR?
It's a yearly conference about computer-assisted reporting.
Computer-assisted reporting is, as Wikipedia puts it:
the use of computers to gather and analyze the data necessary to write news stories.
In practice, this means a ton of things:
- scraping politicians' twitter feeds
- performing statistical analysis
- keeping a database
- making predictions based on polls
- tracking aircraft
- meteorology
- Census reporting
If this sounds like it's trending into Math, don't worry, we'll cover that.
More generally, NICAR is a bunch of journalists and coders getting together to talk about issues they've faced and how they've overcome the issues. It's a skillshare. It's showing off code. It's showing off tools. It's a lot of talk about cross-org collaborations.
This talk isn't even a tenth of what was covered at NICAR 2016. If you want everything that was linked to or discussed at NICAR 2016, check out Chrys Wu's megacollection.
This talk is broken into several segments:
- How to get started
- Neat tools and projects
- Ponderable reads
- Awesome people
- Cool organizations
- Events to attend
First, a word from our sponsors.
I'm a news apps developer at the Institute for Nonprofit News. Our dev team does a mix of WordPress and Python coding. INN has covered my cost of attendance at NICAR for two years running. INN is closely associated with IRE, and therefore with DocumentCloud. INN has more than 100 member organizations, who you should check out.
I graduated from OSU in May 2014 with a B.S. in Agricultural Communication and a minor in plant pathology, and a lot of bylines from The Lantern, OSU's student-run newspaper.
These are my own opinions and endorsements, and not those of my employer.
How to get started
A pep talk.
1. Don't be intimidated.
You can do this.
2. Okay, be a little intimidated.
You've never used this before, and the documentation doesn't make a lot of sense.
3. Be very intimidated.
What you're trying to do has literally never been done before.
4. Don't give up.
@benlkeith Start with a project that really grabs you, some problem you have to solve. Motivation + structure for early practice.
— Jessica Chapel (@jnchapel) March 24, 2016
5. Ask questions.
These resources will be mentioned later, but in short:
- Ask the maintainers of the tool.
- Ask other users of the tool.
- Ask your friends and coworkers and family and competitors who have knowledge that might apply.
If someone doesn't know the answer, they still might be able to point you in the right direction. It's very much like interviewing sources to find facts for a story.
If it's not a tool:
- Ask people who might know the answer.
An example: I needed advice on a FOIA request, and wasn't sure what sort of advice I needed. So I asked a few journalists under a frieNDA for their take on the topic. What they said didn't apply exactly to what I was asking for, but it pointed me down the right track.
Tools to check out
Includes things that I want to play with a result of the conference, and a bunch of questions I want to answer.
Embeddable PDFs
Do you ever need to upload a PDF and make it searchable? Use DocumentCloud! Here's a WaPo example. Here's a direct document link. DocumentCloud is a part of Investigative Reporters and Editors and is funded in part by the University of Missouri. Most of their code is on Github.
DocumentCloud is responsible for Backbone.js and Underscore.js.
Don't qualify for DocumentCloud? Use Scribd.
Audio tricks
Uploading audio? Gonna have to host your own, if SoundCloud goes out of business.
If you have audio somewhere, you can do inline audio players with SoundCite, from the Northwestern University Knight Lab. Do what? Let's make the dialup noises inline.
Image things
Want to compare two images? Use JuxtaposeJS from the Knight Lab:
Want to embed video? Probably going to have to use YouTube or Vimeo for that, sorry.
Maps
Google My Maps lets you create maps with placed icons. Here's Google's tutorial. The Lantern uses them to create the campus-area crime maps.
Want to do something even cooler? Use Google Fusion Tables. This exposes all your data, as LoHud.com found out. It's a cautionary tale.
Have a server where you can put files? Seattle Times Map Template uses the power of Node to build you a map using Leaflet.
What's Leaflet? It's a library for mobile-friendly maps. It's really customizable. It's really cool. You can use it for things that aren't maps, too!
How about Knight Lab storymap?
Timelines
Seattle Times built a timeline template, but it's not very well documents. If you use it, please contribute some docs.
TimelineJS from the Knight Lab, which is the first actual interactive I made.
Embedding things at all:
Seattle Times' Responsive Frames
Pym.js, when used on both the embedding page and the child page, can make your frames responsive!
Building presentations
There are a lot of tools out there for building presentations, but if you want something open-source that runs in a browser, check out:
Each has a slightly different approach. Reveal.js is the one you see the most often, most likely. There's a Software-as-a-service deck builder and hosting site run by the creator of reveal.js.
I used impress.js for a presentation once. It was interesting.
Static site generators
A common theme among these generators is that all the code goes in a repository, and all the content lives in a Google Sheets spreadsheet set.
- Tarbell, a static site generator from the Chicago Tribune News Apps Team and used by orgs like Al Jazeera and the Oregon Register-Guard and the Chicago Tribune
- NPR Visuals' dailygraphics rig, which they use for everything. Charts, graphs, maps, chloropleths, tables, gifs. Everything.
- NPR Visuals' app-template, which we used for Power Players and Making Connections. It's meant for heavier lifting.
What I want to know:
- Is it possible to deploy NPR Visuals' app-template to places other than S3?
A note about news apps
Most stuff written by news apps teams these days is written in Python. Generally in Python 2.7.
Why Python, and why Python 2.7?
It's installed by default on all recent Macs. Instant, no-hassle setup.
Windows users, I feel your pain.
Python data analysis and import tools
Here, we've gone a few steps up the difficulty ladder.
- csvkit, which transforms other data formats into CSV. Why comma-separated value files? Because they're really easy to work with.
- agate (formerly journalism.py), for data analysis. Here's an intro to Agate by a journalism professor at University of Nebraska - Lincoln.
- Jupyter notebooks for data analysis are slowly becoming standard. Why? Because they allow you to show your work in a way that readers can follow, and you can comment on your work to explain how it works. Other people can download and rerun your analyses. (Assuming, of course, that your datasets are public.)
- If you have an Associated Press elections API key, elex is a wrapper to make your real-time elections analytics much much nicer. It's a joint project by NPR and the New York Times, and is open source! Associated Press API keys, sadly, are neither open-source nor free. The pricing isn't listed - you have to contact them.
What else can you do with Python?
- Turn data into sound using the MIDITime library from Reveal and the Center for Investigative Reporting.
Want a static site generator without using Python?
How about the Seattle Times' News App Template? It uses Grunt, which uses Node, which is JavaScript.
Other cool tools
- How does inter-process messaging work? If I have a Twitter bot written in Node.js and a Python Markov chain loaded in memory, how do I get the Node bot to talk to the Twitter bot? How do I get the Twitter library to talk to a Python IRC bot?
- Tor Messenger, an encrypted OTR client for Twitter DMs, gChat, Facebook Messenger, AIM and others.
- FOIA Machine, an automated FOIA-filing robot that reminds you of common deadlines.
- Signal, an app for Android and iPhone that gives you free encrypted calls and texts around the world. (contingent on data plan, of course.)
- Working with Google Sheets? You can use the function
importHTML()
to scrape tables, says Thomas Wilburn
On themes
There's a very common theme here: Here's our code. Use it, because it will save you time. Less time writing code is more time reporting.
Not sure what to do?
Check out the How to train your newsroom presentation slides from Dana Williams & Rachel Schallom. Ask yourself those questions.
Follow through the Your First News App tutorial by Ben Welsh.
Email some journalists. Read their code. Make a GitHub account and use it. Follow developers on Twitter. (Lists of developers below.)
Ponderable reads: tipsheets and tutorials
During the talk, I'm going to skip this section, but you are strongly encouraged to come back here and read it later.
- The ultimate NICAR 2016 link collection < has just about everything in it
- If you plan to attend a NICAR in the future: a tipsheet from long-term attendees
- 10 tips for students going to NICAR
- Using
t
andcsvkit
to follow #nicar16 from the command line: a tutorial. - A list of every hidden journalism-related social media group that Melody Kramer could find
- Quinn Norton: PGP tends to fail silently. Use other tools. For someone who's not tech-savvy, use Signal, or if you need SMS support, use SMSSecure (Android-only).
- Want to clean your data?
- Use Python machine learning by Cathy Deng
- Use regular expressions, by Christian McDonald
- Use OpenRefine, by Sarah Cohen
- Julia Smith's Data Viz for All.
- Don't develop in the same browser all the time. Switch things up.
- MaryJo Webster's Data Journalism training materials
- Brent Jones' notes from NICAR 16
- Need to find the database admin of a gov database? Find the System of Record Notice document, then reverse-engineer the email address. - @Erie
- Data security advice for journalists
- Keep your sensitive chats off the record: Not in Slack, not in Gchat, not in iMessage, or anything else. It better be using OTR. Use Tor Messenger.
- Risks and Rewards of Rolling Your Own Criminal Justice Data
- @dancow's regular expressions slides
- vin.place lets you look up car owners by their VIN, or the car by the owners.
- How to install Jupyter Notebook and Agate on your computer, from Matt Waite, a professor who got his pilot license in order to teach drone journalism.
- Contribute to OpenElections - you don't have to be a coder!
- Contribute to the California Civic Data Coalition - you might need to be a coder! Look for the next California Code Convening and contribute then - because there are raffles for prizes!
Things to read
First, watch the NICAR 2016 lightning talks.
- Three ethical frameworks for scraping by Martin Burch.
- Cory Doctorow's Information Doesn't Want to be Free, which was read for News Nerd Book Club's meetup during the Conversations Track. Here's some reading notes on the book and the bok club notes
- sharktop.us
- How The Washington Post is improving longreads
- Hacks/Hackers Colorado recap notes.
- on appropriation
- #nicarreads 2016 book recommendations.
- Caught in the Crossfire, a story about how Propublica and other news outlets are serving to Tor users.
- Propublica is now accessible over Tor.
- The Above Chart Tinyletter
- Web animation past and future
- Matt Waite's 1997 data journalism handbook
Notes in general
Building better graphics
From Aaron Williams's talk on building better graphics, a strong recommendation to build mobile-first:
- Read the thing.
- Stop building charts/websites/apps that don't work on mobile.
- Death to tooltips. People without mice can't hover over elements. Phone browsers may support the
:hover
selector, but phone hardware doesn't. - Build designs that work great on mobile and desktop. You can still build those large 1600-px experiences, but they should be the progressive enhancement from the phone experience. Single-column approaches are great.
- The stuff on first page load is frequently all that a mobile user will see, because of slow connection or bad connection/reception. He says that anymore, he's using plain unenhanced Javascript for the simple stuff, because it keeps the page load smaller.
- Mobile-first design isn't "paring things down", it's making sure that you are creating designs that are the most effective. It's the nut graf, essentially.
- Scrollytelling is better for phones. Tapping is ☹, especially for transit users. They have one hand to hold their phone and one hand
- Test as much as you can, especially on older hardware. Old Android, old iPhones. If you're training your readers to expect that your site will perform poorly on your device, you will lose readers.
- Find ways to gif your dataviz projects. Use Licecap or other screencasting tools, or turn your video into gifs.
- Make it snackable
On testing
A lot of newsroom execs use iPads. Remember to test your content there, if your exec has or is rumored to have an iPad.
The Digital Dark Arts
Use Twitter to discover other reporter's stories by searching for the "Hi, I'm a reporter at [blank] working at a story on [blank]. DM me or email me at [email] or call me at [number]" tweets. The people being @mentioned are the people you don't want to contact. Explore the @mentioned person's tweets, finding the people who the @mentioned person tweets to. You get an idea of who everyone's listening to. This comes from Nicole Hensley and Mike Tigas.
Slides from the Digital Dark Arts talk:
Should you use other platforms for news content? (From the conversations track)
These are a lot of unattributed opinions.
- If your website is so bad that you're offloading things to other platforms, you need to fix your website. You're losing your ad views.
- But being on other platforms is about being where your readers are.
- We need to be concerned about archiving our content and keeping our permalinks working on our own sites.
- You can't do stuff beyond text-and-photos in the Facebook Instant Articles, and building AMP-compliant interactives is hard.
- AMP/FbIA are already preferred content, over outside links. Your readers prefer them because they're faster. but you lose control.
- "If your business model depends on the whim of an organization that doesn't share your values, you're setting yourself up to fail." (Regarding tweaking things for Facebook's algos, or SEO, or …!)
- Our performance is bad because of ads and analytics. Selling online ads is too expensive for most orgs, because it requires infrastructure. So we use DoubleClick and Outbrain and Taboola. Part of it is that it makes the page load more slowly. Part of it is that they make your readers' experience more unpleasant.
- Maybe we need more performance budgets in news websites.
- What if we built our own AMP?
- What if news orgs pushed back against the platforms? Against AMP, against Instant Articles, against Taboola and Outbrain and, heck, DoubleClick?
- This pushback would be celebrated like Newsies, but have the plot of Fountainhead or Atlas Shrugged. Fed up with advertising companies' mistreatment of readers, news publishing execs remove ads from their websites to punish the ad companies.
Wireservice!
Wireservice is the new umbrella project for things like csvkit
and agate
. lookup
is a new repo that is a set of reliable lookup tables for finding X from Y. Want examples of how it's useful? Here's how to use it with agate
. It gets rid of VLOOKUPs, LEFT JOINs, bespoke scripts. By Christopher Groskopf.
Building a map? Vector tiles.
Only use vector tiles, says Ken Schwencke. Map tiles are prerendered as images, and they're great for saving your reader better data. But rendering and re-rendering tiles takes f o r e v e r. Vector tiles are the same data, but easier to simplify. Here's a blog post by him about it.
What is the goal of Journalism?
Gripping journalism is that makes the world a better place is the goal of journalism, right? Thus says Adam Playford.
Make your interactives engaging
From Gregor Aisch, The Future of Interactive News: Clickers or scrollers are just like board books asking people to just keep flipping. Actual interactives are grest! Ask your readers questions. Ask them to draw things. Have them press buttons that affect the outcome of the interactive. Here's some examples. But remember that other thing about making things to interact with with only one hand. Remember what areas of the screen are most accessible to thumbs.
But run tests.
If people aren't clicking on your engaging interactive content, don't waste time on it.
Many people prefer scrolling.
Archiving
Build your web interactives with archiving in mind. What happens to them at end-of-life?
- How will you notify readers that it's dead?
- How will you direct readers to newer content?
- How will you reduce functionality? Will you disable search engines? Forms? Database queries?
- How long will you keep static assets around? (Baking dynamic sites to static assets is a great thing to do!)
- Are you donating to The Internet Archive?
Make sure your websites support HTTPS. Here's the ethics reason why. Some tech reasons:
- HTTP/2 is faster than HTTP, and requires HTTPS.
- If you want to use push notifications, the browser's microphone or camera, to use geolocation data, the accelerometer, fullscreen displays, or anything else - browser manufacturers will require you to use HTTPS.
- Commenting systems, analytics, ads, social, most font providers, many embeds - all of this stuff already works over HTTPS. Most CDNs support HTTPS, just might charge you for it.
What's the cost of HTTPS? Setup time and server config, and training people to be aware of the https
versus http
. But the big one is fixing your archives. Content-Security-Policy = "upgrade-insecure-requests"
works great. Read more about it.
Make your visualizations 'snackable': A small image suitable for sharing on social media. This comes back to the interactives thing: It needs to load fast, be simplee, and be understandable. Hook them. Give them the strongest image that isn't misleading.
Notes for student journos
Appeal all FOIA decisions. And ask public officials for help, or say you're doing a story on FOIA denials. Run a pre-FOIA to find what you need to FOIA.
Cryptographically sign your emails to techy people, Andy Boyle says. They're more likely to respond. It's the one advantage of PGP.
It is possible to work a coding job in journalism without a computer science degree. Many devs have learned all their tooling on the job and through 'hobby coding'. Experiment when you have time.
Don't be afraid of math. "I'm not good at math" is not a reason to not practice math. You weren't good at other things, and you're certainly good at them now. Read some books. Get out the pencil and practice. Do things over and over and then cover them up and try to remember how to do them by memory. Solve a problem with the books, then cover it up and do it again without the aids. Take mental breaks. Break things up into smaller problems. These are the lessons from Ryann Jones.
Stats made easy: Resampling! It works on abnormal data. It makes things interesting! So says Jonathan Stray. Here's more info.
Consider automation. You've read the XKCD strip about whether it's a good idea to automate, and how it plays out in practice. But if you're a newsroom, you really should look into building automation tools. Simple things, like alerts about new uploads or updated data. Use IFTTT to bug you when things change. Write parsers to grab data. Don't bother automating things that atren't straightforward. Don't bother with Machine Learning; it takes too long to train things. If it's something you'll need to do again, that's clear and straightforward, that is simple and answers simple questions.
If you have problems making time for something, set goals and make accountability and take it a day at a time and realise there's no good time to start and be kind to yourself. "I think it's okay to lose the battle but win the war. These side projects are about making you feel good," says Nicole Zhu, who reads 52 books in 52 weeks.
Side projects let you define the metrics of your own success.
Ask the most-basic questions, even if they sounds naïve. The answers are probably not as simple as you think.
Cool people
A list of Twitter users at INN member organizations
Another list of news nerds and apps teams
Cool organizations
A Twitter list of apps teams near INN.
Conferences and Training to go to
There are a ton of journalism-adjacent conferences and web-development-adjacent conferences. I'm interested in data-driven journalism, design, WordPress and Python.
Society for News Design
Like IRE, SND runs a lot of events.
From an old hand, tips on attending SNDSF 2016.
SNDMakes is a series of news-focused hackathons, which frequently have scholarships available.
SRCCON
SRCCON is run by Knight-Mozilla OpenNews. This year, it's July 28 and 29. You know how I said that NICAR was people talking about how they solved problems? SRCCON is people discussing problems, and thinking about how to solve them. It's a smaller conference, and there are often scholarships for students to attend. If you volunteer, your registration price is reduced.
Write the Docs
Write The Docs, May 22-24 in Portland: a documentation conference. Docs are important if you want anyone to understand your code. Three months from now, you're part of "Anyone". I went in 2015, and learned how to think like my users.
They also have a European meeting.
WordCamp
- WordCamp Columbus and WordCamp US, two WordPress-centric conferences. WordPress catches a lot of flak, but it's also powering most of the web for very good reasons.
Propublica training
- Propublica is offering need-based scholarships to the NABJ/NAHJ conference this summer.
OpenNews community calls
- OpenNews runs a biweekly community call. They're interesting to listen in on: who's working on what, and some short interviews with interesting folks.
SPJ/NECIR training
- SPJ is offering a grant for investigative journalism training in Columbus and Portland.
This is not a complete list.
There are a lot of opportunities out there.
Conclusion
Do you want to be a news dev? Are you already a news dev?
Check out:
- the News Nerdery Slack
- the Lonely Coders Slacks
- the the NICAR-L listserv
- the very quiet
#newsnerdery
channel on Freenode IRC - /r/newsapps
- that list of every hidden journalism-related social media group that Melody Kramer could find
Here's my 2015 recap.