⌫ Home

NICAR 2018 notes: Archiving data journalism

  1. The commandments as presented
  2. Commentary and notes
    1. 1. Don't expect someone else to clean up your mess
    2. 2. Publish static files
    3. 3. Don't depend on rando external links
    4. 4. Version thy templates
    5. 5. Use existing archives as a platform
  3. Follow-up: it's a CMS problem
  4. An aside on scheduling archiving sessions
  5. Shoutouts

One of the last panels at NICAR 2018 was Archiving Data Journalism, with Katherine Boss, Meredith Broussard, Nora Paul, and Ben Welsh. In that session, Ben Welsh proposed a number of commandments for archiving news.

The commandments as presented

John Muyskens transcribed them as follows:

  1. Don't expect someone else to clean up your mess
  2. Publish static files
  3. Don't depend on rando external links
  4. Version thy templates
  5. Use existing archives as a platform

Brent Jones' notes have a similar version:

  1. Thou shalt not make a mess and expect someone else to clean it up for you
  2. Thou shalt publish as static files immediately or eventually
  3. Thou shalt not depend on rando links
  4. Thou shalt version your CSS and base templates
  5. Thou shalt see the big archives as a platform

My own notes on the session were very similar to Brent's, so we were probably copying off the slides:

  1. Thou shalt not make a mess and expect someone else to fix it for you
  2. Thou shalt publish as static files immediately
  3. Thou shalt not depend on rando links
  4. Thou shalt version your CSS and base templates
  5. Thou shalt see the big archives as a platform

Commentary and notes

Some of this commentary is mine; most of it is probably from discussion during the session. My notes are unclear on the subject.

1. Don't expect someone else to clean up your mess

Thou shalt not make a mess and expect someone else to fix it for you

Don't expect someone else to tell the Internet Archive to save your work. Don't expect that the Internet Archive will save it. Don't expect the Internet Archive to exist in perpetuity. (Do donate to the Internet Archive.)

If you want to make sure that something gets saved in an archive, save it there yourself. Ben Welsh suggested that it might be a good idea to have CMS plugins that automatically notify archives when content is published, republished, and updated.

2. Publish static files

Thou shalt publish as static files immediately

Or at least eventually. If your interactive/app must be published dynamically, have plans for how to bake it out as a static site after its initial swell in readership has diminished. Know when you'll make that transition — a set date or pageviews-per-day level — and have a clear procedure for doing so.

The Internet Archive is not searchable by keyword; only by date. If you have to bake out an entire news app, it's a bajillion files that can link to each other, but the mechanisms for navigating those bajillion files need to be present from the beginning. Plan for how your app will function when it's not dynamic.

Also, use web standards. Don't use proprietary code. Adobe Flash will be sunset in 2020, but it's already effectively dead. What have we lost? Every "soundslides" presentation, many video players, tons of interactives, half the Homestuck animated content, and certain generations of custom fonts.

Thou shalt not depend on rando links

Host your own assets, packaged alongside what you're publishing. Don't expect that external images, fonts, JavaScript, CSS, or templates will remain where they are.

During discussion, I raised a question about JavaScript libaries like Pym.js, which had recently received a security update. If you link out to the library maintainer's CDN-hosted version, you can receive instant security updates across all of your projects that use that library. If you keep a version of the library packaged locally with each graphic, you have to track, version-control, and update all graphics that depend upon it. Security requires fast updates; archiving requires duplication of assets. Is the solution to run your own CDN?

4. Version thy templates

Thou shalt version your CSS and base templates.

Don't let changes to organization-wide CSS and templates break your site. Reference style.1.css, style.2.css, where those filenames are the version of the organization's stylesheet that was active at the time you created the graphic.

5. Use existing archives as a platform

Thou shalt see the big archives as a platform.

You don't have to hoard your static files. You can let them be deleted if they're on an archive. However, those archives need help curating and organizing. During the session, Meredith Broussard said that we need digital archivists and librarians back in newsrooms.

Your org should budget for an archivist to help run your archives and coordinate with outside archives.

Follow-up: it's a CMS problem

Ben Welsh tweeted a follow-up to his commandments a month later, in response to a tweet from Jon Keegan:

It's a big deal. My challenge to the speakers: What can the people in the audience actually do to make a difference?

IMHO, the people raising awareness, researching and working on this topic need to formulate an action plan for publishers.

It's not enough to say "Maybe one day the publishers will make this a priority" or see some fantasy business prop. The industry is in turmoil. That day ain't coming.

At #NICAR18, I tried to come up with a list of rules for news nerds, i.e. tech people in newsroom who build custom things but don't control their CMS.

@JohnMuyskens transcribed

But the reality is that custom projects are sliver of digital news. We need to attack this problem at the core: The CMS.

The people that design and operate them need simple, clear instructions on how to do better. And the companies that pay them need incentives.

— Ben Welsh (@palewire) April 5, 2018

An aside on scheduling archiving sessions

This session was at the every end of the conference, on a day when many attendees had already left. Yet if its messages are to be taken seriously, they should be thought of from the very beginning of the conference.

"Build with archiving in mind" should be planned into the design and development phases of every hands-on class.

NICAR 2018 was not alone in scheduling the session late; SRCCON 2017's session on archiving was also at the end of the conference. NICAR 2017's archiving talk wasn't at the end of the conference (10 a.m. on the second day), and SRCCON 2016's talk on surfacing archival content was in the afternoon of the first day.


Shoutouts

Thanks to NICAR for facilitating this conversation and to John Muyskens and Brent Jones for taking notes.

Thanks to Jeremy Singer-Vine for the nicar-2017-schedule repository, and to OpenNews for solid SRCCON archives.

NICAR 2018 notes: Archiving data journalism - April 6, 2018 - Ben Keith