Newspaper = Data Warehouse: A Different Sort Of Journalism

Posted on Fri 08 September 2006 in Dispatches • 3 min read

Adrian Holovaty, one of the creators of Django (which you will read my raves about again and again), has posted a wonderful meditation on how online newspapers need to change. He centers his thoughts on one central point:

One of those important shifts is: Newspapers need to stop the story-centric worldview.


Holovaty explains quite succinctly how journalists gather structured data every day, and instead of focusing on composing this information into a static story, they should be focusing on storing this data into a machine-readable format that allows the information to be used again and again for a variety of uses, and provides concrete examples why this would be such a powerful and useful change.

For example, say a newspaper has written a story about a local fire. Being able to read that story on a cell phone is fine and dandy. Hooray, technology! But what I really want to be able to do is explore the raw facts of that story, one by one, with layers of attribution, and an infrastructure for comparing the details of the fire – date, time, place, victims, fire station number, distance from fire department, names and years experience of firemen on the scene, time it took for firemen to arrive – with the details of previous fires. And subsequent fires, whenever they happen.

That’s what I mean by structured data: information with attributes that are consistent across a domain. Every fire has those attributes, just as every reported crime has many attributes, just as every college basketball game has many attributes.


Now, if you read his post, Holovaty is really focusing on the ability of the newspaper to “repurpose” their data in order to rapidly develop new and powerful features for their own services, but what excites me about this idea is the potential for marketing that kind of information. Let’s say newspapers take his suggestion and begin specializing in what they do best, the rapid collection of structured data. The journalists will still produce stories (it’s necessary, as Holovaty explains), but the focus of the organization is to fill their servers with as much granular data regarding the event as possible. What if then the newspapers provide an API for other applications or organizations to access that raw data?

I’m not saying that they would give it away for free, quite the opposite. That information in its structured form is worth far more than the articles that it generates due to its re-usability. Charge organizations a subscription fee for direct access to data, and then those subscribers can use the API to develop powerful products and services that the news gathering organization does not have the resources to pursue. Rather than focusing their business on the final presentation of news (although there will still be plenty of that), share the focus with the aggregation of the source data, and then serve as a supplier of that information to other vendors. The possibilities it would open for development are really staggering, and I suspect that data-subscriber revenues for the newspapers would be substantial.

As our demand for online services increases, I suspect this type of model will become absolutely necessary, and it is thus inevitable. If the newspapers don’t go for it, some other business will rise to fulfill the same function. Newspapers have the advantage though, since each newspaper is admirably suited to provide highly specialized data for its location. In fact, it is likely that most of those external developers will need to subscribe to several companies’ data streams in order to get the totality of information needed to meet the demand of a diverse online user base.

Don’t get me wrong, the role and importance of traditional journalism, providing analysis and organizing that data into a meaningful story, will always remain. However, I think if newspapers are to survive this New Media transition, it will be essential to pursue this as a parallel business model. With today’s emphasis on rapid development, the early adopters are likely to net a large number of start-ups as subscribers, and I for one think the sooner they get started the better. :)