December 2011. In September, BBC news headlined a story
"Supercomputer Predicts Revolution."
[1]
On his Web site, Kalev H. Leetaru,
the University of Illinois researcher behind this claim,
describes his work with equal modesty: he says
that he "forecast the Arab Spring."
[2]
He used software to sweep through English-language news
archives, counting words of negative emotional tone and plotting
the time series of a metric derived from their
frequency of occurence in
articles about Egypt, Tunisia and Libya. He published some
details of this work in the online journal First Monday.
[3]
The principle he seeks to demonstrate is that
this type of data, which he calls "sentiment mining,"
can predict political conflict. Here's his graph
for Egypt:
[4]
It's arguable that this plot doesn't usefully forecast anything. After all, the final negative spike that the author claims as a predictor of the Egyptian uprising occurs only about two weeks ahead of the event.
He also applies the same technique to show that the
trend in the tone of the news media overall has been toward
the negative. One data set he uses is the entire contents
of the New York Times from 1945 to 2005, whose negativity
score he has plotted in the following graph:
[5]
As it happens, I have some data of my own that have a direct bearing on the interpretation of this graph. I gathered it years ago in the pre-Internet days when the only way to acquire time series data like this was by reading microfilm of archived newspapers on a hand-cranked viewer in a library.
At the time, I was studying a change in the New York Times's
format that took place around 1970. Among other changes,
the number of articles per issue declined,
while the number of pages in the paper remained
approximately the same. I plotted the number of articles
in the Times's front
section, per week (specifically, Monday through Saturday),
at various points in the years during which this transition
seemed to be taking place
(horizontal axis numbers refer to years 1963-73):
When this graph is rescaled
so that it can be aligned
with Leetaru's on the time axis, the combined plot shows that the
decline in story
count, occurring around 1969, coincides with a
sort of phase change in his metric:
Before 1969, his metric varies widely around a positive value with a hint of cyclicity. After 1969 the short-term variance narrows around a trend that seems almost to be controlled by a steady hand, descending to a minimum in 1973 and then rising linearly to a kind of plateau beginning around 1978, interrupted only by a negative spike in 1990-91 and then a prolonged depression starting in 2001.
In 1969-70 the Times moved toward fewer, longer articles dealing with issues in depth. Inevitably, the subject matter of these articles involved problems and conflict, presumably increasing the usage of words of negative tone. Therefore, the peculiar, structured shape of this graph might be the outcome of a change in editorial focus that can be isolated to a short time span between 1969 and 1971.
The exact nature of Leetaru's metric isn't spelled out in his paper or in his cited references. But when applied to recent news about Egypt, interpretation is easy: the country is headed for a rough patch, as seen through the eyes of the English-speaking press. On the other hand, when it's applied to the entire contents of the New York Times, it doesn't just mean that the Times is trending toward a more negative tone. Instead, it reveals something more subtle and interesting, something that begs for a more fine-grained analysis.
Charles Packer mailbox@cpacker.org