A Journey to Germany … And Thoughts on Crowdsourced Data

Another week is flying by here in Cambridge: Persian classes, meetings with professors and other Harvard affiliates, lunch and dinner seminars, and too many good books that demand to be read yet lay forlorn on nightstands and desks all around. It’s all good, however: from waking up early every day to slip into my apartment building’s fitness room or stroll along Memorial Drive to use one of Harvard’s pool facilities, to walking to the Harvard Academy offices and pilfering some of the Government Department’s free coffee, to losing myself in writing portions of the book manuscript, to imagining cartographical phantasmagoria to accompany the body text of the book, there’s a ton going on – and I love it. Yet with a wave of oppressive humidity and an Indian summer turning Cambridge into a humidity trap this week, I’m glad to be escaping to Germany for a long weekend to attend a conference on Cold War history, where I’ll have the chance to unwind, enjoy some cooler weather, and engage with German, Russian, and other international scholars on some of my thoughts for how to re-write the chapter of the book manuscript that deals with the intersections of a couple of themes: the role of Soviet scholars of Afghanistan in the country in the 1980s; Party advisers in the south (mostly Qandahar Province) during the same period; and, very tentatively, how one can think of how to integrate the story of this Soviet developmental / social science dispotif towards Afghanistan with the story of how Western agencies take on a greater role in administering humanitarian aid in the same region towards the late 1980s. I write, for example, having just dashed back from Widener’s Microform salon, where I was copying reports from the United Nations’ “Operation Salam” program – a UN-administered initiative to coordinate aid among different UN agencies towards Afghans in Pakistan, but also to begin to implement aid projects in the country (Afghanistan itself) from 1989 onwards. One of the challenges facing the historian looking at all of this material is how to write this history of how different aid apparatuses – first the Soviet, then the UN (acting in conjunction with non-UN Western aid groups) built a layer cake of developmental styles on southern and eastern Afghanistan in particular in the late 1980s. Here’s hoping that some of the conference participants will have ideas as I struggle to push through the jet lag to deliver my talk …

Off to Deutschland!

In the meantime, the Digital History seminar which is one of the few courses I’m taking during this first semester continues to go well. A first week introduced us to such note-taking tools as Diigo, Zotero, and EverNote; I remain skeptical of at some of these tools – Zotero integrates poorly with Scrivener, my writing tool of choice – but Diigo is quite useful for highlighting texts one has found online, and keeping them in one’s brain in a more organized fashion than the old, trusted, search through one’s Chrome browser history. This week, in addition to some more theoretical reading on what DH (as the Digital Humanities are often abbreviated) actually is beyond “just” tools, we’ve been introduced to some new models and new tools to help stimulate our imaginations as we look forward to designing our own project as the semester approaches. (Hence some of the rushing through Harvard’s map collections and Inter-Library Loan for obscure statistical sets churned out by the USSR or the UN about Afghanistan in the 1980s … hopefully this all leads towards brave new mapping projects.)  TweetMap, an instance of a larger MIT-run project called MapD, is one such example. In the words of their Facebook pitch,

Map-D is a new approach to querying and visualizing big data. At Map-D we build massively parallel databases that process spatial and geographic information system data as well relational data in milliseconds, performing up to 70 times faster than CPU-based solutions.Map-D’s unique database architecture creates a vertically-integrated end-to-end solution for data querying, visualization and analysis. It uses the immense computational power of next generation graphics processing units (GPUs) as the backbone of a data processing engine that marries traditional relational database management systems with advanced analytic and visualization features.

What this practically means is that one can conduct quick searches of huge amounts of data (in the case of TweepMap, Twitter posts over the last month or so) and overlay them onto maps. One might think of it as a more sophisticated, and obviously, chronologically abbreviated, version of the Google nGram charts that increasingly accompany scholarship these days. For fun, for example, I did a search for all tweets mentioning “In N Out,” the beloved California burger chain that is famous not only for its great food but also its true regional character: In N’Out’s corporate management doesn’t franchise stores, and the restaurants don’t use freezers (out of a commitment to delivering fresh food), so all restaurants have to be within reasonable distance of two distribution centers in southern California or Texas. Check out the TweetMap (depicted below) and the geographical reach of In N’Out more or less matches this story. This example is just for fun, but the power of the idea should be clear (again, at least for people working on relatively recent topics, or interested in short-term phenomena – even the team of engineers at MIT appears not to have yet found a way to trawl through the truly epic total database of Tweets, for example). Nor are the ramifications of something like this linked just to scholarship. During the 2010 protests in Iran, for examples, many commentators – few of them proficient in Persian or scholars of Iran – wet themselves over how this was the first example of a “Twitter Revolution,” even though something like less than 1% of Iranians used Twitter.  Use of tools like TweetMap might help distinguish between how, for example, the emigre communities of Iranians (or, to take the obvious present-day example, Egyptians or Syrians …) feel amount political turmoil in their country.

The In N’Out Empire: one of several reasons why California > the East Coast, now documented through the digital humanities.

Beyond being introduced to such tools, however, we’ve also been invited to reflect on the use of crowdsourced data in historical scholarship this week. Some context for the reader new to these things: in recent years, several libraries and institutions have begun massively participatory online, digital repositories of major events – 9/11 or the Japanese tsunamis – that invite readers to share their data (photographs, video, personal testimony …) about these events. The attraction is clear: not only do ordinary Americans (or Japanese) get to participate in a visible way in writing history – a shift away from the top-down approach of which one might accuse museums or history as done in universities. More than that, professional historians themselves gain access to a humongous new corpus of social history in a way that they might have formerly only been able to glean from oblique readings of other sources.  We know how theologians and philosophers (Voltaire, for example) reacted to huge natural disasters like the Lisbon earthquake, for example, but what about the reactions of the hoi polloi? That was more difficult for past eras, but thanks to these archives – culled from voluntary submissions and subsequently tagged, collated, and curated by readers – we can learn more about contemporary reactions to these events. The uses of such crowdsourced data for events that took place recently is already suggestive. Some linguists, for example, have used “deep reading” techniques to trawl the thousands of messages people wrote on 9/11 to examine changes in American English dialect; the phrase “OMG” was only permeating teenage English discourse then, some argue. But beyond these more technical, apolitical exercises in scholarship, such databases might also allow Americans – professional historians and citizens alike – to better understand the mentality of the country at a time when – rarely – everyone seemed united and the partisan divisions so bemoaned today seem to disappear.  Understanding that moment might be especially useful for contemporary political debates, too. Had scholars made more vocal use of the stories on the 9/11 database to show how benign the attitude of many New Yorkers was towards American Muslims prior to, or even shortly after, 9/11 (as opposed to attitudes that were inflamed in the buildup to the 2003 War in Iraq), it might have been harder for opponents of the so-called “Ground Zero Mosque” – which was neither at Ground Zero nor a mosque – to demagogue the issue as  being “too sensitive” for most New Yorkers. On the contrary, a closer reading of the rich social topography that the September 11 Digital Archive could provide might even show the specific ways in which American Muslims living in Manhattan precisely lacked for a safe space, a space of refuge, in Lower Manhattan that day.  Politics aside, however, the broader promise is rich indeed: denser, more personal, more interactive social history than the kind our mentors practiced.

However sincerely felt at the time, such sentiments about the ‘sensitivity’ of the construction of Park51 may have reflected more a constructed public memory of 9/11 than how the survivors, and victims’ families, felt at the time. Could crowdsourced digital history help with a reckoning?

What are some of the limits to these kinds of projects? How does their existence change the way I think about my own work? One uncontroversial point that I’d make is that such collections have to be just one wrench in the toolbox of the historian. One of the methodological impulses that I’ve picked up – for better or for worse – from spending time with international historians is the perceived need to cross-examine sources between different archives. Particularly for major crises (the Cuban Missile crisis, Able Archer 83, etc.) it’s important to cross-examine sources from Washington, Moscow, Beijing, Havana, etc., to understand how decisions are made under pressure and what systematic distortions of information policymakers had in each capital. Even though my work isn’t quite so traditionally diplomatic, I do the same thing in Developing Powers: Soviet and American and West German economists were in Kabul at the same time, learning from one another, following one another. Combining these accounts – drawn from painstaking research in Koblenz, Moscow, Washington, and many other places – helps provide enough layers to understand this cross-talk on a more micro level. What’s lacking even in that micro account, however, is what “ordinary people” felt – and it’s there that crowdsourced collections have a huge role to play, I think: not to write the umpteenth book that draws almost exclusively on these new equivalents of the Mass-Observation movement, but to layer these “broad” sources in with the “deep” readings that come from sustained attention to other collections (often drawn from traditional archives).

Crowdsourced resources like this Pashto-language Radio Azadi fan mail from a group of prisoners in Paktia (a province in eastern Afghanistan) dazzle, but they also raise questions about how much DH has to remain subordinated to a traditional graduate training if it aspires to escape its Anglophone roots

To make the discussion more concrete, I often think that if someone were to write the equivalent of what my project is for the current American occupation of Afghanistan, ideally, they’d want to do it not just using U.S. official records – DoD, State, the Bush and Obama (and 2016 winner’s … ?) White Houses – but also similarly crowdsources collections like the terrific Voices From Afghanistan project put together by the Library of Congress and Voice of America. Compared to the primarily online collection method of the September 11th Digital Archive or the Japanese tsunami collection, the way that the LOC and Radio Azadi (“Radio Liberty”) Afghanistan received their crowdsourced materials – 15,000 letters and compositions in Dari and Pashto, many of which are lavishly illuminated by hand – blows the mind. Following announcements and requests by Radio Azadi (which was widely heard around the country following the collapse of the Taliban regime) for listeners to engage with the station, readers responded in droves. As the press announcement for the original exhibit explains,

For the past eight years, from every corner of Afghanistan, letters from Radio Azadi’s listeners have made their way to RFE/RL’s headquarters in Prague. Merchants, clerics, farmers, university students and schoolchildren from cities and rural villages have sent “fan mail” and other correspondence to the station. Letters sent from large cities are placed into regular postal bins while those from remote regions — including the country’s inaccessible tribal areas — are loaded onto horses and pack animals. Through nearly impassable mountainous terrain, they come to one of RFE/RL’s 11 post-office boxes around the country or to a drop box in Peshawar, Pakistan. Radio Azadi journalists collect the letters and drive them to RFE/RL’s Kabul bureau, where they are eventually loaded onto planes bound for RFE/RL’s headquarters in Prague, Czech Republic.

Now that’s what I call crowdsourcing – donkeys may not even be Web 1.0, but still impressive. Scarcely used by scholars (to my knowledge), the collection would seem to be crying out for attention for DH methods. Yet here one runs into the problem of doing the real legwork of scholarship, as well as into the problem of DH still, largely, being an Anglophone game. Because so many of the sources are handwritten in elaborate Persian script, text recognition is bound to be more difficult. Many of the compositions that listeners sent in were in the form of accordion books – a form where reader engagement takes place differently from just flipping the page. Most obviously, they’re all in Dari and Pashto – with perhaps some Uzbek in there, too. That means years of language learning for the non-native speaker, plus paleographical training to read handwritten sources, plus an awareness of regional dialect in the extremes that Pashto displays, plus several years of work in the American archives (not to mention those of the UN, of Pakistan, of India …). Running down the list, it’s easy to see why PhDs in the humanities frequently take close to a decade to complete these days. Exploiting resources like these ones to the fullest means that universities still have to commit to the full package of humanities training, and more: not always an easy proposition these days.  DH might enable awesome projects like the one I’ve just imagined, but without a professional path that is more flexible than the one we currently have, it’s unlikely that many female scholars (because of familiar social pressures) or untenured scholars might pursue such a project: even with “deep reading,” the crowdsourced data plus the language component makes it too massive for one person to handle – and yet to get the tenure book out of the way, you need to have been that one person who did handle it. Them’s the shakes for now. So perhaps I’m not much consolation at the end of this post: crowdsourced collections pose a huge promise, but it’s not exactly clear to me how they really change the kind of research that we do, and what really excites us.


