Session 425 ~ Numbers and Letters: Empirical Method in Literary Studies


Reviewed by Sarah Allison

James F. English, Univ. of Pennsylvania

Robin Valenza, Univ. of Wisconsin, Madison, “Enumerating and Visualizing Early English Print”
Ted Underwood, Univ. of Illinois, Urbana, “The Imaginative Use of Numbers”
Mark McGurl, Stanford Univ., “Being and Time Management”

As its title suggests, this panel focused on a key problem in empirical method in literary studies: the translation of letters to numbers and back again. Texts unfold one word after another, but data sets don’t. So the panelists asked: What happens to the initial object of study in that transformation, and how does that translation relate to other kinds of literary interpretation? How do you use the digital analysis of texts to learn something that matters? What kinds of concepts or knowledge make this analysis possible?

The transformation of letters to numbers or, as Robin Valenza put it more accurately, of “entities to vectors,” is central to the empirical method on the table here.  She offered a methodological overview of approaches to texts from searching for a keyword—a simple, intuitive unit that lends itself to a return to an initial text—to a “topic,” or list of words that tend to co-occur, which demands interpretation in its own right before it can illuminate a larger corpus. Valenza has degrees in Computer Science, Linguistics and Engineering, and English; her discussion of interdisciplinary teamwork emphasized the importance of producing research publishable in both disciplines. She cited twin pleas by computer scientists for humanists to recognize that (1) exemplars and outliers are both interesting and (2) statistics are not an argument. This second point turns out to be particularly important when we think about empirical methods, because it means that literary scholars—known on Valenza’s research team as “domain experts”—must recognize their part in making highly subjective judgments about the results produced by computer scientists.

Ted Underwood’s “Imaginative Use of Numbers” looked not at the problem of analyzing results or creative accounting (considering “numbers as evidence”), but rather at the problem of how we should count words—which “strategies of discovery” are possible. The methodological focus of this talk was a paper that used topic modeling to scrutinize the print run of PMLA from 1924-2006. (The paper under discussion was co-written with Andrew Goldstone of Rutgers and is available here.) Topic modeling is a way of picking up patterns across large bodies of texts (see Underwood’s own definition and explanation). I tend to associate topic modeling with top-secret, government-sponsored email-reading, but this paper revealed its beautiful suitability for cultural discourse analysis. Underwood and Goldstone focused on changing critical approaches to texts in PMLA, and they found that the transitions between critical schools—which one might imagine in terms of rupture—are “smoother than you’d think.” Looking at the occurrence of topics across a wide stretch of texts revealed that it takes time for new critical paradigms to be registered in the way we write about texts.

Mark McGurl introduced himself as a relative newcomer to the field of digital humanities, as he has just begun working with the Stanford Literary Lab. Franco Moretti’s critique of the Litlab’s early work is that concepts too often come last; however, here they came first. McGurl’s discussion of “Being and Time Management” opened with a mesmerizing overview of 20th-century perspectives on temporality before shifting to a discussion of how temporal perspective might connect with narrative style. He sketched out a binary in American style intimately connected with temporal perspective: minimalism and maximalism. (He compared Ernest Hemingway’s commitment to the present moment evident in the very title of In Our Time [1925] with William Faulkner’s obsession with the Southern past in Absalom, Absalom [1936].) McGurl began by revisiting Robert Cluett’s stylistic analysis of Hemingway’s atypical nonfiction work, Death in the Afternoon (1932), in which Cluett demonstrated that Death in the Afternoon was not only not-very-Hemingway, but was, in fact, typical of books in general at the time. Approximating Cluett’s metrics, the Litlab confirmed this finding and went on to show that early works by Hemingway are characterized by just the features one might expect (unusually short sentences, few dependent clauses, etc.).

Moderator James English opened the Q&A by addressing the problem of “what one might expect of Hemingway’s style,” and suggested that we tend to overestimate the importance of surprise in our results. Surely it means something to “confirm” in a more subtle and clearly-defined way what readers have long sensed? And surely it’s not so surprising that “literariness” as such becomes important during—rather than before—the canon wars? This—and the defense of what is surprising about these results—brought us squarely back to Valenza’s point about the highly-subjective interpretation of empirical results. The unsurprising-but-sound study is haunted by the specter of confirmation bias: how do we avoid simply finding what we’ve set out to find?

Sarah Allison, Ph.D., is a member of the Stanford Literary Lab and a co-author of its first pamphlet, “Quantitative Formalism,” a study of style and genre, and its forthcoming pamphlet, “Style and the Sentence.”


Leave A Reply