Towards an algorithmic criticism: DH methods in application
All criticism is algorithmic.
Ramsay (2011) writes:
“In the classroom one encounters the professor instructing his or her students to turn to page 254, and then to page 16, and finally to page 400. They are told to consider just the male characters, or just the female ones, or to pay attention to the adjectives, the rhyme scheme, images of water, or the moment in which Nora Helmer confronts her husband. The interpreter will set a novel against the background of the Jacobite Rebellion, or a play amid the historical location of the theater. He or she will view the text through the lens of Marxism, or psychoanalysis, or existentialism, or postmodernism. In every case, what is being read is not the ‘original’ text, but a text transformed and transduced into an alternative vision, in which, as Wittgenstein put it, we ‘see an aspect’ that further enables discussion and debate.” (16)
Writing a thesis chapter, I note every reference to Nietzsche, Freud, post-structuralism, or the hermeneutics of suspicion in a Sheila Heti novel, recording the page numbers in pencil on the book’s inside cover, then refer back to that index while writing. Writing a seminar paper, I search various keywords in an 800-page Dickens novel, looking for references to water, drowning, the Thames.
Computers make algorithmic criticism easier to perform, but in doing so they aid an operation readers and interpreters have been performing since the beginning of humanistic study. In Too Much to Know, Ann M. Blair gives a history of note-taking practices spanning centuries, beginning with the advent of print. Eighteenth century readers would habitually copy favorite passages from their reading into commonplace books, which they might organize by theme. Scholars debated whether copying out relevant passages from books was “mechanical” work they should have their servants, amanuenses, apprentices, wives, or children perform, or an integral part of scholarly work. In the sixteenth century, Conrad Gesner, a bibliographer, recommended indexing books by cutting passages “directly from the printed book,” keeping two copies so that passages could be retrieved from both recto and verso (96). In the seventeenth century, Thomas Harrison designed a “scrinium literatum” or literary closet “designed to store slips of paper on hooks associated with commonplace headings that were inscribed alphabetically on little lead plaques”: a sort of spatialized commonplace book (94). The closet had “3,000 headings and a further 300 slots left blank for additions” and could, at least in theory, allow groups of scholars to easily share passages and notes amongst each other (94, 101).
When we interpret texts, we interpret fragments of texts that are themselves selected through interpretive processes. We “commonplace” passages into our notes. We copy or physically cut portions of texts. Then we further transform these texts or paratexts, narrowing down the relevant data or rearranging it. Since the time of illuminated manuscripts, books have often been designed to facilitate this type of algorithmic reading or criticism: books (in the eighteenth century, even novels) include indexes; they employ other paratext, such as chapter divisions, headings for each page giving brief plot summaries, lists of scenes, footnotes, and printed marginal notes. Readers also produce their own paratext through written marginalia, notes, commonplace books, and so on.
What if we view the computational tools available to contemporary scholars as an extension of these practices? Ramsay suggests a similar conclusion, describing the texts produced by algorithmic transformations as “paratexts” of the original. These paratexts provide us with new maps of the original text, and like geographic maps, they emphasize particular aspects of their “territories” over others, bringing particular aspects of those territories into focus, opening up new interpretive possibilities.
Computing allows us to perform these operations unusually systematically and quickly, even on a length text or large corpus. Algorithms may also produce paratexts that are productively at odds with the ones human interpreters might normally produce: the “enormous liberating power of the computer,” Ramsay writes, lies not in “the immediate apprehension of knowledge, but instead what the Russian Formalists called ostranenie - the estrangement and defamiliarization of textuality” (3). These operations can allow us to see new and unexpected things in the texts we transform. But algorithmically generated paratexts are only a starting point for the work of the human interpreter.
The role of computational tools in literary criticism, “If text analysis is to participate in literary critical endeavour in some manner beyond fact-checking” should be to “endeavour to assist the critic in the unfolding of interpretative possibilities. We might say that its purpose should be to generate further ‘evidence,’ though we do well to bracket the association that term holds in the context of less methodologically certain pursuits.” (10) Computational tools enable us to very efficiently identify patterns in texts — computers can scan large sets of data, often with high degrees of accuracy, and return evidence that can be used to develop original claims or to support existing ones.
The problem with empiricism and falsifiability in the digital humanities
Ramsay identifies a tendency in text analysis in which “the analogy of science” is “being put forth as the highest aspiration of literary study.” (3) He cites a number of scholars who believe that the propositions put forth by literary critics “‘have the technical status of hypothesis, since they have not been confirmed empirically in terms of the data which they propose to describe - literary texts.’” (4)
Scholars working on post-critical methods that challenge the supposed dominance of critique and the “hermeneutics of suspicion” in literary studies have also picked up on the digital humanities as a field that could lend more objectivity to literary study. For example, in “Surface Reading: an Introduction,” Stephen Best and Sharon Marcus write that digital humanities approaches “resonate” with their work on surface reading, a method aimed at “attend[ing] to the surfaces of texts rather than plumb[ing] their depths” (1-2). According to Best and Marcus, digital humanities work, like surface reading, “seeks to occupy” a “space of minimal critical agency.” Digital humanists can attempt to “correct for” their “critical subjectivity, by using machines to bypass it, in the hopes that doing so will produce more accurate knowledge about texts. […] [C[omputers can help us to find features that texts have in common in ways that our brains alone cannot. Computers are weak interpreters but potent describers, anatomizers, taxonomists.” Because of this, they may “bypass the selectivity and evaluative energy that have been considered the hallmarks of good criticism, in order to attain what has almost become taboo in literary studies: objectivity, validity, truth.” This could help to produce knowledge that is both qualitative and empirical, “expand[ing]” the work that literary critics can do (17).
As Ramsay cautions, however, the aim of computational tools in the digital humanities is not to ascertain the empirical “truth” of a critical claim. This is because “literary arguments … do not stand in the same relationship to facts, claims,and evidence as the more empirical forms of inquiry. There is no experiment that can verify the idea that Woolf’s ‘playful formal style’ reformulates subjectivity or that her ‘elision of corporeal materiality’ exceeds the dominant Western subject.” (7)
Indeed, the goals of scientific investigation and those of literary criticism are entirely at odds — we do not aspire to or assume that “there is a singular answer (or a singular set of answers) to the question at hand.” (15) Instead, we seek a number of answers and interpretations that can generate further discussion, “not so that a given matter might be definitely settled, but in order that the matter might become richer, deeper, and ever more complicated.” (16)
The fact that computers cannot piece together scraps of evidence to make critical claims doesn’t mean we shouldn’t use them. Actually, “calling computational tools ‘limited’ because they cannot do this makes it sound as if they might one day evolve this capability, but it is not clear that human intelligence can make this determination objectively or consistently.” Creating computers that can interpret texts in the way humans do might not even be possible, let alone desirable — we are as yet uncertain that our ability to create arguments is reproducible, since we don’t know the mechanisms by which we do so, and it’s doubtful even if we did that these could somehow be considered objective. And objectivity is certainly not the goal, either: “We read and interpret, and we urge others to accept our readings and interpretations. Were we to strike upon a reading or interpretation so unambiguous as to remove the hermeneutical questions that arise, we would cease to refer to the activity as reading and interpretation” (10).