Counting Things in Texts

*This is one of those posts that probably deserves a fuller version, something I might consider submitting to ProfHacker, but I’m in the middle of a bunch of other work right now, so it’s going to be shorter than I like.*

Two recent posts by non-scholars have used two practices[^1] that are emerging as conventions within the digital humanities: one is counting unique words to get a sense of vocabulary and the second is counting the number of times characters in a text appear together in scenes.

Matt Daniels counts words in [“The Largest Vocabulary in Hip Hop”][hh], and, thanks to an astute commenter, uncovers that at least one rapper purposefully minimizes his vocabulary in order to maximize sales: an interesting parallel here would be to examine political discourse of various public figures to see how appeals to the oft-lauded common man might be realized by vocabulary.

Ben Blatt counts the co-occurrences of characters in scenes in [“Which Friends on Friends Were the Closest Friends?”][ff]. Like Daniels, Blatt is upfront about his method: “To determine which characters shared scenes, I downloaded transcripts of all 236 episodes … If a character spoke a line in a scene, I marked him or her as present.” The results are interesting for those familiar with the show, but, as my wife noted, few undergraduate students would be familiar with _Friends_, but one could do this with a program with which they were familiar. Perhaps the most famous example of this kind of counting characters co-occurring in scenes — not to be confused with _Comedians in Cars Getting Coffee_ — is Franco Moretti’s [“”Network Theory, Plot Analysis.”][fm] (see below for conventional reference), wherein he uncovers that the compelling nature of _Hamlet_ may very well be that Hamlet and Claudius are both central characters, with Horation a close third — I saw Moretti give a version of this paper at the NEH seminar on network studies in the humanities organized by Tim Tangherlini at UCLA’s IPAM in 2010. (Oh, the debt I owe to Tangherlini!)


Moretti, Franco. 2011. Network Theory, Plot Analysis. _New Left Review_ 68: 80–102.

[^1]: There is actually a name for this, but I can’t think of it at the moment. *Method*? More coffee is needed….


Leave a Reply