## Following My Own Advice
For years now I have been encouraging students, both beginning and advanced, to keep a journal of their activities as one way of breaking down the barrier to getting writing done. I have especially encouraged graduate students working on their dissertations to try it. And I have done this while only being an intermittent practitioner myself. (I confess that this is in part one of the great advantages of having a spouse who practices the same profession: one is free to do much of the daily review over the dinner table. The pret-a-ecouter audience is great, but it disengages one important dimension of the process: writing.)
And so, John Anderson, if you are reading this post, here is me doing what I said, an account of trying my hand at textual analysis.
## The Onus ##
At the end of last year I was invited to participate in an NEH seminar on “Networks and Networking in the Humanities” which will be hosted by UCLA’s Institute for Pure and Applied Mathematics later this summer. Earlier this year the participants received a list of homework assignments: two books to read, a technical paper or two, and the production of an edge list.
The books have been interesting. (More on each one in separate posts.) The technical paper was at the border of my ken, but I followed chunks of it. The production of the edge list, a list of links in a network, has been the hardest task. Of course, part of it was nomenclature. “Edge list” through for a loop, new as I am to networkese, but I grokked it with the help of the assigned reading — and a variety of web reading. (Thank you, intarwebs.)
But there was another dimension to the edge list assignment that was stymying me: the data. Yes, I have the emergent data from the boat book, but I don’t feel entirely comfortable rushing to produce more data for the sake of the seminar if it means rushing certain dimensions of the research and I don’t quite have a grip on all the data I already have in a way that I am comfortable pouring it into a new paradigm of analysis and modeling. (Like some mental version of Twister.)
And so I needed a data set with which I could work that would allow me to do the kind of analysis that I hoped network theories and models would make possible. In particular I am interested in applying these paradigms to ethnographic contexts where we need to understand how individuals make their way through the world using the ready-made mentifacts that we sometimes call folklore as “equipment for living.”
What I think that means is that I want to understand how individuals within a given group (a social graph, if you will) draw from a repertoire (network) of forms (stories, legends, anecdotes, jokes, etc.) which themselves variously reflect and refract a network of ideas (ideology) dispersed (variably) throughout the group.
Or, as folklorist Henry Glassie once put it: “Culture is made up of ideas, society of people.” But ideas just don’t bounce around peoples’ heads and they don’t exist out in the world, at least very often, unencapsulated. Ideas and values are usually embedded in the things we say and do.[^1] We keep these things around, these stories and explanations, because they resonate with our values and beliefs. At the same time, the forms not only give shape to the ideas but also shape them.
This dynamic interaction has been the focus of folklore studies for the past century. For the last forty years, studies of culture and language have taken an ethnographic turn, sometimes called “performance” and sometimes called “ethnomethodology,” which has focused on the important role that individuals play in the intertextual network of forms (and thus the ideological network embedded within them).
I am one of those performance-oriented scholars. Performance studies has produced a wide range of profound micro-level studies of folklore in action. In the last decade or so, there has begun to be an attempt to build back toward the philological framework from which the performance orientation sprang and against which it initially pushed back. It’s time to fold these things together, and I think network theories offer one possibility for doing so.
## The Data ##
If not my own data, then what other corpus? I wanted to work with materials that I knew fairly well. I began to build a database of Louisiana folklore in print, focusing especially on tales and legends, but the amount of time to get a large enough corpus digitized and into the database, even using OCR software, quickly loomed too large. A great project, but one that could easily take up an entire summer, not the limited time I had to get something up and usable in order to begin to complete the seminar assignment — which I was late fulfilling anyway.
I did, however, initiate some conversations that may yet produce a foundation for such a database, contacting authors of several texts for electronic copies of their manuscripts to facilitate data entry. (The metadata is entirely a separate matter for now.)
The answer to my question didn’t come to me until I was in Providence, Rhode Island for the sixth, and final, Project Bamboo planning workshop. I don’t know if somebody said something or suggested something, but I struck upon the idea of using Zora Neale Hurston’s _Mules and Men_ as the basis for the seminar assignment and for my own initial explorations into the various software tools that are available. I was reasonably hopeful that somewhere, someone would have digitized the text, and I was right: the text is not in Project Gutenberg, nor in the Oxford Text Archive, but at the University of Virginia’s American Studies’ [hypertext collection][xroads]. There I found a [hypertext version of _Mules and Men_ put together by Laura Grand-Jean in 2001][lgj].
I am not yet at a point where I could deploy a `bash` script to `wget` or `curl` or something else the pages I needed, but since I decided to focus on only the folktales section of the book, the book’s first half, it wasn’t too much of a task to click on each page and then copy the text and paste it into a plain text document in my text editor, Textmate. For reference, I also copied and pasted the HTML in hopes that it might prove useful for getting certain kinds of texts out. That is, I had hopes of figuring out how to tell a piece of software to pull everything out between `
` tags. Unfortunately, Grand-Jean had used some non-standard `
` markup to handle the long blockquotes. I thought about doing some fancy find and replace work with regular expressions, but in the end I decided I would rather work with the plain text, which would also encourage (force) me to re-read the text. The latter proved useful as I came across some long texts embedded in dialogue that were worth including in the extracted corpus.
(The plain text version of Part One of _Mules and Men_ can be found both on [Scribd] as well as on [GitHub] — forked critical editions of texts is an interesting idea, no? It weighs in at 55,798 words in 2,127 lines — somewhere along the way I’ll put up some stats on word counts for block quoted text, quoted text, narrative text, etc.)
## And Now for Some Software ##
So I’ve got a digitized text. An ethnographic text.[^2] That will give me people and forms, and I’m reasonably familiar with the kinds of speech communities involved that I can take a crack at ideas. Now I hope to use software to begin to discern those patterns more clearly. (And to produce that edge list.)
The first thing I try is SEASR’s [Meandre]. Meandre is really something like a software suite, consisting of server and client software, both of which you install and run locally. The server software syncs with the component and workflow repositories at SEASR HQ which are then made available to you through the workbench.
As a quick glance at the UI reveals, it’s not exactly user friendly. Then again, none of this software really is. The good folks running the seminar have provided us with links to useful software: Network Workbench, Wordij, and Pajek (which is, sigh, Windows-only). I am still working my way through these various packages, but I have to say that so far my best results have been using [IBM’s Many Eyes][ibm].
[^1]: The poet William Carlos Williams once advised in “A Sort of Song” to: “Let the snake wait under / his weed / and the writing / be of words, slow and quick, sharp / to strike, quiet to wait, / sleepless. / — through metaphor to reconcile / the people and the stones. / Compose. (No ideas / but in things) / Invent! / Saxifrage is my flower that splits / the rocks.” His famous urging to himself and other poets to find the ideas that already surrounded them in the world echoes the anthropological project of the twentieth century: to find the intelligence and beauty in the always already peopled world of the everyday. (My apologies to Williams for eliminating his line breaks but my software, `PHP Markdown Extra`, wasn’t handling a poem within a footnote at all well.)
[^2]: To be sure, I’m fully aware of the potential problems of Hurston’s text. For a fuller discussion, see my essay in _African American Review_ ([JSTOR](http://www.jstor.org/stable/1512231)).