More TEI Work

I am still working through the phylogenetic material both on Nouvelles Mythologie Comparée as well as the materials that Julien d’Huy sent me. Both d’Huy’s work as well as Tehrani’s work require better and more texts than I currently possess: my corpus of Louisiana legends weighs in at close to 30 in terms of oral texts and another two dozen or so literary texts. I need more.

And I need better texts. So far, I have been working mostly with just the texts — that is, no metadata of any kind — a process has revealed its limitations, more and more, over the past year. If I look at the kind of analyses that I find most compelling, Jahmid Tehrani’s study of Little Red Riding Hood, for example, and if I consider the road blocks I’ve encountered in my own work, I need to be able to mark up texts with a variety of analytical details that folklorists find useful: motifs (and/or plot points not currently motifs), locations, performers, etc.

TEI is the best way forward, but, if I haven’t said it before, it is not an intuitive markup. Where I would err on the side of brevity, <source>, TEI opts for something a bit more cumbersome, <sourceDesc>. I spent a good portion of the day working with both the tutorials and examples, as well as scanning the GitHub materials and some of the other forms of documentation.

I’ve begun to build a basic framework for the next set of materials that I am folding into the project: Gerard Hurley’s 1947 survey of American treasure legendry, “Buried Treasure Tales in America”. The last part of the essay enumerates 102 tales, many of which were published in the Journal of American Folklore or Western Folklore — I am working on a complete bibliography of all this material, if anyone is interested, and I’m happy to share it, so long as everyone remembers it’s a work in progress.

What I spent much of today doing was copying and pasting texts out of the JSTOR PDFs of the JAF articles and wrapping TEI around them. Pasting OCR is never as straightforward as it sounds, so there was a fair amount of clean up done. I also normalized eye dialect so that should I want to run these texts through some scripts as is, I won’t have to deal with differences between “Ah” and “I” or “jes'” and “just”.

In addition to the texts and the bibliographic information, I also need to capture the page(s) on which the text appeared, but I don’t want that page number to be in the text itself, since it’s not at all important for analysis. My best guess is to include it under <sourceDesc> in the TEIheader. I also want to include other information in the source document, notes about collection and especially about tellers that might be useful to future users of these TEI documents, but sorting out where such things go is all of a trick. I did, however, finally determine how to embed location metadata in the TEIheader.

I know I need to go back and double-check on the adaptations of TEI by linguists and oral historians as I continue to move forward with a TEI for folklore studies. I know I know.

Why Folklorists Should Care about TEI

*Part 1 in a series of posts about [TEI and folklore studies][TEIfolk].*

We live, we are (probably too often) told, in a connected world. The internet, we are assured, has brought or will bring us all closer together. But such notions as connection and closeness are dependent upon actual relationships developing, and to do that we must use those two things to communicate. These are obvious things to folklorists, and yet we have been slow to take advantage of such a robust infrastructure as the internet to communicate in more than the usual ways: the exchange of PDFs or the submission of Word documents to journals. These are fine starts, but as anyone who has nurtured an essay or volume to publication knows, a lot gets left out.

Perhaps the most important thing that gets left out is all the material that we collect and record but do not have room for in the slim space of pages. This material, however, was not only useful in the development of our own thinking, but it also has far wider use potential: other folklorists could use it to teach or to develop their own research projects or the people themselves could use it for education or introspection or even simply a sense of acknowledgement that they exist and have something to add to the larger archeological record of humankind.

How to format this record has remained a puzzle for folklorists, who have engaged in robust conversations over the possible categories of human expressivity, over the uses of such expression, and how to transcode expression from one mode (e.g., spoken performance) to another mode (e.g., written). While the internet makes it possible to upload audio, video, and image files in addition to texts, it is not always the case that others can readily download such materials, and there remains the question of having downloaded the materials, are they able to view them, use them.

Matters having to do with audio, video, and image files we must leave to a longer, more comprehensive sorting out, but there exists today a format for capturing verbal materials in a written form that can encompass not only the words themselves, but the rich complexities of spoken discourse. Moreover, the format is also capable of embedding within a text a wide variety of analytical information–including, yes, type and motif numbers as well as the location, date, and nature of an event, such that folklorists can rest assured that users on the other end are receiving the fullest sense of the original that text can make possible.

[TEI][tei], as the format for the [Text Encoding Initiative][tei] has come to be called, has emerged as the foundation for any number of humanistic endeavors. It lies, for example, at the heart of the [Perseus Digital Library][], which is now the standard library for students of the Greco-Roman classics, amounting to 69 million words now. Its collections of Arabic, Germanic, Renaissance, and nineteenth-century American materials are equally stunning not only in terms of amount, but also in terms of accessibility and usability: users are, in fact, encouraged to download materials and add their own annotations. The [Oxford Text Archive][] was, like the Perseus Library Project, also a pioneer in the use of TEI, and its use of the format has meant that literary scholars and linguists are often using the same materials but for their own research agendas.

The current problem for humanistic research is that the texts available have largely been contributed by the disciplines of linguistics and literary studies, which means that the texts from which conclusions are being drawn are either sentences and utterances of a few to a few dozen words or texts of thousands upon thousands of worlds. The meaningful middle is missing. Folklorists of course specialize in this “middle” range of texts. From highly-structured short texts like proverbs, to interactionally-complex legends, to flexibly-organized narratives like myths or tales, folklorists have long recorded, transcribed, annotated, analyzed, and shared such materials, reminding the larger scientific and scholastic community of the importance of such texts and the social worlds which they help to create.

It’s time then for folklorists to join the emergent social world of interactional scholarship, whereby our materials are widely available and accessible not only for fellow folklorists to appreciate and use but also for other scholars and scientists. In doing so, in establishing ourselves as the proverbial “middle men” we will continue to maintain the importance of folklore studies to the understanding of what it means to be human.

In the posts that follow in this series, which I am tagging as [TEIfolk][] so that one click will get you all the posts at once, I hope to air out some of the work I have been doing this summer, as I try to advance thinking about *things digital* in my disciplinary home.

*Please feel free to circulate this post, and those that follow, widely. I will gladly accept any, and all, feedback. I am going to make mistakes; I am going to leave obvious things out, revealing my ignorance.*

[Perseus Digital Library]:
[Oxford Text Archive]:

My note to myself this morning is: Spend two hours going through TEI materials and draft 500 words of intro for folklorists. My goal here is to write something I can use in an essay for a folklore journal like JAF, something I can use in a text on textual analysis / text mining for humanists, something I can use in working with the International Society for Contemporary Legend Research, and something I can use in talking with the American Folklife Center and the American Folklore Society to begin to develop a TEI standard for folklore studies.

TEI for Folklore

As Elisa walked me through her TEI-encoded documents, and showed me the XSLT she uses to transform the TEI encoding into network files, I realized that I needed to start working on my own use of TEI. A quick search *ye olde web* for “TEI folklore” turned up … not much.

Two things occur to me: First, this represents an opportunity to be involved in getting TEI up and running in folklore studies, and, second, I need to start collecting useful links:

* So far, it looks like [oral history][] is leading the way.
* The [MLA][] recently received a grant from the NEH to “to begin development of Humanities Commons Open Repository Exchange, or Humanities CORE. Humanities CORE will connect a library-quality repository for sharing, discovering, retrieving, and archiving digital work with Humanities Commons, a developing platform for collaboration among scholarly societies and other humanities organizations.”
* There are [seminars][] on TEI encoding.

**Please note**: if you know of already extant implementations of TEI in folklore studies, please let me know! I don’t want to re-invent the wheel. Drop me a note, if you can, and I’ll add links here, with credits for contributors. (Or we can do this somewhere else, if you like. G+?)

[oral history]:

TEI By Example

For those of us still struggling with all the complexities, riches!, of TEI, the Centre for Scholarly Editing and Document Studies[ctb] of the Royal Academy of Dutch Language and Literature, the Centre for Computing in the Humanities[cch] of King’s College London, and the Department of Information Studies[dis] of University College London, have created TEI by Example.