More TEI Work

I am still working through the phylogenetic material both on Nouvelles Mythologie Comparée as well as the materials that Julien d’Huy sent me. Both d’Huy’s work as well as Tehrani’s work require better and more texts than I currently possess: my corpus of Louisiana legends weighs in at close to 30 in terms of oral texts and another two dozen or so literary texts. I need more.

And I need better texts. So far, I have been working mostly with just the texts — that is, no metadata of any kind — a process has revealed its limitations, more and more, over the past year. If I look at the kind of analyses that I find most compelling, Jahmid Tehrani’s study of Little Red Riding Hood, for example, and if I consider the road blocks I’ve encountered in my own work, I need to be able to mark up texts with a variety of analytical details that folklorists find useful: motifs (and/or plot points not currently motifs), locations, performers, etc.

TEI is the best way forward, but, if I haven’t said it before, it is not an intuitive markup. Where I would err on the side of brevity, <source>, TEI opts for something a bit more cumbersome, <sourceDesc>. I spent a good portion of the day working with both the tutorials and examples, as well as scanning the GitHub materials and some of the other forms of documentation.

I’ve begun to build a basic framework for the next set of materials that I am folding into the project: Gerard Hurley’s 1947 survey of American treasure legendry, “Buried Treasure Tales in America”. The last part of the essay enumerates 102 tales, many of which were published in the Journal of American Folklore or Western Folklore — I am working on a complete bibliography of all this material, if anyone is interested, and I’m happy to share it, so long as everyone remembers it’s a work in progress.

What I spent much of today doing was copying and pasting texts out of the JSTOR PDFs of the JAF articles and wrapping TEI around them. Pasting OCR is never as straightforward as it sounds, so there was a fair amount of clean up done. I also normalized eye dialect so that should I want to run these texts through some scripts as is, I won’t have to deal with differences between “Ah” and “I” or “jes'” and “just”.

In addition to the texts and the bibliographic information, I also need to capture the page(s) on which the text appeared, but I don’t want that page number to be in the text itself, since it’s not at all important for analysis. My best guess is to include it under <sourceDesc> in the TEIheader. I also want to include other information in the source document, notes about collection and especially about tellers that might be useful to future users of these TEI documents, but sorting out where such things go is all of a trick. I did, however, finally determine how to embed location metadata in the TEIheader.

I know I need to go back and double-check on the adaptations of TEI by linguists and oral historians as I continue to move forward with a TEI for folklore studies. I know I know.

