Speaking of Legend Corpora

Working with these texts for my paper at this year’s meeting of ISCLR (International Society for Contemporary Legend Research), I remembered that I have an entire inbox dedicated to emails sent to me by friends and family that struck me as “net lore” (which is the name of the mailbox, by the way). I just checked and the archive reaches back to 2003. (And I think I have an older archive somewhere on disk.) My goal in the months to come is to find a way to slice the 56MB text file into individual text files that are appropriately named, perhaps by subject line and date. My guess, and it’s only a guess right now, is that making these files available in plain text, with something like the following filename as a primitive form of metadata is going to be the most efficient form of sharing:

2013-04-18-A-Bridge-To-Hawaii.txt

I think I can figure out how to write a Python script to do that. While I know that a better set of metadata might include who the texts were from and the trace route for them, I am unwilling to imperil the privacy of my correspondents. Plus, I think most folklorists are going to be chiefly interested in the texts. (We’re still playing catch-up to the notion of social graphs. Sigh.)

Once I’ve got the collection put together, my best guess is that I will make it available through something like GitHub or BitBucket. Neither is really designed to support this kind of thing, but they are oriented towards public repositories and they do make forking projects very simple, and it would be interesting if researchers interested in this material, folklorists among them, could find some way to have projects remain connected in some fashion. Both GitHub and BitBucket make it possible to follow the chain of forked projects and also for users to “follow” those projects and make comments or even, fold those advances back into their own projects. (How cool would that be?)

In case you are wondering about the actual texts involved: they are an admixture of jokes and legendry. Some of the materials are quite topical (and racist):

It seems that once again,

all us white folks have missed

a great opportunity.

While all the black people attended

Obama’s inauguration and parades,

we should have broken into their homes

and gotten all our shit back.

And some of the materials, like the joke referenced in the file name above, have been around for quite some time on the internet and probably in oral circulation before that:

A man was riding his Harley along a California beach when suddenly the sky clouded above his head. In a booming voice, the Lord said, “Because you have tried to be faithful to me in all ways, I will grant you one wish.” The biker pulled over and said, “Build a bridge to Hawaii so I can ride over anytime I want.” The Lord said, “Your request is materialistic. Think of the enormous challenges for that kind of undertaking; the supports required to reach the bottom of the Pacific and the concrete and steel it would take! I can do it, but it is hard for me to justify your desire for worldly things. Take a little more time and think of something that could possibly help mankind The biker thought about it for a long time.

Finally, he said, “Lord, I wish that I, and all men, could understand our wives. I want to know how she feels inside, what she’s thinking, why she cries, what she means when she says nothing’s wrong, and how I can make a woman truly happy.”

The Lord replied, “You want 2 lanes or 4 on that bridge

(Please note that the period and the closing quotation mark are missing in the original.)

Any feedback on how to proceed is quite welcome.

Open Data Commons

Open Data Commons “is the home of a set of legal tools to help you provide and use Open Data.” They have a lovely write-up of why open data matters:

Why bother about openness and licensing for data? After all they don’t matter in themselves: what we really care about are things like the progress of human knowledge or the freedom to understand and share.

However, open data is crucial to progress on these more fundamental items. It’s crucial because open data is so much easier to break-up and recombine, to use and reuse. We therefore want people to have incentives to make their data open and for open data to be easily usable and reusable — i.e. for open data to form a ‘commons’.