Speaking of Legend Corpora

Working with these texts for my paper at this year’s meeting of ISCLR (International Society for Contemporary Legend Research), I remembered that I have an entire inbox dedicated to emails sent to me by friends and family that struck me as “net lore” (which is the name of the mailbox, by the way). I just checked and the archive reaches back to 2003. (And I think I have an older archive somewhere on disk.) My goal in the months to come is to find a way to slice the 56MB text file into individual text files that are appropriately named, perhaps by subject line and date. My guess, and it’s only a guess right now, is that making these files available in plain text, with something like the following filename as a primitive form of metadata is going to be the most efficient form of sharing:


I think I can figure out how to write a Python script to do that. While I know that a better set of metadata might include who the texts were from and the trace route for them, I am unwilling to imperil the privacy of my correspondents. Plus, I think most folklorists are going to be chiefly interested in the texts. (We’re still playing catch-up to the notion of social graphs. Sigh.)

Once I’ve got the collection put together, my best guess is that I will make it available through something like GitHub or BitBucket. Neither is really designed to support this kind of thing, but they are oriented towards public repositories and they do make forking projects very simple, and it would be interesting if researchers interested in this material, folklorists among them, could find some way to have projects remain connected in some fashion. Both GitHub and BitBucket make it possible to follow the chain of forked projects and also for users to “follow” those projects and make comments or even, fold those advances back into their own projects. (How cool would that be?)

In case you are wondering about the actual texts involved: they are an admixture of jokes and legendry. Some of the materials are quite topical (and racist):

It seems that once again,

all us white folks have missed

a great opportunity.

While all the black people attended

Obama’s inauguration and parades,

we should have broken into their homes

and gotten all our shit back.

And some of the materials, like the joke referenced in the file name above, have been around for quite some time on the internet and probably in oral circulation before that:

A man was riding his Harley along a California beach when suddenly the sky clouded above his head. In a booming voice, the Lord said, “Because you have tried to be faithful to me in all ways, I will grant you one wish.” The biker pulled over and said, “Build a bridge to Hawaii so I can ride over anytime I want.” The Lord said, “Your request is materialistic. Think of the enormous challenges for that kind of undertaking; the supports required to reach the bottom of the Pacific and the concrete and steel it would take! I can do it, but it is hard for me to justify your desire for worldly things. Take a little more time and think of something that could possibly help mankind The biker thought about it for a long time.

Finally, he said, “Lord, I wish that I, and all men, could understand our wives. I want to know how she feels inside, what she’s thinking, why she cries, what she means when she says nothing’s wrong, and how I can make a woman truly happy.”

The Lord replied, “You want 2 lanes or 4 on that bridge

(Please note that the period and the closing quotation mark are missing in the original.)

Any feedback on how to proceed is quite welcome.

Open Data Commons

Open Data Commons “is the home of a set of legal tools to help you provide and use Open Data.” They have a lovely write-up of why open data matters:

Why bother about openness and licensing for data? After all they don’t matter in themselves: what we really care about are things like the progress of human knowledge or the freedom to understand and share.

However, open data is crucial to progress on these more fundamental items. It’s crucial because open data is so much easier to break-up and recombine, to use and reuse. We therefore want people to have incentives to make their data open and for open data to be easily usable and reusable — i.e. for open data to form a ‘commons’.


Google, Microsoft, and Yahoo have gotten together to adapt a collection of microformats that will make it possible for folks who produce and publish content to the web to make searching that content more meaningful:

Most webmasters are familiar with HTML tags on their pages. Usually, HTML tags tell the browser how to display the information included in the tag. For example, <h1>Avatar</h1> tells the browser to display the text string “Avatar” in a heading 1 format. However, the HTML tag doesn’t give any information about what that text string means — “Avatar” could refer to the a hugely successful 3D movie, or it could refer to a type of profile picture—and this can make it more difficult for search engines to intelligently display relevant content to a user.

Schema.org provides a collection of shared vocabularies webmasters can use to mark up their pages in ways that can be understood by the major search engines: Google, Microsoft, and Yahoo!

You use the schema.org vocabulary, along with the microdata format, to add information to your HTML content. While the long term goal is to support a wider range of formats, the initial focus is on Microdata. This guide will help get you up to speed with microdata and schema.org, so that you can start adding markup to your web pages.

Using Lightroom

Photography is part of my research, and I also enjoy photographing my family and just generally documenting my world — more on that as my next potential project later. Between those various interests and commitments, I have about 15,000 images, all of which are safely cataloged by Adobe’s Lightroom. (I tried Aperture when it premiered at an unbelievable price point on the Mac App store, but either I have worked with Lightroom too long and couldn’t figure out how to access Aperture’s features or it doesn’t have the functionality on which I now depend that exists in Lightroom.)

I get a lot of questions about using Lightroom from students and colleagues. From now on, I am telling everyone to start here. That link takes you George Jardine’s website and the half-hour tutorial he recorded on the basics of image management with Lightroom.

If the tutorial convinces you to try Lightroom, then you should also read Rob Sylvan’s “10 Things I Wish I Could Tell Every Lightroom User.”

iPhone Tracker on GitHub

Apple’s latest update to iOS fixes the problem of making the location services cache easily available on your computer, but before you update, you might still enjoy seeing how much information about you is available. How widely available it is is a matter for a separate discussion.

I tried out the app on myself, just before I updated, to see what the results look like:

It’s pretty much what you expect: it shows that I live most of my life within Lafayette, where I live and work, and the city’s environs, where I do research. What I found interesting, since the app offers this data as an animated timeline, are the brief flowerings that occurred thanks to travel I have done over the past year.

Viewed within a historical perspective, and internally, this information raises no great concerns for me. Viewed from a chance to market to me I have some concerns. Viewed from a particularized and dynamic tracking of my movements … I don’t like it at all.

April 1 Is Backup Day

April 1 is international backup day, which seems like an odd day to choose. I think it would be better, if also equally unfortunate for those of who live in societies that celebrate April Fools, to mark it as open information, or open access, day. Today is the 200th birthday of Robert Bunsen, famous for his eponymous burner, which he chose not to patent and, in fact, pursued those who tried to patent it for themselves.

In celebration of open information day, I offer up this passage from Benjamin Franklin’s Autobiography which details his refusal to patent the Franklin stove:

In order of time, I should have mentioned before, that having, in 1742, invented an open stove for the better warming of rooms, and at the same time saving fuel, as the fresh air admitted was warmed in entering, I made a present of the model to Mr. Robert Grace, one of my early friends, who, having an iron-furnace, found the casting of the plates for these stoves a profitable thing, as they were growing in demand.

To promote that demand, I wrote and published a pamphlet, entitled “An Account of the new-invented Pennsylvania Fireplaces; wherein their Construction and Manner of Operation is particularly explained; their Advantages above every other Method of warming Rooms demonstrated; and all Objections that have been raised against the Use of them answered and obviated,” etc.

This pamphlet had a good effect. Gov’r. Thomas was so pleas’d with the construction of this stove, as described in it, that he offered to give me a patent for the sole vending of them for a term of years; but I declin’d it from a principle which has ever weighed with me on such occasions, viz., That, as we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously.

An ironmonger in London however, assuming a good deal of my pamphlet, and working it up into his own, and making some small changes in the machine, which rather hurt its operation, got a patent for it there, and made, as I was told, a little fortune by it. And this is not the only instance of patents taken out for my inventions by others, tho’ not always with the same success, which I never contested, as having no desire of profiting by patents myself, and hating disputes. The use of these fireplaces in very many houses, both of this and the neighbouring colonies, has been, and is, a great saving of wood to the inhabitants. (From Franklin’s Autobiography.)

And I also note that my colleague Jason Jackson and the team at Open Folklore have exciting news of their own.

Another Graffiti Blog/Database

Latrinalia — the writing on the walls of bathrooms — and graffiti have been studied by folklorists for quite some time. It’s refreshing to see folks not only collecting material but also attempting to publish it in some fashion as they collect it. My friend and colleague Quinn Dombrowksi was the first person I know to do so, and now I just ran across Graffiti on Grounds, an “archive of writing scratched and scrawled around the campus of the University of Virginia.” The great thing about GoG is that clicking on an individual item gets you a single page which has Dublin Core metadata and “Graffiti Item Type” metadata. If there was a “This Week in the Humanities” program, I would like to do a show on this.

More Dropbox Goodness

I wish all services, and even a lot of applications, were as good as Dropbox. I turned the participants in my digital humanities seminar onto it, and, if I had done nothing else, I think that alone would have made the class for some of them. None of them hauls around a USB drive anymore. They have made sharing Dropbox files and folders part of how they work: it’s been amazing to watch.

If you haven’t tried it out, do. 2GB of storage is free. I have a slightly larger account, 10GB for $10 a month. I keep my home and office files synced via DropBox, and I also access PDFs and other files in GoodReader (iPad) via DB.

If you try it and like it, feel free to use my referral code. We both get an extra 250MB for free.

Once you are up and running, head over to AppStorm and read their “Ultimate Dropbox Toolkit and Guide” (link to post).

Reading on an iPad during/as Prime Time

Apps like Read It Later do collect interesting kinds of data from their users. Interesting in the aggregate: it would appear that one of the things that iPad users are doing is spending their evening hours on the couch not watching television but reading. (Or perhaps both.) There are a variety of cool graphs and charts at the link.