April 1 Is Backup Day

April 1 is international backup day, which seems like an odd day to choose. I think it would be better, if also equally unfortunate for those of who live in societies that celebrate April Fools, to mark it as open information, or open access, day. Today is the 200th birthday of Robert Bunsen, famous for his eponymous burner, which he chose not to patent and, in fact, pursued those who tried to patent it for themselves.

In celebration of open information day, I offer up this passage from Benjamin Franklin’s _Autobiography_ which details his refusal to patent the Franklin stove:

In order of time, I should have mentioned before, that having, in 1742, invented an open stove for the better warming of rooms, and at the same time saving fuel, as the fresh air admitted was warmed in entering, I made a present of the model to Mr. Robert Grace, one of my early friends, who, having an iron-furnace, found the casting of the plates for these stoves a profitable thing, as they were growing in demand.

To promote that demand, I wrote and published a pamphlet, entitled “An Account of the new-invented Pennsylvania Fireplaces; wherein their Construction and Manner of Operation is particularly explained; their Advantages above every other Method of warming Rooms demonstrated; and all Objections that have been raised against the Use of them answered and obviated,” etc.

This pamphlet had a good effect. Gov’r. Thomas was so pleas’d with the construction of this stove, as described in it, that he offered to give me a patent for the sole vending of them for a term of years; but I declin’d it from a principle which has ever weighed with me on such occasions, viz., That, as we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously.

An ironmonger in London however, assuming a good deal of my pamphlet, and working it up into his own, and making some small changes in the machine, which rather hurt its operation, got a patent for it there, and made, as I was told, a little fortune by it. And this is not the only instance of patents taken out for my inventions by others, tho’ not always with the same success, which I never contested, as having no desire of profiting by patents myself, and hating disputes. The use of these fireplaces in very many houses, both of this and the neighbouring colonies, has been, and is, a great saving of wood to the inhabitants. (From Franklin’s Autobiography.)

And I also note that my colleague Jason Jackson and the team at Open Folklore have exciting news of their own.

Structure and Interpretation of Computer Programs

The influential computer-science text _Structure and Interpretation of Computer Programs_ by Abelson, Sussman, and Sussman is [available on-line](http://mitpress.mit.edu/sicp/), along with a range of teaching aids. Go MIT Press!

I am considering using some parts of the text to, at least, introduce the idea of computing, into my seminar surveying the digital humanities. I know I want to focus on some basic tools, including perhaps some exposure to Python, and, yes, there is always John Zelle’s [_Python Programming: An Introduction to Computer Science_](http://mcsp.wartburg.edu/zelle/python/) — which is still available for download in its 2002 incarnation [here](http://citeseerx.ist.psu.edu/viewdoc/download?doi= (careful, that’s a link to a 1.3MB PDF), but it’s nice to have options and to be able to offer students different explanations for the same concepts. (I know I need it when it comes to some aspects of computer science.)

Here is the [table of contents](http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html).

Machine Learning for Human Memorization

A machine learning researcher, Danny Tarlow, has come up with a way to describe his problem in competitive scrabble in programming terms. [Here’s a link to the post][post], and here’s his rough description of the problem:

> As some of you know, I used to play Scrabble somewhat seriously. Most Tuesdays in middle school, I would go to the local scrabble club meetings and play 4 games against the best Scrabble players in the area (actually, it was usually 3 games, because the 4th game started past my bedtime). It’s not your family game of Scrabble: to begin to be competitive, you need to know all of the two letter words, most of the threes, and you need to have some familiarity with a few of the other high-priority lists (e.g., vowel dumps; short q, z, j, and x words; at least a few of the bingo stems). See here for a good starting point.

> Anyway, I recently went to the Toronto Scrabble Club meeting and had a great time. I think I’ll start going with more regularity. As a busy machine learning researcher, though, I don’t have the time or the mental capacity to memorize long lists of words anymore: for example, there are 972 legal three letter words and 3902 legal four letter words.

> So I’m looking for an alternative to memorization. Typically during play, there will be a board position that could yield a high-scoring word, but it requires that XXX or XXXX be a word. It would be very helpful if I could spend a minute or so of pen and paper computation time, then arrive at an answer like, “this is a word with 90% probability”. So what I really need is just a binary classifier that maps a word to probability of label “legal”.

> Problem description: In machine learning terms, it’s a somewhat unique problem (from what I can tell). We’re not trying to build a classifier that generalizes well, because the set of 3 (or 4) letter words is fixed: we have all inputs, and they’re all labeled. At first glance, you might think this is an easy problem, because we can just choose a model with high model capacity, overfit the training data, and be done. There’s no need for regularization if we don’t care about overfitting, right? Well, not exactly. By this logic, we should just use a nearest neighbors classifier; but in order for me to run a nearest neighbors algorithm in my head, I’d need to memorize the entire training set!

[post]: http://blog.smellthedata.com/2010/12/machine-learning-for-human-memorization.html

Windows Woes

At some point, I really do need to figure out why our Windows 7 machine won’t sleep. (Yes, we have a Windows machine: it’s in our kitchen as a place for our daughter to do occasional bits of homework that require a computer, for all of us to look things up, for cooking with Pandora playing, and, yes, for me to play the occasional game.) When I get around to figuring things out, I should probably start [here](http://www.windowsbbs.com/windows-7/87612-windows-7-wont-sleep.html).

The things you end up teaching yourself

One of the applications to which we were introduced at the NEH Institute on Networks and Networking in the Humanities — which goes by the hash tag nethums by the way — was a Carnegie-Mellon application called ORA. It and its companion application, AutoMap, are very useful tools for network analysis and visualization.

My difficulty with the applications was simply in getting them to run on my MacBook Pro. The problem was, is, that ORA, AutoMap, and their installers require an older version of Java than is included with Mac OS 10.6. With 10.6, Apple dropped the versions of Java 1.4 and 1.5 that they had been carrying and only provided 1.6. Java 1.4 is still available, but navigating Oracle’s site to get it, and getting it onto my MacBook was a longer road than I wanted to travel.

Now that I am back home, I got the good word that ORA had been updated. Great news! I headed over to the site only to learn that the Windows and Linux versions had been updated to version 2.2.2 but the Mac was still back at 1.6.9.


Two routes now lay open to me, if I wanted to run one of the newer versions on my Mac:

  1. Pick up a copy of VMWare Fusion or Parallels and run either Windows or Linux in a virtual machine, or
  2. Determine if there was a way to run the Linux application on Mac OS X (which is also a certified *nix now).

I had just spent a fair amount of money on corpus linguistics text — I’m working on refining a notion of “corpus folkloristics” — and so the idea of spending more money on virtualization software as well as for a copy of Windows is less than appealing. (I am already about to buy a copy of Windows 7 for our home desktop, but Microsoft offers now family pack the way Apple does, and so multiple copies of Windows is a little out of my price range for now.)

So, let’s go with the second option: run Linux apps on my Mac.

A page on Simple Help promised me a complete walkthrough of the process, the first step of which is getting Fink on my MacBook. (I had been using MacPorts before upgrading to 10.6, but the upgrade had broken it and so I was okay switching to Fink.)

Oops, no binary installer for 10.6. I was going to have to install it from source. Luckily, the Fink Project has a page up that walks you through installing from source. It does a pretty good job of getting you through everything, and it even tells you to run:


which would suggest to a command-line novice — I’m not quite a noob! — like me that, well, my path is going to be setup for me, which makes it all the more maddening when you enter:

fink selfupdate

and get the command not recognized response. Uh oh. And so I double-checked my PATH environment:

echo $PATH

and got all the usual suspects:


What’s going on? I closed the terminal and started doing some reading up on editing my PATH when I decided to double-check my work and ran fink selfupdate again. What do you know, it worked! Here’s the trick: I forgot to follow the directions and open a new terminal window after the initial installation.

And so I taught myself to follow directions.

Magic Is Now Here

Adobe’s John Nack posted the following video on his blog revealing a new “Context Aware” healing/deletion functionality in PhotoShop CS5. I don’t do that much with PS that I typically need to upgrade — I only went from CS1 to CS3 for the Intel compatibility — but this new functionality, no, this new *magic* is amazing: