Personality Type(s)

Just to see where, or what, I am now, I took a redacted version of the Myers-Briggs Inventory, the one available over at [HumanMetrics]( I think I scored pretty close to what I scored on the complete inventory when I took it back in the late 90s: [INFP](

* Introverted (11%)
* iNtuitive (38%)
* Feeling (12%)
* Perceiving (11%)

Maybe I was INTJ last time, but I was equally borderline. The only strong tendency here is **intuitive**, which I think was also the case a decade and a half ago.

So, apparently, sensing is out.

Find out for yourself.


[The Historian’s Macroscope](, a self-described “experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart” is well worth a read. For now, its focus is on topic modeling and network studies. Combine that with William Turkle’s and Adam Crymble’s [The Programming Historian]( and you’ve got a reasonably good foundation for getting started.

Middling Data

I’ve been enjoying working through Matthew Jockers’ [Text Analysis with R for Students of Literature]( and following the various discussions about topic modeling and other approaches to “big data” in the humanities on Twitter (and elsewhere — and I really do wish there was more of the elsewhere — more on this in a moment). At the same time, I am, some would argue desperately, trying to teach myself not only the Python language, and to learn the basic terms of computer science but also trying to get a basic grasp of the statistics that lies behind so much of this work.

I do so because not only do these realms fascinate me and, I think, have real possibilities for studying the kinds of texts that I like to study but I would also like to be part of that larger conversation about what dimensions of statistics are useful, and what are not, that the digital humanities will eventually have to have as the “digital” falls away. We will at some point get past the initial, and very exciting, phase of experimentation and grabbing at all the shiny toys, and begin to synthesize these experiments into the ongoing development of the continuum of work that stretches from the humanities to the human sciences.

Folklore and anthropology have long been the kissing cousins on either side of the perceived divide between those two orders, and I am fascinated, in watching the adaptation/adoption of corpus linguistic methods, often linked with information science and various forms of artificial intelligence, with the jump from sentences, or huge gatherings of sentences into things like corpora, to novels.

These is, I think, a middle ground. It’s not the “small date” of the old humanities, nor yet the “big data” which is our current fascination, but something more like middling amounts of data. *Medium data*? (That sounds better than *middling*, but it does suggest a statistical process, no?)

*Middling data* for now, I think.

I am using it to describe the 50 some-odd legend texts I have that range in size from around 100 words to over 1000 words. This size of texts is, in itself, a kind of middle ground between short texts like proverbs and longer texts like myths. (Some oral histories I have collected tend to fall on the shorter end of this range, as well as a number of personal anecdotes, which only means that we have a lot of counting to do in folklore studies to begin to establish things like this. Easy peasy work and still terribly interesting — how many words does a given context require either to reinforce the current reality or to conjure up an alternate one?)

50 texts of 500 words doesn’t seem like too much, does it? (I’m going to go for the middle number of 500 here, just for the sake of argument.) Why that’s only 25,000 words, a long-ish short story from a literary scholar’s point of view. But 50 distinct texts begins to stretch the boundaries of working memory for most human analysts, and certainly as that number grows, one begins to require alternative means of “holding” the texts in some sort of analytical space.

Of course, as the number grows, one needs to effect some kind of compression somewhere in the process. Where and how is why we need statistical reasoning to better inform how we proceed. (Sorry for the surfeit of adverbs there.) And I do love the kinds of things that topic modeling can do, as well as other forms of statistical analyses. Certainly achieving semi-accurate results with a minimum of failures and making effective use of available computational resources is of interest to computer scientists, but I don’t, at this point, particularly care about such things. Rather, I am interested in those forms of manipulation which let me explore a collection of material(s) — perhaps formally organized enough to begin to be something like a corpus but perhaps not.

This middle ground is the ground I want to work for the foreseeable future. It will let me explore the computational and statistical possibilities from within a territory that I can still attempt to grok using old-fashioned, dare I say “analog”?, methods methods. It’s this kind of middle ground work that made Moretti’s _Graphs, Maps, and Trees_ so compelling. (And he seems to have a distinct preference for working with middling data, if I read other essays and understand other talks he has given correctly.)

*Middling* data is a terrible name to be sure, but like the “middling” domains of folklore studies and cultural anthropology, domains often viewed from a certain askance perspective by practitioners in domains more central to either side of the divide between the humanities and the human sciences, I think that there are some terribly productive tensions to be more clearly articulated and discussed.

Then again, I would think that, wouldn’t I?

Comprehension in a Year

My goal is to be able to understand the software description below in a year’s time:

> The Altmann-Fitter is an interactive software for the iterative fitting of univariate discrete probability distributions to frequency data. It uses the Nelder-Mead Simplex Algorithm.
> In its present version it contains about 200 distributions and is one of the most voluminous collections of distributions at all. It aims at the analysis of data from all empirical domains, e.g. biology, economy, sociology, meteorology, ecology, linguistics, literary science, communication, technical sciences and production. It is indispensable for practitioners.
> Fitting is automatic, i.e. no initial estimators are necessary, and it improves iteratively. The goodness-of-fit test is performed by means of the chi-square test. A number of options and configurations enables the user to flexibly process data.
> Altmann Fitter runs under all Microsoft Windows® versions since Windows XP® and including Windows 8®. For best performance, the computer should be equipped with at least 512 MB of RAM. Different graphical outputs are available.

> Visit our web-site: (here you find a demo version and an user guide – free download).

Right now, I don’t. I understand, I think, the first sentence, but after that … not so much. This only confirms for me that statistics is foundational to being an effective part of the larger discussion.


Somewhere there was quite the controversy over the unmasking of J. K. Rowling writing under a different pen name. In that controversy, stylistics / stylometrics, and their advanced development in the era of computation, have received some coverage in very facets of the media machine that now seems so diverse and so distributed (thanks in no small part to later stages of the era of computation). And so I guess it shouldn’t be any surprise that the [_New Republic_ reports][] on software that uses the pattern discernment of stylistics to “undo” an author’s style, “anonymizing” them.

I mark the transforming verb because I am struck by how much of what remains in the anonymized examples provided by the article. A number of lead passages from Fitzgerald, Tolstoy, Dickens, Eliot, and others are provided. The Dickens paragraph is from the beginning of _A Tale of Two Cities_ and you simply can’t undo it. Perhaps the Fitzgerald one might work–I confess that I remember the novel less well. But the Tolstoy is an example of how things happen — and did no one take into consideration that the Tolstoy is actually already transformed by translation?

_Anna Karenina_ as translated:

> Happy families are all alike; every unhappy family is unhappy in its own way. Everything was in confusion in the Oblonskys’ house. The wife had discovered that the husband was carrying on an intrigue with a French girl, who had been a governess in their family, and she had announced to her husband that she could not go on living in the same house with him. This position of affairs had now lasted three days, and not only the husband and wife themselves, but all the members of their family and household, were painfully conscious of it.

Now, as anonymized:

> Happy families are all alike. And, every family that isn’t happy, is unhappy in its own way. The Oblonskys’ house was in turmoil. The wife/mother discovered her husband had been having a passionate relationship with a French girl–who used to be a governess in their family. She announced to her husband that she couldn’t continue living with him. This unpleasant situation existed for three days—and not only were the husband and wife themselves aware of the tension of the situation, but the entire family/household was troubled by the situation.

There are so many dimensions of language that can’t be undone, at least not in this stage of anti-stylistics. I think a human anonymizer would have gone much further. Perhaps computational methods will get there, but it’s not there. Yet.

[_New Republic_ reports]:

Using Less Data

I would love to understand the post on _Machined Learnings_ on [Using Less Data][], but I don’t. Damn you, Scott Weingart, for making my limitations so obvious.

[Using Less Data]:

UVa Scholars Lab Position

The Scholars’ Lab at the University of Virginia Library is [looking for a new director][]. I *sooo* would like that job. I especially like that they have a notion that the position itself should allow for research opportunities.

[looking for a new director]:

Diverse Digital Humanities Post Roundup

David Golumbia has a pair of posts tackling, first, [in-group and out-group dynamics][dynamics] that plague the digital humanities community, and, second, the [role of tools and tool use][tools] in definitions of what is, and is not, work in the digital humanities. Both posts have already attracted a fair amount of discussion, but the latter post is great because it highlights the work of my collaborator Jonathan Goodwin in trying to think about the relationship of topic modeling methods and outputs and humanistic forms of inquiry.

Early in the essay, Golumbia references his own experience and background in computational linguistics. I wonder if that couldn’t be the key to the wobbly nomenclature that everyone seems to be struggling to stabilize and/or right? That is, before there was the digital humanities, there was computational humanities. In an effort to be more inclusive, and because many of the practitioners were interested in it, digital media got folded in and, *presto*, take the adjective from one and the noun from the other and you’ve got *digital humanities*. Only the practitioners of the two already very diverse fields of inquiry, along with those interested in the digital arts, just don’t speak the same language.

This is my very impressionistic view of the history. I could be completely off base. And I wouldn’t mind ignoring them altogether, except as Katharine Harris points out in her comments: some of the definitions offered up not only don’t help but also harm.

Then there was the whole [Dark Side of the Humanities][dark] panel at MLA that I missed. That’s what I get for not going and not following Twitter during such things.


Decoding Old Writing

The [BBC writes][]: “The world’s oldest undeciphered writing system, which has so far defied attempts to uncover its 5,000-year-old secrets, could be about to be decoded by Oxford University academics.” The rest of the article reveals something a little less dramatic and a lot more scholastic, but, interestingly, the post is under the Business section of the site. Curious.

[BBC writes]:

Describing the “Digital Humanities”

As I have noted before, digital humanities is the merging of two areas of inquiry/activity that were once fairly distinct, humanities computing (also sometimes known as computational humanities) and digital media production. How and why the two got merged remains a larger history for someone else to unfold, but the folding appears to have occurred because they both take advantage of the immense processing powers of computers as both analytical devices, media production tools, and publishing platforms. A recent post on _The Humanist_ by Sheffield University for their biennial Digital Humanities Congress has a nice formulation of all this:

> Digital humanities is understood to mean the use of technology
within arts, heritage and humanities research as both a method of inquiry
and a means of dissemination. As such, proposals related to all disciplines
within the arts, humanities and heritage domains are welcome.

_The Humanist_ on-line seminar/mailing list archive is [here](

McCarty’s Guide to Digital Humanities

Somewhat edited. I imagine it might serve as a building block for a list of my own making. More importantly, it serves as a decent lens onto what one of the major figures in the field considers “central.”

### Articles, books, edited collections

* McCarty, Willard. 2005. Humanities Computing. Basingstoke: Palgrave.
* McGann, Jerome, ed. 2010. Online Humanities Scholarship: The Shape of Things to Come. Houston TX: Connexions.
* Schreibman, Susan, Ray Siemens and John Unsworth, eds. 2004. A Companion to Digital Humanities. Oxford: Blackwell.
* Williams, Raymond. 2003/1974. Television: Technology and Cultural Form. Ed. Ederyn Williams. London: Routledge.

### Journals

* Digital Humanities Quarterly (
* Literary and Linguistic Computing (

### Conversations

* Humanist (
* Blogs & Twitter: google for “blogs digital humanities”; “twitter digital humanities” or substitute the name of your primary discipline. Many fellow students and scholars are eager to help. Talk to them.

### Guides &c

Experiment by googling for whatever subject-area interests you, e.g. “digital history”; for courses, e.g. “digital humanities syllabus”.

### Organizations

Alliance of Digital Humanities Organizations (

### Updates

* Cohen, Dan and Scheinfeldt, Tom. *Hacking the Academy*. (University of
Michigan Press, 2011) [web; print forthcoming]
* Gold, Matthew K. *Debates in the Digital Humanities *(University of
Minnesota Press, 2012)
* Fitzpatrick, Kathleen. *Planned Obsolescence* (NYU Press, 2011)
* Kirschenbaum, Matthew. *Mechanisms: New Media and the Forensic
Imagination *(MIT Press, 2007)
* Nowviskie, Bethany.* alt-academy*. (MediaCommons, 2011) [web]
* Ramsay, Steve. *Reading Machines: Towards an Algorithmic Criticism* (University
of Illinois Press, 2011)

Forthcoming titles to look out for:

* David Berry’s *Understanding Digital Humanities* (Palgrave, 2012)
* Katherine Hayles, *How We Think: Digital Media and Contemporary
Technogenesis* (Chicago UP, 2012)
* Matthew Jockers, *Macroanalysis: Methods for Digital Literary History *(Illinois, forthcoming).

Speaking of his work for the _Index Thomisticus_, Robert Busa noted:

> If I consider the vast amount of human work demanded by processing
texts of this size in this way, I think that such initiatives are
better based on a strongly systemized team, supported by an
institution able to keep alive its efficiency for decades. (117)

Busa, Robert. 1970. Computer processing of over ten million words: Retrospective
Criticism. In _The Computer in Literary and Linguistic Studies_
(Proceedings of the Third International Symposium), 114-117. Ed. Alan Jones and
R. F. Churchhouse. University of Wales Press.

[Evaluating Digital Work for Tenure and Promotion: A Workshop for Evaluators and Candidates at the 2012 MLA Convention ]( Good to see that the title of the session is longer than the URL.