What is it about the digital humanities?

What is it about the digital humanities that attracts so much, so much … angst, anxiety, and/or vituperation? (Some of it well intended, some of it not.) As I’ve noted before, the digital humanities is an *ex post facto* label brazenly applied to a wide variety of activities: computational analyses of texts and other data normally the purview of the humanities, sometimes individually or sometimes on a scale not previously possible; the creation of new kinds of archives of such materials, paving the way either for traditional forms of analysis or for the new kinds of analysis just mentioned; or the creation of new kinds of texts, sometimes called the digital arts (or digital media). And even with this list I am surely leaving something, okay a lot, out.

Given the variety of activity, the situation that results is best likened to the fabled six blind men who encounter an elephant, wherein each can only know the part that they touch. (The fable has always bugged me, because of the lack of communication among the blind interlocutors, but let’s leave it at its original task: to remind its listeners/readers that knowledge is almost always partial.)

The current tempest in the popular teapot is Adam Kirsch’s book review essay, “Technology Is Taking Over English Departments: The False Promise of the Digital Humanities” appearing at the New Republic. ([Link][]. Note the telling addition of “limits” in the essay’s URL: the NR’s editors are laying it on thickly.)

Ignoring obvious false starts, like the fact that two mathematicians, Erez Aiden and Jean-Baptiste Michel are featured early in the essay or that it’s too easy to give prolegomena and provocations too much weight in such considerations, Kirsch does grasp at least that “the field has no common essence: it is not a species but at best a genus, comprising a wide range of activities that have little relationship with one another.” He also foregrounds some of the essential difficulties that the digital humanities face, in actually bringing to the fore some of the difficulties that the humanities themselves have ignored.

One of those difficulties is, as many already know, the demise of the scholarly monograph, whose history is a lot more complicated than most realize, its roots being in the change in the way publishing companies were taxed on inventory (which is to say they were) in the eighties and then the way libraries were funded (which is to say *not*) in the nineties. The internet offered a place to publish, to communicate, and some humanists experimented not only with the new medium not only as a place to publish conventional materials, but to try out new kinds of genres, genres previously impossible in a communicative infrastructure based solely on codices. This desire to make things for ourselves has been in parallel with a host of other interests in “making” that has arisen in an era where devices and machines are increasingly sealed “for protection” and/or roped off by vague claims to IP entitlements. As Kirsch notes, “Like many questions in digital humanities, this one remains open. But the basic emphasis on teamwork and building, as opposed to solitary intellection, is common to all stripes of digital humanists.”

It’s really when it comes to what new kinds of analyses the computational turn in the humanities might make possible, that Kirsch reveals a real blindness, assuming that you think as I do that some of the above is insightful in its own fashion. Taking Moretti’s “Style, Inc.” as indicative of the larger field of computation, Kirsch notes: “It is striking that digital tools, no matter how powerful, are themselves incapable of generating significant new ideas about the subject matter of humanistic study. They aggregate data, and they reveal patterns in the data, but to know what kinds of questions to ask about the data and its patterns requires a reader who is already well-versed in literature.”

A reader could substitute any domain expertise for *literature*: history, folkloristics, rhetoric, linguistics, etc. And so the question really becomes: what exactly is Kirsch’s complaint? That the digital humanities still think domain expertise is important? Central? Critical to the application of computational technologies and techniques? He returns to Aiden and Michel for his discussion, which only proves the point: both are mathematicians with little to no domain expertise in the humanities. Of course many of their grander claims are rather thin. (I watched several audience members at the Texas Digital Humanities Conference try to get Aiden to think about his impoverished understanding of human history and language use, but he just doesn’t get it.)

Where does all of this take Kirsch? Well, he muses, quantification is what has gotten the humanities *into* trouble — the corporatization of the American university, wherein corporatization refers to the bureaucratic impulse to quantify things, like education — and so it should be the role of the humanities to resist quantification. *Really?* Is this the best answer? The only answer? Isn’t it also the job (and the switch from *role* to *job* here is purposeful) of the humanities to critique, to lay bare the apparatus by which certain phenomena appear and forces work? Previously to this moment, and currently in the so-called “traditional” humanities, the humanities have largely responded to the quantification of everything with simply “not everything can be quantified.” Which is rather like the childhood response “Is not!” That is, in a world where a new field called *social physics* cranks out social network analyses of myths, the opportunity arises to respond by being better at it than the physicists.

And that’s what some in the digital humanities aspire to do. In the process, some are also banking on the notion that there may actually arise refinements if not wholesale revisions of methodologies that can only come from not only treating the kinds of materials that have long been the purview of the humanities but also by incorporating humanistic theories and forms of theorizing.

**See also**: [Alan Jacobs’ response](http://text-patterns.thenewatlantis.com/2014/05/my-response-to-adam-kirsch.html) — I’m very jealous of his site name, *text patterns*. So good. [Ted Underwood](http://tedunderwood.com/2014/05/03/you-cant-govern-reception/) says “you can’t govern reception.” [Gary Hall](http://www.garyhall.info/journal/2011/1/12/on-the-limits-of-openness-v-there-are-no-digital-humanities.html) argues that the humanities have long been undertaking computation. And, speaking of the corporatization of the university, aka *scientific management*, [Jill Lepore][] has a nice review of Matthew Stewart’s _ The Management Myth: Why the Experts Keep Getting It Wrong_.

*Revised* 19:30: because English is a stable language and deserves to be treated with more respect that the first draft. Also, there were some redundancies and excesses that needed trimming.

[Link]: http://www.newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch
[Jill Lepore]: http://www.newyorker.com/arts/critics/atlarge/2009/10/12/091012crat_atlarge_lepore?currentPage=all

Counting Things in Texts

*This is one of those posts that probably deserves a fuller version, something I might consider submitting to ProfHacker, but I’m in the middle of a bunch of other work right now, so it’s going to be shorter than I like.*

Two recent posts by non-scholars have used two practices[^1] that are emerging as conventions within the digital humanities: one is counting unique words to get a sense of vocabulary and the second is counting the number of times characters in a text appear together in scenes.

Matt Daniels counts words in [“The Largest Vocabulary in Hip Hop”][hh], and, thanks to an astute commenter, uncovers that at least one rapper purposefully minimizes his vocabulary in order to maximize sales: an interesting parallel here would be to examine political discourse of various public figures to see how appeals to the oft-lauded common man might be realized by vocabulary.

Ben Blatt counts the co-occurrences of characters in scenes in [“Which Friends on Friends Were the Closest Friends?”][ff]. Like Daniels, Blatt is upfront about his method: “To determine which characters shared scenes, I downloaded transcripts of all 236 episodes … If a character spoke a line in a scene, I marked him or her as present.” The results are interesting for those familiar with the show, but, as my wife noted, few undergraduate students would be familiar with _Friends_, but one could do this with a program with which they were familiar. Perhaps the most famous example of this kind of counting characters co-occurring in scenes — not to be confused with _Comedians in Cars Getting Coffee_ — is Franco Moretti’s [“”Network Theory, Plot Analysis.”][fm] (see below for conventional reference), wherein he uncovers that the compelling nature of _Hamlet_ may very well be that Hamlet and Claudius are both central characters, with Horation a close third — I saw Moretti give a version of this paper at the NEH seminar on network studies in the humanities organized by Tim Tangherlini at UCLA’s IPAM in 2010. (Oh, the debt I owe to Tangherlini!)


Moretti, Franco. 2011. Network Theory, Plot Analysis. _New Left Review_ 68: 80–102.

[^1]: There is actually a name for this, but I can’t think of it at the moment. *Method*? More coffee is needed….

[hh]: http://rappers.mdaniels.com.s3-website-us-east-1.amazonaws.com/
[ff]: http://www.slate.com/articles/arts/culturebox/2014/05/friends_chandler_joey_ross_rachel_monica_phoebe_which_friends_were_closest.html
[fm]: http://www.newleftreview.org/?page=article&view=2887

Opening Scholarship

I think Caleb McDaniel has it [right][], when he considers what it might mean for scholars to work “in the open”: publishing their notes as they make them. He raises all the right opportunities and the right dangers, and I like the idea of using version control for a backend. I would like to compare his use of GitHub and Gitit with what Graham, Milligan, and Weingart are using for [The Historian’s Macroscope][].

[right]: http://wcm1.web.rice.edu/open-notebook-history.html
[The Historian’s Macroscope]: http://www.themacroscope.org


The [list of courses][] for the Digital Humanities Summer Institute is amazing. I would love to be able to go to just one of these courses — and even better to go to more. Unfortunately, with salaries frozen for 8 years and absolutely no sense of professional development for faculty at my university, I will mostly have to make do with MOOCs and catching what training I can. Still, one can dream… and, perhaps more importantly, encourage others to *seek out and seize these opportunities!*

[list of courses]: http://www.dhsi.org/courses.php

The First Texas Digital Humanities Conference

I’m just back from the premier offering of the Texas Digital Humanities Conference, and I can’t tell you what a pleasure it was to have such a superb event held so close to home, especially since I won’t be able to make the big Digital Humanities meeting this summer (or next summer, for that matter, since things are unlikely to get better here any time soon). There’s more to write about than what I am posting here, but I wanted to post my notes and links for both my future reference and as part of the conference’s wider historical record: interested readers should also check out the conference’s Twitter stream, [#txdhc][], and Geoffrey Rockwell’s [notes][].

In addition to the notes below, I also want to particularly thank [Elisa Beshero-Bondar][] and Max for walking me through loading networks into Cytoscape.

### Geoffrey Rockwell

Parallel between art critical process — *integritas* (apprehaneding that thing according to its form), *consonantia* (synthesis which is logically and aesthetically permissable), *claritas* (see the thing as it is and no other thing). In text analysis: demarcation, analysis, synthesis.

Tufte suggested the usefulness of spark lines?


CHUM: _Computing Humanities_ (was important journal until 2005).

An interpretive thing, a *hermeneutica*, is like an architectural folly from the nineteenth century: there to prompt our own thinking. Not simulacra.

Text analysis works on surrogates, not the text itself, not as text as conventionally understood. Text as string.

Stephan Sinclair is his collaborator.

Predecessor: HyperBow.

Relationship to bricolage? Embroidery. Contribution by framing. Things for others to think through.

Voyant (http://voyant-tools.org) is downloadable and can be run locally.

Beta Yoyant tools are all R for analysis and D3 for visualization.

→ Ask GR about Smith PDF.

→ Contact John Smith about code and about being interviewed.

### Andrew Higgins

Philosophy has an ArXiv? http://philpapers.org/

Co-categorization of articles.

Modularity measure.

Philpapers –> Google Scholar (to scrape citation data).

Bowling Green has an index.

### Anne Chao

Chen Duxiu was the founder of the Chinese communist party. Begins with a social network created in Gephi: threshold was 3 interactions with Chen. [These kinds of faux network visualizations make me realize that having a logic for the layout is terribly important: why are nodes located where they are in the graph? what do the edges represent?]

Later connection with an individual influenced, or trained, by John Dewey.

### Minute Madness

CADOH: Corpus of American Discourses on Health

### Cameron Bruckner

Mike Jones at IU is doing work with topic modeling (of a kind) that takes into account the position of a word in an n-gram.

### Elisa Beshero-Bondar

EBB is interested in mapping poetic structures and ideas in network visualization in the work of Robert Southey. When she coded meta-places and places, she discovered tat the meta-places are necessary for the network to hold together. If they drop out, the network falls apart. Not the case for actual places. Makes sense: you need a cosmology in an epic poem. In-betweenness measures.

Need to know more about measures of centrality. Cf. Alexander Maida (on computer scientists and computational linguistics). Closeness centrality reveals the places that Southey talks about most often.

Startled by the difference between the eccentricity graph.

Shortest half-lengths.

→ KML vs ArcGIS mapping. Cytoscape does mapping.

EBB has students mapping Cook’s voyages. See http://pacific.pitt.edu.

### Kathryn Beebe

Medieval historians grapple not with big data but with small, even tiny, data.

Social networks in texts are very popular.

→ Tim Evans.

GR: “There are ways to metasatize your data, build it up quickly: look at what people say about these texts, at reception.”

### Tanya Clement

ARLO displays spectrograms that represents that amount of energy in each frequency band. Some genre detection. Code switching. Genre switching. (This is more information than wave forms, but it strikes me as an evolutionary improvement, not a revolutionary improvement: a comparison would reveal how different performances, different speakers intertwine frequency and dynamics — this must be the “energy” she was talking about.)

Cf. Shannon and Weaver.

→ Cf. Donald MacKay. 1969. _Information, Mechanics, ???_.

### Elijah Meeks

EM: “This is the first conference I’ve seen that specifically focuses on networks in the humanities.”

Working a book about programming D3,js.

* kindred.stanford.edu
* orbis.standford.edu

EM feels, like many in DH, like he is an impostor. But maybe the better term is interloper.

Interloper *par excellence*: Jared Diamond.

*Neotopology* refers to …

Mike Bostock (mbostock).

→ Anne Knowles. 2002. _Past Time, Past Place: GIS for History_. No volume yet for network visualization.

→ Willard McCarty. 2002. “Humanities Computing: Essential Problems, Experimental Practice.”

The *network turn* is taking place after the *spatial turn*: _Envisioning Landscape, Making World_; _Placing History_; _Spatial Humanities_.

Networks are really simple: it’s the annotation of a connection. E.g., a person is connected to another person, a person is connected to a document. N-partite networks.

A network is a view into your work as a view of the structure and not the components, a part of the process of operationalizing your understanding of the system.

Structure is important.

EM: “We need standards for interactivity.”

EM: “Any network is good as long as you declare the constraints that affected it.”

All of these can visualize a network dataset:

* Arc diagram
* Adjacency matrix (?).
* Force-directed layout.
* Radial layout.
* Donut charts.

You need to know what a random walk is, you need to know what centrality is; you need to understand how modularity detection works, that it returns a value and what that number means. → Learn network statistics.

Invent your own centrality measures. Authorial acts, not authoritative.

→ Understand topology: cool visualization of topoJSON.

See McCarthy’s description of a *trading zone* (2002).

A sloppy way of bundling together socio-physics, traffic analysis, etc.

→ Arts, Humanities, and Networks. (Conference organized by Max. Ebook out from MIT press.)

→ _Book of Trees_.

### Yannick Rochat

*Character-space* is that particular and charged encounter between an individual human personality and a determined space and position with the narrative as a whole, and *character-system* is the arrangement of multiple and differentiated character-spaces — differentiated configuration and manipulation of the human figure — into a unified narrative structure (Woloch 2003: 14).

First graph: occurrences of characters per page with chapter breaks and part breaks indicated.

Second graph: occurrences totaled for each of 12 chapters.

Centrality measures: *degree* rank, *betweenness* rank, *harmonic* rank, *eigenvector* rank.

Louvain [?] clustering in Gephi. (Eigenvector based.)

### Ayse Gursoy

Game criticism as it happens on-line: how discourse happens. (She’s using Google’s slideshow — and maybe EM was too?)

Critics are identifiable personae with many roles: critic, curator, and advocate.

The game _Dear Esther_, an interactive experience, led to debates about *game-ness*: “many discussions of “, “doing the rounds”, and “much has been written about.”

### Neal Audenaert

Collaborated with Nathalie Houston.

Started by calling attention to the difference pages of prose and pages of poetry and three different kinds of features that shape such things *bibliographic* features (paper, binding), *visual* features, and *linguistic* features.

Their research question: How to extract visual features? What are the research questions? How to present/interact with this information? How to analyze this information algorithmically?

Work bubbled out of a THATCamp at Rice a few years ago. Then an NEH StartUp grant. And now a HathiTrust grant.

Used Tesseract to extract page layout.

Nathalie’s questions:

* How long are the lines?
* What’s the spacing between the lines?
* How much text on a page?

Text per page.

Nice use of R with a trend line — like what JG set up.

[#txdhc]: https://twitter.com/search?q=%23txdhc
[notes]: http://www.philosophi.ca/pmwiki.php/Main/1stInauguralTexasDigitalHumanitiesConference
[Elisa Beshero-Bondar]: http://digitalromanticist.wordpress.com/

Model View Culture

The Feminist Technology Collective is now publishing [_Model View Culture_][]mvc], which has to be the greatest title ever, and one I wish I had thought of. It joins the interview with David Golumbia over at [Dichtung Digital][dd] in being a thoughtful critique from a theoretically-informed perspective on aspects of technology that have gone under-examined. (My thanks to [Jonathan Goodwin][jg] for the heads up on the Golumbia interview.)

[mvc]: http://modelviewculture.com/
[dd]: http://www.dichtung-digital.de/journal/nachste-nummer/?postID=2150
[jg]: http://www.jgoodwin.net/

MLA 2014 Session Notes

*Original plaintext notes (in markdown) are [here](https://gist.github.com/johnlaudun/8585068).*

## 98. Vulnerable Texts in Digital Literary Studies

### Jeremy Douglass

* Subject is _Meanwhile_, the fiction that began as a poster, became a website, then a tabbed book, then an iOS app. 
* Branching narratives as networks.
* What is M’s status?
* poster – print – canvas UI
* hypertext – electronic – page UI
* tabbed book – print – page UI
* iOS app – electronic – canvas UI
* See also: Queneau’s branching narrative and “choose your own adventure” books. 

### Rachel Sullivan

* Subject is code comments. 
* References: Galloway’s _Protocol_ (2004), Hayles’ _My Mother Was a Computer_ (2005). See also work by Nick Montfort and Rita Raley (code surface || code depth). Adrian MacKenzie, _Cutting Code_ (2006).
* reader/user –> reader/user/programmer
* Bit rot, code bloat…
* Her examples: /ti-explorer/kernel/arrays.lisp
* Jeremy Douglass gave a paper at the 2011 Critical Code Studies conference: article published in _Vectors_ journal. 

### John David Zuern

* Subject is something he is calling “curatorial reading” which he is triangulating / differentiating from close/distant reading. 
* Stuart Moulthrop … Stephanie Strickland … (+?) … These are all examples of electronic literature being preserved by the ELO, Electronic Literature Organization.
* His pantheon of critics:  Stephen Best, Sharon Marcus, Heather Love.
* Flash fiction whose status is unknown “My Name Is Captain, Captain” (Judd Morrissey and Lori Talley): catalog of airshow maneuvers codes each maneuver with a particular symbol. Symbols used in text.
* Curators place objects along various historical and cultural axes.

### Q & A ###

* On code executability: different browsers interpret code differently –> computational environments change and code no longer executes –> a move to preserve computational environments (see Kirschenbaum’s article).

* * *

## 155. Literary Criticism at the Macroscale

### Andrew Piper ###

* Subject: the Wertherian exotext.
* _Sorrows of Werther_ was translated, imitated, etc.
* “post-mimetic” 
* Genette’s 5-part schema of textuality: intertext, paratext, metatext, hypertext, architext + exotext.
* Not looking for the Wertherian, which texts are most like _Werther_, but the Werthericity of these texts: which texts are most like each other. 
* “Scalar reading” 
* piperlab.mcgill.ca/pdfs/WertherEffect1.pdf
* English_324: words common to Werther – words common to all novels
* Veroni —> community detection —> nodes that have most links among themselves + nodes that are most “between” 
* [I want to make sure I understand the different between community detection and topic modeling.]

### Hoyt Brown ###

* —> literarynetworks.uchicago.edu: site for Global Literary Networks
* used naive Bayes out of the NLTK

### Underwood’s Commentary + Q & A ###

* TU: Hobbes’ description of the Leviathan.
* “fuzzy matching of 3, 4, 5, 7-grams”

* * *

## 233. Seeing with Numbers: Sociological and Macroanalytic Approaches to Literary Exclusion

### Andrew Goldstone

* -> andrewgoldstone.com/mla2014
* Seeking to answer Moretti’s question: “Who counts?” 200 canonized novels = 0.5% of novels published. (Moretti in audience.)
* “Let’s confront sublimity with computational methods.” (More striking when he said it.)
* -> John Guillory, _Cultural Capital_.
* Scholarly reading as a practice.
* Reception vs. consumption
– Book Scan
– Primary texts are really secondary (from units sold)
* -> [Has anybody looked sales v. influence? -> Maybe in the sociology of reading?]
* Methodolofy
– Down-weighted book chapters: collections skewed results.
– Gini co-efficient used to measure inequality
* Authors symbolically “rich” get “richer” as measured by operationalizing prestige.

### Richard So

* Modernism
– 12000 poems
– 2200 poets
– 10s periodicals
* re: poet: “Is there something in his literary genetic code that authorizes his exclusion?”

### Matthew Jockers

* He first began imagining a “macroeconomics of literature” in 2005.
* Reviewed distinction between macro as quant analysis and micro qual assessment. [Noted: the analysis / assessment distinction.]
* For Brown and So: “What are the literary primitives [from genetics] and what do they signal?”

### Q & A

* Relationship of BoW to style. Word choice is one answer. Others?
* Opporunity: 40,000 new fiction titles / year.
* -> Jim English, _Economies of Prestige_.
* Focus of a study by an audience member: _McSweeney’s_.

* * *

## What is Data in Literary Studies

Session speakers are in pairs:

* First pair: different modes of reading
* Second pair: ontology of literature vs ontology of data
* Third pair: literary data as conceptual resource, as a system

### First Pair

* Daniel Rosenberg on “data” (as rhetorical device).
* Versions of data (Foucault vs cybernetics [see Pickering]).
* Becdel test is algorithmic.

### Second Pair

* What isn’t data?
* Data in the context of topic modeling (machine learning): inventoried all the things that get dropped: function words, honorifics, proper nouns, dialect … and this doesn’t include stemming. By the time everything has gone through this scrubbing process, it is *not* literature. But it is data. … LDA does make particular assumptions about how texts get composed. (If those assumptions are not accurate, what does that mean for the utility of probabilistic modeling? I.e., reverse genesis.)
* -> _Raw Data Is an Oxymoron_.
* -> AAAS has visiting fellowships.

### Third Pair

* [Cutting edge in sociology of reading in this room: Latour and Goffman. E.g., Heather Love.]
* Literature as primary source *and* conceptual source.
* -> Reference to _Objectivity_: see also Cynthia Wall’s account

### Q & A

* Role (goal) of prediction? Literature as a model vs literature as a scenario.
* Johanna Drucker reference that I didn’t follow.
* When book history entered the academy 20-25 years ago, it allowed us to see new things.
* -> How does literature store information? The Homeric epic as technology of information storage and retrieval. [See Lord and Rubin.]
* Moretti: the Annalists turned to the archive to ask different questions about history; it is not yet clear what the archive will mean for literary history.

* * *

## Surface Reading

* “The Way We Read Now” was a special issue of _Representations_.

### Heather Love & Sharon Marcus

* _Reading Methods in Literary Studies_: Recent debates about reading have questioned the core methodological commitments of literary studies. This course will serve as an introduction to these debates through an exploration of topics including the exhaustion of critique, post-hermeneutic criticism, and the relation between interpretation and description. Many of our readings will situate new methods such as surface reading, just reading, distant reading, and reparative reading in the longer history of the discipline, exploring the links between these methods and New Criticism, New Historicism, and Marxist and psychoanalytic criticism. The course will also introduce students to some of the alternatives to…
* _DH Assignments_:
* Get set up on comment with Benito Cereno
* Scan assigned pages of Benito Cereno
* OCR — convert TIFF into .text and clean up txt file
* Comment on doc using Co-Ment
* Text analysis of BC (data mining and visualization) using Voyant, Tapor, Google Ngram, Word Hoard
* Read TEI guidelines, text body, Chapters 1-4, and select on additional chapter. Explore TEI tutorials.
* Comment on your pages of CS unsung Co-Ment
* Diagnose BC sympton
* When it came time to collaborate, they needed to develop a controlled vocabulary. They used a Google Doc to develop the CV.

### Ted Underwood

* “Computer scientists are alien objects stored somewhere else on campus, from the point of view of many humanists.” 
* CS is really philosophical: what does it mean to learn.
* Alan Liu’s essay from last summer. [> Published where? DHQ? PMLA?]
* Trying to model literary characters in the same fashion as topic models: the work is made difficult because character is more complex than it looks.
* Reading Todorov 1971.

### Alex Gill

* A theory built on top of the work of Jerome McGann: a theory about everything being surfaces. McGann’s work is on topology.
* Levenshtein distance is the minimum number of edits that it takes to turn one string of text into another. (Like the game that turns one word into another by changing only one character at a time?)
* E.g., Borges’ library.
* Thanks to computational methods, we are finally paying attention to the ways that texts relate to themselves.

### Questions & Answers

* Beware temporal compression, even elision, that occurs in so-called social networks of fictions. E.g., Moretti’s graphs of Shakespeare.
* Question of using tools to replicate what we already do by hand …  TEI becomes a form of reception history.
* Andy Stouper has an essay: the importance of libraries keeping old books for the marginalia. 
* Teaching DH class on Benito Cereno got Heather Love to re-read _S/Z_ and make her appreciate the text.
* TU prefers not to frame critical methods in opposition to each other.

Possible project: _Structuralism for Digital Humanists_.

* * *

## Making Sense of Big Data

### Anupam Basu ###

* He has a postdoc at Washington University; interested in machine learning and informatics. 
* Big data = terabytes. Compared to genome of fruit fly, humanities “big” is still small.
* EEBO: Early English Books Online: eebo.chadwyck.com
* Two categories of texts: those in ESTC (super set) and those digitized in EBOTC.
* Complexity is a relatively well-defined concept in computer science. [What is it?]
* Standardization of spelling occurs during the expansion of print during the English civil war, circa 1630. E.g., moue > move.
* Another use of Gini: this time to measure the variation of English orthography. 

### Mark Algee-Hewitt

* ECCO Archive: Eighteenth Century Collections Online.
* Now Bakhtin?! Look at “From the Prehistory of Novelistic Discourse”, “Epic and Novel” and “Discourse in the Novel.”
* _Bleak House_ in five chunks — how were the chunks chosen? — for PCA. [I think he’s using R.]
* Testing Bakhtin’s supposition that novelistic discourse will be more heterogeneic than poetry? (Not sure what MB actually claimed, but this is interesting.) Arrived at an “H score” (heteroglossia).
* Followed H score with a t-test that he box plotted.
* K-L divergence?
* Overall: fiction is more self-similar than poetry.
* Was Bakhtin wrong about competing discourses contained within the novel or is it simply the case that the heteroglossia take place at a different order / level of discourse? [How to quantify register?]

### Laurie Mandell

* Early Modern OCR Project (EMOP): http://emop.tamu.edu.
* Mellon Grant is on-line — it’s the hardest book she ever wrote.
* Goal is to make the entire TCP available. All 300,000 (128,000 EEBO + 182,000 ECCO).
* Using tesseract-ocr (developed by Google).
* 70% (OCR) accuracy is gauged as good enough by historians. But it’s not in our field.
* Go to 18thconnect.org: if you edit a text, you can get the text for free.
* TCP was done by sweat shop labor.

### Questions

* For Mark A-H: his work on Bakhtin highlights something that has been a consistent thread through a number of panels: how do we negotiate the move past texts as bags of words? That is, what are your thoughts on developing a computational identification of speech registers? > MAH has developed something but he wasn’t comfortable describing it.
* One of the things that seems interesting here is that you can develop an algorithmic infrastructure and then use it over many projects.
* _Ah hah!_ Mark is Piper’s collaborator. 

* * *

## 402. Beyond the Digital: Pattern Recognition and Interpretation

* Notes for talks are [on-line](http://ach.org/2013/12/30/methods-and-more-for-beyond-the-digital-at-mla-2014/).

* A series of 7-minute talks.
* Topic modeling as both a navigational tool and as an interpretive practice.
* See: [networkedcorpus.com](http://networkedcorpus.com/)
* Viral Texts draws from the LoC’s _Chronicling America_ archive of newspapers. (Holes in archive are due to states that have not contributed.)
* Nineteenth century newspapers look a lot like contemporary websites and Facebook. 
* * Newsy pieces moved fast, average life of 3 months.
* Literary pieces moved slow, average life of 5 years.
* Cordell’s grant to NEH: [here](https://securegrants.neh.gov/publicquery/main.aspx?f=1&gn=HD-51728-13).

Portable DH at MLA 2014

Why, yes, I should be finishing up my paper for MLA, and, yes, I got carried away with the simple task of making an ebook version of [Mark Sample’s incredibly generous compilation of all the digital humanities sessions at this year’s MLA meeting in Chicago][dh] … which starts tomorrow.

What I wanted was something more portable than the paper program, something that would fit on my iPhone or my Kindle, and so I came up with these three documents that you can feel free to load onto any/all of your devices:

First, there is the [EPUB][] version.

Next, there is, for those Kindle users out there, the [MOBI][] version. (I uploaded to my Kindle using the e-mail protocol, `username@kindle.com`, and it worked fine. The file looks good. I also downloaded it to the Kindle software on my iPhone: the pages start kinda low, which makes many of the sessions two pages. Sorry, but I’m still a noob when it comes to the niceties of EPUB formatting.)

I submitted the MOBI version as soon as possible to Amazon this morning, and so if you have any difficulties getting this uploaded yourself, check back here later Tuesday or Wednesday morning for the Amazon link. (It looks like, for now, it will have to be priced at 99¢ — Amazon doesn’t like free unless you go through a lengthy process of uploading elsewhere for free and then making them price match it. Believe me, I looked into it. My promise is to turn any money I see over to the ACH as a donation.)

**Now available on [Amazon][].**

There is always the trusty [PDF][] version.

Finally, if you are interested in how I did this: I copied and pasted the HTML from Mark’s site into a [Scrivener document][], and then I split the file, using CMD + K, until every session had its own document. I added folders for the relevant days as I went, and then I spent about half an hour playing with Scrivener’s export / compile options until I got something that seemed reasonably useful.

By the way, for those of you using WordPress, making EPUB and MOBI files available is a bit of a trick: the built-in uploader does not want to accept them. In order to get it to work, you need to add something like this to your `functions.php` file in your theme’s directory:

function custom_myme_types($mime_types){

//Adding avi extension
$mime_types[‘avi’] = ‘video/avi’;
$mime_types[‘mobi’] = ‘application/x-mobipocket-ebook’;
$mime_types[‘epub’] = ‘application/epub+zip’;

//Removing the pdf extension
// unset($mime_types[‘pdf’]);

return $mime_types;

add_filter(‘upload_mimes’, ‘custom_myme_types’, 1, 1);

I simply added this to the very end of the file, and it worked just fine. For more information, see [this thread on StackExchange][se].

I hope it is. I’m done now. Back to the paper!

[dh]: http://www.samplereality.com/2013/09/19/digital-humanities-at-mla-2014/
[EPUB]: http://media.johnlaudun.org/wordpress/media/2014/01/DH-MLA-2014.epub
[MOBI]: http://media.johnlaudun.org/wordpress/media/2014/01/DH-MLA-2014.mobi
[Amazon]: http://www.amazon.com/dp/B00HRGWE7W
[PDF]: http://media.johnlaudun.org/wordpress/media/2014/01/DH-MLA-2014.pdf
[Scrivener document]: http://media.johnlaudun.org/wordpress/media/2014/01/DH-MLA-2014.scriv_.zip
[se]: http://wordpress.stackexchange.com/questions/42669/how-to-upload-and-allow-downloads-of-mobi-and-epub-formats

Chronicle Vitae Blog

So, the [Chronicle of Higher Education][che] has this new social-job-thingamabob a la [LinkedIn][] and [Academia.edu][] called Vitae, and, it turns out, Vitae has a blog. And the latest [post][] is about the work involved in putting together a tenure and promotion dossier for digital work.

[che]: http://thechronicle.com/
[LinkedIn]: http://linkedin.com/
[Academic.edu]: http://academia.edu/
[post]: https://chroniclevitae.com/news/249-digital-humanists-if-you-want-tenure-do-double-the-work

Creating a Space for the Digital Humanities

[Paige Morgan has a post][] about how to create a space, what she calls a microclimate, for the digital humanities. Morgan is a PhD candidate at the University of Washington, and she at first worked with one other graduate student, now two, to offer a series of workshops to provide an “introduction to digital humanities and multimodal scholarship, and some of the activities associated with digital humanities (DH) — professionalisation through social media, working with code, and project development.”

What I like about her approach is its realistic expectations about what time and energy their audience possessed and how best to manage it: “We avoid assigning readings, because the majority of our students are already carrying a full course load, and teaching. We can’t make this a stealth 5-credit seminar for which they don’t actually get credit. Instead, we send out email teasers, in which we often highlight one paragraph, or even one sentence, from an essay or website, and we teach using that.”

There’s more. Follow the link above.

[Paige Morgan has a post]: http://www.paigemorgan.net/rmmla-panel-on-digital-humanities-microclimates-demystifying-digital-humanities/

Speaking in Code Second Day Links

* A lot of people end up using [Oxygen][] to edit XML. (I refuse to try to replicate all the weird capitalizations at this point in the day.)
* There’s always [DH Answers][].
* Finally, I got to join the Humanist Readable Data Models group for the afternoon session, and we began with Jean Bauer’s DAVILA, which has an [overview][] and a [GitHub repo][].
* The peer review committee linked to [DH Commons][].
* The TEI group already has a [GitHub repo of examples][].

[Oxygen]: http://tei.oucs.ox.ac.uk/Talks/2009-04-galway/talk-D1L3_tools.xml
[DH Answers]: http://digitalhumanities.org/answers/
[overview]: http://projectquincy.org
[GitHub repo]: https://github.com/jabauer/projectquincy2
[DH Commons]: http://dhcommons.org/help-type/peer-review
[GitHub repo of examples]: https://github.com/TEI-examples/tei-examples

The Links Behind the Day

The links/URLs flew fast and furious yesterday, more often in conversation than in the presentations, and it felt so completely natural that I didn’t think about it until this morning when I decided to capture all the tabs I had open in my browser:

* There is, of course, the foundational site for all of this the [Speaking in Code][] page, which, to my mind is a model for how these things should be done: everything organized on one page, in a graphically very clear way and with links to maps and destinations right on the page itself
* Bill Turkel mentioned a graphical programming language, _Max_, that piqued my curiosity, and reminded me of the [Lego Mindstorms][] interface, and then I remembered that *that* was built on top of [Logo][].
* Our discussions about *tacit knowledge* (and what it implied) and what to call *mastery* — something Hugh Cayless asked — reminded me of the [Dreyfus model of skill acquisition][]. (To be honest, while I had heard of the Dreyfus brothers work over the years, I didn’t really encounter it until I read Andy Hunt’s _Pragmatic Thinking and Learning_ a few years ago.)
* Someone, I think Jean Bauer (but, boy, I could be wrong) tweeted about [Software Carpentry][]. With a slogan like “Scientific computing doesn’t have to hurt,” you know it’s going to be good.
* Bethany Nowviskie, following up on a bunch of comments about getting signal from noise (as Micki Kaufman put it — or maybe someone else rephrased her point that way — this thing is collaborative, people, everyone is building on top of everyone else here in a way that makes attribution really hard) linked to a post on _Snow Theory_ entitled [“Can Digital Humanities Visualize Absence?”][].
* Finally, at some point, someone said something about the arbitrary nature of museum collections — I think it was Mia Ridge, now that I think about it — and it made me remember that weird historical moment when Jean-Paul Sartre, Claude Levi-Strauss, and Andre Breton were in New York together and rummaging through junk shops together. For those who don’t know it, it produced Sartre’s terrific essay, “New York, the Colonial City” in which he concludes about the city’s grid of streets that in it one is *jamais égaré, toujours perdu* (never led astray, but always lost). Levi-Strauss references it in the opening pages of _The Way of the Masks_. (I have always wanted to write a bit of fiction in which you had Levi-Strauss popping up from behind a mound of dust-covered antiques and knick-knacks, wearing a Kwakiutl mask and cry out “Jean-Paul! Jean-Paul! I am Xwexwe! Throw me pennies for luck!”) I googled a historical reference and found [this][]:

Screen Shot 2013-11-05 at 7.44.02 AM

[Speaking in Code]: http://codespeak.scholarslab.org/
[Lego Mindstorms]: http://en.wikipedia.org/wiki/Lego_Mindstorms
[Logo]: http://en.wikipedia.org/wiki/Logo_(programming_language)
[Dreyfus model of skill acquisition]: http://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisition
[Software Carpentry]: http://software-carpentry.org
[“Can Digital Humanities Visualize Absence?”]: http://snowtheory.blogspot.com/2013/11/can-digital-humanities-visualize-absence.html
[this]: http://books.google.com/books?id=yuL98MtZ5poC&pg=PA63&lpg=PA63&dq=breton+sartre+levi-strauss&source=bl&ots=dT0IMTH9BY&sig=CYKnkWe_ZGxfvh3K0FYH5PWjzU8&hl=en&sa=X&ei=GA14UpbnDYalsATFxYCwCA&ved=0CD8Q6AEwAw#v=onepage&q&f=false

The Theory Behind It All

The first day of the [Speaking in Code][] symposium hosted by the [Scholar’s Lab][] at the University of Virginia and sponsored by the NEH’s Office of Digital Humanities was exhausting and exhilirating all at the same time and for the same reasons: there was no let up in the high level not only of exchange but in its fierce integrity, too. By integrity I mean there was not one thing said that the person didn’t believe wholeheartedly and so people spoke their minds and their hearts at the same time. And, best of all, they spoke openly and generously: there was very little, if any, posing or preening. It was exhilirating for those reasons, too.

It was also humbling that everyone there knew more than I did about most dimensions of the day’s topics. But, as always, there was generosity in the differential, one of the things I have come to appreciate about the digital humanities groups I have encountered — having never been to the big annual meeting, I can only daydream that the same holds there on a larger scale.

What I find most interesting about one of the subtexts emerging from the meeting is where theory is getting located in the digital humanities. Looking back, I think our organizer and leader, Bethany Nowviskie, actually purposefully contrasted the difference between discourse-based and pactice-based domains as a way of nudging us to think abou this. When Nowviskie contrasted the difference between the bookishness of the humanities scholar and the dirty hands of the humanities coder, I balked (internally) at the stereotype of the potter with so much clay under his nails that he doesn’t have time to think. I cocked a mighty folklorist’s eyebrow at her and really wanted to rebut the idea of the craftsman whose ideas are found only in his practice.

But I think Nowviskie played the metaphor exactly right, because as it became clear what she meant by practice, and what this group means by practice is something far richer and more interesting than, as my table mate Rohit Chopra’s sociological ancestor Pierre Bourdieu, meant by practice.

This comes up, in part, because I released my notes for the day to the group in a [], and the last speaker for the day, Mia Ridge picked up on the one personal aside I included in my notes — I was largely too busy keeping up to reflect: “My point about code = theory appears not to have been interesting.” She quite nicely asked:

> @johnlaudun I think ‘code = theory’ is key, but might have missed your explanation in the rush of ideas. Can you expand in your doc?

My earlier point was pretty much, I realized later — but, alas, too late because I forgot to pull the aside out of the notes (I did note note how slow I am on the uptake, right?) — *the point* of this symposium. What I said at the time was that my defense of wanting to learn to code, and of coding (if you can call the mashing of keys that I do that), to my colleagues in folklore studies is that much of the code base with which I sometimes work was built by others for other purposes, and so it has their assumptions about language, and cognition, and their goals built into it. Their assumptions and their goals are not mine, and so I want to intervene, to take an active role in shaping the code that I use so that it fits my goals, my assumptions, my theories.

Which is pretty much what everyone else here was saying. (It only took me one night to realize this, so I’m processing faster than normal.) My notes to prove this aren’t quite as good as I would like, but Micki Kaufman reminded me of something Jean Bauer said:

> [it’s] important to consider the data models we construct as arguments in and of themselves, ripe for interrogation

It was also what Bill Turkel assumed when he noted that:

> Code should be a way for programmers, and scholars, to talk to each other

Or when he foregrounded how learning a programming language, like learning a natural language, is simply a way to force ourselves to think in new ways. (I also liked very much his notion that the humanities need to learn that failure is an important part of any discovery process.)

And we ended the day on a similar note with Mia Ridge challenging us with the “proper” fit between data and tools in a talk she entitled “Messy Understandings.”

It was, I realized later, entirely a day about challenging the distinction between tacit knowledge and explicit knowledge and for getting us all to think about more clearly about the theory behind *any* practice (if I may be allowed a momentary nod toward my folklorist ancestor, Zora Neale Hurston).

(I should note here that Stefan Sinclair’s assignment to get us thinking about the theory behind good design was useful here, and Hugh Cayless was relentless, relentless I tell you, in reminding us that TEI was built to let theory get embedded in its documents.)

Much of this returns, for me, to a series of late night conversations I had with my mentor Henry Glassie, as we poured over a number of ethnographic documents like J. M. Synge’s _The Aran Islands_ or James Agee’s _Let Us Now Praise Famous Men_ and we talked about ways to embed our theories, folklore theories, into prose that otherwise looked novelistic to readers. All the theoretical concerns would be displaced, as they are in his masterwork _Passing the Time in Ballymenone_ into notes tucked in the back of the book. I have tried to do much the same in my own book, _The Makers of Things_ (UPM, Spring 2014), and time will tell if I pulled it off, and I am very much looking forward to trying to do much the same thing with my next project in terms of the code I write and use, and share.

And now I know there’s an entire group of people to whom I can turn for help in getting it right, because it is exactly on their minds, too.

P.S. My apologies to everyone’s comments in my notes that got anonymized: the conversation moved so fast, and my typing too slow, to catch anything more than what was said itself.

[Speaking in Code]: http://codespeak.scholarslab.org/
[Scholar’s Lab]: http://scholarslab.org/
: https://gist.github.com/johnlaudun/7309824

Personality Type(s)

Just to see where, or what, I am now, I took a redacted version of the Myers-Briggs Inventory, the one available over at [HumanMetrics](http://www.humanmetrics.com/cgi-win/JTypes2.asp). I think I scored pretty close to what I scored on the complete inventory when I took it back in the late 90s: [INFP](http://typelogic.com/infp.html).

* Introverted (11%)
* iNtuitive (38%)
* Feeling (12%)
* Perceiving (11%)

Maybe I was INTJ last time, but I was equally borderline. The only strong tendency here is **intuitive**, which I think was also the case a decade and a half ago.

So, apparently, sensing is out.

Find out for yourself.