LinkedIn Network Visualization

[LinkedIn][] now offers to visualize your professional networks. A brief glimpse of mine reveals that the components are, themselves, heterogenous:

My LinkedIn Network

That is, the components are fairly mixed, which, I guess, reveals that a number of people with whom I’ve connected on LinkedIn are themselves involved in a number of communities. The only clear standouts here are my colleagues in folklore studies across the nation. Perhaps the upshot of this is that I have too many local connections and too few national? Or that a number of the national and international scholars don’t participate in LinkedIn, which I think is equally true, especially among my colleagues in network studies, quantitative analysis in the humanities, and computational folklore. Most of them are on Twitter. Interesting split.

[Daniel McLaren has a nice write-up][dm] about downloading your LinkedIn information in a JSON file and then using Protoviz to do improve the visualization. (His principle edit was to remove himself from the center of the graph.)

[LinkedIn]: http://www.linkedin.com
[dm]: http://danielmclaren.com/blog/2011/02/08/visualizing-linkedin-connections-using-protovis

Where there’s a will

The old saying goes, “where there’s a will, there’s a way.” As someone interested in making, in the full [Make Magazine][make] sense of that word, I was delighted to read about a project to “open source” a wireless network … in Afghanistan. The project is called *FabFi* and it uses common building materials and off-the-shelf electronics to transmit wireless ethernet signals across distances of up to several miles. The [project’s home page][fabfi] has an Afghanistan TLD — and how many times do you get to click on a link with an `af` in it? — and notes: “With Fabfi, communities can build their own wireless networks to gain high-speed internet connectivity—thus enabling them to access online educational, medical, and other resources.”

I’m lucky enough to live in a community where our public utility, owned and operated by the city, offers amazing fiber-to-the-home connectivity, and so my desire in building anything like this is tempered, but, as they also say, you never know when you may need to know how to build your own network infrastructure…

[make]: http://makemag.com/
[fabfi]: http://fabfi.fablab.af/

Sci2 0.5.1 Now Available

My continuing thanks to everyone at Indiana University who works on this project and makes it possible. It keeps getting better, even if my use of it doesn’t. Here’s the link.

Bateson on structures

During transformative moments in one’s thinking, I find that I turn to the writers and thinkers who first inspired me to examine the human condition more closely. In my case, the usual suspects are Heidegger, Bateson, Bakhtin, and Levi-Strauss. (And that’s something of the order in which I encountered them.) A recent survey of cyborgs and cybernetics on the web, [50 POSTS ABOUT CYBORGS][50], reminded me of one of my favorite essays by Gregory Bateson, which has, to my dismay, remained critically under-appreciated or under-read, “Style, Grace, and Information in Primitive Art.” In particular, they pulled a great quote from the essay:

> No organism can afford to be conscious of matters with which it could deal at unconscious levels.

If you want to read the essay for yourself, it’s available via [Google Books][gb] — the link is to a search for the quote which takes you to the essay as it appears in an anthology on the anthropology of art.

[50]: http://50cyborgs.tumblr.com/
[gb]: http://books.google.com/books?id=1ohH1JPQwEMC&pg=PA85&lpg=PA85&dq=No+organism+can+afford+to+be+conscious+of+matters+with+which+it+could+deal+at+unconscious+levels.&source=bl&ots=py_eA8jHkT&sig=RyyGMp3k-PMJBTz7YjX1GGNTpk8&hl=en&ei=ZmuTTLjUCMX7lweD-bWoCg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBIQ6AEwAA#v=onepage&q=No%20organism%20can%20afford%20to%20be%20conscious%20of%20matters%20with%20which%20it%20could%20deal%20at%20unconscious%20levels.&f=false

Federated Is the Future for Open Source

In his remarks to this year’s OSCON, Tim O’Reilly makes the interesting assertion that “federated is the future for open source”. His assertion comes out of his interest in the internet as the next operating system. His example makes the point very clearly (paraphrased):

Imagine yourself out with friends and you decide to get a pizza. What do you do? If you have one of the new smart phones [by which he means iPhone or Android], you can quite literally put the thing to you mouth and speak the word pizza into an app and it will search for places to eat pizza that also happen to be nearby.

The technologies involved are quite astonishing: touch sensors (to activate the app) motion sensors (the device has to know you are moving it up to your head to know to turn on the microphone), a GPS radio (to know where you are), and a microwave radio (to transmit your request).

But the technology doesn’t end there: the speech recognition is not being done on your phone in many instances but “in the cloud” as is the cross-indexing of eateries and your location. All of this is assembled into some form of text — HTML or otherwise — and then sent back to your handset, which now offers you a range of options.

Amazing stuff. But even more amazing is that really how Google, for example, know how to understand your spoken request is because they have a pretty good sense of what goes with what. They are, after all, in the search business as well. It’s all this data that makes it possible to give you not just an answer but a semantically-rich and appropriate one.

Obviously, the more you can cross-pollinate these various data sets, the more interesting your results will be and the more kind of innovation become possible. But Google owns its (your) searches and Facebook owns its (your) social graphs. Given that the current trend is in this direction, O’Reilly asks the pressing question of where does the open source community go when a lot of these companies are built on open source — Google runs on Linux after all and gives away a lot of the software it developes — but the data itself remains beyond our reach?

DH/Networking Explorations 1

## Following My Own Advice

For years now I have been encouraging students, both beginning and advanced, to keep a journal of their activities as one way of breaking down the barrier to getting writing done. I have especially encouraged graduate students working on their dissertations to try it. And I have done this while only being an intermittent practitioner myself. (I confess that this is in part one of the great advantages of having a spouse who practices the same profession: one is free to do much of the daily review over the dinner table. The pret-a-ecouter audience is great, but it disengages one important dimension of the process: writing.)

And so, John Anderson, if you are reading this post, here is me doing what I said, an account of trying my hand at textual analysis.

## The Onus ##

At the end of last year I was invited to participate in an NEH seminar on “Networks and Networking in the Humanities” which will be hosted by UCLA’s Institute for Pure and Applied Mathematics later this summer. Earlier this year the participants received a list of homework assignments: two books to read, a technical paper or two, and the production of an edge list.

The books have been interesting. (More on each one in separate posts.) The technical paper was at the border of my ken, but I followed chunks of it. The production of the edge list, a list of links in a network, has been the hardest task. Of course, part of it was nomenclature. “Edge list” through for a loop, new as I am to networkese, but I grokked it with the help of the assigned reading — and a variety of web reading. (Thank you, intarwebs.)

But there was another dimension to the edge list assignment that was stymying me: the data. Yes, I have the emergent data from the boat book, but I don’t feel entirely comfortable rushing to produce more data for the sake of the seminar if it means rushing certain dimensions of the research and I don’t quite have a grip on all the data I already have in a way that I am comfortable pouring it into a new paradigm of analysis and modeling. (Like some mental version of Twister.)

And so I needed a data set with which I could work that would allow me to do the kind of analysis that I hoped network theories and models would make possible. In particular I am interested in applying these paradigms to ethnographic contexts where we need to understand how individuals make their way through the world using the ready-made mentifacts that we sometimes call folklore as “equipment for living.”

What I think that means is that I want to understand how individuals within a given group (a social graph, if you will) draw from a repertoire (network) of forms (stories, legends, anecdotes, jokes, etc.) which themselves variously reflect and refract a network of ideas (ideology) dispersed (variably) throughout the group.

Networks of People, Stories, and Ideas

Or, as folklorist Henry Glassie once put it: “Culture is made up of ideas, society of people.” But ideas just don’t bounce around peoples’ heads and they don’t exist out in the world, at least very often, unencapsulated. Ideas and values are usually embedded in the things we say and do.[^1] We keep these things around, these stories and explanations, because they resonate with our values and beliefs. At the same time, the forms not only give shape to the ideas but also shape them.

This dynamic interaction has been the focus of folklore studies for the past century. For the last forty years, studies of culture and language have taken an ethnographic turn, sometimes called “performance” and sometimes called “ethnomethodology,” which has focused on the important role that individuals play in the intertextual network of forms (and thus the ideological network embedded within them).

I am one of those performance-oriented scholars. Performance studies has produced a wide range of profound micro-level studies of folklore in action. In the last decade or so, there has begun to be an attempt to build back toward the philological framework from which the performance orientation sprang and against which it initially pushed back. It’s time to fold these things together, and I think network theories offer one possibility for doing so.

## The Data ##

If not my own data, then what other corpus? I wanted to work with materials that I knew fairly well. I began to build a database of Louisiana folklore in print, focusing especially on tales and legends, but the amount of time to get a large enough corpus digitized and into the database, even using OCR software, quickly loomed too large. A great project, but one that could easily take up an entire summer, not the limited time I had to get something up and usable in order to begin to complete the seminar assignment — which I was late fulfilling anyway.

I did, however, initiate some conversations that may yet produce a foundation for such a database, contacting authors of several texts for electronic copies of their manuscripts to facilitate data entry. (The metadata is entirely a separate matter for now.)

The answer to my question didn’t come to me until I was in Providence, Rhode Island for the sixth, and final, Project Bamboo planning workshop. I don’t know if somebody said something or suggested something, but I struck upon the idea of using Zora Neale Hurston’s _Mules and Men_ as the basis for the seminar assignment and for my own initial explorations into the various software tools that are available. I was reasonably hopeful that somewhere, someone would have digitized the text, and I was right: the text is not in Project Gutenberg, nor in the Oxford Text Archive, but at the University of Virginia’s American Studies’ [hypertext collection][xroads]. There I found a [hypertext version of _Mules and Men_ put together by Laura Grand-Jean in 2001][lgj].

I am not yet at a point where I could deploy a `bash` script to `wget` or `curl` or something else the pages I needed, but since I decided to focus on only the folktales section of the book, the book’s first half, it wasn’t too much of a task to click on each page and then copy the text and paste it into a plain text document in my text editor, Textmate. For reference, I also copied and pasted the HTML in hopes that it might prove useful for getting certain kinds of texts out. That is, I had hopes of figuring out how to tell a piece of software to pull everything out between `

` tags. Unfortunately, Grand-Jean had used some non-standard `

` markup to handle the long blockquotes. I thought about doing some fancy find and replace work with regular expressions, but in the end I decided I would rather work with the plain text, which would also encourage (force) me to re-read the text. The latter proved useful as I came across some long texts embedded in dialogue that were worth including in the extracted corpus.

(The plain text version of Part One of _Mules and Men_ can be found both on [Scribd][] as well as on [GitHub][] — forked critical editions of texts is an interesting idea, no? It weighs in at 55,798 words in 2,127 lines — somewhere along the way I’ll put up some stats on word counts for block quoted text, quoted text, narrative text, etc.)

## And Now for Some Software ##

So I’ve got a digitized text. An ethnographic text.[^2] That will give me people and forms, and I’m reasonably familiar with the kinds of speech communities involved that I can take a crack at ideas. Now I hope to use software to begin to discern those patterns more clearly. (And to produce that edge list.)

The first thing I try is SEASR’s [Meandre][]. Meandre is really something like a software suite, consisting of server and client software, both of which you install and run locally. The server software syncs with the component and workflow repositories at SEASR HQ which are then made available to you through the workbench.

Meandre Workbench

As a quick glance at the UI reveals, it’s not exactly user friendly. Then again, none of this software really is. The good folks running the seminar have provided us with links to useful software: Network Workbench, Wordij, and Pajek (which is, sigh, Windows-only). I am still working my way through these various packages, but I have to say that so far my best results have been using [IBM’s Many Eyes][ibm].

[xroads]: http://xroads.virginia.edu/~HYPER/hypertex.html
[lgj]: http://xroads.virginia.edu/~MA01/Grand-Jean/Hurston/Chapters/siteintroduction.html
[Scribd]: http://www.scribd.com/doc/33800238/Zora-Neale-Hurston-s-Mules-and-Men-in-plain-text
[GitHub]: http://github.com/johnlaudun/Mules-and-Men
[Meandre]: http://seasr.org/meandre/download/
[ibm]: http://manyeyes.alphaworks.ibm.com/manyeyes/users/johnlaudun
[^1]: The poet William Carlos Williams once advised in “A Sort of Song” to: “Let the snake wait under / his weed / and the writing / be of words, slow and quick, sharp / to strike, quiet to wait, / sleepless. / — through metaphor to reconcile / the people and the stones. / Compose. (No ideas / but in things) / Invent! / Saxifrage is my flower that splits / the rocks.” His famous urging to himself and other poets to find the ideas that already surrounded them in the world echoes the anthropological project of the twentieth century: to find the intelligence and beauty in the always already peopled world of the everyday. (My apologies to Williams for eliminating his line breaks but my software, `PHP Markdown Extra`, wasn’t handling a poem within a footnote at all well.)
[^2]: To be sure, I’m fully aware of the potential problems of Hurston’s text. For a fuller discussion, see my essay in _African American Review_ ([JSTOR](http://www.jstor.org/stable/1512231)).