Trying Out Indico’s “plotlines”

Running parallel to Jockers’ attempts to “plot” texts via sentiment analysis, Indico Data Solutions has released a Python package plotlines as well as a Jupyter notebook of documentation and sample code.

Neither indico nor plotlines turned up in a port search so my next step was to try pip. My first attempt revealed that I was still using the Python 2.7 version of pip, and I needed both to get the version for Python 3.4 but also make sure it was the active version:

sudo port install py34-pip
sudo port select -- pip pip34

And, then, to the matter at hand:

sudo pip install -U indicoio


More Notes on Jupyter Notebook

I used MacPorts, as always, to install the new code for Jupyter:

port install py34-jupyter

Note: you may need to prepend sudo to install software on your setup.

But the new command jupyter notebook only returned -bash: jupyter: command not found for me. I tried various alternatives, but got nowhere until I returned to ipython notebook. Presto. And even better, now I have this:

Python or R?

Python or R?

Getting the R there is considerably simpler now, while in the R shell:

             repos = c('', 


See iRkernel for more information.

First Thoughts on Folklore’s Contribution to a Computational Narratology

A complete video of this talk, with me talking while slides go by, is available on Youtube. I am also working on a longer, revised version of this essay to address the concerns I have about the particular usage of folklore theory, Propp, while ignoring folkloric materials and making pretty large claims.

For my presentation at this year’s meeting, I proposed to our session’s organizer, Jill Rudy, that I attempt a more synthetic understanding of recent explorations by physicists and information and computer scientists who are using folklore materials as way to test the limits of their own theories and models either of things like social networks or of dimensions of textuality that have, until lately, largely not been within the sphere of the traditional study of either literary or folkloric texts (whatever the distinction is). At the risk of having over-promised and now under-delivering, I am not prepared to do that today, for a number of reasons. First, my own attempts to survey and synthesize the strains of scientific inquiry is not yet complete, and, second, because there’s no way I can do that in seven minutes. (That noted, if you are interested in such a survey, contact me. I’ll be happy to share when it’s ready.)

I have no idea what I was thinking when I proposed such a task for a diamond session, which may be in keeping with how some in the audience view any talk of computational this and digital that. As not thinking. And, perhaps, they would be right that it is not thinking, at least not in terms of the way we are used to thinking. But I see no reason to have to choose between one or the other. There’s no sense in pressing ahead with either-or when both-and is just as likely and for more useful.

With the five and a half minutes I have left in this presentation, what I would like to do is to focus on a particular moment in the spring of this year, the moment which was, in fact, on my mind when I wrote my overly ambitious abstract, in order to begin to think through the opportunities that lie before us as folklorists, if we are but willing to count.

On February second of this year, Matthew Jockers, an associate professor of English at the University of Nebraska, published “Revealing Sentiment and Plot Arcs with the Syuzhet Package” (Jockers 2015a). The post itself was a follow-up to an earlier exploration of the shapes of stories he had published the previous year (Jockers 2014) which was based in part on a re-publication of a video of a lecture given by Kurt Vonnegut at some point during the era of VHS tapes and/or DVDs.[1]

Vonnegut begins with a rather interesting assertion: “There is no reason why the simple shapes of stories can’t be fed into computers. They are beautiful shapes.” He then turns to a blackboard and draws a vertical line, calling it the “G-I axis” for good fortune and ill fortune and then a horizontal line that he describes as the B-E axis, for beginning and end. Vonnegut doesn’t raise the specter of computers again: the rest of his presentation is on mapping various shapes of stories, shapes that are based on the fortune of the protagonist: is she doing well or is she suffering at the hands of the antagonist?

Working at the chalkboard, Vonnegut offers a number of variations on the possible shapes of such narratives, which graphic designer Maya Eilam later turned into a poster-sized graphic available to readers of Open Culture, the website that had first made Vonnegut’s lecture a cause celebre.

Jocker’s syuzhet program realizes Vonnegut’s idea by processing the prose of a novel sentence by sentence and scoring each sentence on its positive or negative emotional valence using something known as sentiment analysis. (There are a number of problems with sentiment analysis that a fuller conversation should have, but let’s give Jockers some room to work, or at least play, and see what happens.) As sentences strung together build to create the novels that are Jockers’ focus, the sentiments they contain slowly trace a trajectory up and down along the time-line of the novel’s narration. Since novels are different lengths, Jockers uses some math to normalize for length, allowing the trajectories between two novels, for example James Joyce’s Portrait of the Artist as a Young Man and Oscar Wilde’s Portrait of Dorian Gray, to be comparable.

Based on his computational analysis of some 41,383 novels[ Some of these are recent. How did he get access?], tested against a close reading of a couple dozen of the novels, Jockers came to the conclusion that there are approximately six archetypal story shapes, at least one of which looked so much like Vonnegut’s “Man in Hole” shape that he named it such in homage.[2]

The two shapes which he has discussed most so far are the “man in hole” and one he has dubbed “man on hill” — not very elaborate terms, but I think it helps to remind everyone that the nature of this work is still very much a sketch and not yet as programmatic as some have taken it to be.

The example text that Jockers uses most often is Joyce’s Portrait, which in its initial visualization in the syuzhet package looks something like a pixelated cloud, but he eventually achieves a smooth, optimum shape through a series of mathematical transformations, all of which he is very upfront and clear about.

So, while I want to point out that this kind of work does allow you to build your theory of textuality into its very operation, I think we need to be very clear about how texts are being treated here: sentences are being weighted for their sentiment and those weights are being added up and averaged over larger and larger stretches of those text in order to achieve a particular kind of two-dimensional shape.

There have been a number of responses to Jockers’ work. Some have expressed concerns about the use of sentiment analysis, which, in the end, is simply a collection of words with a value between 1 and -1 assigned to them, and the application, as at least one observer has mused, can be fairly crude.[3] Like, for example is a positive word, since it is assumed that it is used as a verb. Such an assumption entirely misses its use as a preposition, as in “he smelled like a chicken farm in August.”

Jockers’s response to these concerns, and others, is that it all evens out over the course of a fifty or one hundred thousand, or more, word novel, so those ironic or sarcastic or non-standard uses of words do not really matter.

And, perhaps most importantly for those of us gathered in this room—okay, like I’m not—almost all forms of sentiment analysis would misunderstand, and misvalue, the use of like as a quotative: “And she was like, I told you he would say that.”

And that, ultimately, is the shortcoming of Jockers’ claim: he keeps talking about these shapes, drawn from novels, as the shape of stories. But folklorists would be the first to point out that novels are only ever produced by an incredibly small number of human beings, and while they are consumed by a larger number, even that number is not as large a percentage as the number of human beings who tell stories. So universalist claims about the shapes of stories based on an, albeit quite large, collection of novels are, I would suggest, fairly premature.

More importantly, where are the folklore collections with which we could begin to build comparisons with Jocker’s 41,383 novels?[4] For fun, I ran some of the legends from corpus of Louisiana treasure legends through Jocker’s syuzhet package.

A small legend from Barry Ancelet’s Cajun and Creole Folktales, consisting of a little over two dizen sentences and 333 words produced a graph that showed positive sentiments upfront and negatives in the latter part of the story.

A somewhat longer legend from my own fieldwork that is about twice as long at 653 words and almost 50 sentences had a bit more dynamism in terms of sentiment, but seemed to possess a similar overall trend.

When I tried to smooth the graphs to look at a larger trend using one of the options in the suzhet package, I confronted the fact that its code base requires texts of at least 200 sentences. The longest text in my collection, coming from work done by Carl Lindahl and Maida Owens for the Swapping Stories project, weighs in at only a little over a thousand words and, or but, less than a hundred sentences.

Still, using the Fourier transforms we are able to see some interesting consistencies emerge: first, let’s take a look at the small legend again, then transform it. Now, the second legend, one about a pirate in a tree that threatens African Americans. And finally, our long legend laid over the previous two.

This is small stuff, but it’s another dimension to think about. If I were with you now, I would put in a pitch for folklorists gathering to discuss how to make our data more share-able. I’m working with TEI to make that happen. Tim Tangherlini is in the room, and I know he has a smile on his face and some ideas in a head.

If anyone would like to discuss a copy of this paper or of the visuals, they are available at this URL, and I’m happy to discuss accessible, share-able data with anyone interested. Talk to you soon.

PDF of slides.

[1] The Vonnegut video is clearly an excerpt from a longer lecture which had been captured at some, so far unknown, date and time. The sequence of events that led up to its most recent popularity seems to be the following: on 2010 October 30, David Comberg uploaded the 4:36 segment of video on Youtube. There is no other information available. This is the video to which all others link. On 2011 April 4, Open Culture featured the video segment in a post titled “The Shape of a Story:Writing Tips from Kurt Vonnegut.” The site featured the segment again on 2014 February 18, this time with an impressive set of visualizations by graphic designer Maya Eilam. Open Culture also quoted from Vonnegut’s autobiography, Palm Sunday as a way to provide more context: “‘What has been my prettiest contribution to the culture?”’ asked Kurt Vonnegut in his autobiography Palm Sunday. His answer? His master’s thesis in anthropology for the University of Chicago, ‘which was rejected because it was so simple and looked like too much fun.’ The elegant simplicity and playfulness of Vonnegut’s idea is exactly its enduring appeal. The idea is so simple, in fact, that Vonnegut sums the whole thing up in one elegant sentence: ‘The fundamental idea is that stories have shapes which can be drawn on graph paper, and that the shape of a given society’s stories is at least as interesting as the shape of its pots or spearheads.’” A link to the site appeared on Reddit later that day.

[2] Some commenters have compared Vonnegut’s idea to Joseph Campbell’s monomyth. Interestingly, Campbell borrowed the term from James Joyce’s Finnegan’s Wake, and Jockers’ first explorations are with Joyce’s Portrait of the Artist.

[3] Some collections, like that of Hu and Liu, contain as many as 6800 words.

[4] The information on the novels involved in this set is rather thin. Initial descriptions of the contents suggested that only the word frequencies for the novels was available.

The Tree of Life Is Back

For those of you who grew up with a typology for biology as neat as the periodic table was for chemistry, you know the frustration that the various permutations the tree of life has undergone, including breaking into various zones — and we won’t mention that the periodic table has had its own reasons to be made more flexible. Frustrate no more! It looks like there’s an emergent synthesis that might make holding a visualization of the varieties of life in your head possible once more. (You are, however, going to need a slightly bigger head.)

Details are available at PNAS. (Link is to PDF.)

Everything Is Moving

I saw this on Reddit, and I wanted a copy of it for myself. It’s an archived answer from a user no longer on the site that another user dug up. It reminds of something a physicist said on a recent In Our Time podcast: “Everything wants to be iron.”1 (Ah, the role of desire in our imaginations.)

The answer below came in response to the question: “We all know light travels 186,282 miles per second. But HOW does it travel. What provides its thrust to that speed? And why does it travel instead of just sitting there at its source?”

Everything, by nature of simply existing, is “moving” at the speed of light (which really has nothing to do with light: more on that later). Yes, that does include you. Our understanding of the universe is that the way that we perceive space and time as separate things is, to be frank, wrong. They aren’t separate: the universe is made of “spacetime,” all one word. A year and a lightyear describe different things in our day to day lives, but from a physicist’s point of view, they’re actually the exact same thing (depending on what kind of physics you’re doing).

In our day to day lives, we define motion as a distance traveled over some amount of time. However, if distances and intervals of time are the exact same thing, that suddenly becomes completely meaningless. “I traveled one foot for every foot that I traveled” is an absolutely absurd statement!

The way it works is that everything in the universe travels through spacetime at some speed which I’ll call “c” for the sake of brevity. Remember, motion in spacetime is meaningless, so it makes sense that nothing could be “faster” or “slower” through spacetime than anything else. Everybody and everything travels at one foot per foot, that’s just… how it works.

Obviously, though, things do seem to have different speeds. The reason that happens is that time and space are orthogonal, which is sort of a fancy term for “at right angles to each other.” North and east, for example, are orthogonal: you can travel as far as you want directly to the north, but it’s not going to affect where you are in terms of east/west at all.

Just like how you can travel north without traveling east, you can travel through time without it affecting where you are in space. Conversely, you can travel through space without it affecting where you are in time.

You’re (presumably) sitting in your chair right now, which means you’re not traveling through space at all. Since you have to travel through spacetime at c (speed of light), though, that means all of your motion is through time.

By the way, this is why time dilation happens: something that’s moving very fast relative to you is moving through space, but since they can only travel through spacetime at c, they have to be moving more slowly through time to compensate (from your point of view).

Light, on the other hand, doesn’t travel through time at all. The reason it doesn’t is somewhat complicated, but it has to do with the fact that it has no mass.

Something that isn’t moving that has mass can have energy: that’s what E = mc2 means. Light has no mass, but it does have energy. If we plug the mass of light into E=mc2, we get 0, which makes no sense because light has energy. Hence, light can never be stationary.

Not only that, but light can never be stationary from anybody’s perspective. Since, like everything else, it travels at c through spacetime, that means all of its “spacetime speed” must be through space, and none of it is through time.

So, light travels at c. Not at all by coincidence, you’ll often hear c referred to as the “speed of light in a vacuum.” Really, though, it’s the speed that everything travels at, and it happens to be the speed that light travels through space at because it has no mass. edit: By the way, this also covers the common ELI5 question of why nothing can ever travel faster than light, and why things with mass cannot travel at the speed of light. Since everything moves through spacetime at c, nothing can ever exceed it (and no, traveling backwards in time would not fix that). Also, things with mass can always be “stationary” from someone’s perspective (like their own), so they always have to move through time at least a little bit, meaning they can never travel through space as fast as light does. They’d have to travel through spacetime faster than c to do that, which, again, is not possible.

  1. I believe the episode on “The Sun” is where I heard this. It has to do with how big an atom ordinary solar fusion can build: right up to iron, but no further. All heavier elements are the products of novas and supernovas. Gold, for example, is the product of supernovas that has been splattered, quite literally, across the galaxy.