Skip to content →

Tag: corpus

Assessing the Corpora available to Me

When I go to the Culture Analytics at UCLA’s IPAM in a little over a month, I want to arrive with at least one interesting corpus with which to work. I have the following options:

  • Louisiana treasure legends:
  • Hook legend:
  • Oil industry interviews: 480 texts

The oil industry interviews come as a collection of mostly DOC files with an RTF file or two mixed in. They are a mixed bag in terms of content, but perhaps doing some distant reading might turn up something interesting. To do that, I need to get them into a form with which I can work:

textutil -convert txt ~/Desktop/transcripts/*.docx

And, just after, the same command as above except with *.rtf at the end. Now I’ve got 480 plain text files. It would be nice, for the sake of using filenames later, to get rid of some part of the file names:

Lastname, Firstname 08-09-2006 final.txt
...
Lastname, Firstname and Firstname 01-23-02 final.txt

I created two Automator workflows: one workflow to make all the letters lowercase in the file names, a personal preference, and to replace spaces with underscores and another workflow to trim all occurrences of final or transcript from the end of files. (This could just as easily have been one workflow, but I created two, since I am guessing I will re-use these workflows again in the future.) Now file names look like this:

lastname_firstname_06-01-2006.txt

Still somewhat ungainly, but it will do for now.

Comments closed

Bateson on structures

During transformative moments in one’s thinking, I find that I turn to the writers and thinkers who first inspired me to examine the human condition more closely. In my case, the usual suspects are Heidegger, Bateson, Bakhtin, and Levi-Strauss. (And that’s something of the order in which I encountered them.) A recent survey of cyborgs and cybernetics on the web, [50 POSTS ABOUT CYBORGS][50], reminded me of one of my favorite essays by Gregory Bateson, which has, to my dismay, remained critically under-appreciated or under-read, “Style, Grace, and Information in Primitive Art.” In particular, they pulled a great quote from the essay:

> No organism can afford to be conscious of matters with which it could deal at unconscious levels.

If you want to read the essay for yourself, it’s available via [Google Books][gb] — the link is to a search for the quote which takes you to the essay as it appears in an anthology on the anthropology of art.

[50]: http://50cyborgs.tumblr.com/
[gb]: http://books.google.com/books?id=1ohH1JPQwEMC&pg=PA85&lpg=PA85&dq=No+organism+can+afford+to+be+conscious+of+matters+with+which+it+could+deal+at+unconscious+levels.&source=bl&ots=py_eA8jHkT&sig=RyyGMp3k-PMJBTz7YjX1GGNTpk8&hl=en&ei=ZmuTTLjUCMX7lweD-bWoCg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBIQ6AEwAA#v=onepage&q=No%20organism%20can%20afford%20to%20be%20conscious%20of%20matters%20with%20which%20it%20could%20deal%20at%20unconscious%20levels.&f=false

Comments closed