I started [a thread on Stackoverflow] as I try to determine how to write a Python script using the Natural Language Toolkit that will write the concordance for a term out to a file. Here’s the script as it stands:
#! /usr/bin/env python
# First we have to open and read the file:
thefile = open(‘all_no_id.txt’)
raw = thefile.read()
# Second we have to process it with nltk functions to do what we want
tokens = nltk.wordpunct_tokenize(raw)
text = nltk.Text(tokens)
# Now we can actually do stuff with it:
concord = text.concordance(“cultural”, 75, sys.maxint)
# Now to save this to a file
fileconcord = open(‘ccord-cultural.txt’, ‘w’)
Eventually I hope to have a script that will ask me for the `source text` and the `term` to be put in context and that will then generate a `text` file with the name of the term in it.
I should note that one of the respondents has already pointed me to a thread on the [NLTK discussion group], which I knew existed but had someone managed not to find.
If you’re interested in the discussion group, here’s its [home page] in the new Google Groups format. (It’s an ugly URL, to be sure.)
**Update**: [NLTK is now on GitHub]. Some of the [documentation], from what I can tell is in Tex. The NLTK book, which I own as an O’Reilly codex and epub, is also on GitHub as well as [an NLTK repository], which appears to be empty for now.
If you’re interested in the book: [visit O’Reilly’s site][site], where you can purchase it in a variety of formats, codex or electronic. The great thing about the e-versions is that you can pick and choose from PDF, epub, or mobi, which means I can have the PDF on my iPad and the epub on my phone and the mobi on my Kindle. If you really only want to deal with Amazon, then if you follow [this link][amz], I will get a small commission.