Getting NLTK Up and Running on Mac OS X

*Please note that there is a more direct version of these instructions that walks you through setting up everything around a Python 2.7 installation: read it [here](http://johnlaudun.org/20131229-the-complete-python-for-text-analysis/).*

So somehow, somewhere, you got interested in natural language processing, and the Natural Language Toolkit available for Python strikes you as one reasonable place to start. First of all, congratulations for wanting to go the building-block route as opposed to the already assembled route. The first few steps are a bit more complex, but I think you will be gratified pretty quickly with how much you can do and how quickly you are doing it. More importantly, you are doing it for yourself.

Here are the steps that we are going to take to get NLTK up and running on Mac OS X:

1. Install Xcode (and then the Xcode development tools).
2. Install [MacPorts][].
3. Use MacPorts to install all the Python libraries you need, including NLTK.
4. There is no step four.

### First, Xcode

Before you do anything else, you will need to open the [App Store][] and download Xcode, Apple’s Developer’s Suite for creating OS X applications.

I’m not entirely clear what there is in Xcode or what Xcode installs that is needed for package managers like MacPorts, but MacPorts requires it, so go do it … by the way, I think the new way this happens is that the App Store will install an application that you will find in the Applications folder and that you have to click on to install Xcode. (Again, I don’t know why that is.)

As I note in an updated version of this [list of instructions][list], there is now supposed to be a shortcut from the command line to install what you need, but I have had various results and I have received reports of others having the same, various, results. Until this gets cleared up, my best recommendation is to install Xcode the usual way, and then to proceed as MacPorts directs you to.

### Install MacPorts

Download the [Mac OS X Package `pkg` Installer](http://www.macports.org/install.php) and step through the GUI install.

MacPorts should, as part of the install process, run `sudo port selfupdate -v` but you can always run it again. You know, just to make yourself feel better.

### Use MacPorts to Install Python and the Libraries You Need for NLTK

Now you’ll need to install a version of Python. In my case, I am building a setup around Python 2.7, and so I entered `sudo port -v install python27`. The `-v` option gives you a verbose description of what’s happening. Be prepared to watch a lot of stuff scroll by. (If you’d rather not see all that and having the machine quietly do its thing, you can leave the `-v` off. Good for you for having quiet confidence in your Mac.)

How did I know to type in `python27` and not just `python`? Good question. MacPorts gives you some nice functionality with its `search` feature, which you can use to find MacPort portfiles. If you type in:

port search python

A whole lot of stuff is going to fly by, but you can scroll to the middle of the list to see all the versions of Python that are available for you to install. As of this writing, you can install everything from Python 2.4 to Python 3.1.

By the way, perhaps you are savvy enough to know that your Mac already comes with Python (and Perl and Ruby and PHP and goodness knows what else) already installed. Yes, it does, but the consensus seems to be that leave those system installations well enough alone. If you screw up Python, you want it to be an instance the system doesn’t need. Don’t worry. I’m terrible at this coding business and I’ve yet to screw up Python. Python is very good at telling you that you’ve screwed and, thank you very much, it won’t follow your orders down the path to electronic perdition.

We are going to install the most recent version of Python 2, which is Python 2.7. (As of this writing, Python 3 is still considered a draft version of Python — it’s complicated beyond my ability to explain. It’s just as well, as we will see in a minute, the NLTK support only runs up to Python 2.7.

port install python27

Once that’s done, and I should remind you that depending upon your connection speed and the size of any particular portfile, this could take a while, you will want to make it so that your computer turns to your nice custom install of Python and not the one that came with the system. I usually accomplish this by editing my `.bash_profile`, but this did not work for me. Luckily, MacPorts has the solution:

sudo port install python_select

Once you’ve done this, enter `sudo port select –set python python27` and you’re done with your base installation of Python. Now it’s time to process some natural languages, or process languages naturally, or … you know, whatever it is we are going to do with the NLTK.

### Install NLTK, But First Some Dependencies

In all honesty, here is where the magic really happens. I know that sounds weird, especially when we are talking about something that happens at the command line, but, honestly, MacPorts, makes the rest of this so easy that you might just wonder, I tell you, why everyone prefers clanging around with GUI installer packages that you have to go find, download, open, and click on.

Anytime you want to install anything using MacPorts, the best place to start is to see if a *portfile* is available. (Else you can’t.) Using the new-found power of the `port` command, this is quite simple:

port search nltk

Right? We want to see if the NLTK is available as a `portfile` and we want to see what, if any, versions are available. Here is what the search turns up:

py-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py24-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py25-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py26-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py27-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

Found 5 ports.

Great! There’s a version of the NLTK available that matches the version of Python we just installed. But before you install the NLTK, you will want to know what other Python modules it requires. (Okay, sidebar here, this is really something MacPorts does for you, but I can’t help being a bit of a control freak and wanting to take care of this myself.) Again, MacPorts has your back. Enter:

port deps py27-nltk

And MacPorts reports back:

Full Name: py27-nltk @2.0.1rc1_0
Library Dependencies: py27-numpy, py27-yaml, py27-matplotlib

Now all you have to do is `port install` and then add each of those. While you’re at it, you might as well add `py27-scipy` to your list. Don’t ask; just do it.

### Conclusion

That’s it. You’re done. Fire up IDLE, `import nltk`, and you can can do an amazing assortment of things.

If you like, you can show your love for the makers of the NLTK and buy their book. It’s available both at [O’Reilly][] in a variety of formats and at [Amazon][]. Personally, I find the [NLTK Cookbook][] a little less helpful, but this could be purely a matter of a cognitive style mismatch. It happens. It’s not the author’s fault. It’s me. Really.

### Updates

The first version of this post detailed how to “get up and running” using the HomeBrew package manager, but after running into a number of difficulties, I discovered that something about the HomeBrew setup just doesn’t work. I updated this post to point to the now preferred use of MacPorts, but this post continues to be the most popular. Since I believe it’s turning up in search engines because of the obviousness of the post title, and because I don’t want to send people down the path of pain, I have removed the HomeBrew directions and replaced them with a much more detailed version of the MacPorts installation path. If you would like to try HomeBrew, please drop me a note — my contact info is on the [About][] page — and I’ll be glad to send you directions.

[list]: http://johnlaudun.org/20131229-the-complete-python-for-text-analysis/
[MacPorts]: http://macports.org/
[App Store]: https://itunes.apple.com/us/app/xcode/id497799835
[O’Reilly]: http://shop.oreilly.com/product/9780596516499.do
[Amazon]: http://www.amazon.com/gp/product/0596516495/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=0596516495&linkCode=as2&tag=johnlaudun-20
[NLTK Cookbook]: http://www.amazon.com/gp/product/1849513600/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=1849513600&linkCode=as2&tag=johnlaudun-20
[About]: http://johnlaudun.org/about/

Change Order of PATH Entries in Mac OS X

Homebrew prefers that `/usr/local/bin` get seen first so that `brew installed` programs will get used before native programs. You can change this in `.bash_profile`. The easiest way to edit .bash_profile is to open it in a command line editor — the dot before the file name means that it is normally hidden from view. You can “see” it if you enter `ls -a` at the command line, or, if you do some freaky things with the Finder. (I figure that when I’m in the Finder, I’m in one mode and shouldn’t play with things like hidden files. When I’m at the command line, then I’m in maker mode and nothing is invisible!) I use `vi` (because `emacs` scares me):

% vi .bash_profile

Depending upon anything you’ve done before or what’s been done before you, this file may or may not exist or it may or may not have anything in it. If it doesn’t exist, congratulations, you are in the process of creating it. If it does exist, you are now editing it.

My profile looks like this:

# Bash Profile

# PROMPT
PS1=”[\W]% ”

# Homebrew
export PATH=/usr/local/bin:$PATH
export PATH=”$PATH:/usrs/local/Cellar”
export PATH=/usr/local/share/python:$PATH

# Blogpost
export PATH=”$PATH:/Users/john/local/blogpost”
export PATH=”$PATH:/Users/john/local/asciidoc”

# Emacs
alias emacs=’/usr/local/Cellar/emacs/23.3b/bin/emacs’

# ALIASES
alias Learn=’cd Dropbox/personal/programming/learn’

Being a humanist type, I have to name my file, and that’s what you see on the first line. That’s followed by my preference for how I want my prompt to look. Next up is my adjustments to my PATH in order to take advantage of my brewed installations. Everything else should be equally self-explanatory.

You need to log out of your terminal session and then back in in order to enjoy the fruits of your PATH labors.

I did see a note from someone about how changing `/etc/path` might be better, but from some comments on StackOverflow, it seems like that might be a bad idea. A very bad idea. 

Adjusting Your PATH

In a previous post on getting NLTK up and running on Mac OS X, I mentioned that once you install a separate version of Python, with which to play and to work, you need to adjust your PATH. Doing so, directs your computer to use the newly-installed version over the version that comes with Mac OS X. This kind of direction, of indicating to the operating system which programs to use, is known as adjusting the PATH.

When you run a command from a UNIX or UNIX-like shell, the shell looks for the executable file using the directories listed in your PATH variable as a map. Your PATH variables are really just a part of your shell profile. Nothing more than one part of a larger list. For the record, my .bash_profile file looks like this:

% more .bash_profile
# Bash Profile

# PROMPT
    PS1='\W % '
    PS2='$ '


# MacPorts
# export PATH=/opt/local/bin:/opt/local/sbin:$PATH

# Emacs
alias emacs='/usr/local/Cellar/emacs/23.3b/bin/emacs'

# ALIASES
alias Learn='cd Dropbox/personal/programming/learn'
alias Research='cd Dropbox/research'

    # Set architecture flags
    export ARCHFLAGS="-arch x86_64"

I’ve included the more command that I typed to look at my .bash_profile, but apart from that this is the entire file. Reading it, you’ll see:

  • That I’m goofy enough to name the file: note that this line and everything else that isn’t something I want the shell to act upon is “commented out” with a hash, #, at the beginning of the line.
  • Next I have the customization for how I prefer my prompt to look: here it’s simply the current working directory.
  • Next is my obsolete MacPorts variable information. (I should probably get rid of that.)
  • Then there’s a list of PATH variables to enable the system to find things I have installed using Homebrew, including Python, as described in my previous post.
  • Then there’s a few more variables for apps I use.
  • And finally the aliases I use for directories in which I often work and don’t feel like re-navigating the file system.

That’s it. That’s all there is to PATH. If you are using Mac OS X, it’s a good bet that Bash is the default shell and that the file you need to edit is .bash_profile. The best way to do that, in all honesty, is to use a simple CLI (command line interface) editor like vi — there’s also nano and joe and emacs to be sure. Because of the dot at the beginning of its name, .bash_profile is not normally viewable in the Finder. You can search and find the terminal command that changes that, but, to be honest, there are an awful lot of little hidden files that I just don’t want to have to deal with on a daily basis. When I want to work with hidden files, I can find them through ls -a (list –all files) and then edit them while in the terminal.