MacPorts Weirdness: NLTK for Python 3 Depends on Python 2

File under NLTK dependency depends on Python 2 but there is a workaround to keep everything Python 3.

This is probably one of those things that leads my colleague [Jonathan Goodwin][] to roll his eyes when treating in Pythonic waters: while re-installing MacPorts, after upgrading to Mac OS X El Capitan, I was going through [my Python roll call][] — `numpy`, `scipy`, `nltk`, `pandas`, etc — when I noticed that `py34-nltk` was installing Python 2. Here’s what I saw scroll by:

—> Installing python2_select @0.0_1
—> Activating python2_select @0.0_1
—> Cleaning python2_select
—> Fetching archive for python27

That didn’t seem right, so I looked into the list of dependencies (which is a long list but I’ll repeat it here):

Dependencies to be installed: py34-matplotlib freetype libpng pkgconfig
py34-cairo cairo fontconfig glib2 libpixman xorg-libXext autoconf automake
libtool xorg-libX11 xorg-bigreqsproto xorg-inputproto xorg-kbproto xorg-libXau
xorg-xproto xorg-libXdmcp xorg-libxcb python27 db48 python2_select
xorg-libpthread-stubs xorg-xcb-proto libxml2 xorg-util-macros xorg-xcmiscproto
xorg-xextproto xorg-xf86bigfontproto xorg-xtrans xorg-xcb-util xrender
xorg-renderproto py34-cycler py34-six py34-dateutil py34-tz py34-parsing
py34-pyobjc-cocoa py34-pyobjc py34-py2app py34-macholib py34-modulegraph
py34-altgraph py34-tkinter tk Xft2 tcl xorg-libXScrnSaver xorg-scrnsaverproto
py34-tornado py34-backports_abc py34-certifi qhull cmake curl curl-ca-bundle
perl5 perl5.16 gdbm libarchive lzo2 py34-yaml libyaml

Buried in there are:

xorg-libxcb python27 db48 python2_select

I submitted this as a [bug at MacPorts][], and I got the following really interesting reply:

> py34-tkinter which depends on tk which depends on Xft2 which depends on xrender which depends on xorg-libX11 which depends on xorg-libxcb which depends on xorg-xcb-proto which depends on python27 which depends on python2_select. This is not a bug. If you want xorg-xcb-proto to use python34 instead, install it with its +python34 variant:

sudo port install xorg-xcb-proto +python34

> More generally, if you always want to use a +python34 in any port, if available, put “+python34” into your variants.conf file.
>
> Not all ports that use python offer a +python34 variant. If you find one that doesn’t, you can request one be added by filing a ticket.

Thanks, ryandesign.

[Jonathan Goodwin]: http://jgoodwin.net
[my Python roll call]: http://johnlaudun.org/20121230-macports-for-nltk/
[bug at MacPorts]: https://trac.macports.org/ticket/49970

Really, Too Easy

iPython Notebook and the Python Natural Language Toolkit are, I think, spoiling me. Not only does the iPython notebook make it easy to write code and to make notes about writing it — which helps a noob like me document his many (many) mistakes, but when I need to download something for the NLTK, up pops a GUI window to make it easy to select what to install:

NLTK and Stopwords

I spent some time this morning playing with various features of the Python NLTK, trying to think about how much, if any, I wanted to use it with my freshmen. (More on this in a moment.) I loaded in a short story text that we have read, and running it through various functions that the NLTK makes possible when I ran into a hiccup:

[code lang=text]
>>> text.collocations()
Building collocations list
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/nltk/text.py", line 341, in collocations
ignored_words = stopwords.words('english')
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/nltk/corpus/util.py", line 68, in __getattr__
self.__load()
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/nltk/corpus/util.py", line 56, in __load
except LookupError: raise e
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download().
Searched in:
– '/usr/share/nltk'
– '/Users/john/nltk_data'
– '/usr/share/nltk_data'
– '/usr/local/share/nltk_data'
– '/usr/lib/nltk_data'
– '/usr/local/lib/nltk_data'
**********************************************************************
[/code]

Now, the nice thing is that all you have to do is follow the directions, entering nltk.download() in the IDLE prompt, and you get:

[code lang=text]
showing info http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml
[/code]

which provides the following window:

Screen Shot 2013-01-26 at 09.08.15

Clicking on the Corpora tab and scrolling down allows you to download the stopword list:

Screen Shot 2013-01-26 at 09.08.43

What I have not yet figured out is how to specify your own stopword list. Part of what I want to teach any of my students is that choosing what words are important and what words are not are a matter of subject matter expertise and thus something they should not turn over to someone else to do.

Joy = MacPorts (Python + Numpy + Scipy + Matplotlib + NLTK)

This is the TL:DR version of my previous post.

After installing [MacPorts](http://www.macports.org/install.php) via the package installer, open a terminal session and enter the following:

% sudo port selfupdate
% sudo port install python27
% sudo port install py27-numpy
% sudo port install py27-scipy
% sudo port install py27-matplotlib
% sudo port install py27-nltk
% sudo port install python_select
% sudo port select –set python python27

By the way, once I did all this. I was able to run a Python script that relied on `matplotlib` to run. *Sweet.*