Complete Python for Scientific Computing Cheatsheet

Here’s everything you need to do, in the order you need to do it, using MacPorts as your basis. Please note this assumes that everything you need to do is in the most recent version of Python, which as of this date is Python 3.4.

First, install Xcode. (Workaround in the offing.)

Second, install Xcode command-line tools. First, this:

xcode-select –install

And, then, you’ll need to do this:

sudo xcodebuild -license

Third, download and install the MacPorts base package.

Fourth, once the base package is installed, run:

sudo port selfupdate

Now, we need to install Python and the various libraries. The basic setup is:

sudo port install python34
sudo port install py34-numpy
sudo port install py34-scipy
sudo port install py34-matplotlib
sudo port install py34-pandas
sudo port select –set python python34

If you would like the option of using *Jupyter* notebooks:

sudo port install py34-ipython
sudo port select –set ipython py34-ipython
sudo port install py34-jupyter

If you’re interested in doing text analytics, then you’ll probably find the following libraries useful. (Please note that the first line below is a workaround to keep NLTK from installing Python 2.)

sudo port install xorg-xcb-proto +python34
sudo port install py34-nltk

If you would like to add R to your arsenal of weapons and to have it work within Jupyter notebook:

sudo port install R
sudo port install py34-zmq

Then, in R, using sudo:

install.packages(c(‘rzmq’,’repr’,’IRkernel’,’IRdisplay’),
repos = c(‘http://irkernel.github.io/’,
getOption(‘repos’)))

IRkernel::installspec()

MacPorts Weirdness: NLTK for Python 3 Depends on Python 2

File under NLTK dependency depends on Python 2 but there is a workaround to keep everything Python 3.

This is probably one of those things that leads my colleague [Jonathan Goodwin][] to roll his eyes when treating in Pythonic waters: while re-installing MacPorts, after upgrading to Mac OS X El Capitan, I was going through [my Python roll call][] — `numpy`, `scipy`, `nltk`, `pandas`, etc — when I noticed that `py34-nltk` was installing Python 2. Here’s what I saw scroll by:

—> Installing python2_select @0.0_1
—> Activating python2_select @0.0_1
—> Cleaning python2_select
—> Fetching archive for python27

That didn’t seem right, so I looked into the list of dependencies (which is a long list but I’ll repeat it here):

Dependencies to be installed: py34-matplotlib freetype libpng pkgconfig
py34-cairo cairo fontconfig glib2 libpixman xorg-libXext autoconf automake
libtool xorg-libX11 xorg-bigreqsproto xorg-inputproto xorg-kbproto xorg-libXau
xorg-xproto xorg-libXdmcp xorg-libxcb python27 db48 python2_select
xorg-libpthread-stubs xorg-xcb-proto libxml2 xorg-util-macros xorg-xcmiscproto
xorg-xextproto xorg-xf86bigfontproto xorg-xtrans xorg-xcb-util xrender
xorg-renderproto py34-cycler py34-six py34-dateutil py34-tz py34-parsing
py34-pyobjc-cocoa py34-pyobjc py34-py2app py34-macholib py34-modulegraph
py34-altgraph py34-tkinter tk Xft2 tcl xorg-libXScrnSaver xorg-scrnsaverproto
py34-tornado py34-backports_abc py34-certifi qhull cmake curl curl-ca-bundle
perl5 perl5.16 gdbm libarchive lzo2 py34-yaml libyaml

Buried in there are:

xorg-libxcb python27 db48 python2_select

I submitted this as a [bug at MacPorts][], and I got the following really interesting reply:

> py34-tkinter which depends on tk which depends on Xft2 which depends on xrender which depends on xorg-libX11 which depends on xorg-libxcb which depends on xorg-xcb-proto which depends on python27 which depends on python2_select. This is not a bug. If you want xorg-xcb-proto to use python34 instead, install it with its +python34 variant:

sudo port install xorg-xcb-proto +python34

> More generally, if you always want to use a +python34 in any port, if available, put “+python34” into your variants.conf file.
>
> Not all ports that use python offer a +python34 variant. If you find one that doesn’t, you can request one be added by filing a ticket.

Thanks, ryandesign.

[Jonathan Goodwin]: http://jgoodwin.net
[my Python roll call]: http://johnlaudun.org/20121230-macports-for-nltk/
[bug at MacPorts]: https://trac.macports.org/ticket/49970

Trying Out Indico’s “plotlines”

Running parallel to Jockers’ attempts to “plot” texts via sentiment analysis, [Indico Data Solutions][] has released a Python package `plotlines` as well as a [Jupyter notebook][] of documentation and sample code.

Neither *indico* nor *plotlines* turned up in a `port search` so my next step was to try `pip`. My first attempt revealed that I was still using the Python 2.7 version of `pip`, and I needed both to get the version for Python 3.4 but also make sure it was the active version:

sudo port install py34-pip
sudo port select — pip pip34

And, then, to the matter at hand:

sudo pip install -U indicoio

*Success!*

[Indico Data Solutions]: https://indico.io
[Jupyter notebook]: https://indico.io/blog/plotlines/

Moving to Python 3

The time has come. All the libraries I use on a regular basis are available for Python 3.4. I just spent a frighteningly short amount of time with [MacPorts][] installing everything I need and then making sure that Python 3.4 and iPython 3.4 are the defaults. Here’s all that I had to type:

% sudo port selfupdate
% sudo port install python34
% sudo port install py34-numpy
% sudo port install py34-scipy
% sudo port install py34-nltk
% sudo port install py34-pandas
% sudo port install ipython34
% sudo port select –set python python3
% sudo port select –set ipython ipython3

And then, because I’m working on something for [The Programming Historian][] and it’s just easier to do everything in an iPython notebook:

% ipython notebook

Later, all I will need to do is:

% ipython nbconvert –to markdown myFile.ipynb

[MacPorts]: https://www.macports.org
[The Programming Historian]: http://programminghistorian.org

Install R with MacPorts

If you are looking to install R using MacPorts, `port search R` will return 17844 possibilities, only one of which is R itself and three of which are related to R. If you use `port help search` you will see that there is a better way to search:

port search –exact R
R @3.1.0_1 (math, science)
R is GNU S – an interpreted language for statistical computing

The you need to do nothing more than:

sudo port install R

MacPorts will determine the dependencies:

—> Computing dependencies for R
—> Dependencies to be installed: gcc48 icu jpeg pango Xft2
gobject-introspection libtool harfbuzz graphite2 pkgconfig
readline tiff xorg-libXt xorg-libsm xorg-libice

After that, if you are using Matthew Jockers’ excellent _Text Analysis With R
for Students of Literature_, you should install [RStudio][]. (I don’t find it as useful as iPython’s notebook, but it is a handy all-in-one GUI.)

[RStudio]: http://www.rstudio.com/products/rstudio/

The Complete Python for Text Analysis

The following set of commands assume that you begin with a Mac OS X that does not have any of the necessities already installed. You can, thus, skip anything you have already done, e.g., if you have already installed Xcode, skip to Step 2.

Step 1: Install the Xcode development and command line tool environment. You’ll have to get Xcode from the Mac App Store. Supposedly, you can avoid this by simply installing the command line tools (see command below), but I have come across at least on instance where it seemed like I needed to go inside Xocde itself and download and install things from within preferences. (This was the old way of doing it.) Here’s the terminal command to install the Command Line Tools (a bit redundant isn’t it?):

xcode-select --install

Nota bene: I continue to see warnings when installing Python and its modules when I have not installed the complete Xcode from the App Store. They look like this:

Warning: xcodebuild exists but failed to execute
Warning: Xcode does not appear to be installed; most ports will likely fail to build.

I am installing the complete setup now on another machine, I will update this post if anything is borked.

Step 2: Install MacPorts.

If, like me, you have recently upgraded your operating system and things are borked, then you need to clean out the old installation(s). This means downloading the installer and running it like you did when you were young. It’s still fast and easy. The uninstallation is also fast and easy. Cleaning, however, takes some time. The steps below first document what you have installed before working you clean everything out:

port -qv installed > myports.txt  
sudo port -f uninstall installed  
sudo port clean all  

You can use the myports document as your list. (The migration page at MacPorts does have a way to automate the re-installation process using this document. Try it, if you like.)

At any rate, once you have MacPorts installed, pretty much everything else you need is going to be found and then installed via port search and then port install.

Step 3: Now you can start installing the stuff you want to install, like [Python 2.7][python]:

sudo port selfupdate  
sudo port install python27  
sudo port install python_select  
sudo port select --set python python27  

Step 4: Install everything needed for the NLTKnumpy, scipy, and matplotlib:

sudo port install py27-numpy  
sudo port install py27-scipy  
sudo port install py27-matplotlib  
sudo port install py27-nltk  

At this point, if you are only interested in NLP (natural language processing), you are done.

Optional: If you are going to pull anything from websites, then you can make your life easier by getting Beautiful Soup, which parses HTML for you:

sudo port install py27-beautifulsoup4

(Check for versions, as it may have incremented up.)

Step 5: If, however, you are also interested in network analysis as well as topic modeling and other forms of “big” data analysis, you can also install three Python modules built to do so — NetworkX, Gensim, and pandas:

sudo port install py27-networkx
sudo port install py27-gensim
sudo port install py27-pandas

Step 6: You have a pretty powerful analytical toolkit now at your disposal. If yo would like to make the user interface a bit more “friendly,” let me suggest that you also install iPython, an interactive Python interpreter, and, the best thing since someone sliced something in order to serve it the iPython notebook:

First, iPython:

sudo port install py27-ipython  
port select --set ipython ipython27  

Then, the iPython notebook components:

sudo port install py27-jinja2  
sudo port install py27-sphinx  
sudo port install py27-zmq  
sudo port install py27-pygments  
sudo port install py27-tornado  
sudo port install py27-nose  
sudo port install py27-readline  

I can’t tell you what a joy iPython notebooks are to use: you can copy complete scripts into a code cell and get results by simply hitting SHIFT + ENTER. And everything is captured for you in a space where you can also make notes on what you are doing, or, in my case, trying to do, in markdown. Everything is saved to a modified JSON file with the extension ipynb. Even better, you can transform the file, using the nbconvert utility, into HTML or LaTeX or PDF. It is very, very, nice.

Options: if you want that LaTeX option for nbconvert to work, you are going to need a functional TeX installation:

sudo port install texlive-latex

Nota bene: In my experience, any TeX installation is big, so if you are in a hurry, either open up another terminal window (or tab), do something in the GUI, or go fix yourself a cup of coffee. It’s going to take a while, and unless staring at the installation log as it scrolls by is your thing, and, hey, it could be, I suggest you let the code take its course and get some other things done.

And, if you need to convert scanned documents into text, the open source OCR application Tesseract is available:

sudo port install tesseract

You’ll need to install your preferred languages, in my case:

sudo port install tesseract-eng

See this search for tesseract for all the languages available.

Afterword: There is also, sigh!, a machine learning module for python called SciKit that does all kinds of things that at this moment in time both excites me and makes my head hurt.

[python]: http://docs.python.org/2/

Install Xcode CLT from the Terminal

The Command Line Developer Tools package required to run MacPorts can be installed on demand using:

xcode-select –install

The installed tools will be automatically updated using Software Update. Mac OS 10.9 is required for this feature.