Complete Python for Scientific Computing Cheatsheet

Here’s everything you need to do, in the order you need to do it, using MacPorts as your basis. Please note this assumes that everything you need to do is in the most recent version of Python, which as of this date is Python 3.4.

First, install Xcode. (Workaround in the offing.)

Second, install Xcode command-line tools. First, this:

xcode-select –install

And, then, you’ll need to do this:

sudo xcodebuild -license

Third, download and install the MacPorts base package.

Fourth, once the base package is installed, run:

sudo port selfupdate

Now, we need to install Python and the various libraries. The basic setup is:

sudo port install python34
sudo port install py34-numpy
sudo port install py34-scipy
sudo port install py34-matplotlib
sudo port install py34-pandas
sudo port select –set python python34

If you would like the option of using *Jupyter* notebooks:

sudo port install py34-ipython
sudo port select –set ipython py34-ipython
sudo port install py34-jupyter

If you’re interested in doing text analytics, then you’ll probably find the following libraries useful. (Please note that the first line below is a workaround to keep NLTK from installing Python 2.)

sudo port install xorg-xcb-proto +python34
sudo port install py34-nltk

If you would like to add R to your arsenal of weapons and to have it work within Jupyter notebook:

sudo port install R
sudo port install py34-zmq

Then, in R, using sudo:

install.packages(c(‘rzmq’,’repr’,’IRkernel’,’IRdisplay’),
repos = c(‘http://irkernel.github.io/’,
getOption(‘repos’)))

IRkernel::installspec()

MacPorts Weirdness: NLTK for Python 3 Depends on Python 2

File under NLTK dependency depends on Python 2 but there is a workaround to keep everything Python 3.

This is probably one of those things that leads my colleague [Jonathan Goodwin][] to roll his eyes when treating in Pythonic waters: while re-installing MacPorts, after upgrading to Mac OS X El Capitan, I was going through [my Python roll call][] — `numpy`, `scipy`, `nltk`, `pandas`, etc — when I noticed that `py34-nltk` was installing Python 2. Here’s what I saw scroll by:

—> Installing python2_select @0.0_1
—> Activating python2_select @0.0_1
—> Cleaning python2_select
—> Fetching archive for python27

That didn’t seem right, so I looked into the list of dependencies (which is a long list but I’ll repeat it here):

Dependencies to be installed: py34-matplotlib freetype libpng pkgconfig
py34-cairo cairo fontconfig glib2 libpixman xorg-libXext autoconf automake
libtool xorg-libX11 xorg-bigreqsproto xorg-inputproto xorg-kbproto xorg-libXau
xorg-xproto xorg-libXdmcp xorg-libxcb python27 db48 python2_select
xorg-libpthread-stubs xorg-xcb-proto libxml2 xorg-util-macros xorg-xcmiscproto
xorg-xextproto xorg-xf86bigfontproto xorg-xtrans xorg-xcb-util xrender
xorg-renderproto py34-cycler py34-six py34-dateutil py34-tz py34-parsing
py34-pyobjc-cocoa py34-pyobjc py34-py2app py34-macholib py34-modulegraph
py34-altgraph py34-tkinter tk Xft2 tcl xorg-libXScrnSaver xorg-scrnsaverproto
py34-tornado py34-backports_abc py34-certifi qhull cmake curl curl-ca-bundle
perl5 perl5.16 gdbm libarchive lzo2 py34-yaml libyaml

Buried in there are:

xorg-libxcb python27 db48 python2_select

I submitted this as a [bug at MacPorts][], and I got the following really interesting reply:

> py34-tkinter which depends on tk which depends on Xft2 which depends on xrender which depends on xorg-libX11 which depends on xorg-libxcb which depends on xorg-xcb-proto which depends on python27 which depends on python2_select. This is not a bug. If you want xorg-xcb-proto to use python34 instead, install it with its +python34 variant:

sudo port install xorg-xcb-proto +python34

> More generally, if you always want to use a +python34 in any port, if available, put “+python34” into your variants.conf file.
>
> Not all ports that use python offer a +python34 variant. If you find one that doesn’t, you can request one be added by filing a ticket.

Thanks, ryandesign.

[Jonathan Goodwin]: http://jgoodwin.net
[my Python roll call]: http://johnlaudun.org/20121230-macports-for-nltk/
[bug at MacPorts]: https://trac.macports.org/ticket/49970

Trying Out Indico’s “plotlines”

Running parallel to Jockers’ attempts to “plot” texts via sentiment analysis, [Indico Data Solutions][] has released a Python package `plotlines` as well as a [Jupyter notebook][] of documentation and sample code.

Neither *indico* nor *plotlines* turned up in a `port search` so my next step was to try `pip`. My first attempt revealed that I was still using the Python 2.7 version of `pip`, and I needed both to get the version for Python 3.4 but also make sure it was the active version:

sudo port install py34-pip
sudo port select — pip pip34

And, then, to the matter at hand:

sudo pip install -U indicoio

*Success!*

[Indico Data Solutions]: https://indico.io
[Jupyter notebook]: https://indico.io/blog/plotlines/

Moving to Python 3

The time has come. All the libraries I use on a regular basis are available for Python 3.4. I just spent a frighteningly short amount of time with [MacPorts][] installing everything I need and then making sure that Python 3.4 and iPython 3.4 are the defaults. Here’s all that I had to type:

% sudo port selfupdate
% sudo port install python34
% sudo port install py34-numpy
% sudo port install py34-scipy
% sudo port install py34-nltk
% sudo port install py34-pandas
% sudo port install ipython34
% sudo port select –set python python3
% sudo port select –set ipython ipython3

And then, because I’m working on something for [The Programming Historian][] and it’s just easier to do everything in an iPython notebook:

% ipython notebook

Later, all I will need to do is:

% ipython nbconvert –to markdown myFile.ipynb

[MacPorts]: https://www.macports.org
[The Programming Historian]: http://programminghistorian.org

Install R with MacPorts

If you are looking to install R using MacPorts, `port search R` will return 17844 possibilities, only one of which is R itself and three of which are related to R. If you use `port help search` you will see that there is a better way to search:

port search –exact R
R @3.1.0_1 (math, science)
R is GNU S – an interpreted language for statistical computing

The you need to do nothing more than:

sudo port install R

MacPorts will determine the dependencies:

—> Computing dependencies for R
—> Dependencies to be installed: gcc48 icu jpeg pango Xft2
gobject-introspection libtool harfbuzz graphite2 pkgconfig
readline tiff xorg-libXt xorg-libsm xorg-libice

After that, if you are using Matthew Jockers’ excellent _Text Analysis With R
for Students of Literature_, you should install [RStudio][]. (I don’t find it as useful as iPython’s notebook, but it is a handy all-in-one GUI.)

[RStudio]: http://www.rstudio.com/products/rstudio/

The Complete Python for Text Analysis

The following set of commands assume that you begin with a Mac OS X that does not have any of the necessities already installed. You can, thus, skip anything you have already done, e.g., if you have already installed Xcode, skip to Step 2.

Step 1: Install the Xcode development and command line tool environment. You’ll have to get Xcode from the Mac App Store. Supposedly, you can avoid this by simply installing the command line tools (see command below), but I have come across at least on instance where it seemed like I needed to go inside Xocde itself and download and install things from within preferences. (This was the old way of doing it.) Here’s the terminal command to install the Command Line Tools (a bit redundant isn’t it?):

xcode-select --install

Nota bene: I continue to see warnings when installing Python and its modules when I have not installed the complete Xcode from the App Store. They look like this:

Warning: xcodebuild exists but failed to execute
Warning: Xcode does not appear to be installed; most ports will likely fail to build.

I am installing the complete setup now on another machine, I will update this post if anything is borked.

Step 2: Install MacPorts.

If, like me, you have recently upgraded your operating system and things are borked, then you need to clean out the old installation(s). This means downloading the installer and running it like you did when you were young. It’s still fast and easy. The uninstallation is also fast and easy. Cleaning, however, takes some time. The steps below first document what you have installed before working you clean everything out:

port -qv installed > myports.txt  
sudo port -f uninstall installed  
sudo port clean all  

You can use the myports document as your list. (The migration page at MacPorts does have a way to automate the re-installation process using this document. Try it, if you like.)

At any rate, once you have MacPorts installed, pretty much everything else you need is going to be found and then installed via port search and then port install.

Step 3: Now you can start installing the stuff you want to install, like [Python 2.7]:

sudo port selfupdate  
sudo port install python27  
sudo port install python_select  
sudo port select --set python python27  

Step 4: Install everything needed for the NLTKnumpy, scipy, and matplotlib:

sudo port install py27-numpy  
sudo port install py27-scipy  
sudo port install py27-matplotlib  
sudo port install py27-nltk  

At this point, if you are only interested in NLP (natural language processing), you are done.

Optional: If you are going to pull anything from websites, then you can make your life easier by getting Beautiful Soup, which parses HTML for you:

sudo port install py27-beautifulsoup4

(Check for versions, as it may have incremented up.)

Step 5: If, however, you are also interested in network analysis as well as topic modeling and other forms of “big” data analysis, you can also install three Python modules built to do so — NetworkX, Gensim, and pandas:

sudo port install py27-networkx
sudo port install py27-gensim
sudo port install py27-pandas

Step 6: You have a pretty powerful analytical toolkit now at your disposal. If yo would like to make the user interface a bit more “friendly,” let me suggest that you also install iPython, an interactive Python interpreter, and, the best thing since someone sliced something in order to serve it the iPython notebook:

First, iPython:

sudo port install py27-ipython  
port select --set ipython ipython27  

Then, the iPython notebook components:

sudo port install py27-jinja2  
sudo port install py27-sphinx  
sudo port install py27-zmq  
sudo port install py27-pygments  
sudo port install py27-tornado  
sudo port install py27-nose  
sudo port install py27-readline  

I can’t tell you what a joy iPython notebooks are to use: you can copy complete scripts into a code cell and get results by simply hitting SHIFT + ENTER. And everything is captured for you in a space where you can also make notes on what you are doing, or, in my case, trying to do, in markdown. Everything is saved to a modified JSON file with the extension ipynb. Even better, you can transform the file, using the nbconvert utility, into HTML or LaTeX or PDF. It is very, very, nice.

Options: if you want that LaTeX option for nbconvert to work, you are going to need a functional TeX installation:

sudo port install texlive-latex

Nota bene: In my experience, any TeX installation is big, so if you are in a hurry, either open up another terminal window (or tab), do something in the GUI, or go fix yourself a cup of coffee. It’s going to take a while, and unless staring at the installation log as it scrolls by is your thing, and, hey, it could be, I suggest you let the code take its course and get some other things done.

And, if you need to convert scanned documents into text, the open source OCR application Tesseract is available:

sudo port install tesseract

You’ll need to install your preferred languages, in my case:

sudo port install tesseract-eng

See this search for tesseract for all the languages available.

Afterword: There is also, sigh!, a machine learning module for python called SciKit that does all kinds of things that at this moment in time both excites me and makes my head hurt.

: http://docs.python.org/2/

Setting up iPython Notebook on Mac OS X

*Woo hoo!* I have the iPython notebook up and running and it actually runs in Safari just fine. (I had Chrome open, but Safari was where the window opened.) I don’t know if the MacPorts version of iPython simply doesn’t know all the dependencies, but here is a complete list of what you will need to `port search` for and then `sudo port install`:

* `jinja2`, needed for the notebook
* `sphinx`, needed for nbconvert
* `zmq`, needed for IPython’s parallel computing features, qt console and notebook
* `pygments`, used by nbconvert and the Qt console for syntax highlighting
* `tornado`, needed by the web-based notebook
* `nose`, used by the test suite
* `readline` (probably already installed), needed for the terminal

And then, of course, you’ll also need to install `ipython`, if you haven’t already, and do the `ipython_select` thing mentioned in a previous post. (Click the (http://johnlaudun.org/tag/python/) and you’ll see it listed.

With all that done, all you need to do is enter:

ipython notebook –pylab=inline

And a tab should open in your open browser that will slowly begin to collect stuff from your iPython session.

Eh, what’s that MacPorts?

Note to self, run `port upgrade outdated` more often. If you run it every few months, that’s a lot of stuff that needs updating. Also, I got this note:

XeTeX is built without support for Apple Type Services for Unicode Imaging
(ATSUI) or Apple Advanced Typography (AAT). To enable it, build texlive-bin with
the +atsui variant. Note that this will force texlive and all of its
dependencies to be built 32-bit.

Reason #35 to Like MacPorts

So the essay that Jonathan Goodwin and I wrote together using LDA topic modeling to explore the intellectual history of folklore studies is about to head into the _Journal of American Folklore_’s workflow and that means it has to get converted from LaTeX to Word. The way that conversion apparently works is:

LaTeX > HTML > Word

Fortunately, [someone has written a command line tool][], `latex2html`, that does the heavy lifting. And, thank you computing gods wherever, and whoever, ye may be, the tool is already in MacPorts. MacPorts makes it easy to find this out:

% port search latex2html
latex2html @2008 (print)
Convert LaTeX into HTML.

All that means is this:

% sudo port install latex2html

A whole lot of scrolled text later ends with:

—> Installing latex2html @2008_3
—> Activating latex2html @2008_3
—> Cleaning latex2html
—> Updating database of binaries: 100.0%
—> Scanning binaries for linking errors: 100.0%
—> No broken files found.

How do we use this tool?

> At this point, you’re ready to convert your files from LaTeX to HTML, and then possibly to Word. To invoke latex2html, switch to the directory with your .tex file, and latex2html filename. In its default state, latex2html will produce HTML that is broken up into multiple pages, usually one per section / subsection, much like the latex2html home page is. If you want to import your document into Word, you may wish to suppress this tendency. To do so, use the following command:

latex2html -split 0 -info 0 -no_navigation filename

> `-split 0` will make the entire LaTeX file into a single HTML page, while `-info 0` will remove the information bar at the bottom of the page and `-no_navigation` will remove the navigational menus on the top on bottom. This should produce a vanilla HTML file that Microsoft Word can read fairly easily.

> One thing to beware at this point: … Word will link to image files instead of including them in the document, which will mean that things like your equations will drop out if you send someone the .doc file without sending the image files as well. To fix this … go to Edit->Links, selecting all of the links in the dialog box, and clicking “Break Link”. Once that is done, save the file and the images will now be embedded into the document itself, ready for sending off to someone else.

`UPDATE: I just ran this, and it worked fine. And it was very fast.`

[someone has written a command line tool]: http://mildopinions.wordpress.com/2008/09/29/latex-to-html-and-word-with-latex2html-a-mini-tutorial-for-os-x-users/

MacPorts Cheat Sheet

Note to self: Macports needs a cheat sheet for people like me who can’t remember options and arguments precisely because Macports does such a good job of staying out of the way as you work. This should be done in LaTeX and made available on GitHub.

Joy = MacPorts (Python + Numpy + Scipy + Matplotlib + NLTK)

This is the TL:DR version of my previous post.

After installing [MacPorts](http://www.macports.org/install.php) via the package installer, open a terminal session and enter the following:

% sudo port selfupdate
% sudo port install python27
% sudo port install py27-numpy
% sudo port install py27-scipy
% sudo port install py27-matplotlib
% sudo port install py27-nltk
% sudo port install python_select
% sudo port select –set python python27

By the way, once I did all this. I was able to run a Python script that relied on `matplotlib` to run. *Sweet.*

MacPorts: The Key to Python Happiness

For those who want the TL;DR version which gives you all the commands you need to copy and paste into a terminal window, then it’s all here.

To do some of the work I do, I needed to have a working version of Python that included the numpy, scipy, and matplotlib libraries. I could not, however, get all these pieces to come together using homebrew. After trying a number of approaches from a variety of sources, I turned to StackOverflow for help. I got a response from tiago, who noted that “Homebrew and pip are great for minimalistic, pure python packages. But they stumble spectacularly with scipy or packages that require external non-python packages.” His advice was to turn, again, to MacPorts. (My first step was to un-install homebrew. After that, it was time to crank up the MacPorts assembly.)

Installing MacPorts

First, before you do anything else, you’ll need to make sure that you have Xcode’s command line tools installed. Installation is now as easy as typing the following in a terminal window:

xcode-select --install

You’ll get a GUI dialogue box, agree to the EULA, and then installation will happen. (And I believe software update / the App store will track updates for you.)

Second, Download the Mac OS X Package pkg Installer and step through the GUI install.

MacPorts should, as part of the install process, run sudo port selfupdate -v but you can always run it again. You know, just to make yourself feel better.

Third, you’ll need to install a version of Python. In my case, I am building a setup around Python 2.7, and so I entered sudo port -v install python27. The -v option gives you a verbose description of what’s happening. Be prepared to watch a lot of stuff scroll by. (If you’d rather not see all that and having the machine quietly do its thing, you can leave the -v off. Good for you for having quiet confidence in your Mac.)

MacPorts gives you some nice functionality with its search feature, which you can use to find MacPort portfiles. In my case, I wanted to start with numpy and so I entered port search numpy and got the following:

py-imread @0.2.5 (python, graphics)
    Reads images into numpy arrays

py-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py24-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py25-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py25-symeig @1.4 (python, science)
    Symeig - Symmetrical eigenvalue routines for NumPy.

py26-imread @0.2.5 (python, graphics)
    Reads images into numpy arrays

py26-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py26-scikits-audiolab @0.11.0 (python, science, audio)
    Audiolab is a python toolbox to read/write audio files from numpy arrays

py27-imread @0.2.5 (python, graphics)
    Reads images into numpy arrays

py27-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py31-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py32-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

py33-numpy @1.6.2 (python, math)
    The core utilities for the scientific library scipy for Python

Found 13 ports.`

That py27-numpy is the one I want, and so I entered sudo port install py27-numpy. More scrolling. Done. Repeat these steps for scipy and matplotlib and nltk.

Finally, a crucial step is to make it so that your setup turns to your nice custom install of Python and not the one that came with the system. I usually accomplish this by editing my .bash_profile, but this did not work for me. Luckily, MacPorts has the solution: sudo port install python_select. Once you’ve done this, enter sudo port select --set python python27 and you’re done.

MacPorts requires Xcode

It’s right there in the installation instructions, but somehow I managed to miss it. And that explains why I couldn’t get `git` properly installed and setup. Bit I do wish that `port` would tell you that at some point. After all, I can’t be the only idiot?

*Phew* A Google search for the *Error 77* code reveals that there are other idiots out there. One of my main goals in life has thus been achieved: I have learned that I am not alone.