The following set of commands assume that you begin with a Mac OS X that does not have any of the necessities already installed. You can, thus, skip anything you have already done, e.g., if you have already installed Xcode, skip to Step 2.
Step 1: Install the Xcode development and command line tool environment. You’ll have to get Xcode from the Mac App Store. Supposedly, you can avoid this by simply installing the command line tools (see command below), but I have come across at least on instance where it seemed like I needed to go inside Xocde itself and download and install things from within preferences. (This was the old way of doing it.) Here’s the terminal command to install the Command Line Tools (a bit redundant isn’t it?):
Nota bene: I continue to see warnings when installing Python and its modules when I have not installed the complete Xcode from the App Store. They look like this:
Warning: xcodebuild exists but failed to execute
Warning: Xcode does not appear to be installed; most ports will likely fail to build.
I am installing the complete setup now on another machine, I will update this post if anything is borked.
Step 2: Install MacPorts.
If, like me, you have recently upgraded your operating system and things are borked, then you need to clean out the old installation(s). This means downloading the installer and running it like you did when you were young. It’s still fast and easy. The uninstallation is also fast and easy. Cleaning, however, takes some time. The steps below first document what you have installed before working you clean everything out:
port -qv installed > myports.txt
sudo port -f uninstall installed
sudo port clean all
You can use the myports document as your list. (The migration page at MacPorts does have a way to automate the re-installation process using this document. Try it, if you like.)
At any rate, once you have MacPorts installed, pretty much everything else you need is going to be found and then installed via
port search and then
Step 3: Now you can start installing the stuff you want to install, like [Python 2.7][python]:
sudo port selfupdate
sudo port install python27
sudo port install python_select
sudo port select --set python python27
Step 4: Install everything needed for the NLTK — numpy, scipy, and matplotlib:
sudo port install py27-numpy
sudo port install py27-scipy
sudo port install py27-matplotlib
sudo port install py27-nltk
At this point, if you are only interested in NLP (natural language processing), you are done.
Optional: If you are going to pull anything from websites, then you can make your life easier by getting Beautiful Soup, which parses HTML for you:
sudo port install py27-beautifulsoup4
(Check for versions, as it may have incremented up.)
Step 5: If, however, you are also interested in network analysis as well as topic modeling and other forms of “big” data analysis, you can also install three Python modules built to do so — NetworkX, Gensim, and pandas:
sudo port install py27-networkx
sudo port install py27-gensim
sudo port install py27-pandas
Step 6: You have a pretty powerful analytical toolkit now at your disposal. If yo would like to make the user interface a bit more “friendly,” let me suggest that you also install
iPython, an interactive Python interpreter, and, the best thing since someone sliced something in order to serve it the iPython notebook:
sudo port install py27-ipython
port select --set ipython ipython27
Then, the iPython notebook components:
sudo port install py27-jinja2
sudo port install py27-sphinx
sudo port install py27-zmq
sudo port install py27-pygments
sudo port install py27-tornado
sudo port install py27-nose
sudo port install py27-readline
I can’t tell you what a joy iPython notebooks are to use: you can copy complete scripts into a code cell and get results by simply hitting SHIFT + ENTER. And everything is captured for you in a space where you can also make notes on what you are doing, or, in my case, trying to do, in markdown. Everything is saved to a modified JSON file with the extension
ipynb. Even better, you can transform the file, using the
nbconvert utility, into HTML or LaTeX or PDF. It is very, very, nice.
Options: if you want that LaTeX option for
nbconvert to work, you are going to need a functional TeX installation:
sudo port install texlive-latex
Nota bene: In my experience, any TeX installation is big, so if you are in a hurry, either open up another terminal window (or tab), do something in the GUI, or go fix yourself a cup of coffee. It’s going to take a while, and unless staring at the installation log as it scrolls by is your thing, and, hey, it could be, I suggest you let the code take its course and get some other things done.
And, if you need to convert scanned documents into text, the open source OCR application Tesseract is available:
sudo port install tesseract
You’ll need to install your preferred languages, in my case:
sudo port install tesseract-eng
See this search for
tesseract for all the languages available.
Afterword: There is also, sigh!, a machine learning module for python called SciKit that does all kinds of things that at this moment in time both excites me and makes my head hurt.