I just finished watching an O’Reilly webcast on statistics for NFL office pools. I don’t care much about football, unless it’s the other kind of football, but I was interested to see what pieces of Python the presenter, [Tanya Schlusser], was going to use: [pandas] and [scikit-learn]. Her presentation was pretty tense, but, luckily she made the code, including a Jupyter notebook, available on [GitHub]. *Thank you, Tanya!*
A couple of other things came up in the group chat that accompanied the presentation or in the presentation itself:
* [seaborn] is statistical data visualization library for Python.
* [statsmodels] “provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.”
* You can store models in `scikit-learn` in [pickles].
* I shouldn’t forget about [OpenRefine].
## Addendum ##
As regular readers of these notes know, installation of `scikit-learn` is as easy as:
% sudo port install py34-scikit-learn
What I didn’t know is that the installation of `seaborn` in MacPorts includes `statsmodels`:
% sudo port install py34-seaborn
—> Computing dependencies for py34-seaborn
—> Dependencies to be installed: py34-patsy py34-statsmodels
I didn’t know about `patsy`, here’s what its readme at GitHub says:
> Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. Patsy brings the convenience of R “formulas” to Python.
[Tanya Schlusser]: http://twitter.com/tanyaschlusser