OpenRefine

[Open Refine][] is a “tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases.” Link takes you to a page with lots of video tutorials. There is also Thomas Padilla’s [Getting Started with OpenRefine][].

[Open Refine]: http://openrefine.org
[Getting Started with OpenRefine]: http://thomaspadilla.org/dataprep/

Machine Learning Explosion

[Why the explosion in machine learning?][q] As always, major and minor reasons. Major reason? Data. Lots and lots of data, both because we human beings have put so much up ourselves, but also because businesses, and other organizations — Hello!, NSA! (Call when you’re ready to talk about how I can help!) — have collected so much. And that’s the minor reason right there, if one can consider it minor: organizations want to “learn” things from all this data.

[q]: http://www.quora.com/Machine-Learning/Why-are-we-experiencing-such-an-explosive-growth-of-machine-learning-and-its-applications-today-even-though-the-space-exists-for-more-than-3-4-decades

Data Mining vs Big Data

I was working on a post that outlines my own version of “Text Analytics 101” that I have been using in freshmen writing classes for the past three years, and I found myself considering, momentarily, the uses of “text mining” versus “text analytics” and “data mining” versus “big data.” I’m sure there are distinctions to be made between the two terms, but it’s also the case that terms map onto various disciplines/domains and or historical moments. A quick ngram search in Google, which is based on Google Books, produced the following graph:

Data Mining vs Big Data

Data Mining vs Big Data

A similar search for the first pair produced the following:

Text Mining vs Text Analytics

Text Mining vs Text Analytics

The only thing the two graphs suggest to me is that, possibly, the latter terms appear later and thus haven’t made it into paper. I would like to do a similar search of ngrams on the web, but I haven’t found the same simple interface for doing this kind of quick survey.