Labels

“One popular misconception [about machine learning] is that people think they have enough data when they don’t. When people say machine learning, a very large segment of predictions are based on existing data. And in order for that to work, you generally have to have a big labeled set of data,” says Hillary Green-Lerman of Codecademy.

Emphasis on labeled.

Later:

“People often don’t realize how much of machine learning is getting data into a format so that you can feed it into an algorithm. The algorithms are actually usually available pre-baked,” Hillary said. “In a lot of ways, you need to know how to pick the best linear regression for your data, but you don’t really need to know the intricacies of how it’s programmed. You do need to work the data into a format where each row is a data point, the kind of thing you’d want to pick.

Me a Data Scientist?

As things continue to deteriorate here in Louisiana, and it becomes increasingly obvious that what our administration wants from faculty, especially humanities faculty, is for us to become teaching bots, I find myself more and more interested in non-academic alternatives. And, the fact is that I really enjoy my current work on the small end of the big data revolution, or however it’s termed these days.

Mostly, it seems increasingly to be termed *data science*, but what people mean by that can vary. As I try to understand this emergent field, both from the removed position of a humanist just trying to track how ideas and practices play out in history as well as a humanist who maybe wants to play on those fields himself, I find myself looking at various data science programs. UC Berkeley’s School of Information offers a more traditional program, but there is also [Zipfian Academy](http://www.zipfianacademy.com/). They offer 12-week intensive programs and the possibility of some tuition relief. (And that sounds pretty good to a poor Southern humanist, or is a Southern humanist poor by definition?)