“One popular misconception [about machine learning] is that people think they have enough data when they don’t. When people say machine learning, a very large segment of predictions are based on existing data. And in order for that to work, you generally have to have a big labeled set of data,” says Hillary Green-Lerman of Codecademy.
Emphasis on labeled.
“People often don’t realize how much of machine learning is getting data into a format so that you can feed it into an algorithm. The algorithms are actually usually available pre-baked,” Hillary said. “In a lot of ways, you need to know how to pick the best linear regression for your data, but you don’t really need to know the intricacies of how it’s programmed. You do need to work the data into a format where each row is a data point, the kind of thing you’d want to pick.
Bhavya Geethika Peddibhotla at _KD Nuggets_ has compiled a list of the [“Top 20 Python Machine Learning Open Source Projects.”](http://www.kdnuggets.com/2015/06/top-20-python-machine-learning-open-source-projects.html) At the top? `scikit-learn`, as evidenced in one of the graphics from the article:
![The Top 20 Open Source Machine Learning Projects in Python](http://www.kdnuggets.com/wp-content/uploads/top-python-machine-learning-projects1.jpg)
Machine learning, I am learning, covers a wide range of activities. To help me understand this arena, I have the following notes:
* Jason Brownlee maintains a reader-friendly blog on machine learning: [Machine Learning Mastery].
* Coursera offers several courses on machine learning: I’m taking [Andrew Ng’s introduction] now.
[Machine Learning Mastery]: http://machinelearningmastery.com/
[Andrew Ng’s introduction]: http://www.coursera.org/course/ml