milo + datamining 6
Three New Tools Bring Machine Learning Insights to the Masses
february 2012 by milo
Over the past few years, machine learning has quickly become the "secret sauce" of large-scale web sites. Machine learning systems have historically been hand-crafted by the small armies of computer science and mathematics Ph.D.s in employ at places like Google. With the growing popularity of machine learning and other statistical techniques, the demand for so-called "data scientists" (software developers and analysts with the skill to apply statistical techniques to large data sets) has exploded since 2010.
As a result, these rarefied skills have become extremely difficult to find and expensive to retain, driving up the cost of machine learning systems and making it difficult for enterprises and smaller web firms to apply the technology. In the data scientist talent shortage is opportunity, however, and a new breed of software platform is rising to meet this need. Building upon the low-level big data infrastructure now available, these new platforms seek to democratize machine learning and advanced analytics, making their benefits available to enterprises and firms who either can't afford or can't find enough PhDs and data scientists. The first of this coming wave of machine learning-powered platforms is launching at this week's O'Reilly Strata conference. Here are three companies leading the way.
datamining
learning
rww
As a result, these rarefied skills have become extremely difficult to find and expensive to retain, driving up the cost of machine learning systems and making it difficult for enterprises and smaller web firms to apply the technology. In the data scientist talent shortage is opportunity, however, and a new breed of software platform is rising to meet this need. Building upon the low-level big data infrastructure now available, these new platforms seek to democratize machine learning and advanced analytics, making their benefits available to enterprises and firms who either can't afford or can't find enough PhDs and data scientists. The first of this coming wave of machine learning-powered platforms is launching at this week's O'Reilly Strata conference. Here are three companies leading the way.
february 2012 by milo
Hilary Mason Wants To Get You Started With Big Data
january 2012 by milo
Mason outlined in a series of workshops the tools you need to get started with manipulating Big Data and understanding the basics of machine learning, something she does everyday as she sifts through each one of those shortened URLs that we all create furiously. (You can read about her latest revelation here which we wrote about earlier in the month.) You know when she says, "this is a hard problem" that she is really saying "this is a problem that I haven't yet figured out the best answer to." To each problem, her credo is Obtain, Scrub, Explore, Model, and Interpret. I'll review each of these steps.
The first step is setting up a proper environment, and for Mason it is a Linux machine with a variety of tools on it that you can find on her Github page linked above. She is a Python programmer, and so this reflects that interest. She uses Python with JSONview's Chrome extension, NLTK, numpy, Pycluster, hcluster, and mathplotlib. You can use most of these tools on other OSs too.
data
dataanalysis
bigdata
rww
howto
tips
workflow
Journalismus
datamining
The first step is setting up a proper environment, and for Mason it is a Linux machine with a variety of tools on it that you can find on her Github page linked above. She is a Python programmer, and so this reflects that interest. She uses Python with JSONview's Chrome extension, NLTK, numpy, Pycluster, hcluster, and mathplotlib. You can use most of these tools on other OSs too.
january 2012 by milo
Copy this bookmark: