rybesh + machinelearning 41
About Campaign 2012 in the Media | Project for Excellence in Journalism (PEJ)
17 hours ago by rybesh
To arrive at the results regarding the tone of coverage, PEJ employed computer coding software developed by Crimson Hexagon along with PEJ's traditional media research methods.
The technology for Crimson Hexagon is rooted in an algorithm created by Gary King, a professor at Harvard University's Institute for Quantitative Social Science. (Click here to view the study explaining the algorithm.)
According to Crimson Hexagon, the purpose of computer coding is to "take as data a potentially large set of text documents, of which a small subset is hand coded into an investigator-chosen set of mutually exclusive and exhaustive categories. As output, the methods give approximately unbiased and statistically consistent estimates of the proportion of all documents in each category."
news
textanalysis
sentiment
machinelearning
classification
The technology for Crimson Hexagon is rooted in an algorithm created by Gary King, a professor at Harvard University's Institute for Quantitative Social Science. (Click here to view the study explaining the algorithm.)
According to Crimson Hexagon, the purpose of computer coding is to "take as data a potentially large set of text documents, of which a small subset is hand coded into an investigator-chosen set of mutually exclusive and exhaustive categories. As output, the methods give approximately unbiased and statistically consistent estimates of the proportion of all documents in each category."
17 hours ago by rybesh
[1203.6402] Scalable K-Means++
9 weeks ago by rybesh
Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k-means++ is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike prevailing efforts on parallelizing k-means that have mostly focused on the post-initialization phases of k-means. We prove that our proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and then show that in practice a constant number of passes suffices. Experimental evaluation on real-world large-scale data demonstrates that k-means|| outperforms k-means++ in both sequential and parallel settings.
clustering
machinelearning
9 weeks ago by rybesh
Maximum Margin Temporal Clustering
9 weeks ago by rybesh
Temporal Clustering (TC) refers to the factorization of multiple time series into a set of non-overlapping segments that belong to k temporal clusters. Existing methods based on extensions of generative models such as k -means or Switching Linear Dynamical Systems (SLDS) often lead to intractable inference and lack a mechanism for feature selection, critical when dealing with high dimensional data. To overcome these limitations, this paper proposes Maximum Margin Temporal Clustering (MMTC). MMTC simultaneously determines the start and the end of each segment, while learning a multi-class Support Vector Machine (SVM) to discriminate among temporal clusters. MMTC extends Maximum Margin Clustering in two ways: first, it incorporates the notion of TC, and second, it introduces additional constraints to achieve better balance between clusters. Experiments on clustering human actions and bee dancing motions illustrate the benefits of our approach compared to state-of-the-art methods.
temporality
actions
events
clustering
supervised
machinelearning
9 weeks ago by rybesh
10 MILLION INTERNATIONAL DYADIC EVENTS
10 weeks ago by rybesh
When the Palestinians launch a mortar attack into Israel, the Israeli army does not wait until the end of the calendar year to react. Yet, most modern data collections are aggregated to the month or year. The data available here include almost 10 million individual events, each coded to the exact day they occur or become known. Each event is summarized in the data as "Actor A does something to Actor B", with Actors A and B recording about 450 countries and other (within-country) actors and "does something to" coded in an ontology of about 200 types of actions. The data are coded by computer from millions of Reuters news reports. The software system (produced by VRA) that performs this task has been independently evaluated by King and Lowe (2003). This article found that for the numbers of events it was possible to convince humans (trained Harvard undergraduates) to code by hand, the machine did as well as the humans. For much larger numbers of events for which no expert coder could keep up, the machine dominates.
events
politicalscience
data
machinelearning
textanalysis
10 weeks ago by rybesh
Blei - Introduction to Probabilistic Topic Models
10 weeks ago by rybesh
Probabilistic topic models are a suite of algorithms whose aim is to discover the hidden thematic structure in large archives of documents. In this article, we review the main ideas of this field, survey the current state-of-the-art, and describe some promising future directions. We first describe latent Dirichlet allocation (LDA) [8], which is the simplest kind of topic model. We discuss its connections to probabilistic modeling, and describe two kinds of algorithms for topic discovery. We then survey the growing body of research that extends and applies topic models in interesting ways. These extensions have been developed by relaxing some of the statistical assumptions of LDA, incorporating meta-data into the analysis of the documents, and using similar kinds of models on a diversity of data types such as social networks, images and genetics. Finally, we give our thoughts as to some of the important unexplored directions for topic modeling. These include rigorous methods for checking models built for data exploration, new approaches to visualizing text and other high dimensional data, and moving beyond traditional information engineering applications towards using topic models for more scientific ends.
topicmodels
unsupervised
machinelearning
clustering
10 weeks ago by rybesh
TinySVM: Support Vector Machines
12 weeks ago by rybesh
TinySVM is an implementation of Support Vector Machines (SVMs) [Vapnik 95], [Vapnik 98] for the problem of pattern recognition.
svm
machinelearning
12 weeks ago by rybesh
Support Vector Machines: Software
12 weeks ago by rybesh
Nice ranked list of SVM software.
svm
machinelearning
classification
12 weeks ago by rybesh
Introduction to Conditional Random Fields - Edwin Chen's Blog
12 weeks ago by rybesh
Accessible introduction to CRFs.
crf
machinelearning
12 weeks ago by rybesh
[1003.0783] Supervised Topic Models
12 weeks ago by rybesh
We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive an approximate maximum-likelihood procedure for parameter estimation, which relies on variational methods to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and the political tone of amendments in the U.S. Senate based on the amendment text. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.
slda
classification
lda
topicmodels
textanalysis
machinelearning
12 weeks ago by rybesh
Supervised latent Dirichlet allocation for classification
12 weeks ago by rybesh
This is a C++ implementation of supervised latent Dirichlet allocation (sLDA) for classification.
c++
slda
classification
topicmodels
lda
machinelearning
textanalysis
12 weeks ago by rybesh
Latent Dirichlet Allocation in C
12 weeks ago by rybesh
This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .
This code contains:
an implementation of variational inference for the per-document topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter
lda
c
linguistics
machinelearning
textanalysis
textmining
This code contains:
an implementation of variational inference for the per-document topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter
12 weeks ago by rybesh
Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.
12 weeks ago by rybesh
During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book.
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for ``wide'' data (italics p bigger than n), including multiple testing and false discovery rates.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful {italics An Introduct ion to the Bootstrap}. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
statistics
machinelearning
datamining
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for ``wide'' data (italics p bigger than n), including multiple testing and false discovery rates.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful {italics An Introduct ion to the Bootstrap}. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
12 weeks ago by rybesh
wcauchois / pysvmlight / overview — Bitbucket
12 weeks ago by rybesh
A Python binding to the popular "SVM-Light" support vector machine library.
svm
machinelearning
python
12 weeks ago by rybesh
Conditional Random Fields
february 2012 by rybesh
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.
machinelearning
nlp
crf
textmining
metadata
february 2012 by rybesh
The Meaning and The Mining of Legal Texts
january 2012 by rybesh
Positive law, inscribed in legal texts, entails an authority not inherent in literary texts, generating legal consequences that can have real effects on a person’s life and liberty. The interpretation of legal texts, necessarily a normative undertaking, resists the mechanical application of rules, though still requiring a measure of predictability, coherence with other relevant legal norms and compliance with constitutional safeguards. The present proliferation of legal texts on the internet (codes, statutes, judgments, treaties, doctrinal treatises) renders the selection of relevant texts and cases next to impossible. We may expect that systems to mine these texts to find arguments that support one’s case, as well as expert systems that support the decision-making process of courts, will end up doing much of the work.
This raises the question of the difference between human interpretation and computational pattern-recognition and the issue of whether this difference makes a difference for the meaning of law. Possibly, data mining will produce patterns that disclose habits of the minds of judges and legislators that would have otherwise gone unnoticed (reinforcing the argument of the ‘legal realists’ at the beginning of the 20th century). Also, after the data analysis it will still be up to the judge to decide how to interpret the results or up to the prosecution which patterns to engage in the construction of evidence (requiring a hermeneutics of computational patterns instead of texts). My focus in this paper regards the fact that the mining process necessarily disambiguates the legal texts in order to transform them into a machine-readable data set, while the algorithms used for the analysis embody a strategy that will co-determine the outcome of the patterns. There seems a major due process concern here to the extent that these patterns are invisible for the naked human eye and will not be contestable in a court of law, due to their hidden complexity and computational nature.
This position paper aims to explain what is at stake in the computational turn with regard to legal texts. This prepares for the question I want to put forward to those involved in distant reading and not-reading of texts: could a visualization of computational patterns constitute a new way of un-hiding the complexity involved, opening the results of computational ‘knowledge’ to citizens’ scrutiny?
textmining
machinelearning
visualization
digitalhumanities
law
This raises the question of the difference between human interpretation and computational pattern-recognition and the issue of whether this difference makes a difference for the meaning of law. Possibly, data mining will produce patterns that disclose habits of the minds of judges and legislators that would have otherwise gone unnoticed (reinforcing the argument of the ‘legal realists’ at the beginning of the 20th century). Also, after the data analysis it will still be up to the judge to decide how to interpret the results or up to the prosecution which patterns to engage in the construction of evidence (requiring a hermeneutics of computational patterns instead of texts). My focus in this paper regards the fact that the mining process necessarily disambiguates the legal texts in order to transform them into a machine-readable data set, while the algorithms used for the analysis embody a strategy that will co-determine the outcome of the patterns. There seems a major due process concern here to the extent that these patterns are invisible for the naked human eye and will not be contestable in a court of law, due to their hidden complexity and computational nature.
This position paper aims to explain what is at stake in the computational turn with regard to legal texts. This prepares for the question I want to put forward to those involved in distant reading and not-reading of texts: could a visualization of computational patterns constitute a new way of un-hiding the complexity involved, opening the results of computational ‘knowledge’ to citizens’ scrutiny?
january 2012 by rybesh
Apache Mahout: Scalable machine learning and data mining
august 2011 by rybesh
Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.
apache
hadoop
machinelearning
mapreduce
lda
august 2011 by rybesh
MADlib
july 2011 by rybesh
MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
database
analytics
datamining
statistics
machinelearning
sql
july 2011 by rybesh
Christopher M. Bishop: Pattern Recognition and Machine Learning
june 2011 by rybesh
This leading textbook provides a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of pattern recognition or machine learning concepts is assumed. This is the first machine learning textbook to include a comprehensive coverage of recent developments such as probabilistic graphical models and deterministic inference methods, and to emphasize a modern Bayesian perspective. It is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. This hard cover book has 738 pages in full colour, and there are 431 graded exercises (with solutions available below). Extensive support is provided for course instructors.
machinelearning
books
patterns
statistics
datamining
june 2011 by rybesh
maui-indexer - Maui - Multi-purpose automatic topic indexing - Google Project Hosting
june 2011 by rybesh
Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles.
Maui performs the following tasks:
term assignment with a controlled vocabulary (or thesaurus)
subject indexing
topic indexing with terms from Wikipedia
keyphrase extraction
terminology extraction
automatic tagging
It can also be used for terminology extraction and semi-automatic topic indexing.
indexing
vocabulary
tools
nlp
machinelearning
java
Maui performs the following tasks:
term assignment with a controlled vocabulary (or thesaurus)
subject indexing
topic indexing with terms from Wikipedia
keyphrase extraction
terminology extraction
automatic tagging
It can also be used for terminology extraction and semi-automatic topic indexing.
june 2011 by rybesh
Languages - Accentuate.us - Really Easy Computer Input
april 2011 by rybesh
Accentuate.us uses statistics to predict where special characters are needed on a language-by-language basis.
language
input
python
tools
webservices
api
machinelearning
april 2011 by rybesh
TiMBL: Tilburg Memory-Based Learner
april 2011 by rybesh
TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases.
For the past decade, TiMBL has been mostly used in natural language processing as a machine learning classifier component, but its use extends to virtually any supervised machine learning domain. Due to its particular decision-tree-based implementation, TiMBL is in many cases far more efficient in classification than a standard k-nearest neighbor algorithm would be.
nlp
machinelearning
tools
For the past decade, TiMBL has been mostly used in natural language processing as a machine learning classifier component, but its use extends to virtually any supervised machine learning domain. Due to its particular decision-tree-based implementation, TiMBL is in many cases far more efficient in classification than a standard k-nearest neighbor algorithm would be.
april 2011 by rybesh
Daisy Zhe Wang: BayesStore
january 2011 by rybesh
BayesStore is a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. BayesStore represents model and evidence data as relational tables; implements inference algorithms efficiently in SQL; adds probabilistic relational operators to the query engine; optimizes queries with both relational and inference operators. The design goals of BayesStore are: (1) to be able to support efficient query processing over different models compared to the off-the-shelf machine learning libraries; (2) to be able to support extensible API for plugging in new models and inference algorithms; and (3) to be able to scale up to very large data sets.
statistics
bayes
database
machinelearning
january 2011 by rybesh
Modular toolkit for Data Processing (MDP)
december 2010 by rybesh
Modular toolkit for Data Processing (MDP) is a Python data processing framework.
From the user's perspective, MDP is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures.
datamining
machinelearning
python
tools
From the user's perspective, MDP is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures.
december 2010 by rybesh
MALLET homepage
october 2010 by rybesh
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
datamining
java
machinelearning
nlp
tools
october 2010 by rybesh
Google Prediction API - Google Code
july 2010 by rybesh
The Prediction API enables access to Google's machine learning algorithms to analyze your historic data and predict likely future outcomes.
machinelearning
api
classification
july 2010 by rybesh
Training Examples Q&A - machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization
june 2010 by rybesh
Where data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!
ai
machinelearning
nlp
textanalysis
ir
datamining
search
statistics
infoviz
reference
june 2010 by rybesh
Apache Mahout
november 2009 by rybesh
Mahout's goal is to build scalable machine learning libraries.
machinelearning
opensource
hadoop
apache
recommendation
clustering
classification
datamining
november 2009 by rybesh
Maximum Entropy (GA) Model Optimization Package
august 2009 by rybesh
Maximum entropy (aka logistic regression) models are very popular, especially in natural language processing. The software here is an implementation of maximum likelihood and maximum a posterior optimization of the parameters of these models. The algorithms used are much more efficient than the iterative scaling techniques used in almost every other maxent package out there.
research
tools
nlp
statistics
machinelearning
ocaml
logreg
maxent
august 2009 by rybesh
ParsCit: An open-source CRF Reference String Parsing Package
september 2008 by rybesh
It is architected as a supervised machine learning procedure that uses Conditional Random Fields as its learning mechanism.
nlp
machinelearning
opensource
perl
parsing
recognition
citations
september 2008 by rybesh
Dawid Weiss
november 2007 by rybesh
Text clustering, information retrieval, web mining, text processing, NLP.
people
academia
poland
search
datamining
nlp
machinelearning
november 2007 by rybesh
Map-Reduce for Machine Learning on Multicore
august 2007 by rybesh
In this paper, we develop a broadly applicable parallel programming method, one that is easily applied to many different learning algorithms.
machinelearning
distributed
grid
research
august 2007 by rybesh
gladwell.com: The Perfect and the Good
november 2006 by rybesh
...one of the most important changes we're going to see in lots of professions over the next few years is the emergence of tools that close the gap between the middle and the top--that allow the decision-maker who is merely competent to avoid his errors a
machinelearning
decisionmaking
future
ideas
tools
expertise
november 2006 by rybesh
Manifold - Wikipedia, the free encyclopedia
november 2006 by rybesh
A manifold is an abstract mathematical space in which every point has a neighborhood which resembles Euclidean space, but in which the global structure may be more complicated.
math
machinelearning
statistics
november 2006 by rybesh
Northrop
october 2006 by rybesh
A genre categorizer that lets users narrow down searches to particular genres like editorials, financial reports or scientific writing or group search results according to genre.
genre
search
organization
nlp
classification
machinelearning
october 2006 by rybesh
Statistical Data Mining Tutorials
september 2006 by rybesh
A set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.
machinelearning
reference
statistics
howto
september 2006 by rybesh
Milind Naphade
july 2006 by rybesh
Research interests in content analysis, information extraction, statistical machine learning and graphical modeling and detection and representation of semantic information.
multimedia
analysis
machinelearning
semweb
people
IBM
SSMS2006
july 2006 by rybesh
Orange
october 2005 by rybesh
Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques.
machinelearning
classification
code
datamining
python
opensource
tools
nlp
statistics
october 2005 by rybesh
Data Mining in Python
october 2005 by rybesh
This is a collection of libraries useful for machine learning and data mining.
python
statistics
machinelearning
nlp
code
opensource
datamining
october 2005 by rybesh
Divmod.org :: Reverend
october 2005 by rybesh
Reverend is a general purpose Bayesian classifier. Use the Reverend to quickly add Bayesian smarts to your app.
machinelearning
bayes
classification
python
statistics
opensource
code
october 2005 by rybesh
MusicStrands
july 2005 by rybesh
MusicStrands uses statistical machine learning, collaborative filtering, link-based analysis, to provide independent music recommendations based on the listening behavior of individuals and social networks.
music
playlist
social
networking
tools
machinelearning
statistics
july 2005 by rybesh
related tags
academia ⊕ actions ⊕ ai ⊕ analysis ⊕ analytics ⊕ apache ⊕ api ⊕ bayes ⊕ berkeley ⊕ books ⊕ c ⊕ c++ ⊕ citations ⊕ classification ⊕ clustering ⊕ code ⊕ courses ⊕ crf ⊕ current ⊕ data ⊕ database ⊕ datamining ⊕ decisionmaking ⊕ digitalhumanities ⊕ distributed ⊕ events ⊕ expertise ⊕ fall2006 ⊕ future ⊕ genre ⊕ grid ⊕ hadoop ⊕ howto ⊕ IBM ⊕ ideas ⊕ indexing ⊕ infoviz ⊕ input ⊕ ir ⊕ java ⊕ language ⊕ law ⊕ lda ⊕ linguistics ⊕ logreg ⊕ machinelearning ⊖ mapreduce ⊕ math ⊕ maxent ⊕ metadata ⊕ multimedia ⊕ music ⊕ networking ⊕ news ⊕ nlp ⊕ ocaml ⊕ opensource ⊕ organization ⊕ parsing ⊕ patterns ⊕ people ⊕ perl ⊕ playlist ⊕ poland ⊕ politicalscience ⊕ python ⊕ recognition ⊕ recommendation ⊕ reference ⊕ research ⊕ search ⊕ semweb ⊕ sentiment ⊕ slda ⊕ social ⊕ sql ⊕ SSMS2006 ⊕ statistics ⊕ supervised ⊕ svm ⊕ temporality ⊕ textanalysis ⊕ textmining ⊕ tools ⊕ topicmodels ⊕ unsupervised ⊕ visualization ⊕ vocabulary ⊕ webservices ⊕Copy this bookmark: