cshalizi + classifiers 49
Clarke , Clarke : Prediction in several conventional contexts
20 days ago by cshalizi
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."
(to_teach tags are tentative.)
to:NB
prediction
statistics
classifiers
regression
to_teach:undergrad-ADA
to_teach:data-mining
(to_teach tags are tentative.)
20 days ago by cshalizi
Game-powered machine learning
4 weeks ago by cshalizi
"Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data."
--- This is more than a bit of a stunt, but it points in an interesting direction.
to:NB
to_read
data_mining
collective_cognition
active_learning
tagging
classifiers
re:democratic_cognition
--- This is more than a bit of a stunt, but it points in an interesting direction.
4 weeks ago by cshalizi
[1203.4354] Asymptotic Confidence Sets for General Nonparametric Regression and Classification by Regularized Kernel Methods
7 weeks ago by cshalizi
"Regularized kernel methods such as, e.g., support vector machines and least-squares support vector regression constitute an important class of standard learning algorithms in machine learning. Theoretical investigations concerning asymptotic properties have manly focused on rates of convergence during the last years but there are only very few and limited (asymptotic) results on statistical inference so far. As this is a serious limitation for their use in mathematical statistics, the goal of the article is to fill this gap. Based on asymptotic normality of many of these methods, the article derives a strongly consistent estimator for the unknown covariance matrix of the limiting normal distribution. In this way, we obtain asymptotically correct confidence sets for $psi(f_{P,lambda_0})$ where $f_{P,lambda_0}$ denotes the minimizer of the regularized risk in the reproducing kernel Hilbert space $H$ and $psi:Hrightarrowmathds{R}^m$ is any Hadamard-differentiable functional. Applications include (multivariate) pointwise confidence sets for values of $f_{P,lambda_0}$ and confidence sets for gradients, integrals, and norms."
to:NB
confidence_sets
kernel_methods
statistics
nonparametrics
regression
classifiers
7 weeks ago by cshalizi
Time-series clustering via quasi U-statistics - Valk - 2012 - Journal of Time Series Analysis - Wiley Online Library
8 weeks ago by cshalizi
"The problem of time-series discrimination and classification is discussed. We propose a novel clustering algorithm based on a class of quasi U-statistics and subgroup decomposition tests. The decomposition may be applied to any concave time-series distance. The resulting test statistics are proven to be asymptotically normal for either i.i.d. or non-identically distributed groups of time-series under mild conditions. We illustrate its empirical performance on a simulation study and a real data analysis. The simulation setup includes stationary vs. stationary and stationary vs. non-stationary cases. The performance of the proposed method is favourably compared with some of the most common clustering measures available."
to:NB
clustering
time_series
statistics
classifiers
8 weeks ago by cshalizi
[1203.0193] Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts
12 weeks ago by cshalizi
Huh: "The Vapnik-Chervonenkis dimension (VC dimension) of the set of half-spaces of R^d with frontiers parallel to the axes is computed exactly. It is shown that this VC dimension is smaller than the intuitive value of d. An additional approximation using the Stirling's formula is given. This result may be used to evaluate the performance of classifiers or regressors based on dyadic partitioning of R^d for instance. Algorithms using axis-parallel cuts to partition R^d are often used to reduce the computational time of such estimators when d is large."
to:NB
learning_theory
vc-dimension
classifiers
12 weeks ago by cshalizi
[1202.1523] Information Forests
february 2012 by cshalizi
"We describe Information Forests, an approach to classification that generalizes Random Forests by replacing the splitting criterion of non-leaf nodes from a discriminative one -- based on the entropy of the label distribution -- to a generative one -- based on maximizing the information divergence between the class-conditional distributions in the resulting partitions. The basic idea consists of deferring classification until a measure of "classification confidence" is sufficiently high, and instead breaking down the data so as to maximize this measure. In an alternative interpretation, Information Forests attempt to partition the data into subsets that are "as informative as possible" for the purpose of the task, which is to classify the data. Classification confidence, or informative content of the subsets, is quantified by the Information Divergence. Our approach relates to active learning, semi-supervised learning, mixed generative/discriminative learning."
After reading: meh.
have_read
decision_trees
information_theory
classifiers
machine_learning
to_teach:data-mining
re:AoS_project
After reading: meh.
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
february 2012 by cshalizi
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB
information_theory
statistics
variable_selection
model_selection
to_teach:data-mining
to:blog
machine_learning
classifiers
have_read
graphical_models
february 2012 by cshalizi
Linear Classifiers Are Nearly Optimal When Hidden Variables Have Diverse Effects
january 2012 by cshalizi
"We analyze classification problems in which data is generated by a two-tiered random process. The class is generated first, then a layer of conditionally independent hidden variables, and finally the observed variables. For sources like this, the Bayes-optimal rule for predicting the class given the values of the observed variables is a two-layer neural network. We show that, if the hidden variables have non-negligible effects on many observed variables, a linear classifier approximates the error rate of the Bayes optimal classifier up to lower order terms. We also show that the hinge loss of a linear classifier is not much more than the Bayes error rate, which implies that an accurate linear classifier can be found efficiently."
to:NB
machine_learning
classifiers
re:what_is_the_right_null_model_for_linear_regression
january 2012 by cshalizi
[1111.5312] Representations and Ensemble Methods for Dynamic Relational Classification
november 2011 by cshalizi
"Temporal networks are ubiquitous and evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Although many relational datasets contain temporal information, the majority of existing techniques in relational learning focus on static snapshots and ignore the temporal dynamics. We propose a framework for discovering temporal representations of relational data to increase the accuracy of statistical relational learning algorithms. The temporal relational representations serve as a basis for classification, ensembles, and pattern mining in evolving domains. The framework includes (1) selecting the time-varying relational components (links, attributes, nodes), (2) selecting the temporal granularity, (3) predicting the temporal influence of each time-varying relational component, and (4) choosing the weighted relational classifier. Additionally, we propose temporal ensemble methods that exploit the temporal-dimension of relational data. These ensembles outperform traditional and more sophisticated relational ensembles while avoiding the issue of learning the most optimal representation. Finally, the space of temporal-relational models are evaluated using a sample of classifiers. In all cases, the proposed temporal-relational classifiers outperform competing models that ignore the temporal information. The results demonstrate the capability and necessity of the temporal-relational representations for classification, ensembles, and for mining temporal datasets."
in_NB
to_read
relational_learning
network_data_analysis
transaction_networks
neville.jennifer
machine_learning
ensemble_methods
time_series
classifiers
november 2011 by cshalizi
Boosting - The MIT Press
november 2011 by cshalizi
"Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate “rules of thumb.” A remarkably rich theory has evolved around boosting, with connections to a range of topics, including statistics, game theory, convex optimization, and information geometry. Boosting algorithms have also enjoyed practical success in such fields as biology, vision, and speech processing. At various times in its history, boosting has been perceived as mysterious, controversial, even paradoxical.
This book, written by the inventors of the method, brings together, organizes, simplifies, and substantially extends two decades of research on boosting, presenting both theory and applications in a way that is accessible to readers from diverse backgrounds while also providing an authoritative reference for advanced researchers. With its introductory treatment of all material and its inclusion of exercises in every chapter, the book is appropriate for course use as well.
The book begins with a general introduction to machine learning algorithms and their analysis; then explores the core theory of boosting, especially its ability to generalize; examines some of the myriad other theoretical viewpoints that help to explain and understand boosting; provides practical extensions of boosting for more complex learning problems; and finally presents a number of advanced theoretical topics. Numerous applications and practical illustrations are offered throughout."
in_NB
books:noted
coveted
machine_learning
ensemble_methods
re:democratic_cognition
collective_cognition
classifiers
regression
This book, written by the inventors of the method, brings together, organizes, simplifies, and substantially extends two decades of research on boosting, presenting both theory and applications in a way that is accessible to readers from diverse backgrounds while also providing an authoritative reference for advanced researchers. With its introductory treatment of all material and its inclusion of exercises in every chapter, the book is appropriate for course use as well.
The book begins with a general introduction to machine learning algorithms and their analysis; then explores the core theory of boosting, especially its ability to generalize; examines some of the myriad other theoretical viewpoints that help to explain and understand boosting; provides practical extensions of boosting for more complex learning problems; and finally presents a number of advanced theoretical topics. Numerous applications and practical illustrations are offered throughout."
november 2011 by cshalizi
[1010.5496] Theory of spike timing based neural classifiers
november 2010 by cshalizi
"We study the computational capacity of a model neuron, the Tempotron, which classifies sequences of spikes by linear-threshold operations. We use statistical mechanics and extreme value theory to derive the capacity of the system in random classification tasks. In contrast to its static analog, the Perceptron, the Tempotron's solutions space consists of a large number of small clusters of weight vectors. The capacity of the system per synapse is finite in the large size limit and weakly diverges with the stimulus duration relative to the membrane and synaptic time constants."
neural_networks
classifiers
to:NB
november 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
april 2010 by cshalizi
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory
statistics
estimation
hypothesis_testing
prediction
minimax
bootstrap
model_selection
regression
classifiers
confidence_sets
wasserman.larry
kith_and_kin
april 2010 by cshalizi
A dissection of John Gottman's love lab. - By Laurie Abraham - Slate Magazine
march 2010 by cshalizi
This is confused, or at least confusingly written. Is the objection to not evaluating the classifier out of sample? Or that the success of even a very stupid rule should be high (because most couples don't get divorced within five years)? (That would be a valid point, but it's not "base-rate neglect".) Or what?
marriage
classifiers
to_teach:data-mining
data_analysis
march 2010 by cshalizi
[1003.0470] Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels
march 2010 by cshalizi
"Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such risks using only unlabeled data and p(y). We prove that the technique is consistent for high-dimensional linear classifiers and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers with no labeled data whatsoever." --- Now, this abstract makes absolutely no sense to me (I mean it, none whatsoever), but since Guy is the one saying these senseless things, I assume that they actually make sense somehow.
semi-supervised_learning
learning_theory
classifiers
re:naive-semi-supervised
to:NB
march 2010 by cshalizi
[1003.0783] Supervised Topic Models
march 2010 by cshalizi
What a coincidence, some of the kids in 490 have labeled documents...
latent_dirichlet_allocation
text_mining
classifiers
machine_learning
statistics
to_teach:data-mining
to_teach:undergrad-research
topic_models
blei.david
march 2010 by cshalizi
Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.
october 2009 by cshalizi
Free PDF! (Still, I find my bound physical copy much more convenient.)
books:recommended
machine_learning
data_mining
statistics
learning_theory
estimation
cross-validation
ensemble_methods
classifiers
regression
graphical_models
clustering
dimension_reduction
bootstrap
via:arthegall
have_read
october 2009 by cshalizi
Bootstrapping
september 2009 by cshalizi
Corrections to Blum and Mitchell 1998
machine_learning
classifiers
semi-supervised_learning
via:anoopsarkar
to:NB
to_read
re:naive-semi-supervised
september 2009 by cshalizi
[0909.0184] Robust nearest-neighbor methods for classifying high-dimensional data
september 2009 by cshalizi
Sounds impossible, from the abstract, so clearly I need to read it.
classifiers
machine_learning
nearest_neighbors
statistics
to_read
to:NB
to_teach:data-mining
september 2009 by cshalizi
Geeking with Greg: Finding task boundaries in search logs
august 2009 by cshalizi
Nice write-up from a year ago on K's paper.
information_retrieval
classifiers
search_engines
klinkner.kristina
jones.rosie
kith_and_kin
august 2009 by cshalizi
[0906.3590] Forest Garrote
june 2009 by cshalizi
We have got to do something about the nams of techniques in this area. I don't mind the whimsy, it's just that combinations like this don't work, metaphorically.
ensemble_methods
classifiers
statistics
machine_learning
sparsity
variable_selection
lasso
june 2009 by cshalizi
Consistency of Random Forests and Other Averaging Classifiers (Biau, Devroye and Lugosi)
may 2009 by cshalizi
"In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent."
--- The actual nuts and bolts here are too complex for teaching in 350, but I should look carefully at what version of random forests I teach them.
ensemble_methods
random_forests
classifiers
machine_learning
learning_theory
statistics
breiman.leo
biau.gerard
devroye.luc
lugosi.gabor
to_read
to_teach:data-mining
--- The actual nuts and bolts here are too complex for teaching in 350, but I should look carefully at what version of random forests I teach them.
may 2009 by cshalizi
Stationary Features and Cat Detection
may 2009 by cshalizi
Their "stationary features" sound a bit like "necessary statistics"... which is not a bad thing!
pattern_recognition
statistics
classifiers
boosting
cats
stationary_features
machine_learning
geman.donald
fleuret.francois
to:blog
have_read
may 2009 by cshalizi
Introduction to Machine Learning - Ethem Alpaydin [@Labyrinth]
august 2008 by cshalizi
Useful introductory survey/reference; mathematically undemanding. Nice that it has chapters on hidden Markov models and various model-combination methods.
books:recommended
machine_learning
classifiers
regression
clustering
markov_models
ensemble_methods
august 2008 by cshalizi
Online Learning of Complex Prediction Problems Using Simultaneous Projections
july 2008 by cshalizi
"framework for online classification where each online trial consists of multiple prediction tasks that are tied together"
machine_learning
classifiers
prediction
online_learning
to_read
july 2008 by cshalizi
Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes - Ryabko
february 2008 by cshalizi
Using universal coding to estimate stationary distributions and predict and classify "[d]iscrete-time stochastic processes [with values in] either a finite set ... or a real line interval"
universal_prediction
to:NB
information_theory
prediction
classifiers
nonparametrics
ryabko.b._ya.
to_read
february 2008 by cshalizi
Pyjamas in Bananas: Shake that booty
november 2007 by cshalizi
Excuse me while I bang my head into the wall. Ahh, that's better...
to:blog
evolutionary_psychology
bad_data_analysis
bad_science_journalism
statistics
classifiers
november 2007 by cshalizi
related tags
active_learning ⊕ bad_data_analysis ⊕ bad_science_journalism ⊕ biau.gerard ⊕ blei.david ⊕ blogged ⊕ books:noted ⊕ books:recommended ⊕ boosting ⊕ bootstrap ⊕ breiman.leo ⊕ CART ⊕ catoni.olivier ⊕ cats ⊕ causal_inference ⊕ classifiers ⊖ clinton.hillary ⊕ clustering ⊕ collective_cognition ⊕ confidence_sets ⊕ coveted ⊕ cox.amanda ⊕ cross-validation ⊕ dataset_shift ⊕ data_analysis ⊕ data_mining ⊕ decision_trees ⊕ devroye.luc ⊕ dimension_reduction ⊕ eagle.nathan ⊕ ensemble_methods ⊕ estimation ⊕ evisceration ⊕ evolutionary_psychology ⊕ expectation-maximization ⊕ experimental_psychology ⊕ feature_selection ⊕ fisher_information ⊕ fleuret.francois ⊕ fmri ⊕ geman.donald ⊕ graphical_models ⊕ have_read ⊕ hilbert_space ⊕ hypothesis_testing ⊕ induction ⊕ information_retrieval ⊕ information_theory ⊕ in_NB ⊕ jones.rosie ⊕ kernel_methods ⊕ kith_and_kin ⊕ klinkner.kristina ⊕ lasso ⊕ latent_dirichlet_allocation ⊕ lazer.david ⊕ learning_theory ⊕ liberman.mark ⊕ linguistics ⊕ logistic_regression ⊕ lugosi.gabor ⊕ machine_learning ⊕ markov_models ⊕ marriage ⊕ methodological_EPIC_FAIL ⊕ minimax ⊕ model_selection ⊕ nearest_neighbors ⊕ network_data_analysis ⊕ neural_networks ⊕ neuroscience ⊕ neville.jennifer ⊕ nonparametrics ⊕ obama.barack ⊕ online_learning ⊕ outliers ⊕ pac-bayesian ⊕ pattern_recognition ⊕ pentland.alex ⊕ perceptron ⊕ prediction ⊕ primates ⊕ probably_approximately_correct ⊕ propagation_of_error ⊕ random_forests ⊕ re:AoS_project ⊕ re:democratic_cognition ⊕ re:naive-semi-supervised ⊕ re:what_is_the_right_null_model_for_linear_regression ⊕ re:XV_for_networks ⊕ regression ⊕ relational_learning ⊕ ryabko.b._ya. ⊕ search_engines ⊕ semi-supervised_learning ⊕ social_networks ⊕ sparsity ⊕ stationary_features ⊕ statistical_mechanics ⊕ statistics ⊕ support_vector_machines ⊕ tagging ⊕ text_mining ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ topic_models ⊕ to_read ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ to_teach:undergrad-research ⊕ transaction_networks ⊕ universal_prediction ⊕ us_politics ⊕ variable_selection ⊕ vc-dimension ⊕ via:anoopsarkar ⊕ via:arthegall ⊕ via:klk ⊕ via:shreejoy ⊕ vision ⊕ wasserman.larry ⊕ why_oh_why_cant_we_have_a_better_academic_publishing_system ⊕Copy this bookmark: