cshalizi + classifiers   49

Clarke , Clarke : Prediction in several conventional contexts
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."

(to_teach tags are tentative.)
to:NB  prediction  statistics  classifiers  regression  to_teach:undergrad-ADA  to_teach:data-mining 
20 days ago by cshalizi
Game-powered machine learning
"Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data."

--- This is more than a bit of a stunt, but it points in an interesting direction.
to:NB  to_read  data_mining  collective_cognition  active_learning  tagging  classifiers  re:democratic_cognition 
4 weeks ago by cshalizi
[1203.4354] Asymptotic Confidence Sets for General Nonparametric Regression and Classification by Regularized Kernel Methods
"Regularized kernel methods such as, e.g., support vector machines and least-squares support vector regression constitute an important class of standard learning algorithms in machine learning. Theoretical investigations concerning asymptotic properties have manly focused on rates of convergence during the last years but there are only very few and limited (asymptotic) results on statistical inference so far. As this is a serious limitation for their use in mathematical statistics, the goal of the article is to fill this gap. Based on asymptotic normality of many of these methods, the article derives a strongly consistent estimator for the unknown covariance matrix of the limiting normal distribution. In this way, we obtain asymptotically correct confidence sets for $psi(f_{P,lambda_0})$ where $f_{P,lambda_0}$ denotes the minimizer of the regularized risk in the reproducing kernel Hilbert space $H$ and $psi:Hrightarrowmathds{R}^m$ is any Hadamard-differentiable functional. Applications include (multivariate) pointwise confidence sets for values of $f_{P,lambda_0}$ and confidence sets for gradients, integrals, and norms."
to:NB  confidence_sets  kernel_methods  statistics  nonparametrics  regression  classifiers 
7 weeks ago by cshalizi
Time-series clustering via quasi U-statistics - Valk - 2012 - Journal of Time Series Analysis - Wiley Online Library
"The problem of time-series discrimination and classification is discussed. We propose a novel clustering algorithm based on a class of quasi U-statistics and subgroup decomposition tests. The decomposition may be applied to any concave time-series distance. The resulting test statistics are proven to be asymptotically normal for either i.i.d. or non-identically distributed groups of time-series under mild conditions. We illustrate its empirical performance on a simulation study and a real data analysis. The simulation setup includes stationary vs. stationary and stationary vs. non-stationary cases. The performance of the proposed method is favourably compared with some of the most common clustering measures available."
to:NB  clustering  time_series  statistics  classifiers 
8 weeks ago by cshalizi
[1203.0193] Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts
Huh: "The Vapnik-Chervonenkis dimension (VC dimension) of the set of half-spaces of R^d with frontiers parallel to the axes is computed exactly. It is shown that this VC dimension is smaller than the intuitive value of d. An additional approximation using the Stirling's formula is given. This result may be used to evaluate the performance of classifiers or regressors based on dyadic partitioning of R^d for instance. Algorithms using axis-parallel cuts to partition R^d are often used to reduce the computational time of such estimators when d is large."
to:NB  learning_theory  vc-dimension  classifiers 
12 weeks ago by cshalizi
[1202.1523] Information Forests
"We describe Information Forests, an approach to classification that generalizes Random Forests by replacing the splitting criterion of non-leaf nodes from a discriminative one -- based on the entropy of the label distribution -- to a generative one -- based on maximizing the information divergence between the class-conditional distributions in the resulting partitions. The basic idea consists of deferring classification until a measure of "classification confidence" is sufficiently high, and instead breaking down the data so as to maximize this measure. In an alternative interpretation, Information Forests attempt to partition the data into subsets that are "as informative as possible" for the purpose of the task, which is to classify the data. Classification confidence, or informative content of the subsets, is quantified by the Information Divergence. Our approach relates to active learning, semi-supervised learning, mixed generative/discriminative learning."

After reading: meh.
have_read  decision_trees  information_theory  classifiers  machine_learning  to_teach:data-mining  re:AoS_project 
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB  information_theory  statistics  variable_selection  model_selection  to_teach:data-mining  to:blog  machine_learning  classifiers  have_read  graphical_models 
february 2012 by cshalizi
Linear Classifiers Are Nearly Optimal When Hidden Variables Have Diverse Effects
"We analyze classification problems in which data is generated by a two-tiered random process. The class is generated first, then a layer of conditionally independent hidden variables, and finally the observed variables. For sources like this, the Bayes-optimal rule for predicting the class given the values of the observed variables is a two-layer neural network. We show that, if the hidden variables have non-negligible effects on many observed variables, a linear classifier approximates the error rate of the Bayes optimal classifier up to lower order terms. We also show that the hinge loss of a linear classifier is not much more than the Bayes error rate, which implies that an accurate linear classifier can be found efficiently."
to:NB  machine_learning  classifiers  re:what_is_the_right_null_model_for_linear_regression 
january 2012 by cshalizi
[1111.5312] Representations and Ensemble Methods for Dynamic Relational Classification
"Temporal networks are ubiquitous and evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Although many relational datasets contain temporal information, the majority of existing techniques in relational learning focus on static snapshots and ignore the temporal dynamics. We propose a framework for discovering temporal representations of relational data to increase the accuracy of statistical relational learning algorithms. The temporal relational representations serve as a basis for classification, ensembles, and pattern mining in evolving domains. The framework includes (1) selecting the time-varying relational components (links, attributes, nodes), (2) selecting the temporal granularity, (3) predicting the temporal influence of each time-varying relational component, and (4) choosing the weighted relational classifier. Additionally, we propose temporal ensemble methods that exploit the temporal-dimension of relational data. These ensembles outperform traditional and more sophisticated relational ensembles while avoiding the issue of learning the most optimal representation. Finally, the space of temporal-relational models are evaluated using a sample of classifiers. In all cases, the proposed temporal-relational classifiers outperform competing models that ignore the temporal information. The results demonstrate the capability and necessity of the temporal-relational representations for classification, ensembles, and for mining temporal datasets."
in_NB  to_read  relational_learning  network_data_analysis  transaction_networks  neville.jennifer  machine_learning  ensemble_methods  time_series  classifiers 
november 2011 by cshalizi
Boosting - The MIT Press
"Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor by combining many weak and inaccurate “rules of thumb.” A remarkably rich theory has evolved around boosting, with connections to a range of topics, including statistics, game theory, convex optimization, and information geometry. Boosting algorithms have also enjoyed practical success in such fields as biology, vision, and speech processing. At various times in its history, boosting has been perceived as mysterious, controversial, even paradoxical.

This book, written by the inventors of the method, brings together, organizes, simplifies, and substantially extends two decades of research on boosting, presenting both theory and applications in a way that is accessible to readers from diverse backgrounds while also providing an authoritative reference for advanced researchers. With its introductory treatment of all material and its inclusion of exercises in every chapter, the book is appropriate for course use as well.

The book begins with a general introduction to machine learning algorithms and their analysis; then explores the core theory of boosting, especially its ability to generalize; examines some of the myriad other theoretical viewpoints that help to explain and understand boosting; provides practical extensions of boosting for more complex learning problems; and finally presents a number of advanced theoretical topics. Numerous applications and practical illustrations are offered throughout."
in_NB  books:noted  coveted  machine_learning  ensemble_methods  re:democratic_cognition  collective_cognition  classifiers  regression 
november 2011 by cshalizi
[1010.5496] Theory of spike timing based neural classifiers
"We study the computational capacity of a model neuron, the Tempotron, which classifies sequences of spikes by linear-threshold operations. We use statistical mechanics and extreme value theory to derive the capacity of the system in random classification tasks. In contrast to its static analog, the Perceptron, the Tempotron's solutions space consists of a large number of small clusters of weight vectors. The capacity of the system per synapse is finite in the large size limit and weakly diverges with the stimulus duration relative to the membrane and synaptic time constants."
neural_networks  classifiers  to:NB 
november 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory  statistics  estimation  hypothesis_testing  prediction  minimax  bootstrap  model_selection  regression  classifiers  confidence_sets  wasserman.larry  kith_and_kin 
april 2010 by cshalizi
A dissection of John Gottman's love lab. - By Laurie Abraham - Slate Magazine
This is confused, or at least confusingly written. Is the objection to not evaluating the classifier out of sample? Or that the success of even a very stupid rule should be high (because most couples don't get divorced within five years)? (That would be a valid point, but it's not "base-rate neglect".) Or what?
marriage  classifiers  to_teach:data-mining  data_analysis 
march 2010 by cshalizi
[1003.0470] Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels
"Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled dataset. We develop a novel technique for estimating such risks using only unlabeled data and p(y). We prove that the technique is consistent for high-dimensional linear classifiers and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers with no labeled data whatsoever." --- Now, this abstract makes absolutely no sense to me (I mean it, none whatsoever), but since Guy is the one saying these senseless things, I assume that they actually make sense somehow.
semi-supervised_learning  learning_theory  classifiers  re:naive-semi-supervised  to:NB 
march 2010 by cshalizi
[0906.3590] Forest Garrote
We have got to do something about the nams of techniques in this area. I don't mind the whimsy, it's just that combinations like this don't work, metaphorically.
ensemble_methods  classifiers  statistics  machine_learning  sparsity  variable_selection  lasso 
june 2009 by cshalizi
Consistency of Random Forests and Other Averaging Classifiers (Biau, Devroye and Lugosi)
"In the last years of his life, Leo Breiman promoted random forests for use in classification. He suggested using averaging as a means of obtaining good discrimination rules. The base classifiers used for averaging are simple and randomized, often based on random samples from the data. He left a few questions unanswered regarding the consistency of such rules. In this paper, we give a number of theorems that establish the universal consistency of averaging rules. We also show that some popular classifiers, including one suggested by Breiman, are not universally consistent."

--- The actual nuts and bolts here are too complex for teaching in 350, but I should look carefully at what version of random forests I teach them.
ensemble_methods  random_forests  classifiers  machine_learning  learning_theory  statistics  breiman.leo  biau.gerard  devroye.luc  lugosi.gabor  to_read  to_teach:data-mining 
may 2009 by cshalizi
Stationary Features and Cat Detection
Their "stationary features" sound a bit like "necessary statistics"... which is not a bad thing!
pattern_recognition  statistics  classifiers  boosting  cats  stationary_features  machine_learning  geman.donald  fleuret.francois  to:blog  have_read 
may 2009 by cshalizi
Introduction to Machine Learning - Ethem Alpaydin [@Labyrinth]
Useful introductory survey/reference; mathematically undemanding. Nice that it has chapters on hidden Markov models and various model-combination methods.
books:recommended  machine_learning  classifiers  regression  clustering  markov_models  ensemble_methods 
august 2008 by cshalizi
Online Learning of Complex Prediction Problems Using Simultaneous Projections
"framework for online classification where each online trial consists of multiple prediction tasks that are tied together"
machine_learning  classifiers  prediction  online_learning  to_read 
july 2008 by cshalizi
Application of data compression methods to nonparametric estimation of characteristics of discrete-time stochastic processes - Ryabko
Using universal coding to estimate stationary distributions and predict and classify "[d]iscrete-time stochastic processes [with values in] either a finite set ... or a real line interval"
universal_prediction  to:NB  information_theory  prediction  classifiers  nonparametrics  ryabko.b._ya.  to_read 
february 2008 by cshalizi

related tags

active_learning  bad_data_analysis  bad_science_journalism  biau.gerard  blei.david  blogged  books:noted  books:recommended  boosting  bootstrap  breiman.leo  CART  catoni.olivier  cats  causal_inference  classifiers  clinton.hillary  clustering  collective_cognition  confidence_sets  coveted  cox.amanda  cross-validation  dataset_shift  data_analysis  data_mining  decision_trees  devroye.luc  dimension_reduction  eagle.nathan  ensemble_methods  estimation  evisceration  evolutionary_psychology  expectation-maximization  experimental_psychology  feature_selection  fisher_information  fleuret.francois  fmri  geman.donald  graphical_models  have_read  hilbert_space  hypothesis_testing  induction  information_retrieval  information_theory  in_NB  jones.rosie  kernel_methods  kith_and_kin  klinkner.kristina  lasso  latent_dirichlet_allocation  lazer.david  learning_theory  liberman.mark  linguistics  logistic_regression  lugosi.gabor  machine_learning  markov_models  marriage  methodological_EPIC_FAIL  minimax  model_selection  nearest_neighbors  network_data_analysis  neural_networks  neuroscience  neville.jennifer  nonparametrics  obama.barack  online_learning  outliers  pac-bayesian  pattern_recognition  pentland.alex  perceptron  prediction  primates  probably_approximately_correct  propagation_of_error  random_forests  re:AoS_project  re:democratic_cognition  re:naive-semi-supervised  re:what_is_the_right_null_model_for_linear_regression  re:XV_for_networks  regression  relational_learning  ryabko.b._ya.  search_engines  semi-supervised_learning  social_networks  sparsity  stationary_features  statistical_mechanics  statistics  support_vector_machines  tagging  text_mining  time_series  to:blog  to:NB  topic_models  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  to_teach:undergrad-research  transaction_networks  universal_prediction  us_politics  variable_selection  vc-dimension  via:anoopsarkar  via:arthegall  via:klk  via:shreejoy  vision  wasserman.larry  why_oh_why_cant_we_have_a_better_academic_publishing_system 

Copy this bookmark:



description:


tags: