Vaguery + classification 51
Topic modeling made just simple enough. | The Stone and the Shell
5 weeks ago by Vaguery
"Computer scientists make LDA seem complicated because they care about proving that their algorithms work. And the proof is indeed brain-squashingly hard. But the practice of topic modeling makes good sense on its own, without proof, and does not require you to spend even a second thinking about “Dirichlet distributions.” When the math is approached in a practical way, I think humanists will find it easy, intuitive, and empowering. This post focuses on LDA as shorthand for a broader family of “probabilistic” techniques. I’m going to ask how they work, what they’re for, and what their limits are."
text-processing
classification
algorithms
lovely
two-cultures-only-one-of-which-can-write
5 weeks ago by Vaguery
[1003.5956] Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
11 weeks ago by Vaguery
"…In this paper, we introduce a replay method- ology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbi- ased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show ac- curacy and effectiveness of our offline evaluation method."
classification
recommendations
algorithms
machine-learning
crowdsourcing
nudge-targets
statistics
11 weeks ago by Vaguery
[1112.2316] Complexity-entropy causality plane: a useful approach for distinguishing songs
january 2012 by Vaguery
Nowadays we are often faced with huge databases resulting from the rapid growth of data storage technologies. This is particularly true when dealing with music databases. In this context, it is essential to have techniques and tools able to discriminate properties from these massive sets. In this work, we report on a statistical analysis of more than ten thousand songs aiming to obtain a complexity hierarchy. Our approach is based on the estimation of the permutation entropy combined with an intensive complexity measure, building up the complexity-entropy causality plane. The results obtained indicate that this representation space is very promising to discriminate songs as well as to allow a relative quantitative comparison among songs. Additionally, we believe that the here-reported method may be applied in practical situations since it is simple, robust and has a fast numerical implementation.
signal-processing
classification
data-analysis
clustering
representation
music
nudge-targets
january 2012 by Vaguery
[1108.4135] Complex-Valued Autoencoders
december 2011 by Vaguery
"Autoencoders are unsupervised machine learning circuits whose learning goal is to minimize a distortion measure between inputs and outputs. Linear autoencoders can be defined over any field and only real-valued linear autoencoder have been studied so far. Here we study complex-valued linear autoencoders where the components of the training vectors and adjustable matrices are defined over the complex field with the $L_2$ norm. We provide simpler and more general proofs that unify the real-valued and complex-valued cases, showing that in both cases the landscape of the error function is invariant under certain groups of transformations. The landscape has no local minima, a family of global minima associated with Principal Component Analysis, and many families of saddle points associated with orthogonal projections onto sub-space spanned by sub-optimal subsets of eigenvectors of the covariance matrix. The theory yields several iterative, convergent, learning algorithms, a clear understanding of the generalization properties of the trained autoencoders, and can equally be applied to the hetero-associative case when external targets are provided. Partial results on deep architecture as well as the differential geometry of autoencoders are also presented. The general framework described here is useful to classify autoencoders and identify general common properties that ought to be investigated for each class, illuminating some of the connections between information theory, unsupervised learning, clustering, Hebbian learning, and auto encoders."
neural-networks
machine-learning
classification
encoding
algorithms
nudge-targets
december 2011 by Vaguery
Classifying Heart Sounds Challenge
november 2011 by Vaguery
"According to the World Health Organisation, cardiovascular diseases (CVDs) are the number one cause of death globally: more people die annually from CVDs than from any other cause. An estimated 17.1 million people died from CVDs in 2004, representing 29% of all global deaths. Of these deaths, an estimated 7.2 million were due to coronary heart disease. Any method which can help to detect signs of heart disease could therefore have a significant impact on world health. This challenge is to produce methods to do exactly that. Specifically, we are interested in creating the first level of screening of cardiac pathologies both in a Hospital environment by a doctor (using a digital stethoscope) and at home by the patient (using a mobile device).
The problem is of particular interest to machine learning researchers as it involves classification of audio sample data, where distinguishing between classes of interest is non-trivial. Data is gathered in real-world situations and frequently contains background noise of every conceivable type. The differences between heart sounds corresponding to different heart symptoms can also be extremely subtle and challenging to separate. Success in classifying this form of data requires extremely robust classifiers. Despite its medical significance, to date this is a relatively unexplored application for machine learning."
machine-learning
competition
nudge-targets
classification
segmentation
data-analysis
supervised-learning
The problem is of particular interest to machine learning researchers as it involves classification of audio sample data, where distinguishing between classes of interest is non-trivial. Data is gathered in real-world situations and frequently contains background noise of every conceivable type. The differences between heart sounds corresponding to different heart symptoms can also be extremely subtle and challenging to separate. Success in classifying this form of data requires extremely robust classifiers. Despite its medical significance, to date this is a relatively unexplored application for machine learning."
november 2011 by Vaguery
[1110.1462] Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances
october 2011 by Vaguery
"…To cluster sets of histogram data, we propose to use Dynamic Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is a k-means-like algorithm for clustering a set of individuals into K classes that are apriori fixed. The main aim of this research is to provide a tool for clustering histograms, emphasizing the different contributions of the histogram variables, and their components, to the definition of the clusters. We demonstrate that this can be achieved using adaptive distances.
Two kind of adaptive distances are considered: the first takes into account the variability of each component of each descriptor for the whole set of individuals; the second takes into account the variability of each component of each descriptor in each cluster. We furnish interpretative tools of the obtained partition based on an extension of the classical measures (indexes) to the use of adaptive distances in the clustering criterion function. Applications on synthetic and real-world data corroborate the proposed procedure."
classification
statistics
histograms
metrics
clustering
Two kind of adaptive distances are considered: the first takes into account the variability of each component of each descriptor for the whole set of individuals; the second takes into account the variability of each component of each descriptor in each cluster. We furnish interpretative tools of the obtained partition based on an extension of the classical measures (indexes) to the use of adaptive distances in the clustering criterion function. Applications on synthetic and real-world data corroborate the proposed procedure."
october 2011 by Vaguery
[1105.1033] Adaptively Learning the Crowd Kernel
october 2011 by Vaguery
"We introduce an algorithm that, given n objects, learns a similarity matrix over all n^2 pairs, from crowdsourced data alone. The algorithm samples responses to adaptively chosen triplet-based relative-similarity queries. Each query has the form "is object 'a' more similar to 'b' or to 'c'?" and is chosen to be maximally informative given the preceding responses. The output is an embedding of the objects into Euclidean space (like MDS); we refer to this as the "crowd kernel." SVMs reveal that the crowd kernel captures prominent and subtle features across a number of domains, such as "is striped" among neckties and "vowel vs. consonant" among letters."
classification
ontology-discovery
crowdsourcing
feature-extraction
algorithms
nudge-targets
performance-space-analysis
october 2011 by Vaguery
[1101.4744] Clustering functional data using wavelets
october 2011 by Vaguery
"We present two methods for detecting patterns and clusters in high dimensional time-dependent functional data. Our methods are based on wavelet-based similarity measures, since wavelets are well suited for identifying highly discriminant local time and scale features. The multiresolution aspect of the wavelet transform provides a time-scale decomposition of the signals allowing to visualize and to cluster the functional data into homogeneous groups. For each input function, through its empirical orthogonal wavelet transform the first method uses the distribution of energy across scales generate a handy number of features that can be sufficient to still make the signals well distinguishable. Our new similarity measure combined with an efficient feature selection technique in the wavelet domain is then used within more or less classical clustering algorithms to effectively differentiate among high dimensional populations. The second method uses dissimilarity measures between the whole time-scale representations and are based on wavelet-coherence tools. The clustering is then performed using a k-centroid algorithm starting from these dissimilarities. Practical performance of these methods that jointly designs both the feature selection in the wavelet domain and the classification distance is demonstrated through simulations as well as daily profiles of the French electricity power demand."
classification
time-series
feature-extraction
machine-learning
multiobjective-optimization
ontology-discovery
wavelets
nudge-targets
october 2011 by Vaguery
[1106.4064] Algorithmic Programming Language Identification
october 2011 by Vaguery
"Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier."
algorithms
programming
classification
languages
archives
cute
nudge-targets
october 2011 by Vaguery
[1107.0674] "Memory foam" approach to unsupervised learning
august 2011 by Vaguery
"We propose an alternative approach to construct an artificial learning system, which naturally learns in an unsupervised manner. Its mathematical prototype is a dynamical system, which automatically shapes its vector field in response to the input signal. The vector field converges to a gradient of a multi-dimensional probability density distribution of the input process, taken with negative sign. The most probable patterns are represented by the stable fixed points, whose basins of attraction are formed automatically. The performance of this system is illustrated with musical signals."
machine-learning
classification
learning-from-data
algorithms
nudge-targets
august 2011 by Vaguery
[1107.0550] 3D Terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology
august 2011 by Vaguery
"3D point clouds of natural environments relevant to geomorphology problems (rivers, cliffs...) often require to classify the data into elementary relevant classes. A typical example is the separation of riparian vegetation from soil in fluvial environments, the distinction between fresh surfaces and rockfall in cliff environments, or more generally the classification of surfaces according to their morphology (ripples, grain size...). Natural surfaces are very heterogeneous and their distinctive properties are seldom defined at a unique scale. We have thus defined a multi-scale measure of the point cloud dimensionality around each point. The dimensionality characterizes the local 3D organization of the point cloud and varies from being 1D (points set along a line) to really taking all 3D volume, at each scale. We present the technique and illustrate its efficiency in separating riparian vegetation from ground and classifying a mountain stream in vegetation, rock, gravel and water surface. The superiority of the multi-scale analysis in enhancing class separability and spatial resolution of the classification is also demonstrated. Large scenes can be classified on a commodity laptop in a reasonable time. The technique is robust to missing data and especially shadow zones. The classification is fast and accurate and can account for some degree of intra-class morphological variability such as different vegetation types. A probabilistic confidence in the classification result is given at each point allowing the user to remove the points for which the classification is uncertain. The process can be both fully automated but also fully customized by the user including a graphical definition of the classifiers if so desired. Although developed for fully 3D data, the method can be readily applied to 2.5D airborne LiDAR data."
image-analysis
image-segmentation
learning-from-data
classification
nudge-targets
august 2011 by Vaguery
[0807.1271] Semiparametric curve alignment and shift density estimation for biological data
august 2010 by Vaguery
"Assume that we observe a large number of curves, all of them with identical, although unknown, shape, but with a different random shift. The objective is to estimate the individual time shifts and their distribution. Such an objective appears in several biological applications like neuroscience or ECG signal processing, in which the estimation of the distribution of the elapsed time between repetitive pulses with a possibly low signal-noise ratio, and without a knowledge of the pulse shape is of interest. We suggest an M-estimator leading to a three-stage algorithm: we split our data set in blocks, on which the estimation of the shifts is done by minimizing a cost criterion based on a functional of the periodogram; the estimated shifts are then plugged into a standard density estimator. We show that under mild regularity assumptions the density estimate converges weakly to the true shift distribution. The theory is applied both to simulations and to alignment of real ECG signals.…"
data-analysis
statistics
algorithms
heuristics
exploratory-data-analysis
nudge
optimization
classification
time-series
august 2010 by Vaguery
[1007.0628] Image Pixel Fusion for Human Face Recognition
august 2010 by Vaguery
"In this paper we present a technique for fusion of optical and thermal face images based on image pixel fusion approach. Out of several factors, which affect face recognition performance in case of visual images, illumination changes are a significant factor that needs to be addressed. Thermal images are better in handling illumination conditions but not very consistent in capturing texture details of the faces. Other factors like sunglasses, beard, moustache etc also play active role in adding complicacies to the recognition process. Fusion of thermal and visual images is a solution to overcome the drawbacks present in the individual thermal and visual face images.…"
face-recognition
image-processing
machine-learning
classification
nudge-targets
algorithms
august 2010 by Vaguery
[1003.2941] Universal Regularizers For Robust Sparse Coding and Modeling
august 2010 by Vaguery
"Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many signal and image processing tasks. It is now well understood that the choice of the sparsity regularization term is critical in the success of such models. Based on a codelength minimization interpretation of sparse coding, and using tools from universal coding theory, we propose a framework for designing sparsity regularization terms which have theoretical and practical advantages when compared to the more standard l0 or l1 ones. The presentation of the framework and theoretical foundations is complemented with examples that show its practical advantages in image denoising, zooming and classification."
nudge-targets
classification
image-analysis
image-processing
compression
sparse-coding
august 2010 by Vaguery
[1007.0621] Fusion of Daubechies Wavelet Coefficients for Human Face Recognition
august 2010 by Vaguery
"In this paper fusion of visual and thermal images in wavelet transformed domain has been presented. Here, Daubechies wavelet transform, called as D2, coefficients from visual and corresponding coefficients computed in the same manner from thermal images are combined to get fused coefficients. After decomposition up to fifth level (Level 5) fusion of coefficients is done. Inverse Daubechies wavelet transform of those coefficients gives us fused face images. The main advantage of using wavelet transform is that it is well-suited to manage different image resolution and allows the image decomposition in different kinds of coefficients, while preserving the image information.…"
image-processing
image-segmentation
nudge-targets
algorithms
optimization
classification
august 2010 by Vaguery
[1005.5141] Constructing Positive Definite Elastic Kernels with Application to Time Series Classification
august 2010 by Vaguery
"This paper proposes some extensions to the work on kernels dedicated to string alignment (biological sequence alignment) based on the summing up of scores obtained by local alignments with gaps. The extensions we propose allow to construct, from classical time-warp distances, what we called summative time-warp kernels that are positive definite if some simple sufficient conditions are satisfied. Furthermore, from the same formalism, we derive a time-warp inner product that extends the usual euclidean inner product, providing the capability to handle discrete sequences or time series of variable lengths in an Hilbert space. The classification experiment we conducted, using either first near neighbor classifier or Support Vector Machine classifier leads to conclude that the positive definite elastic kernels we propose outperform the distance substituting kernels for the classical elastic distances we tested.…"
time-series
data-analysis
nudge-targets
classification
machine-learning
algorithms
august 2010 by Vaguery
[1007.3254] Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks
august 2010 by Vaguery
"We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with $N$ independent words as vertices or nodes, and edges or links allotted to words occurring within $m$ places of a given vertex; we call $m$ the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance $m$. The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution $P(k)$ and clustering coefficient $C(k)$; we also studied the mean geodesic distance, and found all our texts were small-world networks.…"
nudge-targets
computational-linguistics
linguistics
classification
machine-learning
statistics
natural-language-processing
august 2010 by Vaguery
[1006.5731] A Taxonomy of Networks
july 2010 by Vaguery
"The study of networks has grown into a substantial interdisciplinary endeavor across the natural, social, and information sciences. Yet there have been very few attempts to investigate the interrelatedness of the different classes of networks studied by different disciplines. Here, we introduced a framework to establish a taxonomy of networks from various origins. The provision of this family tree not only helps understand the kinship of networks, but also facilitates the transfer of empirical analysis, theoretical modeling, and conceptual developments across disciplinary boundaries. The framework is based on probing the mesoscopic properties of networks, an important source of heterogeneity for their structure and function. Using our method, we computed a taxonomy for 752 individual networks and a separate taxonomy for 12 network classes. We also computed three within-class taxonomies for political, fungal, and financial networks, and found them to be insightful in each case."
nudge-targets
classification
models
network-theory
statistics
complexology
ontology
taxonomy
july 2010 by Vaguery
[1005.4803] Hirsch index as a network centrality measure
july 2010 by Vaguery
"…The h index is compared with the Degree centrality (a local measure), the Betweenness and Eigenvector centralities (two non-local measures) in the case of a biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II). In both networks, the Hirsch index has poor correlation with Betweenness centrality but correlates well with Eigenvector centrality, specially for the more important nodes that are relevant for ranking purposes, say in Search Engine Optimization. In the thesaurus network, the h index seems even to outperform the Eigenvector centrality measure as evaluated by simple linguistic criteria."
network-theory
linguistics
search-engines
algorithms
nudge-targets
classification
machine-learning
july 2010 by Vaguery
[1006.4175] Optimization of Weighted Curvature for Image Segmentation
june 2010 by Vaguery
"Minimization of boundary curvature is a classic regularization technique for image segmentation in the presence of noisy image data. Techniques for minimizing curvature have historically been derived from descent methods which could be trapped in a local minimum and therefore required a good initialization. Recently, combinatorial optimization techniques have been applied to the optimization of curvature which provide a solution that achieves nearly a global optimum. However, when applied to image segmentation these methods required a meaningful data term. Unfortunately, for many images, particularly medical images, it is difficult to find a meaningful data term. Therefore, we propose to remove the data term completely and instead weight the curvature locally, while still achieving a global optimum."
image-segmentation
image-analysis
classification
machine-learning
algorithms
nudge-targets
medical-technology
june 2010 by Vaguery
[0902.0600] Decisional States
june 2010 by Vaguery
"…The intrinsic underlying structure of the system is modeled by an epsilon-machine and its causal states. The decisional states are the emerging patterns corresponding to the utility function. In a complex systems perspective, these patterns thus form a partition of the lower-level system states that is defined according to the higher-level user's knowledge. The transitions between these decisional states correspond to events that lead to a change of decision. An algorithm is provided so as to estimate the states and their transitions from data. Application examples are given for hidden model reconstruction, cellular automata filtering, and edge detection in images."
computational-mechanics
information-theory
prediction
statistics
probability-theory
machine-learning
classification
june 2010 by Vaguery
[1006.1015] Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees
june 2010 by Vaguery
"Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960's. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and `generalizability' of these summaries. This paper provides an implementation of the geometric distance between trees developed by Billera, Holmes and Vogtmann (2001) [BHV] equally applicable to phylogenetic trees and hieirarchical clustering trees, and shows some of the applications in statistical inference for which this distance can be useful.…Our method gives a new way of evaluating the influence both of certain columns (positions, variables or genes) and of certain rows (whether species, observations or arrays)."
clustering
algorithms
statistics
models
classification
learning-from-data
june 2010 by Vaguery
[1005.5636] Astrocladistics: Multivariate Evolutionary Analysis in Astrophysics
june 2010 by Vaguery
"It is now clear that cladistics can be applied and be useful to the study of galaxy diversification. Many difficulties, conceptual and practical, have been solved,. Significant astrophysical results have been obtained and will be extended to larger samples of galaxies and globular clusters. However, many paths remain in the exploration of this new and large field of research."
astronomy
classification
cladistics
inference
nudge-targets
learning-from-data
model-discovery
june 2010 by Vaguery
[1006.3541] Complexity dichotomy on partial grid recognition
june 2010 by Vaguery
"Deciding whether a graph can be embedded in a grid using only unit-length edges is NP-complete, even when restricted to binary trees. However, it is not difficult to devise a number of graph classes for which the problem is polynomial, even trivial. A natural step, outstanding thus far, was to provide a broad classification of graphs that make for polynomial or NP-complete instances. We provide such a classification based on the set of allowed vertex degrees in the input graphs, yielding a full dichotomy on the complexity of the problem. As byproducts, the previous NP-completeness result for binary trees was strengthened to strictly binary trees, and the three-dimensional version of the problem was for the first time proven to be NP-complete. Our results were made possible by introducing the concepts of consistent orientations and robust gadgets, and by showing how the former allows NP-completeness proofs by local replacement even in the absence of the latter."
algorithms
graph-theory
classification
machine-learning
nudge-targets
geometry
recognition-problems
june 2010 by Vaguery
[1006.0051] Image information content characterization and classification by physical complexity
june 2010 by Vaguery
"We present a method for estimating the complexity of an image based on the concept of logical depth. Unlike the application of the concept of algorithmic complexity by itself, the addition of the concept of logical depth results in a characterization of objects by organizational (physical) complexity. We use this measure to classify images by their information content. The method provides a means for evaluating and classifying objects by way of their visual representations."
image-processing
algorithms
information-theory
nudge-targets
classification
june 2010 by Vaguery
[1004.3925] Classification using distance nearest neighbours
june 2010 by Vaguery
"This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away.…"
classification
machine-learning
markov-random-field
algorithms
learning-from-data
june 2010 by Vaguery
[1005.5086] Classification of interstitial lung disease patterns with topological texture features
may 2010 by Vaguery
"… The results indicate that advanced topological texture features can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods."
image-processing
medical-technology
diagnosis
nudge-targets
classification
machine-learning
may 2010 by Vaguery
[1005.4376] Characterizing the community structure of complex networks
may 2010 by Vaguery
"Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks. We present a systematic empirical analysis of the statistical properties of communities in large information, communication, technological, biological, and social networks. We find that the mesoscopic organization of networks of the same category is remarkably similar. This is reflected in several characteristics of community structure, which can be used as ``fingerprints'' of specific network categories.…"
social-networks
network-theory
classification
empirical-economics
physics
sociology
complexology
may 2010 by Vaguery
[1005.0957] ECG Feature Extraction Techniques - A Survey Approach
may 2010 by Vaguery
"ECG Feature Extraction plays a significant role in diagnosing most of the cardiac diseases. One cardiac cycle in an ECG signal consists of the P-QRS-T waves. This feature extraction scheme determines the amplitudes and intervals in the ECG signal for subsequent analysis. The amplitudes and intervals value of P-QRS-T segment determines the functioning of heart of every human. Recently, numerous research and techniques have been developed for analyzing the ECG signal. The proposed schemes were mostly based on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. All these techniques and algorithms have their advantages and limitations.…
nudge-targets
machine-learning
classification
learning-from-data
diagnostics
medicine
may 2010 by Vaguery
mperham's bayes_motel at master - GitHub
april 2010 by Vaguery
"BayesMotel is a multi-variate Bayesian classification engine. There are two steps to Bayesian classification:
Training You provide a set of variables along with the proper classification for that set.
Runtime You provide a set of variables and ask for the proper classification according to the training in Step 1.
Commonly this is used for spam detection. You will provide a corpus of emails or other data along with a "Spam/NotSpam" classification. The library will determine which variables affect the classification and use that to judge future data."
Ruby
rubygem
Bayesian
classification
statistics
learning-from-data
machine-learning
algorithms
Training You provide a set of variables along with the proper classification for that set.
Runtime You provide a set of variables and ask for the proper classification according to the training in Step 1.
Commonly this is used for spam detection. You will provide a corpus of emails or other data along with a "Spam/NotSpam" classification. The library will determine which variables affect the classification and use that to judge future data."
april 2010 by Vaguery
[1001.5210] Supernova Photometric Classification Challenge
march 2010 by Vaguery
"The goals of this challenge are to (1) learn the relative strengths and weaknesses of the different classification algorithms, (2) use the results to improve classification algorithms, and (3) understand what spectroscopically confirmed sub-sets are needed to properly train these algorithms. The challenge is available at www.hep.anl.gov/SNchallenge, and the due date for classifications is May 1, 2010."
classification
learning-from-data
modeling
challenges
astronomy
statistics
nudge
nudge-targets
march 2010 by Vaguery
Head & Neck Oncology | Full text | Potential for Raman spectroscopy to provide cancer screening using a peripheral blood sample
december 2009 by Vaguery
"The mean spectra were provided as input sequences to the Implicit Context Representation Cartesian Genetic Programming algorithm (IRCGP)[14,15]. IRCGP uses evolutionary computing methodology to learn classifiers that are capable of distinguishing between data classes. Induced classifiers take the form of programmatic expressions applied to particular offsets within the input data sequences. These expressions are composed from a set of simple mathematical functions. Both the choice and connectivity of the functions, and the choice of offsets used within the input sequences, are determined by the algorithm's evolutionary process. The input sequences were divided equally into training and test sets. To prevent over-learning, training of the classifiers was stopped once classification accuracy of the test sequences started to fall."
genetic-programming
clinical
diagnosis
nudge
spectroscopy
applied-mathematics
machine-learning
classification
december 2009 by Vaguery
Linear Classifiers and Loss Functions « Justin Domke’s Weblog
february 2009 by Vaguery
"So, in summary– a drop in classification error on test data from .941 to .078. Thats a 17% drop. (Or a 21% drop, depending upon which rate you use as a base.) This from a method that you can implement in basically zero extra work if you already have a linear classifier. Seems worth a try."
classification
machine-learning
statistics
methodologies
heuristics
learning-from-data
february 2009 by Vaguery
Science Musings by Chet Raymo
july 2007 by Vaguery
"When the mind fixates on absolute discontinuities, mischief is often in the offing..."
heuristics
biology
learning
classification
advice
Richard-Dawkins
gray-area
july 2007 by Vaguery
related tags
academia ⊕ academic ⊕ advice ⊕ AJAX ⊕ algorithms ⊕ analysis ⊕ analytics ⊕ applications ⊕ applied-mathematics ⊕ archives ⊕ astronomy ⊕ Bayesian ⊕ bioinformatics ⊕ biology ⊕ book-review ⊕ browser ⊕ catalog ⊕ cataloging ⊕ categorization ⊕ CFP ⊕ challenge ⊕ challenges ⊕ cladistics ⊕ classification ⊖ clinical ⊕ clustering ⊕ collaboration ⊕ competition ⊕ complexology ⊕ compression ⊕ computational-linguistics ⊕ computational-mechanics ⊕ conferences ⊕ contest ⊕ criticism ⊕ crowdmining ⊕ crowdsourcing ⊕ cultural-norms ⊕ cute ⊕ data-analysis ⊕ data-mining ⊕ database ⊕ dataset ⊕ del.icio.us ⊕ diagnosis ⊕ diagnostics ⊕ digitization ⊕ empirical-economics ⊕ encoding ⊕ essays ⊕ evolution ⊕ exploratory-data-analysis ⊕ face-recognition ⊕ faceted ⊕ feature-detection ⊕ feature-extraction ⊕ folksonomy ⊕ galaxy-zoo ⊕ genetic-programming ⊕ geometry ⊕ GPL ⊕ graph-theory ⊕ gray-area ⊕ healthcare ⊕ heuristics ⊕ histograms ⊕ humor ⊕ image-analysis ⊕ image-processing ⊕ image-segmentation ⊕ indexing ⊕ inference ⊕ information-theory ⊕ innovation ⊕ journals ⊕ KDD ⊕ KM ⊕ languages ⊕ learning ⊕ learning-from-data ⊕ librarians ⊕ libraries ⊕ library ⊕ library2.0 ⊕ LibraryThing ⊕ linguistics ⊕ lovely ⊕ lumpers ⊕ machine-learning ⊕ magazines ⊕ markov-random-field ⊕ mathematics ⊕ medical-technology ⊕ medicine ⊕ mental-health ⊕ metadata ⊕ metaoptimization ⊕ methodologies ⊕ metrics ⊕ mining ⊕ model-discovery ⊕ modeling ⊕ models ⊕ multiobjective-optimization ⊕ music ⊕ myths ⊕ natural-language-processing ⊕ network-theory ⊕ neural-networks ⊕ nudge ⊕ nudge-targets ⊕ objectivity ⊕ observation ⊕ OCR ⊕ ontology ⊕ ontology-discovery ⊕ open-source ⊕ optimization ⊕ performance-space-analysis ⊕ philosophy ⊕ phylogenetics ⊕ physics ⊕ popularization ⊕ prediction ⊕ probability-theory ⊕ programming ⊕ psychology ⊕ Python ⊕ R ⊕ RDF ⊕ recognition-problems ⊕ recommendations ⊕ representation ⊕ Richard-Dawkins ⊕ Ruby ⊕ rubygem ⊕ science ⊕ search-engines ⊕ segmentation ⊕ semantic-web ⊕ service ⊕ signal-processing ⊕ social ⊕ social-networks ⊕ social-norms ⊕ sociology ⊕ software ⊕ sparse-coding ⊕ spectroscopy ⊕ splitters ⊕ statistics ⊕ supervised-learning ⊕ tagging ⊕ tags ⊕ taxonomy ⊕ technical ⊕ text-processing ⊕ theory ⊕ time-series ⊕ tutorial ⊕ two-cultures-only-one-of-which-can-write ⊕ via:arthegall ⊕ via:heidigoseek ⊕ visual-programming ⊕ visualization ⊕ wavelets ⊕ web ⊕ web2.0 ⊕Copy this bookmark: