Vaguery + machine-learning   121

[1003.5956] Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
"…In this paper, we introduce a replay method- ology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbi- ased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show ac- curacy and effectiveness of our offline evaluation method."
classification  recommendations  algorithms  machine-learning  crowdsourcing  nudge-targets  statistics 
11 weeks ago by Vaguery
[1201.6583] Empowerment for Continuous Agent-Environment Systems
"This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces.…"
agent-based  emergent-design  robotics  engineering-design  machine-learning  empowerment  nudge 
february 2012 by Vaguery
[1109.2618] Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schr"odinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ~10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.
machine-learning  learning-from-data  biochemistry  computational-science  nudge-targets 
january 2012 by Vaguery
[1101.2135] Bounded confidence model: addressed information maintain diversity of opinions
A community of agents is subject to a stream of messages, which are represented as points on a plane of issues. Messages are sent by media and by agents themselves. Messages from media shape the public opinion. They are unbiased, i.e. positive and negative opinions on a given issue appear with equal frequencies. In our previous work, the only criterion to receive a message by an agent is if the distance between this message and the ones received earlier does not exceed the given value of the tolerance parameter. Here we introduce a possibility to address a message to a given neighbour. We show that this option reduces the unanimity effect, what improves the collective performance.
agent-based  communication  network-theory  machine-learning  diversity 
january 2012 by Vaguery
[1111.1797] Analysis of Thompson Sampling for the multi-armed bandit problem
e multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to play according to its probability of being the best arm. Thompson Sampling algorithm has experimentally been shown to be close to optimal. In addition, it is efficient to implement and exhibits several desirable properties such as small regret for delayed feedback. However, theoretical understanding of this algorithm was quite limited. In this paper, for the first time, we show that Thompson Sampling algorithm achieves logarithmic expected regret for the multi-armed bandit problem. More precisely, for the two-armed bandit problem, the expected regret in time $T$ is $O(frac{ln T}{Delta} + frac{1}{Delta^3})$. And, for the $N$-armed bandit problem, the expected regret in time $T$ is $O([(sum_{i=2}^N frac{1}{Delta_i^2})^2] ln T)$. Our bounds are optimal but for the dependence on $Delta_i$ and the constant factors in big-Oh.
probability-theory  machine-learning  exploitation-exploration  nudge-targets  game-theory 
january 2012 by Vaguery
[1108.4135] Complex-Valued Autoencoders
"Autoencoders are unsupervised machine learning circuits whose learning goal is to minimize a distortion measure between inputs and outputs. Linear autoencoders can be defined over any field and only real-valued linear autoencoder have been studied so far. Here we study complex-valued linear autoencoders where the components of the training vectors and adjustable matrices are defined over the complex field with the $L_2$ norm. We provide simpler and more general proofs that unify the real-valued and complex-valued cases, showing that in both cases the landscape of the error function is invariant under certain groups of transformations. The landscape has no local minima, a family of global minima associated with Principal Component Analysis, and many families of saddle points associated with orthogonal projections onto sub-space spanned by sub-optimal subsets of eigenvectors of the covariance matrix. The theory yields several iterative, convergent, learning algorithms, a clear understanding of the generalization properties of the trained autoencoders, and can equally be applied to the hetero-associative case when external targets are provided. Partial results on deep architecture as well as the differential geometry of autoencoders are also presented. The general framework described here is useful to classify autoencoders and identify general common properties that ought to be investigated for each class, illuminating some of the connections between information theory, unsupervised learning, clustering, Hebbian learning, and auto encoders."
neural-networks  machine-learning  classification  encoding  algorithms  nudge-targets 
december 2011 by Vaguery
[1110.0585] Discriminately Decreasing Discriminability with Learned Image Filters
"In machine learning and computer vision, input images are often filtered to increase data discriminability. In some situations, however, one may wish to purposely decrease discriminability of one classification task (a "distractor" task), while simultaneously preserving information relevant to another (the task-of-interest): For example, it may be important to mask the identity of persons contained in face images before submitting them to a crowdsourcing site (e.g., Mechanical Turk) when labeling them for certain facial attributes. Another example is inter-dataset generalization: when training on a dataset with a particular covariance structure among multiple attributes, it may be useful to suppress one attribute while preserving another so that a trained classifier does not learn spurious correlations between attributes. In this paper we present an algorithm that finds optimal filters to give high discriminability to one task while simultaneously giving low discriminability to a distractor task. We present results showing the effectiveness of the proposed technique on both simulated data and natural face images."
machine-learning  data-preparation  filtering  algorithms  nudge-targets 
december 2011 by Vaguery
Classifying Heart Sounds Challenge
"According to the World Health Organisation, cardiovascular diseases (CVDs) are the number one cause of death globally: more people die annually from CVDs than from any other cause. An estimated 17.1 million people died from CVDs in 2004, representing 29% of all global deaths. Of these deaths, an estimated 7.2 million were due to coronary heart disease. Any method which can help to detect signs of heart disease could therefore have a significant impact on world health. This challenge is to produce methods to do exactly that. Specifically, we are interested in creating the first level of screening of cardiac pathologies both in a Hospital environment by a doctor (using a digital stethoscope) and at home by the patient (using a mobile device).

The problem is of particular interest to machine learning researchers as it involves classification of audio sample data, where distinguishing between classes of interest is non-trivial. Data is gathered in real-world situations and frequently contains background noise of every conceivable type. The differences between heart sounds corresponding to different heart symptoms can also be extremely subtle and challenging to separate. Success in classifying this form of data requires extremely robust classifiers. Despite its medical significance, to date this is a relatively unexplored application for machine learning."
machine-learning  competition  nudge-targets  classification  segmentation  data-analysis  supervised-learning 
november 2011 by Vaguery
[1110.1391] A Comparison of Different Machine Transliteration Models
"Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance."
natural-language-processing  machine-learning  review  nudge-targets 
october 2011 by Vaguery
[1106.5264] Acquiring Correct Knowledge for Natural Language Generation
"Natural language generation (NLG) systems are computer software systems that produce texts in English and other human languages, often from non-linguistic input data. NLG systems, like most AI systems, need substantial amounts of knowledge. However, our experience in two NLG projects suggests that it is difficult to acquire correct knowledge for NLG systems; indeed, every knowledge acquisition (KA) technique we tried had significant problems. In general terms, these problems were due to the complexity, novelty, and poorly understood nature of the tasks our systems attempted, and were worsened by the fact that people write so differently. This meant in particular that corpus-based KA approaches suffered because it was impossible to assemble a sizable corpus of high-quality consistent manually written texts in our domains; and structured expert-oriented KA techniques suffered because experts disagreed and because we could not get enough information about special and unusual cases to build robust systems. We believe that such problems are likely to affect many other NLG systems as well. In the long term, we hope that new KA techniques may emerge to help NLG system builders. In the shorter term, we believe that understanding how individual KA techniques can fail, and using a mixture of different KA techniques with different strengths and weaknesses, can help developers acquire NLG knowledge that is mostly correct."
natural-language-processing  artificial-intelligence  interesting-problems  high-hanging-fruit  machine-learning  nudge-targets 
october 2011 by Vaguery
[1101.4744] Clustering functional data using wavelets
"We present two methods for detecting patterns and clusters in high dimensional time-dependent functional data. Our methods are based on wavelet-based similarity measures, since wavelets are well suited for identifying highly discriminant local time and scale features. The multiresolution aspect of the wavelet transform provides a time-scale decomposition of the signals allowing to visualize and to cluster the functional data into homogeneous groups. For each input function, through its empirical orthogonal wavelet transform the first method uses the distribution of energy across scales generate a handy number of features that can be sufficient to still make the signals well distinguishable. Our new similarity measure combined with an efficient feature selection technique in the wavelet domain is then used within more or less classical clustering algorithms to effectively differentiate among high dimensional populations. The second method uses dissimilarity measures between the whole time-scale representations and are based on wavelet-coherence tools. The clustering is then performed using a k-centroid algorithm starting from these dissimilarities. Practical performance of these methods that jointly designs both the feature selection in the wavelet domain and the classification distance is demonstrated through simulations as well as daily profiles of the French electricity power demand."
classification  time-series  feature-extraction  machine-learning  multiobjective-optimization  ontology-discovery  wavelets  nudge-targets 
october 2011 by Vaguery
[1109.4920] Beyond pixels and regions: A non local patch means (NLPM) method for content-level restoration, enhancement, and reconstruction of degraded document images
"A patch-based non-local restoration and reconstruction method for preprocessing degraded document images is introduced. The method collects relative data from the whole input image, while the image data are first represented by a content-level descriptor based on patches. This patch-equivalent representation of the input image is then corrected based on similar patches identified using a modified genetic algorithm (GA) resulting in a low computational load. The corrected patch-equivalent is then converted to the output restored image. The fact that the method uses the patches at the content level allows it to incorporate high-level restoration in an objective and self-sufficient way. The method has been applied to several degraded document images, including the DIBCO'09 contest dataset with promising results."
digitization  algorithms  OCR  archives  machine-learning  nudge-targets 
october 2011 by Vaguery
[1101.4003] Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems
"In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior."
planning  machine-learning  nudge-targets  easy-pickins 
august 2011 by Vaguery
[1107.1322] Text Classification: A Sequential Reading Approach
"We propose to model the text classification process as a sequential decision process. In this process, an agent learns to classify documents into topics while reading the document sentences sequentially and learns to stop as soon as enough information was read for deciding. The proposed algorithm is based on a modelisation of Text Classification as a Markov Decision Process and learns by using Reinforcement Learning. Experiments on four different classical mono-label corpora show that the proposed approach performs comparably to classical SVM approaches for large training sets, and better for small training sets. In addition, the model automatically adapts its reading process to the quantity of training information provided."
text-classification  natural-language-processing  machine-learning  nudge-targets 
august 2011 by Vaguery
[1107.0674] "Memory foam" approach to unsupervised learning
"We propose an alternative approach to construct an artificial learning system, which naturally learns in an unsupervised manner. Its mathematical prototype is a dynamical system, which automatically shapes its vector field in response to the input signal. The vector field converges to a gradient of a multi-dimensional probability density distribution of the input process, taken with negative sign. The most probable patterns are represented by the stable fixed points, whose basins of attraction are formed automatically. The performance of this system is illustrated with musical signals."
machine-learning  classification  learning-from-data  algorithms  nudge-targets 
august 2011 by Vaguery
[1103.0086] A generic trust framework for large-scale open systems using machine learning
"… As a departure from such traditional trust models, we propose a generic, machine learning approach based trust framework where an agent uses its own previous transactions (with other agents) to build a knowledge base, and utilize this to assess the trustworthiness of a transaction based on associated features, which are capable of distinguishing successful transactions from unsuccessful ones. These features are harnessed using appropriate machine learning algorithms to extract relationships between the potential transaction and previous transactions.…"
machine-learning  social-networks  emergent-design  trust  agent-based  from delicious
april 2011 by Vaguery
[1102.3220] A signal recovery algorithm for sparse matrix based compressed sensing
"Even when the numbers of non-zero entries per column/row in the measurement matrices are limited to $O(1)$, numerical experiments indicate that the algorithm can still typically recover the original signal perfectly with an $O(N)$ computational cost per update as well if the density $\rho$ of non-zero entries of the signal is lower than a certain critical value $\rho_{\rm th}(\alpha)$ as $N,M \to \infty$."
compressed-sensing  algorithms  signal-processing  nudge-targets  machine-learning  statistics  from delicious
april 2011 by Vaguery
[1007.0628] Image Pixel Fusion for Human Face Recognition
"In this paper we present a technique for fusion of optical and thermal face images based on image pixel fusion approach. Out of several factors, which affect face recognition performance in case of visual images, illumination changes are a significant factor that needs to be addressed. Thermal images are better in handling illumination conditions but not very consistent in capturing texture details of the faces. Other factors like sunglasses, beard, moustache etc also play active role in adding complicacies to the recognition process. Fusion of thermal and visual images is a solution to overcome the drawbacks present in the individual thermal and visual face images.…"
face-recognition  image-processing  machine-learning  classification  nudge-targets  algorithms 
august 2010 by Vaguery
[1008.1663] Learning Residual Finite-State Automata Using Observation Tables
"We define a two-step learner for RFSAs based on an observation table by using an algorithm for minimal DFAs to build a table for the reversal of the language in question and showing that we can derive the minimal RFSA from it after some simple modifications. We compare the algorithm to two other table-based ones of which one (by Bollig et al. 2009) infers a RFSA directly, and the other is another two-step learner proposed by the author. We focus on the criterion of query complexity."
finite-state-machine  machine-learning  algorithms  nudge-targets  learning-from-data  inference 
august 2010 by Vaguery
[1008.1414] Statistically validated networks in bipartite complex systems
"Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. … Here we introduce an unsupervised method to statistically validate each link of the projected network against a null hypothesis taking into account the heterogeneity of the system. We apply our method to three different systems…. In all these systems, both different in size and level of heterogeneity, we find that our method is able to detect network structures which are informative about the system…"
complexology  network-theory  algorithms  machine-learning  nudge-targets  inference  statistics 
august 2010 by Vaguery
[1007.0638] Human Face Recognition using Line Features
"In this work we investigate a novel approach to handle the challenges of face recognition, which includes rotation, scale, occlusion, illumination etc. Here, we have used thermal face images as those are capable to minimize the affect of illumination changes and occlusion due to moustache, beards, adornments etc. The proposed approach registers the training and testing thermal face images in polar coordinate, which is capable to handle complicacies introduced by scaling and rotation. Line features are extracted from thermal polar images and feature vectors are constructed using these line. Feature vectors thus obtained passes through principal component analysis (PCA) for the dimensionality reduction of feature vectors.…"
nudge-targets  image-processing  face-recognition  machine-learning  algorithms 
august 2010 by Vaguery
Nanex - Market Crop Circle Of The Day
"As we continue to monitor the markets for evidence of Quote Stuffing and Strange Sequences (Crop Circles), we find that there are dozens if not hundreds of examples to choose from on any given day. As such, this page will be updated often with charts demonstrating this activity.

The common theme with the charts shown on this page is they are obviously all generated in code and are algorithmic. Some demonstrate bizarre price or size cycling, some demonstrate large burst of quotes in extremely short time frames and some will demonstrate both. In most cases these sequences are from a single exchange with no other exchange quoting in the same time frame."
machine-learning  trading  financial-engineering  skynet  data-analysis  emergent-design  technical-analysis  behavioral-finance 
august 2010 by Vaguery
[1003.0470] Unsupervised Supervised Learning II: Training Margin Based Classifiers without Labels
"On a more philosophical level, our approach points at novel questions that go beyond supervised and semi-supervised learning. What benefit do labels provide over unsupervised training? Can our framework be extended to semi-supervised learning where a few labels do exist? Can it be extended to non-classification scenarios such as margin based regression or margin based structured prediction? When are the assumptions likely to hold and how can we make our framework even more resistant to deviations from them? These questions and others form new and exciting open research directions."
unsupervised-learning  supervised-learning  learning-from-data  machine-learning  regression  modeling 
august 2010 by Vaguery
[1007.0636] Classification of Log-Polar-Visual Eigenfaces using Multilayer Perceptron
"In this paper we present a simple novel approach to tackle the challenges of scaling and rotation of face images in face recognition. The proposed approach registers the training and testing visual face images by log-polar transformation, which is capable to handle complicacies introduced by scaling and rotation. Log-polar images are projected into eigenspace and finally classified using an improved multi-layer perceptron. In the experiments we have used ORL face database and Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database for visual face images. Experimental results show that the proposed approach significantly improves the recognition performances from visual to log-polar-visual face images. …"
image-processing  nudge-targets  algorithms  machine-learning  security  image-segmentation 
august 2010 by Vaguery
[1007.0631] Classification of Fused Images using Radial Basis Function Neural Network for Human Face Recognition
"Here an efficient fusion technique for automatic face recognition has been presented. Fusion of visual and thermal images has been done to take the advantages of thermal images as well as visual images. By employing fusion a new image can be obtained, which provides the most detailed, reliable, and discriminating information. In this method fused images are generated using visual and thermal face images in the first step. In the second step, fused images are projected into eigenspace and finally classified using a radial basis function neural network. In the experiments Object Tracking and Classification Beyond Visible Spectrum (OTCBVS) database benchmark for thermal and visual face images have been used. Experimental results show that the proposed approach performs well in recognizing unknown individuals with a maximum success rate of 96%."
image-processing  face-recognition  nudge-targets  algorithms  machine-learning 
august 2010 by Vaguery
[1005.5141] Constructing Positive Definite Elastic Kernels with Application to Time Series Classification
"This paper proposes some extensions to the work on kernels dedicated to string alignment (biological sequence alignment) based on the summing up of scores obtained by local alignments with gaps. The extensions we propose allow to construct, from classical time-warp distances, what we called summative time-warp kernels that are positive definite if some simple sufficient conditions are satisfied. Furthermore, from the same formalism, we derive a time-warp inner product that extends the usual euclidean inner product, providing the capability to handle discrete sequences or time series of variable lengths in an Hilbert space. The classification experiment we conducted, using either first near neighbor classifier or Support Vector Machine classifier leads to conclude that the positive definite elastic kernels we propose outperform the distance substituting kernels for the classical elastic distances we tested.…"
time-series  data-analysis  nudge-targets  classification  machine-learning  algorithms 
august 2010 by Vaguery
[1007.1708] A Study on the Effectiveness of Different Patch Size and Shape for Eyes and Mouth Detection
"Template matching is one of the simplest methods used for eyes and mouth detection. However, it can be modified and extended to become a powerful tool. Since the patch itself plays a significant role in optimizing detection performance, a study on the influence of patch size and shape is carried out. The optimum patch size and shape is determined using the proposed method. Usually, template matching is also combined with other methods in order to improve detection accuracy. Thus, in this paper, the effectiveness of two image processing methods i.e. grayscale and Haar wavelet transform, when used with template matching are analyzed."
nudge-targets  image-processing  image-segmentation  machine-learning  algorithms 
august 2010 by Vaguery
[1007.0626] Fusion of Wavelet Coefficients from Visual and Thermal Face Images for Human Face Recognition - A Comparative Study
"In this paper we present a comparative study on fusion of visual and thermal images using different wavelet transformations. Here, coefficients of discrete wavelet transforms from both visual and thermal images are computed separately and combined. Next, inverse discrete wavelet transformation is taken in order to obtain fused face image. Both Haar and Daubechies (db2) wavelet transforms have been used to compare recognition results. For experiments IRIS Thermal/Visual Face Database was used. Experimental results using Haar and Daubechies wavelets show that the performance of the approach presented here achieves maximum success rate of 100% in many cases."
image-analysis  wavelets  machine-learning  algorithms  nudge-targets 
august 2010 by Vaguery
[1007.3254] Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks
"We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with $N$ independent words as vertices or nodes, and edges or links allotted to words occurring within $m$ places of a given vertex; we call $m$ the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance $m$. The literature samples were found to be most effectively represented by their corresponding power laws over degree distribution $P(k)$ and clustering coefficient $C(k)$; we also studied the mean geodesic distance, and found all our texts were small-world networks.…"
nudge-targets  computational-linguistics  linguistics  classification  machine-learning  statistics  natural-language-processing 
august 2010 by Vaguery
[1005.4803] Hirsch index as a network centrality measure
"…The h index is compared with the Degree centrality (a local measure), the Betweenness and Eigenvector centralities (two non-local measures) in the case of a biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II). In both networks, the Hirsch index has poor correlation with Betweenness centrality but correlates well with Eigenvector centrality, specially for the more important nodes that are relevant for ranking purposes, say in Search Engine Optimization. In the thesaurus network, the h index seems even to outperform the Eigenvector centrality measure as evaluated by simple linguistic criteria."
network-theory  linguistics  search-engines  algorithms  nudge-targets  classification  machine-learning 
july 2010 by Vaguery
[0903.5066] Modified-CS: Modifying Compressive Sensing for Problems with Partially Known Support
"We study the problem of reconstructing a sparse signal from a limited number of its linear projections when a part of its support is known, although the known part may contain some errors. The ``known" part of the support, denoted T, may be available from prior knowledge. Alternatively, in a problem of recursively reconstructing time sequences of sparse spatial signals, one may use the support estimate from the previous time instant as the ``known" part. The idea of our proposed solution (modified-CS) is to solve a convex relaxation of the following problem: find the signal that satisfies the data constraint and is sparsest outside of T.…"
compressed-sensing  algorithms  machine-learning  statistics  signal-processing  nudge-targets  data-analysis 
july 2010 by Vaguery
[1007.3799] Adapting to the Shifting Intent of Search Queries
"Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. … This paper shows that the signals a search engine receives can be used to both determine that a shift in intent has happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases."
search-engines  search-algorithms  machine-learning  social-dynamics  algorithms  nudge-targets  intelligence-gathering  data-analysis 
july 2010 by Vaguery
[1007.4191] Fast Moment Estimation in Data Streams in Optimal Space
"We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous space-optimal algorithm of [Kane-Nelson-Woodruff, SODA 2010], which had update time Omega(1/eps^2)."
nudge-targets  algorithms  data-analysis  online-learning  machine-learning  computational-complexity  statistics 
july 2010 by Vaguery
[0906.0231] Solving $k$-Nearest Neighbor Problem on Multiple Graphics Processors
"We introduced an effective algorithm for k-nearest neighbor problem which works on multiple GPUs. By an experiment, we have shown that it runs more than 330 times faster than an implementation on a single core of an up-to-date CPU. We have also shown that the algorithm is effective from the viewpoint of parallelism of GPUs. That is because 1) there is no synchronization between GPUs until the very end of the process and 2) the workload is well balanced."
algorithms  numerical-methods  GPU  CUDA  machine-learning  nudge 
july 2010 by Vaguery
[1007.2958] A Machine Learning Approach to Recovery of Scene Geometry from Images
"Recovering the 3D structure of the scene from images yields useful information for tasks such as shape and scene recognition, object detection, or motion planning and object grasping in robotics. In this thesis, we introduce a general machine learning approach called unsupervised CRF learning based on maximizing the conditional likelihood. We apply our approach to computer vision systems that recover the 3-D scene geometry from images. We focus on recovering 3D geometry from single images, stereo pairs and video sequences. Building these systems requires algorithms for doing inference as well as learning the parameters of conditional Markov random fields (MRF). Our system is trained unsupervisedly without using ground-truth labeled data.…"
visualization  image-processing  algorithms  machine-learning  robotics  nudge-targets 
july 2010 by Vaguery
[1006.4968] Validation of credit default probabilities via multiple testing procedures
"We apply multiple testing procedures to the validation of estimated default probabilities in credit rating systems. The goal is to identify rating classes for which the probability of default is estimated inaccurately, while still maintaining a predefined level of committing type I errors as measured by the familywise error rate (FWER) and the false discovery rate (FDR). For FWER, we also consider procedures that take possible discreteness of the data resp. test statistics into account. The performance of these methods is illustrated in a simulation setting and for empirical default data."
finance  prediction  data-mining  models  statistics  machine-learning  nudge-targets 
june 2010 by Vaguery
[0912.4473] Learning to Predict Combinatorial Structures
"The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. …"
machine-learning  prediction  combinatorics  nudge-targets  learning-from-data 
june 2010 by Vaguery
CASS
"In the social sciences, it is useful to understand the relative similarities of concepts that are embedded in a particular text (from a particular group or a particular person). For example, in trying to estimate conservative bias in FoxNews, one might estimate its tendency to associate conservative concepts (conservative, republican) and good concepts (good, positive, etc.), compared to conservative and bad concepts. The output would indicate conservative favoritism. This comparison could be further refined by taking into account important "baseline" information about the valences associated with liberal, namely liberal and good in comparison to liberal and bad.…"
text-mining  natural-language-processing  data-mining  machine-learning  Ruby  library 
june 2010 by Vaguery
[1006.4330] Large gaps imputation in remote sensed imagery of the environment
"Imputation of missing data in large regions of satellite imagery is necessary when the acquired image has been damaged by shadows due to clouds, or information gaps produced by sensor failure.
The general approach for imputation of missing data, that could not be considered missed at random, suggests the use of other available data. Previous work, like local linear histogram matching, take advantage of a co-registered older image obtained by the same sensor, yielding good results in filling homogeneous regions, but poor results if the scenes being combined have radical differences in target radiance due, for example, to the presence of sun glint or snow.…"
nudge-targets  definitely-nudge-targets  imputation  statistics  machine-learning  data-analysis 
june 2010 by Vaguery
[1006.4175] Optimization of Weighted Curvature for Image Segmentation
"Minimization of boundary curvature is a classic regularization technique for image segmentation in the presence of noisy image data. Techniques for minimizing curvature have historically been derived from descent methods which could be trapped in a local minimum and therefore required a good initialization. Recently, combinatorial optimization techniques have been applied to the optimization of curvature which provide a solution that achieves nearly a global optimum. However, when applied to image segmentation these methods required a meaningful data term. Unfortunately, for many images, particularly medical images, it is difficult to find a meaningful data term. Therefore, we propose to remove the data term completely and instead weight the curvature locally, while still achieving a global optimum."
image-segmentation  image-analysis  classification  machine-learning  algorithms  nudge-targets  medical-technology 
june 2010 by Vaguery
[1006.4326] Stationary and Mobile Target Detection using Mobile Wireless Sensor Networks
"In this work, we study the target detection and tracking problem in mobile sensor networks, where the performance metrics of interest are probability of detection and tracking coverage, when the target can be stationary or mobile and its duration is finite. We propose a physical coverage-based mobility model, where the mobile sensor nodes move such that the overlap between the covered areas by different mobile nodes is small. It is shown that for stationary target scenario the proposed mobility model can achieve a desired detection probability with a significantly lower number of mobile nodes especially when the detection requirements are highly stringent. Similarly, when the target is mobile the coverage-based mobility model produces a consistently higher detection probability compared to other models under investigation."
operations-research  mobile-sensor-networks  algorithms  machine-learning  nudge-targets 
june 2010 by Vaguery
The Berkeley Segmentation Dataset and Benchmark
"The goal of this work is to provide an empirical basis for research on image segmentation and boundary detection. To this end, we have collected 12,000 hand-labeled segmentations of 1,000 Corel dataset images from 30 human subjects. Half of the segmentations were obtained from presenting the subject with a color image; the other half from presenting a grayscale image. The public benchmark based on this data consists of all of the grayscale and color segmentations for 300 images. The images are divided into a training set of 200 images, and a test set of 100 images."
dataset  learning-from-data  training-set  machine-learning  image-segmentation  image-processing  nudge 
june 2010 by Vaguery
[1006.3679] Segmentation of Natural Images by Texture and Boundary Compression
"We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes as multi-scale texture features. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on the publicly available Berkeley Segmentation Dataset. It achieves state-of-the-art segmentation results compared to other existing methods."
algorithms  image-segmentation  numerical-methods  machine-learning  image-compression  nudge-targets  dataset 
june 2010 by Vaguery
[0902.0600] Decisional States
"…The intrinsic underlying structure of the system is modeled by an epsilon-machine and its causal states. The decisional states are the emerging patterns corresponding to the utility function. In a complex systems perspective, these patterns thus form a partition of the lower-level system states that is defined according to the higher-level user's knowledge. The transitions between these decisional states correspond to events that lead to a change of decision. An algorithm is provided so as to estimate the states and their transitions from data. Application examples are given for hidden model reconstruction, cellular automata filtering, and edge detection in images."
computational-mechanics  information-theory  prediction  statistics  probability-theory  machine-learning  classification 
june 2010 by Vaguery
[1006.1346] C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework
"Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is performed by solving an L1-regularized linear regression problem, commonly referred to as Lasso or Basis Pursuit. In this work we combine the sparsity-inducing property of the Lasso model at the individual feature level, with the block-sparsity property of the Group Lasso model, where sparse groups of features are jointly encoded, obtaining a sparsity pattern hierarchically structured. This results in the Hierarchical Lasso (HiLasso), which shows important practical modeling advantages.…"
numerical-methods  statistics  learning-from-data  machine-learning  image-processing  image-segmentation  nudge-targets 
june 2010 by Vaguery
[1006.3541] Complexity dichotomy on partial grid recognition
"Deciding whether a graph can be embedded in a grid using only unit-length edges is NP-complete, even when restricted to binary trees. However, it is not difficult to devise a number of graph classes for which the problem is polynomial, even trivial. A natural step, outstanding thus far, was to provide a broad classification of graphs that make for polynomial or NP-complete instances. We provide such a classification based on the set of allowed vertex degrees in the input graphs, yielding a full dichotomy on the complexity of the problem. As byproducts, the previous NP-completeness result for binary trees was strengthened to strictly binary trees, and the three-dimensional version of the problem was for the first time proven to be NP-complete. Our results were made possible by introducing the concepts of consistent orientations and robust gadgets, and by showing how the former allows NP-completeness proofs by local replacement even in the absence of the latter."
algorithms  graph-theory  classification  machine-learning  nudge-targets  geometry  recognition-problems 
june 2010 by Vaguery
[1006.1165] Optimal Source-Based Filtering of Malicious Traffic
"In this paper, we consider the problem of blocking malicious traffic on the Internet, via source-based filtering. In particular, we consider filtering via access control lists (ACLs): these are already available at the routers today but are a scarce resource because they are stored in the expensive ternary content addressable memory (TCAM). Aggregation (by filtering source prefixes instead of individual IP addresses) helps reduce the number of filters, but comes also at the cost of blocking legitimate traffic originating from the filtered prefixes. We show how to optimally choose which source prefixes to filter, for a variety of realistic attack scenarios and operators' policies. In each scenario, we design optimal, yet computationally efficient, algorithms. Using logs from Dshield.org, we evaluate the algorithms and demonstrate that they bring significant benefit in practice."
nudge-targets  security  algorithms  machine-learning  intrusion  system-administration  operations-research 
june 2010 by Vaguery
[1004.3925] Classification using distance nearest neighbours
"This paper proposes a new probabilistic classification algorithm using a Markov random field approach. The joint distribution of class labels is explicitly modelled using the distances between feature vectors. Intuitively, a class label should depend more on class labels which are closer in the feature space, than those which are further away.…"
classification  machine-learning  markov-random-field  algorithms  learning-from-data 
june 2010 by Vaguery
[1001.0663] Self-organized chaos through polyhomeostatic optimization
"The goal of polyhomeostatic control is to achieve a certain target distribution of behaviors, in contrast to polyhomeostatic regulation which aims at stabilizing a steady-state dynamical state. We consider polyhomeostasis for individual and networks of firing-rate neurons, adapting to achieve target distributions of firing rates maximizing information entropy. We show that any finite polyhomeostatic adaption rate destroys all attractors in Hopfield-like network setups, leading to intermittently bursting behavior and self-organized chaos. The importance of polyhomeostasis to adapting behavior in general is discussed."
adaptive-control  homeostasis  machine-learning  simulation  dynamics 
june 2010 by Vaguery
[0908.2503] Sequential Quantile Prediction of Time Series
"Motivated by a broad range of potential applications, we address the quantile prediction problem of real-valued time series. We present a sequential quantile forecasting model based on the combination of a set of elementary nearest neighbor-type predictors called "experts" and show its consistency under a minimum of conditions. Our approach builds on the methodology developed in recent years for prediction of individual sequences and exploits the quantile structure as a minimizer of the so-called pinball loss function. We perform an in-depth analysis of real-world data sets and show that this nonparametric strategy generally outperforms standard quantile prediction methods"
time-series  prediction  models  statistics  nudge-targets  learning-from-data  machine-learning 
june 2010 by Vaguery
[1005.5086] Classification of interstitial lung disease patterns with topological texture features
"… The results indicate that advanced topological texture features can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods."
image-processing  medical-technology  diagnosis  nudge-targets  classification  machine-learning 
may 2010 by Vaguery
[1005.0794] Active Learning for Hidden Attributes in Networks
"In many networks, vertices have hidden attributes, or types, that are correlated with the networks topology. If the topology is known but these attributes are not, and if learning the attributes is costly, we need a method for choosing which vertex to query in order to learn as much as possible about the attributes of the other vertices. We assume the network is generated by a stochastic block model, but we make no assumptions about its assortativity or disassortativity. We choose which vertex to query using two methods: 1) maximizing the mutual information between its attributes and those of the others (a well-known approach in active learning) and 2) maximizing the average agreement between two independent samples of the conditional Gibbs distribution. Experimental results show that both these methods do much better than simple heuristics. They also consistently identify certain vertices as important by querying them early on."
network-theory  complexology  algorithms  machine-learning  nudge-targets 
may 2010 by Vaguery
Multi-task learning - Wikipedia, the free encyclopedia
"Multi-task learning is an approach to machine learning, that learns a problem together with other related problems at the same time, using a shared representation. This often leads to a better model for the main task, because it allows the learner to use the commonality among the tasks. Therefore, multi-task learning is a kind of inductive transfer."
I-guess  machine-learning  learning-by-doing  learning-by-watching  nudge-targets 
may 2010 by Vaguery
Getting Started Guide - Google Prediction API - Google Code
"The Prediction API allows you to get more from your data and makes its patterns more accessible. Specifically, the Prediction API leverages Google's machine learning infrastructure to give you the tools to better analyze your data and reveal patterns that are often difficult to manually discover. The API also enables you to use those patterns to predict new outcomes, which facilitates the development of all types of software, from textual analysis systems to recommendation systems. Because the Prediction API is a RESTful HTTP service, you can easily access it from Google App Engine, Apps Script, and other Internet-connected desktop applications."
nudge  machine-learning  models  google  prediction  clustering  learning-from-data  AI  API  open-science 
may 2010 by Vaguery
[1002.2283] A Gossip Algorithm for Convex Consensus Optimization over Networks
These gossip methods are essentially identical to the dynamics in HFC, ALPS and Trivial Geography multi-population evolutionary algorithms.
algorithms  machine-learning  exploitation-vs-exploration  diversity 
may 2010 by Vaguery
[1005.2715] On the Subspace of Image Gradient Orientations
"We introduce the notion of Principal Component Analysis (PCA) of image gradient orientations. As image data is typically noisy, but noise is substantially different from Gaussian, traditional PCA of pixel intensities very often fails to estimate reliably the low-dimensional subspace of a given data population. We show that replacing intensities with gradient orientations and the $\ell_2$ norm with a cosine-based distance measure offers, to some extend, a remedy to this problem.…"
image-processing  signal-processing  image-analysis  machine-learning  statistics  PCA  nudge-targets 
may 2010 by Vaguery
[1005.2979] Robust and Adaptive Algorithms for Online Portfolio Selection
"… Our methods use simple ideas from signal processing and statistics, which are sometimes overlooked in the empirical financial literature. The two approaches are evaluated against benchmark allocation techniques using 4 real datasets. Our methods outperform the benchmark allocation techniques in these datasets, in terms of both computational demand and financial performance."
trading  financial-engineering  stocks  machine-learning  statistics  algorithms  portfolio-theory 
may 2010 by Vaguery
[0906.4779] Minimum Probability Flow Learning
"Learning in probabilistic models is often hampered by the general intractability of the normalization factor and its derivatives. Here we propose a new learning technique that obviates the need to compute an intractable normalization factor or sample from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the initial flow of probability away from the data distribution.…"
learning-from-data  statistics  machine-learning  estimation  algorithms  to-understand 
may 2010 by Vaguery
[1005.1364] Cognitive Radio Transmission under QoS Constraints and Interference Limitations
"… Under such QoS constraints and limitations on the interference caused to the primary users, the maximum throughput is identified by finding the effective capacity of the cognitive radio channel. Optimal power allocation strategies are obtained and the optimal channel selection criterion is identified. The intricate interplay between effective capacity, interference and QoS constraints, channel sensing parameters and reliability, fading, and the number of available frequency bands is investigated through numerical results."
cognitive-networks  communication-infrastructure  radio  adaptive-control  machine-learning  quality-of-service  nudge-targets 
may 2010 by Vaguery
Cognitive network - Wikipedia, the free encyclopedia
"In communication networks, cognitive network (CN) is a new type of data network that makes use of cutting edge technology from several research areas (i.e. machine learning, knowledge representation, computer network, network management) to solve some problems current networks are faced with. Cognitive network is different from cognitive radio as it covers all the layers of the OSI model (not only layers 1 and 2 as with cognitive radio).…"
cognitive-networks  machine-learning  engineering-design  communication-infrastructure  nudge-targets 
may 2010 by Vaguery
[1005.1320] The myth of equidistribution for high-dimensional simulation
"…For example, when estimating a contour integral of an analytic function, we might transform the contour to a circle and use equally spaced points on the circle.

However, when simulating Canberra’s future climate and water supply, it would not be a good idea to assume that exceptionally dry years were equally spaced!…"
nudge-targets  quasirandom-numbers  pseudorandom-numbers  modeling  simulation  algorithms  micropragmatism  tools  explanatory-power  complexology  machine-learning 
may 2010 by Vaguery
[1005.1036] Introduction to Graphical Modelling
"The aim of this chapter is twofold. In the first part we will provide a brief overview of the mathematical and statistical foundations of graphical models, along with their fundamental properties, estimation and basic inference procedures. In particular we will develop Markov networks (also known as Markov random fields) and Bayesian networks, which comprise most past and current literature on graphical models. In the second part we will review some applications of graphical models in systems biology."
statistics  machine-learning  introduction  to-read 
may 2010 by Vaguery
[1005.0390] Machine Learning for Galaxy Morphology Classification
"In this work, decision tree learning algorithms and fuzzy inferencing systems are applied for galaxy morphology classification. In particular, the CART, the C4.5, the Random Forest and fuzzy logic algorithms are studied and reliable classifiers are developed to distinguish between spiral galaxies, elliptical galaxies or star/unknown galactic objects. Morphology information for the training and testing datasets is obtained from the Galaxy Zoo project while the corresponding photometric and spectra parameters are downloaded from the SDSS DR7 catalogue."
nudge-targets  learning-from-data  machine-learning  crowdsourcing  galaxy-zoo  public-data  datasets 
may 2010 by Vaguery
[1005.0919] Attribute Weighting with Adaptive NBTree for Reducing False Positives in Intrusion Detection
"… Due to the tremendous growth of network-based services, intrusion detection has emerged as an important technique for network security. Recently data mining algorithms are applied on network-based traffic data and host-based program behaviors to detect intrusions or misuse patterns, but there exist some issues in current intrusion detection algorithms such as unbalanced detection rates, large numbers of false positives, and redundant attributes that will lead to the complexity of detection model and degradation of detection accuracy. The purpose of this study is to identify important input attributes for building an intrusion detection system (IDS) that is computationally efficient and effective.…"
nudge-targets  system-administration  security  algorithms  machine-learning  learning-from-data  learning-by-watching  statistics 
may 2010 by Vaguery
[1005.0945] An Efficient Vein Pattern-based Recognition System
"This paper presents an efficient human recognition system based on vein pattern from the palma dorsa. A new absorption based technique has been proposed to collect good quality images with the help of a low cost camera and light source. The system automatically detects the region of interest from the image and does the necessary preprocessing to extract features. A Euclidean Distance based matching technique has been used for making the decision. It has been tested on a data set of 1750 image samples collected from 341 individuals. The accuracy of the verification system is found to be 99.26% with false rejection rate (FRR) of 0.03%."
nudge-targets  image-processing  biometrics  machine-learning  algorithms  security  pattern-recognition 
may 2010 by Vaguery
[1005.0437] A Unifying View of Multiple Kernel Learning
"Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments."
machine-learning  kernel-methods  mathematics  learning-from-data 
may 2010 by Vaguery
[1005.0957] ECG Feature Extraction Techniques - A Survey Approach
"ECG Feature Extraction plays a significant role in diagnosing most of the cardiac diseases. One cardiac cycle in an ECG signal consists of the P-QRS-T waves. This feature extraction scheme determines the amplitudes and intervals in the ECG signal for subsequent analysis. The amplitudes and intervals value of P-QRS-T segment determines the functioning of heart of every human. Recently, numerous research and techniques have been developed for analyzing the ECG signal. The proposed schemes were mostly based on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. All these techniques and algorithms have their advantages and limitations.…
nudge-targets  machine-learning  classification  learning-from-data  diagnostics  medicine 
may 2010 by Vaguery
mperham's bayes_motel at master - GitHub
"BayesMotel is a multi-variate Bayesian classification engine. There are two steps to Bayesian classification:

Training You provide a set of variables along with the proper classification for that set.
Runtime You provide a set of variables and ask for the proper classification according to the training in Step 1.
Commonly this is used for spam detection. You will provide a corpus of emails or other data along with a "Spam/NotSpam" classification. The library will determine which variables affect the classification and use that to judge future data."
Ruby  rubygem  Bayesian  classification  statistics  learning-from-data  machine-learning  algorithms 
april 2010 by Vaguery
[1004.3980] Hashing Image Patches for Zooming
"In this paper we present a Bayesian image zooming/super-resolution algorithm based on a patch based representation. We work on a patch based model with overlap and employ a Locally Linear Embedding (LLE) based approach as our data fidelity term in the Bayesian inference. The image prior imposes continuity constraints across the overlapping patches."
image-processing  learning-from-data  machine-learning  statistics 
april 2010 by Vaguery
[1003.4002] Spectral Classification; Old and Contemporary
"Beginning with a historical account of the spectral classification, its refinement through additional criteria is presented. The line strengths and ratios used in two dimensional classifications of each spectral class are described. A parallel classification scheme for metal-poor stars and the standards used for classification are presented. The extension of spectral classification beyond M to L and T and spectroscopic classification criteria relevant to these classes are described. Contemporary methods of classifications based upon different automated approaches are introduced."
machine-learning  learning-from-data  science2.0  Nudge  clustering  statistics  astronomy  digitization 
march 2010 by Vaguery
[0908.2033] Galaxy Zoo: Reproducing Galaxy Morphologies Via Machine Learning
"We present morphological classifications obtained using machine learning for objects in SDSS DR6 that have been classified by Galaxy Zoo into three classes, namely early types, spirals and point sources/artifacts. An artificial neural network is trained on a subset of objects classified by the human eye and we test whether the machine learning algorithm can reproduce the human classifications for the rest of the sample. We find that the success of the neural network in matching the human classifications depends crucially on the set of input parameters chosen for the machine-learning algorithm. The colours and parameters associated with profile-fitting are reasonable in separating the objects into three classes. However, these results are considerably improved when adding adaptive shape parameters as well as concentration and texture. …"
learning-from-data  machine-learning  galaxy-zoo  crowdsourcing  crowdsourcing-as-training-data  science2.0  Nudge  variable-selection 
march 2010 by Vaguery
News — PyMVPA Home
"PyMVPA is a Python module intended to ease pattern classification analyses of large datasets. In the neuroimaging contexts such analysis techniques are also known as decoding or MVPA analysis. PyMVPA provides high-level abstraction of typical processing steps and a number of implementations of some popular algorithms. While it is not limited to the neuroimaging domain, it is eminently suited for such datasets. PyMVPA is truly free software (in every respect) and additionally requires nothing but free-software to run."
data-analysis  Python  machine-learning  open-source  free  visualization  statistics  exploratory-data-analysis 
march 2010 by Vaguery
Head & Neck Oncology | Full text | Potential for Raman spectroscopy to provide cancer screening using a peripheral blood sample
"The mean spectra were provided as input sequences to the Implicit Context Representation Cartesian Genetic Programming algorithm (IRCGP)[14,15]. IRCGP uses evolutionary computing methodology to learn classifiers that are capable of distinguishing between data classes. Induced classifiers take the form of programmatic expressions applied to particular offsets within the input data sequences. These expressions are composed from a set of simple mathematical functions. Both the choice and connectivity of the functions, and the choice of offsets used within the input sequences, are determined by the algorithm's evolutionary process. The input sequences were divided equally into training and test sets. To prevent over-learning, training of the classifiers was stopped once classification accuracy of the test sequences started to fall."
genetic-programming  clinical  diagnosis  nudge  spectroscopy  applied-mathematics  machine-learning  classification 
december 2009 by Vaguery
http://jmlr.csail.mit.edu/papers/v5/grandvalet04a.html
"Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used K-fold cross-validation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of K-fold cross-validation."
via:cshalizi  machine-learning  statistics  validation  error  received-wisdom 
november 2009 by Vaguery
Data Mining Group - PMML 4.0 - General Structure of a PMML Document
"PMML uses XML to represent mining models. The structure of the models is described by an XML Schema. One or more mining models can be contained in a PMML document. A PMML document is an XML document with a root element of type PMML. The general structure of a PMML document is:..."
data-mining  models  learning-from-data  machine-learning  standards  XML  Nudge 
october 2009 by Vaguery
Analyzing the effectiveness and applicability of co-training
"Yet, the co-training algorithm in this paper also makes the same assumptions (as it too has underlying naive Bayes clas- sifiers), but does not suffer from the violations. Thus we hypothesize that the co-training algorithm succeeds in part because it is more robust to the assumptions made by its underlying classifiers. This can be understood by looking at the differences in how EM and co-training use the underly- ing assumptions."
via:cshalizi  learning  learning-from-watching  algorithms  machine-learning  collaboration  performance-space-analysis 
september 2009 by Vaguery
Katya Vladislavleva - Tilburg University
See in particular Chapter 2, on Data Balancing. This is important stuff for those of us dealing with data-driven models and techniques, especially those not based on analytical closed form first-principles junk.
genetic-programming  modeling  data-analysis  learning-from-data  machine-learning  thesis  techniques  numerical-models 
may 2009 by Vaguery
MATEDA
MatLab package for Estimation of Distribution Algorithms (EDAs).
via:[David_Goldberg]  evolutionary-algorithms  optimization  machine-learning  search  MatLab  library 
may 2009 by Vaguery
« earlier      

related tags

(and-the-inevitability-of-being-pissed-off)  academia  academic  adaptation  adaptive-control  agent-based  agents  aggregation  agile  agility  AI  algorithms  analysis  analytics  Ann-Arbor  API  applications  applied-mathematics  archives  Arrow's-Theorem  artificial-intelligence  artificial-life  astronomy  automation  autonomous  Bayesian  behavioral-finance  biochemistry  bioinformatics  biologically-inspired  biology  biometrics  book  business  calculus  cards  CFP  CFPs  challenge  classification  clinical  cluster-computing  clustering  cognitive-networks  collaboration  collaborative-filtering  combinatorics  communication  communication-infrastructure  community  competition  complex-systems  complexology  compressed-sensing  computational-complexity  computational-linguistics  computational-mechanics  computational-science  computer-science  conferences  contest  creative-commons  crowdsourcing  crowdsourcing-as-training-data  crystallography  CUDA  cultural-norms  cunning  data  data-analysis  data-mining  data-preparation  dataset  datasets  definitely-nudge-targets  del.icio.us  design  design-automation  development  diagnosis  diagnostics  differentiation  digitization  distance  diversity  dynamics  easy-pickins  economics  emergence  emergent-design  empowerment  encoding  engineering  engineering-design  error  escape-from-design  estimation  estimation-of-distribution  evolutionary-algorithms  explanatory-power  exploitation-exploration  exploitation-vs-exploration  exploratory-data-analysis  face-recognition  feature-detection  feature-extraction  filtering  finance  financial-engineering  finite-state-machine  free  galaxy-zoo  game-theory  GECCO  genetic-programming  geometry  google  GP  GPL  GPU  graph-theory  graphics  graphics-processing-unit  heuristics  high-hanging-fruit  hiring  homeostasis  I-guess  image-analogies  image-analysis  image-compression  image-processing  image-segmentation  imputation  inefficiency  inference  information-architecture  information-theory  intelligence-gathering  interesting-problems  introduction  intrusion  journals  KDD  kernel-methods  language  learning  learning-by-doing  learning-by-watching  learning-from-data  learning-from-watching  libraries  library  linguistics  local  machine-learning  magazines  markov-random-field  mathematics  MatLab  medical-technology  medicine  metaheuristics  metaoptimization  methodologies  metrics  Michigan  micropragmatism  mining  mobile-sensor-networks  modeling  models  Moore's-Law  multiagent-systems  multiobjective-optimization  n-grams  natural-language-processing  network-theory  networks  neural-networks  NLP  nudge  nudge-targets  numerical-methods  numerical-models  OCR  online-learning  ontology-discovery  open-science  open-source  openness  operations-research  optimization  papers  pattern-discovery  pattern-recognition  PCA  pedagogy  peer-production  peer-review  performance-space-analysis  planning  portfolio-theory  prediction  preprint  probability-theory  proceedings  product-development  professional  programming  project  protein-folding  pseudorandom-numbers  public-data  Push3  Python  quality-of-service  quasirandom-numbers  radio  received-wisdom  recognition-problems  recommendations  regression  research  resources  review  robotics  Ruby  rubygem  science  science2.0  scientific-computing  search  search-algorithms  search-engines  security  segmentation  semantic  service  signal-processing  simulation  skynet  social  social-dynamics  social-networks  sociology  soft-computing  software  spectroscopy  standards  statistics  stochastic  stocks  structural-biology  structure  summary  supervised-learning  symbolic-regression  system-administration  technical  technical-analysis  techniques  technology  test-cases  text  text-classification  text-mining  texts  textures  thesis  time-series  to-read  to-understand  tools  trading  training-set  trust  tutorial  unsupervised-learning  validation  variable-selection  via:arthegall  via:cshalizi  via:tsuomela  via:[David_Goldberg]  video  visual-programming  visualization  wavelets  web  web2.0  wikipedia  workshops  XML 

Copy this bookmark:



description:


tags: