rybesh + statistics   94

Cube
Cube is a system for collecting timestamped events and deriving metrics. By collecting events rather than metrics, Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets.
realtime  statistics 
5 weeks ago by rybesh
The RDF Data Cube Vocabulary
There are many situations where it would be useful to be able to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. The Data Cube vocabulary provides a means to do this using the W3C RDF (Resource Description Framework) standard. The model underpinning the Data Cube vocabulary is compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations. The Data Cube vocabulary is a core foundation which supports extension vocabularies to enable publication of other aspects of statistical data flows.
metadata  standard  data  description  inls520  webinfo  statistics  science 
8 weeks ago by rybesh
Beeminder
Anything you can put a periodic number on works -- weight, pushups, number of cigarettes, or how long it takes you to bike to work. Just answer with your number when Beeminder asks and it will show you your progress and a yellow brick road to follow to stay on track.

If you go off track, you pledge money to stay on the road the next time. If you go off track again, we charge you.
productivity  statistics 
9 weeks ago by rybesh
Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.
During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book.

This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for ``wide'' data (italics p bigger than n), including multiple testing and false discovery rates.

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful {italics An Introduct ion to the Bootstrap}. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
statistics  machinelearning  datamining 
12 weeks ago by rybesh
Hierarchical modeling and analysis for spatial data - Sudipto Banerjee, Bradley P. Carlin, Alan E. Gelfand - Google Books
Among the many uses of hierarchical modeling, their application to the statistical analysis of spatial and spatio-temporal data from areas such as epidemiology And environmental science has proven particularly fruitful. Yet to date, the few books that address the subject have been either too narrowly focused on specific aspects of spatial analysis, or written at a level often inaccessible to those lacking a strong background in mathematical statistics.Hierarchical Modeling and Analysis for Spatial Data is the first accessible, self-contained treatment of hierarchical methods, modeling, and data analysis for spatial and spatio-temporal data. Starting with overviews of the types of spatial data, the data analysis tools appropriate for each, and a brief review of the Bayesian approach to statistics, the authors discuss hierarchical modeling for univariate spatial response data, including Bayesian kriging and lattice (areal data) modeling. They then consider the problem of spatially misaligned data, methods for handling multivariate spatial responses, spatio-temporal models, and spatial survival models. The final chapter explores a variety of special topics, including spatially varying coefficient models.
bayes  space  temporality  modeling  statistics 
february 2012 by rybesh
Library Juice » Data Mining
Austin et al. point out that the statistical methods that are at the heart of data mining are not able to distinguish real from spurious associations. Data mining employs the automated examination of enormous bodies of data. Its usefulness is thought to be proportional to the size of the data set that it collates; however, as the data set becomes larger and as the number of attributes that serve as potential relata increases, the number of potential relationships increases exponentially. Importantly, the number of spurious associations also increases. With enough data, no significance test will be stringent enough to provide assurance against the kind of results found in Austin et al. What is needed, according to Austin et al. is a “pre-specified plausible hypothesis.” For statistical analysis to be useful, the researcher must begin with a hypothesis, preferably a plausible one, if the research is to be valuable.

What exactly is a pre-specified plausible hypothesis and how can we generate it if data mining can’t do that for us? The question was posed some sixty years ago by the philosopher Nelson Goodman using different terms: Goodman believed that a critical question for epistemology was to distinguish between “projectible and non-projectible hypotheses.” One can more or less replace “pre-specified plausible hypothesis” with Goodman’s term “projectible hypothesis.” According to Goodman, when we seek to understand what hypothesis is (or is not) projectible, we do not come to the problem “empty-headed but with some stock of knowledge” which we use to determine what is (or is not) projectible. Projectible hypotheses will be those which do not conflict with other hypotheses that have been supported in the past. They will commonly use the same terminology of previously supported hypotheses. The terminology appearing in the hypotheses will have become “entrenched” in the language. This goes a long distance toward explaining why we don’t find the link between one’s astrological sign and medical conditions plausible. Twenty-first century Western medicine is not accustomed to linking astrological signs to ailments and so must find any hypothesis that does so implausible.

If Goodman is correct, then data mining is of little use without an historical understanding of the field of science to which the data pertains.

...

Here, we have another argument for allocating library resources to pay for librarians with deep subject expertise. As e-science develops, vendors will make more and more data sets available, regardless of their actual worth to researchers. To effectively choose the data sets that are of value, librarians must have a thorough understanding of the research needs of their patrons. To do this, they must have a deep understanding of the field. Unfortunately, with the excitement swirling around e-science, the mere access to large data sets threatens to become the be-all and end-all in collection management. If we aren’t careful, we may find ourselves with mountains of data from which everything and nothing can be concluded.
datamining  statistics  knowledge  digitalhumanities  libraries  epistemology 
february 2012 by rybesh
Statistics 110: Introduction to Probability
Statistics 110 (Introduction to Probability), taught at Harvard University by Joe Blitzstein in Fall 2011. Lecture videos, homework, review material, practice exams, and a large collection of practice problems with detailed solutions are provided. This course is an introduction to probability as a language and set of tools for understanding statistics, science, risk, and randomness. The ideas and methods are useful in statistics, science, philosophy, engineering, economics, finance, and everyday life. Topics include the following. Basics: sample spaces and events, conditional probability, Bayes’ Theorem. Random variables and their distributions: cumulative distribution functions, moment generating functions, expectation, variance, covariance, correlation, conditional expectation. Univariate distributions: Normal, t, Binomial, Negative Binomial, Poisson, Beta, Gamma. Multivariate distributions: joint, conditional, and marginal distributions, independence, transformations, Multinomial, Multivariate Normal. Limit theorems: law of large numbers, central limit theorem. Markov chains: transition probabilities, stationary distributions, reversibility, convergence.
statistics  education 
january 2012 by rybesh
Detecting Novel Associations in Large Data Sets
Imagine a data set with hundreds of variables, which may contain important, undiscovered relationships. There are tens of thousands of variable pairs—far too many to examine manually. If you do not already know what kinds of relationships to search for, how do you efficiently identify the important ones?
statistics  relationships  datamining 
december 2011 by rybesh
The Effects of Racial Animus on Voting: Evidence Using Google Search Data
Traditional surveys struggle to capture socially unacceptable attitudes such as racial
animus. This paper uses Google searches including racially charged language as a proxy
for a local area’s racial animus. I use the Google-search proxy, available for roughly
200 media markets in the United States, to reassess the impact of racial attitudes on
voting for a black candidate in the United States. I compare an area’s racially charged
search volume to its votes for Barack Obama, the 2008 black Democratic presidential
candidate, controlling for its votes for John Kerry, the 2004 white Democratic presidential candidate. Other studies using a similar empirical specification and standard
state-level survey measures of racial attitudes yield little evidence that racial animus
had a major impact in recent U.S. elections. Using the Google-search proxy, I find
significant and robust effects in the 2008 presidential election. The estimates imply
that racial animus in the United States cost Obama three to five percentage points in
the national popular vote in the 2008 election.
statistics  socialscience  methods  search 
november 2011 by rybesh
Bayesian statistics - Scholarpedia
Bayesian statistics is a system for describing epistemological uncertainty using the mathematical language of probability. In the 'Bayesian paradigm,' degrees of belief in states of nature are specified; these are non-negative, and the total belief in all states of nature is fixed to be one. Bayesian statistical methods start with existing 'prior' beliefs, and update these using data to give 'posterior' beliefs, which may be used as the basis for inferential decisions.
bayes  statistics 
september 2011 by rybesh
pandas: a python data analysis library — pandas v0.4.0dev documentation
pandas is a python package providing convenient data structures for time series, cross-sectional, or any other form of “labeled” data, with tools for building statistical and econometric models.
python  statistics  dataprocessing  analysis 
august 2011 by rybesh
ScalaNLP
ScalaNLP is a collection of libraries for Natural Language Processing, Machine Learning, and Statistics.
scala  nlp  linearalgebra  statistics 
august 2011 by rybesh
MADlib
MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
database  analytics  datamining  statistics  machinelearning  sql 
july 2011 by rybesh
Home page for the book, "Bayesian Data Analysis"
This book is intended to have three roles and to serve three associated audi- ences: an introductory text on Bayesian inference starting from first principles, a graduate text on effective current approaches to Bayesian modeling and com- putation in statistics and related fields, and a handbook of Bayesian methods in applied statistics for general users of and researchers in applied statistics.
bayes  statistics  data  analysis 
june 2011 by rybesh
Christopher M. Bishop: Pattern Recognition and Machine Learning
This leading textbook provides a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of pattern recognition or machine learning concepts is assumed. This is the first machine learning textbook to include a comprehensive coverage of recent developments such as probabilistic graphical models and deterministic inference methods, and to emphasize a modern Bayesian perspective. It is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. This hard cover book has 738 pages in full colour, and there are 431 graded exercises (with solutions available below). Extensive support is provided for course instructors.
machinelearning  books  patterns  statistics  datamining 
june 2011 by rybesh
Google Books: American English (155 billion words)
This interface allows you to search the Google Books data in many ways that are much more advanced than what is possible with the simple Google Books interface. You can search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, which you can't do with the regular Google Books interface. And you can quickly and easily compare the data in two different sections of the corpus (for example, adjectives describing women or art or music in the 1960s-2000s vs the 1870s-1910s).
american  books  corpus  data  statistics  language 
may 2011 by rybesh
CRAN - Package SPARQL
Load SPARQL result table from an end-point as a data.frame
sparql  R  tools  statistics  visualization  RDF 
may 2011 by rybesh
Using Graphs Instead of Tables
The extra work required in producing graphs is rewarded by greatly enhanced presentation and communication of empirical results.
charts  graphics  statistics  visualization 
march 2011 by rybesh
RStudio
RStudio™ is a new integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R.
statistics  tools 
march 2011 by rybesh
Deducer - A graphical data analysis system for use with JGR - RForge.net
An intuitive, cross-platform graphical data analysis system. It uses menus and dialogs to guide the user efficiently through the data manipulation and analysis process, and has an excel like spreadsheet for easy data frame visualization and editing.
R  statistics  tools 
february 2011 by rybesh
Daisy Zhe Wang: BayesStore
BayesStore is a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. BayesStore represents model and evidence data as relational tables; implements inference algorithms efficiently in SQL; adds probabilistic relational operators to the query engine; optimizes queries with both relational and inference operators. The design goals of BayesStore are: (1) to be able to support efficient query processing over different models compared to the off-the-shelf machine learning libraries; (2) to be able to support extensible API for plugging in new models and inference algorithms; and (3) to be able to scale up to very large data sets.
statistics  bayes  database  machinelearning 
january 2011 by rybesh
Tahir Hemphill
The Hip-Hop Word Count is a searchable ethnographic database built from the lyrics of over 40,000 Hip-Hop songs from 1979 to present day.
hiphop  digitalhumanities  statistics 
december 2010 by rybesh
ARCADE: Literature, the Humanities, and the World
...digital media and huge databases have enormous potential for supporting, preserving, and making available for study the kinds of underground knowledges and cultural productions outside the sphere of mainstream print that you're concerned about. This is the insurgent potential of the Internet and digital media--they can bypass established methods of fixation and legitimation of cultural products. But in academia these are subjects of interest to humanists--and sociologists and anthropologists. By contrast, when true disciplinary outsiders like Jean-Baptiste Michel and his team enter the arena of cultural history and cultural studies from the side of science and engineering, they must be looking to legitimate themselves by proving that their approach "works" for subjects that they imagine will be widely recognized as significant.
digitalhumanities  nlp  statistics  critique 
december 2010 by rybesh
edwired » Blog Archive » Visualizing Millions of Words
...the lesson that I would then focus on with my students is that what they are looking at in such a graph is nothing more or less than the frequency with which a word is used in book (and only books) published over the centuries. While such frequencies do reflect something, it is not clear from one graph just what that something is. So instead of an answer, a graph like this one is a doorway that leads to a room filled with questions, each of which must be answered by the historian before he or she knows something worth knowing.
digitalhumanities  nlp  statistics 
december 2010 by rybesh
Works Cited: Google Books Ngrams and the number of words for "snow"
There's a certain Words For Snowism in the online Google Books Ngrams tool, the suggestion that the more frequently a word is used, the more important it is in a collective unconscious of which the Google Books data set serves as a convenient index. This importance is not the same thing as significance, in the sense of significant digits or statistical significance; it's not the difference that makes a difference, but rather a psychologized importance--attachment, cathexis. Which is really kind of garbage.
nlp  digitalhumanities  statistics  critique 
december 2010 by rybesh
tm - Text Mining Package
tm (shorthand for Text Mining Infrastructure in R) provides a framework for text mining applications within R.

The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package has integrated database backend support to minimize memory demands. An advanced meta data management is implemented for collections of text documents to alleviate the usage of large and with meta data enriched document sets.
R  textmining  datamining  nlp  tools  statistics 
october 2010 by rybesh
Piwik - Web analytics - Open source
Piwik is a downloadable, open source (GPL licensed) real time web analytics software program. It provides you with detailed reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages… and so much more.

Piwik aims to be an open source alternative to Google Analytics.
analytics  web  opensource  statistics 
october 2010 by rybesh
Journal of Statistical Software — Show
This user guide describes a Python package, PyMC, that allows users to efficiently code a probabilistic model and draw samples from its posterior distribution using Markov chain Monte Carlo techniques.
statistics  tools  python 
august 2010 by rybesh
Training Examples Q&A - machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization
Where data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!
ai  machinelearning  nlp  textanalysis  ir  datamining  search  statistics  infoviz  reference 
june 2010 by rybesh
Signs of Neanderthals Mating With Humans - NYTimes.com
"...the statistical insights, however informative, do not have the solidity of an archaeological fact."
epistemology  statistics  facts  history  archaeology 
may 2010 by rybesh
ScapeToad - cartogram software by the Choros laboratory
ScapeToad uses the Gastner/Newman [2004] diffusion-based algorithm to adapt map surfaces to user-defined variables without altering their topological relations.
cartography  maps  statistics  tools 
february 2010 by rybesh
UMBEL Ontology Documentation - umbel:withAlignment
umbel:withAlignment is used to reify a umbel:isAligned or a umbel:linksConcept property to a calculated or estimated overlap percentage value between the two classes (sets).
semweb  ontology  vocabulary  statistics 
august 2009 by rybesh
Maximum Entropy (GA) Model Optimization Package
Maximum entropy (aka logistic regression) models are very popular, especially in natural language processing. The software here is an implementation of maximum likelihood and maximum a posterior optimization of the parameters of these models. The algorithms used are much more efficient than the iterative scaling techniques used in almost every other maxent package out there.
research  tools  nlp  statistics  machinelearning  ocaml  logreg  maxent 
august 2009 by rybesh
“seeing” the Web and a Karl Pearson citation
Over the last couple of years, the social sciences have been increasingly interested in using computer-based tools to analyze the complexity of the social ant farm that is the Web. Issuecrawler was one of the first of such tools and today researchers are indeed using very sophisticated pieces of software to “see” the Web. Sciences-Po, one of these rather strange french institutions that were founded to educate the elite but which now have to increasingly justify their existence by producing research, has recently hired Bruno Latour to head their new médialab, which will most probably head into that very direction. Given Latour’s background (and the fact that Paul Girard, a very competent former colleague at my lab, heads the R&D departement), this should be really very interesting. I do hope that there will be occasion to tackle the most compelling methodological question when in comes to the application of computers (or mathematics in general) to analyzing human life, which is beautifully framed in a rather reluctant statement from 1889 by Karl Pearson, a major figure in the history of statistics:

“Personally I ought to say that there is, in my own opinion, considerable danger in applying the methods of exact science to problems in descriptive science, whether they be problems of heredity or of political economy; the grace and logical accuracy of the mathematical processes are apt to so fascinate the descriptive scientist that he seeks for sociological hypotheses which fit his mathematical reasoning and this without first ascertaining whether the basis of his hypotheses is as broad as that human life to which the theory is to be applied.” cit. in. Stigler, Stephen M.: The History of Statistics. Harvard University Press, 1990 p. 304
actor-network_theory  epistemolgy  network_theory  statistics  from google
july 2009 by rybesh
Evidence Based Scheduling - Joel on Software
You gather evidence, mostly from historical timesheet data, that you feed back into your schedules. What you get is not just one ship date: you get a confidence distribution curve, showing the probability that you will ship on any given date.
statistics  business  planning  development  software  management 
july 2009 by rybesh
How to Write a Spelling Corrector
A toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.
python  nlp  howto  statistics 
april 2008 by rybesh
UNdata
An easy to use data access system was developed that meets UNSD’s vision of providing an integrated information resource with current, relevant and reliable statistics free of charge to the global community.
statistics  database  opendata  demographics  development  economics  analysis  archives  government 
march 2008 by rybesh
Is the Tipping Point Toast? -- Duncan Watts
The ultimate irony of Watts's research is that, if you really buy it, the most effective way to pitch your idea is ... mass marketing.
marketing  research  social  networking  advertising  brands  communication  culture  statistics 
january 2008 by rybesh
Phil Spector's Introduction to R
All the things you'll ever need to do with R that you'd otherwise spend hours trying to figure out.
R  reference  statistics  howto 
october 2007 by rybesh
wikirage: What's hot now on wikipedia
This site lists the pages in Wikipedia which are receiving the most edits per unique editor over various periods of time.
wiki  collaboration  statistics  editing  research  tools 
september 2007 by rybesh
//re:digg\\ » Blog Archive » *New* Sections & Data Mining
"...the notion of quantifying a community’s potential bias is nothing short of remarkable."
journalism  statistics  nlp  quantitative  methods  bias  datamining  election 
march 2007 by rybesh
クチコミ評判検索 β版
Japanese "blog buzz" index which analyzes blog content to track word of mouth in different domains, from business to fashion to sports.
blog  nlp  statistics  marketing  japan 
february 2007 by rybesh
Manifold - Wikipedia, the free encyclopedia
A manifold is an abstract mathematical space in which every point has a neighborhood which resembles Euclidean space, but in which the global structure may be more complicated.
math  machinelearning  statistics 
november 2006 by rybesh
Wikistats
Discusses stats.wikimedia.org, which no longer seems to be running.
wiki  research  statistics  tools  cs294project 
november 2006 by rybesh
Wikistats/Measuring Article Quality
This article discusses measuring Wikipedia quality in conceptual terms.
wiki  quality  statistics 
november 2006 by rybesh
http://tools.wikimedia.de/~interiot/cgi-bin/Tool1/wannabe_kate
Of the Wikipedia edit counting tools I've tried, this one seems to work the best.
wiki  research  statistics  tools  cs294project 
november 2006 by rybesh
Wikipedia:WikiProject edit counters
Overview of the many various tools available for measuring the number of Wikipedia edits a particular user has made.
wiki  research  statistics  tools  cs294project 
november 2006 by rybesh
WikiCharts — Top 100 — 11/2006
This tool shows the articles from the English Wikipedia that are viewed most.
wiki  research  statistics  tools  cs294project 
november 2006 by rybesh
Wikimedia Toolserver
This server fosters the development and continuing operation of software tools for the analysis and improvement of the free content of the Wikimedia projects.
wiki  research  statistics  tools  cs294project 
november 2006 by rybesh
Category:Research - Meta
More Wikipedia statistics. These statistics are less general and more focused on quantitative sociological research.
wiki  research  statistics  cs294project 
november 2006 by rybesh
Category:Wikipedia statistics
Index of pages listing various statistics that have been collected about Wikipedia use.
wiki  research  statistics  cs294project 
november 2006 by rybesh
Statistical Data Mining Tutorials
A set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.
machinelearning  reference  statistics  howto 
september 2006 by rybesh
Topic Modeling Toolbox
Tools for entity recognition, extraction and linking.
nlp  tools  research  statistics  datamining  analysis  matlab 
july 2006 by rybesh
ahhhhhh visualization
A dot plot visualization that conveys the number of results obtained from Google search queries for words of the form a{n}h{m}.
search  statistics  language  infoviz  technology 
july 2006 by rybesh
Quantitative Research Methods for Information Systems and Management
Quantitative methods for data collection and analysis. Research design. Conceptualization, operationalization, measurement.
courses  fall2006  berkeley  SoI  quantitative  methods  statistics  current 
july 2006 by rybesh
Where are we? Rise of the Videonet
Online video stats and the emergence of genres.
web  video  statistics  genre 
june 2006 by rybesh
Million Dollar Blocks
New York City and Wichita, KS, are among the many cities in the United States in which the state regularly spends more than one million dollars to incarcerate prisoners who live within a single census block.
infoviz  maps  statistics  government  prison  economics 
june 2006 by rybesh
R/SPlus - Python Interface
This allows Python programmers unfamiliar with the syntax of R to easily use its functionality and vice versa.
python  R  statistics 
april 2006 by rybesh
UC DATA
UC DATA is UC Berkeley's principal archive of computerized social science and health statistics information and is a part of the University's Survey Research Center.
berkeley  statistics  quantitative  research  reference  search 
april 2006 by rybesh
MediaPost Publications - Points North: Consumers Crave Web-Based TV - 02/15/2006
While 25 percent of Internet users are interested in watching downloaded TV shows and movies on their PCs, 38 percent expressed interest in watching that same video on their TVs.
tv  web  video  consumer  statistics  timetags 
february 2006 by rybesh
Research Methods Knowledge Base
A comprehensive web-based textbook that addresses all of the topics in a typical introductory undergraduate or graduate course in social research methods.
reference  social  research  methods  statistics 
february 2006 by rybesh
S Routines for Social Network Analysis in the R Environment
This is a fully documented collection of R routines for social network analysis; utilities included range from hierarchical Bayesian modeling of informant accuracy to logistic network regression.
social  networking  analysis  tools  R  statistics 
december 2005 by rybesh
Statistical Techniques for Audio and Video Processing
The topics include audio and video object recognition, speech recognition, restoration of corrupted video and audio data, and object discovery in audio and video streams.
courses  statistics  audio  video  contentanalysis 
december 2005 by rybesh
Statnet
Statnet is a software package for social network analysis based on recent advances in the statistical modeling of random graphs. Runs in R.
statistics  social  networking  analysis  tools 
december 2005 by rybesh
Octave
GNU Octave is a high-level language, primarily intended for numerical computations.
analysis  math  unix  osx  tools  statistics 
december 2005 by rybesh
DMA|Stat Spring 2005
By adopting the language of software, algorithms and databases (in short, the langugae of computer science), it is possible to characterize works of "new media."
statistics  newmedia  courses 
december 2005 by rybesh
Parsing the State of the Union
To search for your own words or phrases, or to compare the occurrence of two words in Bush’s State of the Union Addresses, please try the State of the Union Parsing Tool.
politics  political  media  analysis  language  infoviz  speech  statistics  search 
december 2005 by rybesh
WWW2006 Workshop - Logging Traces of Web Activity: The Mechanics of Data Collection
This one day workshop will examine the trade-offs and challenges inherent to the different logging approaches and provide workshop attendees the opportunity to discuss both previous data collection experiences and upcoming challenges.
web  conference  2006  workshop  statistics  datamining 
december 2005 by rybesh
Onlife
Onlife is an application for the Mac OS X that observes your every interaction and then creates a personal shoebox of all the web pages you visit, emails you read, documents you write and much more.
attention  statistics  search  tools  osx 
december 2005 by rybesh
Designated Emphasis in Communication, Computation and Statistics
The DE in Communication, Computation and Statistics enables specialized, multi-disciplinary training and research opportunities in various emerging areas of information technology.
berkeley  statistics  communication  cs  courses 
november 2005 by rybesh
Quantitative/Statistical Research Methods in Social Sciences -- Sociology (SOCIOL) C271D
Selected topics in quantitative/statistical methods of research in the social sciences and particularly in sociology.
sociology  statistics  quantitative  methods  berkeley  spring2006  courses 
november 2005 by rybesh
Enthought Python
A Python distribution that comes with even more useful capabilities already installed and ready for use.
python  windows  science  math  statistics  datamining  tools  opensource  code 
october 2005 by rybesh
The R Project for Statistical Computing
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
code  datamining  language  math  opensource  statistics  tools 
october 2005 by rybesh
RPy
RPy is a very simple, yet robust, Python interface to the R statistical programming language.
python  statistics  code  math 
october 2005 by rybesh
Orange
Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques.
machinelearning  classification  code  datamining  python  opensource  tools  nlp  statistics 
october 2005 by rybesh
Data Mining in Python
This is a collection of libraries useful for machine learning and data mining.
python  statistics  machinelearning  nlp  code  opensource  datamining 
october 2005 by rybesh
stats.py
A collection of statistical functions, ranging from descriptive statistics (mean, median, histograms, variance, skew, kurtosis, etc.) to inferential statistics (t-tests, F-tests, chi-square, etc.).
python  statistics  opensource  code 
october 2005 by rybesh
SciPy Scientific Tools for Python
SciPy includes modules for graphics and plotting, optimization, integration, special functions, signal and image processing, genetic algorithms, ODE solvers, and others.
python  opensource  code  science  statistics  tools  math 
october 2005 by rybesh
« earlier      

related tags

actor-network_theory  advertising  ai  american  analysis  analytics  archaeology  archives  attention  audio  bayes  berkeley  bias  blog  books  brands  business  cartography  charts  classification  code  collaboration  communication  conference  consumer  contentanalysis  corpus  courses  critique  cs  cs294project  culture  current  data  database  datamining  dataprocessing  demographics  description  development  digitalhumanities  economics  editing  education  election  epistemolgy  epistemology  facts  fall2006  genre  government  graphics  hiphop  history  howto  image  infoviz  inls520  ir  japan  journalism  knowledge  language  libraries  linearalgebra  logreg  machinelearning  management  maps  marketing  math  matlab  maxent  media  metadata  methods  modeling  msmdx  music  networking  network_theory  newmedia  nlp  ocaml  ontology  opendata  opensource  osx  p2p  patterns  planning  playlist  political  politics  prison  productivity  python  quality  quantitative  questions  r  RDF  realtime  reference  relationships  research  scala  science  search  semweb  social  socialscience  sociology  software  SoI  space  sparql  speech  spring2006  sql  standard  statistics  technology  temporality  textanalysis  textmining  timetags  tools  tv  unix  urn:asin:0062731025  urn:asin:0486240614  urn:asin:158488388X  video  visualization  vocabulary  web  webinfo  wiki  windows  wishlist  workshop 

Copy this bookmark:



description:


tags: