Evaluating the Design of the R Language: Objects and Functions For Data Analysis
4 weeks ago by cshalizi
"Risadynamiclanguageforstatisticalcomputingthatcombineslazy functional features and object-oriented programming. This rather unlikely lin- guistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features."
There's something a bit odd about evaluating a language designed for statistical computing on a set of benchmarks which do not include statistical problems...
R
programming
programming_languages
computational_statistics
how_outsiders_see_us
to_teach:statcomp
via:aaron_clauset
There's something a bit odd about evaluating a language designed for statistical computing on a set of benchmarks which do not include statistical problems...
4 weeks ago by cshalizi
The benchden Package: Benchmark Densities for Nonparametric Density Estimation
7 weeks ago by cshalizi
"This article describes the benchden package which implements a set of 28 example densities for nonparametric density estimation in R. In addition to the usual functions that evaluate the density, distribution and quantile functions or generate random variates, a function designed to be specifically useful for larger simulation studies has been added. After describing the set of densities and the usage of the package, a small toy example of a simulation study conducted using the benchden package is given."
to:NB
computational_statistics
R
density_estimation
nonparametrics
to_teach:undergrad-ADA
7 weeks ago by cshalizi
Graphical Models with R
12 weeks ago by cshalizi
"Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences. Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years. In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software. This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages. In addition, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R. Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data."
to:NB
books:noted
R
statistics
graphical_models
lauritzen.steffen
computational_statistics
12 weeks ago by cshalizi
A Multi-Language Computing Environment for Literate Programming and Reproducible Research
february 2012 by cshalizi
"We present a new computing environment for authoring mixed natural and computer language documents. In this environment a single hierarchically-organized plain text source file may contain a variety of elements such as code in arbitrary programming languages, raw data, links to external resources, project management data, working notes, and text for publication. Code fragments may be executed in situ with graphical, numerical and textual output captured or linked in the file. Export to LATEX, HTML, LATEX beamer, DocBook and other formats permits working reports, presentations and manuscripts for publication to be generated from the file. In addition, functioning pure code files can be automatically extracted from the file. This environment is implemented as an extension to the Emacs text editor and provides a rich set of features for authoring both prose and code, as well as sophisticated project management capabilities."
paper_writing
programming
R
latex
to_read
february 2012 by cshalizi
Wiley: Mathematical Statistics with Resampling and R
december 2011 by cshalizi
"Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. This groundbreaking book shows how to apply modern resampling techniques to mathematical statistics. Extensively class-tested to ensure an accessible presentation, Mathematical Statistics with Resampling and R utilizes the powerful and flexible computer language R to underscore the significance and benefits of modern resampling techniques."
--- This might be a good book for a baby stats. class; but even if it's a _great_ book, how on Earth am I supposed to justify asking students to spend $130 for it?
books:noted
statistics
bootstrap
R
--- This might be a good book for a baby stats. class; but even if it's a _great_ book, how on Earth am I supposed to justify asking students to spend $130 for it?
december 2011 by cshalizi
R Graph Gallery - Donations Welcome - Romain Francois, Professional R Enthusiast
october 2011 by cshalizi
The R Graph Gallery is an under-utilized resource, and sending a little money Romain's way is not a bad thing.
R
programming
to_teach:statcomp
statistics
visual_display_of_quantitative_information
october 2011 by cshalizi
US Census Spatial and Demographic Data in R: The UScensus2000 Suite of Packages
december 2010 by cshalizi
"The US Decennial Census is arguably the most important data set for social science research in the United States. The UScensus2000 suite of packages allows for convenient handling of the 2000 US Census spatial and demographic data. The goal of this article is to showcase the UScensus2000 suite of packages for R, to describe the data contained within these packages, and to demonstrate the helper functions provided for handling this data. The UScensus2000 suite is comprised of spatial and demographic data for the 50 states and Washington DC at four different geographic levels (block, block group, tract, and census designated place). The UScensus2000 suite also contains a number of functions for selecting and aggregating specific geographies or demographic information such as metropolitan statistical areas, counties, etc. ... This article will provide the necessary background for working with this data set, helper functions, and finish with an applied spatial statistics example."
data_sets
census
R
to_teach:undergrad-ADA
december 2010 by cshalizi
Environmental Modelling & Software : NetLogo meets R: Linking agent-based models with a toolbox for their analysis
october 2010 by cshalizi
"NetLogo is a software platform for agent-based modelling that is increasingly used in ecological and environmental modelling. So far, for comprehensive analyses of agent-based models (ABMs) implemented in NetLogo, results needed to be written to files and evaluated by using external software, for example R. Ideally, however, it would be possible to call any R function from within a NetLogo program. This would allow sophisticated interactive statistical analysis of model structure and dynamics, using R functions and packages for generating certain statistical distributions and experimental design, and for implementing complex descriptive submodels within ABMs. Here we present an R extension of NetLogo. It consists of only nine new NetLogo primitives for sending data between NetLogo and R and for calling R functions (six additional primitives for debugging). We demonstrate the usage of the R extension with three short examples."
R
agent-based_models
october 2010 by cshalizi
Unit Testing in R: The Bare Minimum
august 2010 by cshalizi
I hesitate about the teaching tag, this seems quite clunky --- but perhaps it's not that bad when you try it.
via:arsyed
programming
R
to_teach:data-mining
to_teach:statcomp
august 2010 by cshalizi
depmixS4: An R Package for Hidden Markov Models
august 2010 by cshalizi
"depmixS4 implements a general framework for defining and estimating dependent mixture models in the R programming language. This includes standard Markov models, latent/hidden Markov models, and latent class and finite mixture distribution models. The models can be fitted on mixed multivariate data with distributions from the glm family, the (logistic) multinomial, or the multivariate normal distribution. Other distributions can be added easily, and an example is provided with the exgaus distribution. Parameters are estimated by the expectation-maximization (EM) algorithm or, when (linear) constraints are imposed on the parameters, by direct numerical optimization with the Rsolnp or Rdonlp2 routines."
statistics
computational_statistics
R
markov_models
mixture_models
to_teach:data-mining
to_teach:complexity-and-inference
to_teach:undergrad-ADA
august 2010 by cshalizi
The SHOGUN Machine Learning Toolbox
july 2010 by cshalizi
C++ library with R interface, supposedly good for Really Big data. Consider for 350?
machine_learning
computational_statistics
programming
to_read
to_teach:data-mining
R
c++
july 2010 by cshalizi
Using R for Cross-Cultural Research (Dow)
november 2009 by cshalizi
Describes working with the standard cross-cultural sample in R. TODO: track down the actual file! TODO: think about devising suitable examples/problems for data mining.
anthropology
R
data_sets
via:nikete
to_teach:data-mining
track_down_references
to_teach:undergrad-ADA
november 2009 by cshalizi
Powell's Books - R in a Nutshell (In a Nutshell) by Joseph Adler
november 2009 by cshalizi
About 1/2 R as a programming language, and 1/2 shot explanations of how to do particular analyses in R. I've now used it with decent success as a supplemental textbook.
R
programming
statistics
books:recommended
november 2009 by cshalizi
R style guide
october 2009 by cshalizi
Reasonable though not mandatory.
R
programming
via:jhofman
to_teach:data-mining
to_teach:undergrad-ADA
to_teach:statcomp
october 2009 by cshalizi
Diffusion Maps @ CRAN
october 2009 by cshalizi
Diffusion maps in R. (How handy that the TA for the class wrote the package...)
R
richards.joey
diffusion_maps
manifold_learning
dimension_reduction
to_teach:data-mining
kith_and_kin
spectral_clustering
lee.ann
october 2009 by cshalizi
[0909.1234] High-dimensional Graphical Model Search with gRapHD R Package
september 2009 by cshalizi
"This paper presents the R package gRapHD for efficient selection of high-dimensional undirected graphical models. The package provides tools for selecting trees, forests and decomposable models minimizing information criteria such as AIC or BIC, and for displaying the independence graphs of the models. It has also some useful tools for analysing graphical structures. It supports the use of discrete, continuous, or both types of variables."
statistics
R
graphical_models
september 2009 by cshalizi
[0908.3817] Learning Bayesian Networks with the bnlearn Package
august 2009 by cshalizi
To read; possibly useful for the end of 350 when I want to talk about causality?
to_read
graphical_models
to_teach:data-mining
R
statistics
machine_learning
to_teach:undergrad-ADA
august 2009 by cshalizi
Choosing Your Workflow Applications
august 2009 by cshalizi
We should consider distributing this to the incoming graduate students. (Except we'd need to make it clear that using Word is NOT ACCEPTABLE.)
paper_writing
productivity_software
workflow
advice
healy.kieran
sweave
R
emacs
version_control
latex
to_teach
to_teach:undergrad-research
to_teach:ADA
august 2009 by cshalizi
Sweave
august 2009 by cshalizi
"Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. Instead of inserting a prefabricated graph or table into the report, the master document contains the R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.) is created on the fly and inserted into a final latex document. The report can be automatically updated if data or analysis change, which allows for truly reproducible research."
sweave
R
latex
paper_writing
programming
via:jhofman
where_have_you_been_all_my_life
data_analysis
august 2009 by cshalizi
ReadMe: Software for Automated Text Analysis
june 2009 by cshalizi
"The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories. If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded. ReadMe computes quantities of interest to the scientific community based on the distribution within categories but does so by skipping the more error prone intermediate step of classifing individual documents. Other procedures are also included to make processing text easy."
to_teach:data-mining
text_mining
content_analysis
R
software
linguistics
statistics
via:chl
king.gary
june 2009 by cshalizi
R Fundamentals and Programming Techniques (Lumley)
may 2009 by cshalizi
Very reasonable set of slides from Thomas Lumley. I wouldn't plan on actually using them in a course --- they don't quite fit my style --- but I would put them on a list of pointers for students.
statistics
programming
R
to_teach:data-mining
to_teach:complexity-and-inference
via:jhofman
to_teach:undergrad-ADA
may 2009 by cshalizi
CRAN - Package sspir
february 2009 by cshalizi
State-space modeling with linear/Gaussian state evolution and generalized linear models for the observations. Looks reasonable, lacks a few improvements like diffuse initial conditions in the Kalman filter.
state-space_models
time_series
R
filtering
state_estimation
to_teach
february 2009 by cshalizi
The R-Perl Interface
february 2009 by cshalizi
"This package provides a bidirectional interface for calling R from Perl and Perl from R."
programming
R
statistics
perl
february 2009 by cshalizi
The R Language
february 2009 by cshalizi
Online HTML R documentation. Doesn't cover all the packages, but handy for linking to the basics.
R
statistics
february 2009 by cshalizi
The R Inferno
january 2009 by cshalizi
"If you are using R and you think you’re in hell, this is a map for you. "
R
programming
to_teach:complexity-and-inference
to_teach:data-mining
via:jhofman
literary_homage
funny:academic
to_teach:undergrad-ADA
burns.patrick
aligheri.dante
to_teach:statcomp
january 2009 by cshalizi
Grammar of Graphics 2 (R)
january 2009 by cshalizi
Nice-looking graphics system for R; draft book and R package.
R
visual_display_of_quantitative_information
via:kjhealy
books:noted
january 2009 by cshalizi
Nonparametric Econometrics: The np Package - Hayfield and Racine, J. Stat. Soft.
october 2008 by cshalizi
Non- and semi- parametric kernel-estimation goodness. Definitely using this next time I teach data-mining.
to_teach:data-mining
kernel_methods
regression
density_estimation
computational_statistics
R
hayfield.tristen
racine.jeffrey
to_teach:undergrad-ADA
october 2008 by cshalizi
Charles Franklin - Lecture on Heteroskedastic Regression
october 2008 by cshalizi
Cute: parameterize the variance and then just do a bigger MLE.
regression
R
heteroskedasticity
to_teach:data-mining
franklin.charles
statistics
estimation
october 2008 by cshalizi
Quick-R: Home Page
july 2008 by cshalizi
Well-designed introduction to R for people who know the usual commercial statistical packages.
R
teaching
via:fionajay
to_teach:data-mining
to_teach:undergrad-ADA
july 2008 by cshalizi
Relative Distribution Methods in the Social Sciences: Software (R)
november 2007 by cshalizi
R package for implementing relative distribution methods
computational_statistics
relative_distributions
to_teach:data-mining
to_teach:complexity-and-inference
morris.martina
handcock.mark
R
to_teach:undergrad-ADA
november 2007 by cshalizi
related tags
advice ⊕ agent-based_models ⊕ aligheri.dante ⊕ anthropology ⊕ apple ⊕ books:noted ⊕ books:recommended ⊕ bootstrap ⊕ burns.patrick ⊕ c++ ⊕ census ⊕ clustering ⊕ computational_statistics ⊕ content_analysis ⊕ conway.drew ⊕ databases ⊕ data_analysis ⊕ data_mining ⊕ data_sets ⊕ density_estimation ⊕ diffusion_maps ⊕ dimension_reduction ⊕ emacs ⊕ em_algorithm ⊕ estimation ⊕ filtering ⊕ franklin.charles ⊕ funny:academic ⊕ funny:geeky ⊕ funny:malicious ⊕ graphical_models ⊕ handcock.mark ⊕ have_read ⊕ hayfield.tristen ⊕ healy.kieran ⊕ heteroskedasticity ⊕ how_outsiders_see_us ⊕ intro_prob ⊕ intro_stats ⊕ kalman_filter ⊕ kernel_methods ⊕ king.gary ⊕ kith_and_kin ⊕ latex ⊕ lauritzen.steffen ⊕ lee.ann ⊕ linguistics ⊕ literary_homage ⊕ machine_learning ⊕ manifold_learning ⊕ markov_models ⊕ mixture_models ⊕ morris.martina ⊕ nonparametrics ⊕ optimization ⊕ paper_writing ⊕ perl ⊕ productivity_software ⊕ programming ⊕ programming_languages ⊕ r ⊖ racine.jeffrey ⊕ regression ⊕ relative_distributions ⊕ richards.joey ⊕ simulation ⊕ software ⊕ spectral_clustering ⊕ state-space_models ⊕ state_estimation ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ surveillance ⊕ sweave ⊕ teaching ⊕ text_mining ⊕ time_series ⊕ to:NB ⊕ to_read ⊕ to_teach ⊕ to_teach:ADA ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:statcomp ⊕ to_teach:undergrad-ADA ⊕ to_teach:undergrad-research ⊕ track_down_references ⊕ utter_stupidity ⊕ version_control ⊕ verzani.john ⊕ via:aaron_clauset ⊕ via:arsyed ⊕ via:chl ⊕ via:fionajay ⊕ via:gelman ⊕ via:jhofman ⊕ via:kjhealy ⊕ via:nikete ⊕ via:vqv ⊕ visual_display_of_quantitative_information ⊕ where_have_you_been_all_my_life ⊕ workflow ⊕ xml ⊕Copy this bookmark: