cshalizi + r   47

Evaluating the Design of the R Language: Objects and Functions For Data Analysis
"Risadynamiclanguageforstatisticalcomputingthatcombineslazy functional features and object-oriented programming. This rather unlikely lin- guistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features."

There's something a bit odd about evaluating a language designed for statistical computing on a set of benchmarks which do not include statistical problems...
R  programming  programming_languages  computational_statistics  how_outsiders_see_us  to_teach:statcomp  via:aaron_clauset 
4 weeks ago by cshalizi
The benchden Package: Benchmark Densities for Nonparametric Density Estimation
"This article describes the benchden package which implements a set of 28 example densities for nonparametric density estimation in R. In addition to the usual functions that evaluate the density, distribution and quantile functions or generate random variates, a function designed to be specifically useful for larger simulation studies has been added. After describing the set of densities and the usage of the package, a small toy example of a simulation study conducted using the benchden package is given."
to:NB  computational_statistics  R  density_estimation  nonparametrics  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
Graphical Models with R
"Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences.  Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years.  In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software.  This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages.  In addition, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R.  Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data."
to:NB  books:noted  R  statistics  graphical_models  lauritzen.steffen  computational_statistics 
12 weeks ago by cshalizi
A Multi-Language Computing Environment for Literate Programming and Reproducible Research
"We present a new computing environment for authoring mixed natural and computer language documents. In this environment a single hierarchically-organized plain text source file may contain a variety of elements such as code in arbitrary programming languages, raw data, links to external resources, project management data, working notes, and text for publication. Code fragments may be executed in situ with graphical, numerical and textual output captured or linked in the file. Export to LATEX, HTML, LATEX beamer, DocBook and other formats permits working reports, presentations and manuscripts for publication to be generated from the file. In addition, functioning pure code files can be automatically extracted from the file. This environment is implemented as an extension to the Emacs text editor and provides a rich set of features for authoring both prose and code, as well as sophisticated project management capabilities."
paper_writing  programming  R  latex  to_read 
february 2012 by cshalizi
Wiley: Mathematical Statistics with Resampling and R
"Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. This groundbreaking book shows how to apply modern resampling techniques to mathematical statistics. Extensively class-tested to ensure an accessible presentation, Mathematical Statistics with Resampling and R utilizes the powerful and flexible computer language R to underscore the significance and benefits of modern resampling techniques."

--- This might be a good book for a baby stats. class; but even if it's a _great_ book, how on Earth am I supposed to justify asking students to spend $130 for it?
books:noted  statistics  bootstrap  R 
december 2011 by cshalizi
R Graph Gallery - Donations Welcome - Romain Francois, Professional R Enthusiast
The R Graph Gallery is an under-utilized resource, and sending a little money Romain's way is not a bad thing.
R  programming  to_teach:statcomp  statistics  visual_display_of_quantitative_information 
october 2011 by cshalizi
US Census Spatial and Demographic Data in R: The UScensus2000 Suite of Packages
"The US Decennial Census is arguably the most important data set for social science research in the United States. The UScensus2000 suite of packages allows for convenient handling of the 2000 US Census spatial and demographic data. The goal of this article is to showcase the UScensus2000 suite of packages for R, to describe the data contained within these packages, and to demonstrate the helper functions provided for handling this data. The UScensus2000 suite is comprised of spatial and demographic data for the 50 states and Washington DC at four different geographic levels (block, block group, tract, and census designated place). The UScensus2000 suite also contains a number of functions for selecting and aggregating specific geographies or demographic information such as metropolitan statistical areas, counties, etc. ... This article will provide the necessary background for working with this data set, helper functions, and finish with an applied spatial statistics example."
data_sets  census  R  to_teach:undergrad-ADA 
december 2010 by cshalizi
Environmental Modelling & Software : NetLogo meets R: Linking agent-based models with a toolbox for their analysis
"NetLogo is a software platform for agent-based modelling that is increasingly used in ecological and environmental modelling. So far, for comprehensive analyses of agent-based models (ABMs) implemented in NetLogo, results needed to be written to files and evaluated by using external software, for example R. Ideally, however, it would be possible to call any R function from within a NetLogo program. This would allow sophisticated interactive statistical analysis of model structure and dynamics, using R functions and packages for generating certain statistical distributions and experimental design, and for implementing complex descriptive submodels within ABMs. Here we present an R extension of NetLogo. It consists of only nine new NetLogo primitives for sending data between NetLogo and R and for calling R functions (six additional primitives for debugging). We demonstrate the usage of the R extension with three short examples."
R  agent-based_models 
october 2010 by cshalizi
Unit Testing in R: The Bare Minimum
I hesitate about the teaching tag, this seems quite clunky --- but perhaps it's not that bad when you try it.
via:arsyed  programming  R  to_teach:data-mining  to_teach:statcomp 
august 2010 by cshalizi
depmixS4: An R Package for Hidden Markov Models
"depmixS4 implements a general framework for defining and estimating dependent mixture models in the R programming language. This includes standard Markov models, latent/hidden Markov models, and latent class and finite mixture distribution models. The models can be fitted on mixed multivariate data with distributions from the glm family, the (logistic) multinomial, or the multivariate normal distribution. Other distributions can be added easily, and an example is provided with the exgaus distribution. Parameters are estimated by the expectation-maximization (EM) algorithm or, when (linear) constraints are imposed on the parameters, by direct numerical optimization with the Rsolnp or Rdonlp2 routines."
statistics  computational_statistics  R  markov_models  mixture_models  to_teach:data-mining  to_teach:complexity-and-inference  to_teach:undergrad-ADA 
august 2010 by cshalizi
The SHOGUN Machine Learning Toolbox
C++ library with R interface, supposedly good for Really Big data. Consider for 350?
machine_learning  computational_statistics  programming  to_read  to_teach:data-mining  R  c++ 
july 2010 by cshalizi
Using R for Cross-Cultural Research (Dow)
Describes working with the standard cross-cultural sample in R. TODO: track down the actual file! TODO: think about devising suitable examples/problems for data mining.
anthropology  R  data_sets  via:nikete  to_teach:data-mining  track_down_references  to_teach:undergrad-ADA 
november 2009 by cshalizi
Powell's Books - R in a Nutshell (In a Nutshell) by Joseph Adler
About 1/2 R as a programming language, and 1/2 shot explanations of how to do particular analyses in R. I've now used it with decent success as a supplemental textbook.
R  programming  statistics  books:recommended 
november 2009 by cshalizi
[0909.1234] High-dimensional Graphical Model Search with gRapHD R Package
"This paper presents the R package gRapHD for efficient selection of high-dimensional undirected graphical models. The package provides tools for selecting trees, forests and decomposable models minimizing information criteria such as AIC or BIC, and for displaying the independence graphs of the models. It has also some useful tools for analysing graphical structures. It supports the use of discrete, continuous, or both types of variables."
statistics  R  graphical_models 
september 2009 by cshalizi
Choosing Your Workflow Applications
We should consider distributing this to the incoming graduate students. (Except we'd need to make it clear that using Word is NOT ACCEPTABLE.)
paper_writing  productivity_software  workflow  advice  healy.kieran  sweave  R  emacs  version_control  latex  to_teach  to_teach:undergrad-research  to_teach:ADA 
august 2009 by cshalizi
Sweave
"Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. Instead of inserting a prefabricated graph or table into the report, the master document contains the R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.) is created on the fly and inserted into a final latex document. The report can be automatically updated if data or analysis change, which allows for truly reproducible research."
sweave  R  latex  paper_writing  programming  via:jhofman  where_have_you_been_all_my_life  data_analysis 
august 2009 by cshalizi
ReadMe: Software for Automated Text Analysis
"The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories. If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded. ReadMe computes quantities of interest to the scientific community based on the distribution within categories but does so by skipping the more error prone intermediate step of classifing individual documents. Other procedures are also included to make processing text easy."
to_teach:data-mining  text_mining  content_analysis  R  software  linguistics  statistics  via:chl  king.gary 
june 2009 by cshalizi
R Fundamentals and Programming Techniques (Lumley)
Very reasonable set of slides from Thomas Lumley. I wouldn't plan on actually using them in a course --- they don't quite fit my style --- but I would put them on a list of pointers for students.
statistics  programming  R  to_teach:data-mining  to_teach:complexity-and-inference  via:jhofman  to_teach:undergrad-ADA 
may 2009 by cshalizi
CRAN - Package sspir
State-space modeling with linear/Gaussian state evolution and generalized linear models for the observations. Looks reasonable, lacks a few improvements like diffuse initial conditions in the Kalman filter.
state-space_models  time_series  R  filtering  state_estimation  to_teach 
february 2009 by cshalizi
The R-Perl Interface
"This package provides a bidirectional interface for calling R from Perl and Perl from R."
programming  R  statistics  perl 
february 2009 by cshalizi
The R Language
Online HTML R documentation. Doesn't cover all the packages, but handy for linking to the basics.
R  statistics 
february 2009 by cshalizi
Grammar of Graphics 2 (R)
Nice-looking graphics system for R; draft book and R package.
R  visual_display_of_quantitative_information  via:kjhealy  books:noted 
january 2009 by cshalizi
Quick-R: Home Page
Well-designed introduction to R for people who know the usual commercial statistical packages.
R  teaching  via:fionajay  to_teach:data-mining  to_teach:undergrad-ADA 
july 2008 by cshalizi

related tags

advice  agent-based_models  aligheri.dante  anthropology  apple  books:noted  books:recommended  bootstrap  burns.patrick  c++  census  clustering  computational_statistics  content_analysis  conway.drew  databases  data_analysis  data_mining  data_sets  density_estimation  diffusion_maps  dimension_reduction  emacs  em_algorithm  estimation  filtering  franklin.charles  funny:academic  funny:geeky  funny:malicious  graphical_models  handcock.mark  have_read  hayfield.tristen  healy.kieran  heteroskedasticity  how_outsiders_see_us  intro_prob  intro_stats  kalman_filter  kernel_methods  king.gary  kith_and_kin  latex  lauritzen.steffen  lee.ann  linguistics  literary_homage  machine_learning  manifold_learning  markov_models  mixture_models  morris.martina  nonparametrics  optimization  paper_writing  perl  productivity_software  programming  programming_languages  r  racine.jeffrey  regression  relative_distributions  richards.joey  simulation  software  spectral_clustering  state-space_models  state_estimation  statistical_inference_for_stochastic_processes  statistics  surveillance  sweave  teaching  text_mining  time_series  to:NB  to_read  to_teach  to_teach:ADA  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:statcomp  to_teach:undergrad-ADA  to_teach:undergrad-research  track_down_references  utter_stupidity  version_control  verzani.john  via:aaron_clauset  via:arsyed  via:chl  via:fionajay  via:gelman  via:jhofman  via:kjhealy  via:nikete  via:vqv  visual_display_of_quantitative_information  where_have_you_been_all_my_life  workflow  xml 

Copy this bookmark:



description:


tags: