R, Julia and genome wide selection
5 weeks ago by rahuldave
(This article was first published on Quantum Forest » rblogs, and kindly contributed to R-bloggers)
— “You are a pussy” emailed my friend.
— “Sensu cat?” I replied.
— “No. Sensu chicken” blurbed my now ex-friend.
What was this about? He read my post on R, Julia and the shiny new thing, which prompted him to assume that I was the proverbial old dog unwilling (or was it unable?) to learn new tricks. (Incidentally, with friends like this who needs enemies? Hi, Gus.)
Having a look at different—statistical—horses (Photo: Luis).
I decided to tackle a small—but hopefully useful—piece of code: fitting/training a Genome Wide Selection model, using the Bayes A approach put forward by Meuwissen, Hayes and Goddard in 2001. In that approach the breeding values of the individuals (response) are expressed as a function of a very large number of random predictors (2000, our molecular markers). The dataset (csv file) is a simulation of 2000 bi-allelic markers (aa = 0, Aa = 1, AA = 2) for 250 individuals, followed by the phenotypes (column 2001) and breeding values (column 2002). These models are frequently adjusted using MCMC.
In 2010 I attended this course in Ames, Iowa where Rohan Fernando passed us the following R code (pretty much a transliteration from C code; notice the trailing semicolons, for example):
nmarkers = 2000; # number of markers
startMarker = 1981; # set to 1 to use all
numiter = 2000; # number of iterations
vara = 1.0/20.0;
# input data
data = matrix(scan("trainData.out0"),ncol=nmarkers+2,byrow=TRUE);
nrecords = dim(data)[1];
beg = Sys.time()
# x has the mean followed by the markers
x = cbind(1,data[,startMarker:nmarkers]);
y = data[,nmarkers+1];
a = data[,nmarkers+2];
# inital values
nmarkers = nmarkers - startMarker + 1;
mean2pq = 0.5; # just an approximation
scalea = 0.5*vara/(nmarkers*mean2pq); # 0.5 = (v-2)/v for v=4
size = dim(x)[2];
b = array(0.0,size);
meanb = b;
b[1] = mean(y);
var = array(0.0,size);
# adjust y
ycorr = y - x%*%b;
# MCMC sampling
for (iter in 1:numiter){
# sample vare
vare = ( t(ycorr)%*%ycorr )/rchisq(1,nrecords + 3);
# sample intercept
ycorr = ycorr + x[,1]*b[1];
rhs = sum(ycorr)/vare;
invLhs = 1.0/(nrecords/vare);
mean = rhs*invLhs;
b[1] = rnorm(1,mean,sqrt(invLhs));
ycorr = ycorr - x[,1]*b[1];
meanb[1] = meanb[1] + b[1];
# sample variance for each locus
for (locus in 2:size){
var[locus] = (scalea*4+b[locus]*b[locus])/rchisq(1,4.0+1)
}
# sample effect for each locus
for (locus in 2:size){
# unadjust y for this locus
ycorr = ycorr + x[,locus]*b[locus];
rhs = t(x[,locus])%*%ycorr/vare;
lhs = t(x[,locus])%*%x[,locus]/vare + 1.0/var[locus];
invLhs = 1.0/lhs;
mean = invLhs*rhs;
b[locus]= rnorm(1,mean,sqrt(invLhs));
#adjust y for the new value of this locus
ycorr = ycorr - x[,locus]*b[locus];
meanb[locus] = meanb[locus] + b[locus];
}
}
Sys.time() - beg
meanb = meanb/numiter;
aHat = x %*% meanb;
Thus, we just need defining a few variables, reading the data (marker genotypes, breeding values and phenotypic data) into a matrix, creating loops, matrix and vector multiplication and generating random numbers (using a Gaussian and Chi squared distributions). Not much if you think about it, but I didn’t have much time to explore Julia’s features as to go for something more complex.
nmarkers = 2000 # Number of markers
startmarker = 1981 # Set to 1 to use all
numiter = 2000 # Number of iterations
data = dlmread("markers.csv", ',')
(nrecords, ncols) = size(data)
tic()
#this is the mean and markers matrix
X = hcat(ones(Float64, nrecords), data[:, startmarker:nmarkers])
y = data[:, nmarkers + 1]
a = data[:, nmarkers + 2]
nmarkers = nmarkers - startmarker + 1
vara = 1.0/nmarkers
mean2pq = 0.5
scalea = 0.5*vara/(nmarkers*mean2pq) # 0.5 = (v-2)/v for v=4
ndesign = size(X, 2)
b = zeros(Float64, ndesign)
meanb = zeros(Float64, ndesign)
b[1] = mean(y)
varian = zeros(Float64, ndesign)
# adjust y
ycorr = y - X * b
# MCMC sampling
for i = 1:numiter
# sample vare
vare = dot(ycorr, ycorr )/randchi2(nrecords + 3)
# sample intercept
ycorr = ycorr + X[:, 1] * b[1];
rhs = sum(ycorr)/vare;
invlhs = 1.0/(nrecords/vare);
mn = rhs*invlhs;
b[1] = randn() * sqrt(invlhs) + mn;
ycorr = ycorr - X[:, 1] * b[1];
meanb[1] = meanb[1] + b[1];
# sample variance for each locus
for locus = 2:ndesign
varian[locus] = (scalea*4 + b[locus]*b[locus])/randchi2(4.0 + 1);
end
# sample effect for each locus
for locus = 2:ndesign
# unadjust y for this locus
ycorr = ycorr + X[:, locus] * b[locus];
rhs = dot(X[:, locus], ycorr)/vare;
lhs = dot(X[:, locus], X[:, locus])/vare + 1.0/varian[locus];
invlhs = 1.0/lhs;
mn = invlhs * rhs;
b[locus] = randn() * sqrt(invlhs) + mn;
#adjust y for the new value of this locus
ycorr = ycorr - X[:, locus] * b[locus];
meanb[locus] = meanb[locus] + b[locus];
end
end
toc()
meanb = meanb/numiter;
aHat = X * meanb;
The code looks remarkably similar and there are four main sources of differences:
The first trivial one is that the original code read a binary dataset and I didn’t know how to do it in Julia, so I’ve read a csv file instead (this is why I start timing after reading the file too).
The second trivial one is to avoid name conflicts between variables and functions; for example, in R the user is allowed to have a variable called var that will not interfere with the variance function. Julia is picky about that, so I needed renaming some variables.
Julia pases variables by reference, while R does so by value when assigning matrices, which tripped me because in the original R code there was something like: b = array(0.0,size); meanb = b;. This works fine in R, but in Julia changes to the b vector also changed meanb.
The definition of scalar vs array created some problems in Julia. For example y' * y (t(y) %*% y in R) is numerically equivalent to dot(y, y). However, the first version returns an array, while the second one a scalar. I got an error message when trying to store the ‘scalar like an array’ in to an array. I find that confusing.
One interesting point in this comparison is using rough code, not really optimized for speed; in fact, the only thing that I can say of the Julia code is that ‘it runs’ and it probably is not very idiomatic. Testing runs with different numbers of markers we get that R needs roughly 2.8x the time used by Julia. The Julia website claims better results in benchmarks, but in real life we work with, well, real problems.
In 1996-7 I switched from SAS to ASReml for genetic analyses because it was 1-2 orders of magnitude faster and opened a world of new models. Today a change from R to Julia would deliver (in this particular case) a much more modest speed up (~3x), which is OK but not worth changing languages (yet). Together with the embryonic graphical capabilities and the still-to-develop ecosystem of packages, means that I’m still using R. Nevertheless, the Julia team has achieved very impressive performance in very little time, so it is worth to keep an eye on their progress.
P.S.1 Readers are welcome to suggesting ways of improving the code.
P.S.2 WordPress does not let me upload the binary version of the simulated data.
P.S.3 Hey WordPress guys; it would be handy if the sourcecode plugin supported Julia!
To leave a comment for the author, please follow the link and comment on his blog: Quantum Forest » rblogs.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...
R_bloggers
Bayesian
Julia
R
rblogs
Simulation
stats
from google
— “You are a pussy” emailed my friend.
— “Sensu cat?” I replied.
— “No. Sensu chicken” blurbed my now ex-friend.
What was this about? He read my post on R, Julia and the shiny new thing, which prompted him to assume that I was the proverbial old dog unwilling (or was it unable?) to learn new tricks. (Incidentally, with friends like this who needs enemies? Hi, Gus.)
Having a look at different—statistical—horses (Photo: Luis).
I decided to tackle a small—but hopefully useful—piece of code: fitting/training a Genome Wide Selection model, using the Bayes A approach put forward by Meuwissen, Hayes and Goddard in 2001. In that approach the breeding values of the individuals (response) are expressed as a function of a very large number of random predictors (2000, our molecular markers). The dataset (csv file) is a simulation of 2000 bi-allelic markers (aa = 0, Aa = 1, AA = 2) for 250 individuals, followed by the phenotypes (column 2001) and breeding values (column 2002). These models are frequently adjusted using MCMC.
In 2010 I attended this course in Ames, Iowa where Rohan Fernando passed us the following R code (pretty much a transliteration from C code; notice the trailing semicolons, for example):
nmarkers = 2000; # number of markers
startMarker = 1981; # set to 1 to use all
numiter = 2000; # number of iterations
vara = 1.0/20.0;
# input data
data = matrix(scan("trainData.out0"),ncol=nmarkers+2,byrow=TRUE);
nrecords = dim(data)[1];
beg = Sys.time()
# x has the mean followed by the markers
x = cbind(1,data[,startMarker:nmarkers]);
y = data[,nmarkers+1];
a = data[,nmarkers+2];
# inital values
nmarkers = nmarkers - startMarker + 1;
mean2pq = 0.5; # just an approximation
scalea = 0.5*vara/(nmarkers*mean2pq); # 0.5 = (v-2)/v for v=4
size = dim(x)[2];
b = array(0.0,size);
meanb = b;
b[1] = mean(y);
var = array(0.0,size);
# adjust y
ycorr = y - x%*%b;
# MCMC sampling
for (iter in 1:numiter){
# sample vare
vare = ( t(ycorr)%*%ycorr )/rchisq(1,nrecords + 3);
# sample intercept
ycorr = ycorr + x[,1]*b[1];
rhs = sum(ycorr)/vare;
invLhs = 1.0/(nrecords/vare);
mean = rhs*invLhs;
b[1] = rnorm(1,mean,sqrt(invLhs));
ycorr = ycorr - x[,1]*b[1];
meanb[1] = meanb[1] + b[1];
# sample variance for each locus
for (locus in 2:size){
var[locus] = (scalea*4+b[locus]*b[locus])/rchisq(1,4.0+1)
}
# sample effect for each locus
for (locus in 2:size){
# unadjust y for this locus
ycorr = ycorr + x[,locus]*b[locus];
rhs = t(x[,locus])%*%ycorr/vare;
lhs = t(x[,locus])%*%x[,locus]/vare + 1.0/var[locus];
invLhs = 1.0/lhs;
mean = invLhs*rhs;
b[locus]= rnorm(1,mean,sqrt(invLhs));
#adjust y for the new value of this locus
ycorr = ycorr - x[,locus]*b[locus];
meanb[locus] = meanb[locus] + b[locus];
}
}
Sys.time() - beg
meanb = meanb/numiter;
aHat = x %*% meanb;
Thus, we just need defining a few variables, reading the data (marker genotypes, breeding values and phenotypic data) into a matrix, creating loops, matrix and vector multiplication and generating random numbers (using a Gaussian and Chi squared distributions). Not much if you think about it, but I didn’t have much time to explore Julia’s features as to go for something more complex.
nmarkers = 2000 # Number of markers
startmarker = 1981 # Set to 1 to use all
numiter = 2000 # Number of iterations
data = dlmread("markers.csv", ',')
(nrecords, ncols) = size(data)
tic()
#this is the mean and markers matrix
X = hcat(ones(Float64, nrecords), data[:, startmarker:nmarkers])
y = data[:, nmarkers + 1]
a = data[:, nmarkers + 2]
nmarkers = nmarkers - startmarker + 1
vara = 1.0/nmarkers
mean2pq = 0.5
scalea = 0.5*vara/(nmarkers*mean2pq) # 0.5 = (v-2)/v for v=4
ndesign = size(X, 2)
b = zeros(Float64, ndesign)
meanb = zeros(Float64, ndesign)
b[1] = mean(y)
varian = zeros(Float64, ndesign)
# adjust y
ycorr = y - X * b
# MCMC sampling
for i = 1:numiter
# sample vare
vare = dot(ycorr, ycorr )/randchi2(nrecords + 3)
# sample intercept
ycorr = ycorr + X[:, 1] * b[1];
rhs = sum(ycorr)/vare;
invlhs = 1.0/(nrecords/vare);
mn = rhs*invlhs;
b[1] = randn() * sqrt(invlhs) + mn;
ycorr = ycorr - X[:, 1] * b[1];
meanb[1] = meanb[1] + b[1];
# sample variance for each locus
for locus = 2:ndesign
varian[locus] = (scalea*4 + b[locus]*b[locus])/randchi2(4.0 + 1);
end
# sample effect for each locus
for locus = 2:ndesign
# unadjust y for this locus
ycorr = ycorr + X[:, locus] * b[locus];
rhs = dot(X[:, locus], ycorr)/vare;
lhs = dot(X[:, locus], X[:, locus])/vare + 1.0/varian[locus];
invlhs = 1.0/lhs;
mn = invlhs * rhs;
b[locus] = randn() * sqrt(invlhs) + mn;
#adjust y for the new value of this locus
ycorr = ycorr - X[:, locus] * b[locus];
meanb[locus] = meanb[locus] + b[locus];
end
end
toc()
meanb = meanb/numiter;
aHat = X * meanb;
The code looks remarkably similar and there are four main sources of differences:
The first trivial one is that the original code read a binary dataset and I didn’t know how to do it in Julia, so I’ve read a csv file instead (this is why I start timing after reading the file too).
The second trivial one is to avoid name conflicts between variables and functions; for example, in R the user is allowed to have a variable called var that will not interfere with the variance function. Julia is picky about that, so I needed renaming some variables.
Julia pases variables by reference, while R does so by value when assigning matrices, which tripped me because in the original R code there was something like: b = array(0.0,size); meanb = b;. This works fine in R, but in Julia changes to the b vector also changed meanb.
The definition of scalar vs array created some problems in Julia. For example y' * y (t(y) %*% y in R) is numerically equivalent to dot(y, y). However, the first version returns an array, while the second one a scalar. I got an error message when trying to store the ‘scalar like an array’ in to an array. I find that confusing.
One interesting point in this comparison is using rough code, not really optimized for speed; in fact, the only thing that I can say of the Julia code is that ‘it runs’ and it probably is not very idiomatic. Testing runs with different numbers of markers we get that R needs roughly 2.8x the time used by Julia. The Julia website claims better results in benchmarks, but in real life we work with, well, real problems.
In 1996-7 I switched from SAS to ASReml for genetic analyses because it was 1-2 orders of magnitude faster and opened a world of new models. Today a change from R to Julia would deliver (in this particular case) a much more modest speed up (~3x), which is OK but not worth changing languages (yet). Together with the embryonic graphical capabilities and the still-to-develop ecosystem of packages, means that I’m still using R. Nevertheless, the Julia team has achieved very impressive performance in very little time, so it is worth to keep an eye on their progress.
P.S.1 Readers are welcome to suggesting ways of improving the code.
P.S.2 WordPress does not let me upload the binary version of the simulated data.
P.S.3 Hey WordPress guys; it would be handy if the sourcecode plugin supported Julia!
To leave a comment for the author, please follow the link and comment on his blog: Quantum Forest » rblogs.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...
5 weeks ago by rahuldave
Getting Started with JAGS, rjags, and Bayesian Modelling
7 weeks ago by rahuldave
(This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers)
This post provides links to various resources on getting started with Bayesianmodelling using JAGS and R.It discusses: (1) what is JAGS;(2) why you might want to perform Bayesian modelling using JAGS;(3) how to install JAGS;(4) where to find further information on JAGS;(5) where to find examples of JAGS scripts in action;(6) where to ask questions; and(7) some interesting psychological applications of Bayesian modelling.
What is JAGS?JAGS stands for Just Another Gibbs Sampler.To quote the program author, Martyn Plummer, "It is a program for analysis ofBayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation..." It uses a dialect of the BUGS language, similar but a little different to OpenBUGS and WinBUGS.
Why JAGS?The question of why you might want to use JAGS can be approached in several different ways:
Why Bayesian rather than Null Hypothesis Significance Testing (NHST) approaches?
To quote John D. Cook quoting Anthony O'Hagan, the benefits of "the bayesianapproach are that it is 1. fundamentally sound, 2. very flexible, 3.produces clear and direct inferences, and 4. makes use of all availableinformation." (see John's blog post forelaboration)John K. Kruschke made a similar argument in an Open Letter extolling thebenefits of the bayesianapproach summarisedas: "(1) Scientific disciplines from astronomy to zoology are moving toBayesian data analysis. We should be leaders of the move, not followers. (2) Modern Bayesian methods provide richer information,with greater flexibility and broader applicability than 20th centurymethods. Bayesian methods are intellectually coherent and intuitive.Bayesian analyses are readily computed with modern software and hardware.(3) Null-hypothesis significance testing (NHST), with its reliance on pvalues, has many problems. There is little reason to persist with NHST nowthat Bayesian methods are accessible to everyone."Why JAGS/BUGS rather than coding in a low-level language?
It's simpler; for models that BUGS can handle, BUGS can shield you fromsome of the thorny details related to numeric integration.There are simple interfaces with R.Why JAGS rather than WinBUGS or OpenBUGS?
I'm using JAGS because it works well on Ubuntu. WinBUGS is broadly Windowsspecific, although I've read that it may work with the emulation software Wine.JAGS interfaces well with R. I'm comfortable writingscripts. Thus, I don't personally see the benefits of using a dedicatedGUI like WinBUGS. I can leverage what I know about R. However, ultimately converting code between different flavours of BUGSis not that difficult. For further discussion of the issue, see this r-helpdiscussion andthis discussion on CrossValidated.More than anything I found that JAGS provided a useful entry point into the world ofBayesian modelling. This in turn appealed to me for several reasons:
Even when I perform analyses using an NHST approach I often intuitively thinkof empirical research questions in terms of probability densities on aparameter of interest that changes as empirical and theoretical evidence isaccumulated. See for example Thompson's (2002) concept ofmeta-analytic thinking.Bayesian analysis provides tools for formalising this orientation. More broadly, I appreciate the explicitness that a Bayesian approachrequires and encourages. E.g., specifying the distribution of the error term,specifying a prior, specifying the distribution of parameters in a mixedeffects model, and so on.There are several modelling challenges that I'm currently working throughwhere a Bayesian approach offers substantial flexibility and applicability. In particular, I'm interested in modelling individual differences in theeffect of practice on strategy use and task performance and then relatingthese individual differences to factors like intelligence, prior experience,and personality.JAGS InstallationJAGS runs on Linux, Mac, and Windows.I run JAGS on Ubuntu through an interface with R called rjags.
The following sets out a basic installation process:
If necessary Download and install R andpotentially a user interface to R like R Studio(see here for tips on getting started withR).Download and install JAGS as peroperating system requriements. Install additional R packages: e.g., in R install.packages("rjags") .In particular, I use the packages rjags to interface with JAGS andcoda to process MCMC output.Information on JAGSThe manual for different versions of JAGS is locatedhere.e.g., the pdf of the manual for 3.1.0.Several particularly relevant sections include:the list of supported distributions and how they are parameterised. This isoften important given that the code looks similar to R but often uses different parameterisation (e.g., precision is used instead ofstandard deviation for a normal distribution).It summarises differences between WinBUGS and JAGS. It sets out available functions and operators.The rjags help pdffor information about how to interface with JAGS from R.Martin Plummer has a blog called JAGS NEWSThe Bayesian Task View on CRAN lists and brieflydescribes the many R packages related to Bayesian statistics. Lunn and colleagues have a 2009 article calledThe BUGS project: Evolution, critique and future directions.It provides a useful historical perspective on the broader BUGS project,although it does not mention much about JAGS specifically.Examples JAGS ScriptsI find it easier to pick up a new language by playing with examples.The following provides links to example JAGS code, often with accompanyingexplanations:
John Myles White A course on statistical models that is under development with JAGS scripts on githubA model of Cannabalt scores using a gammadistributionSimple introductory examples of fitting a normal distribution, linearregression, and logisticregressionA follow-up post demonstrating the use of the coda package with rjagsto perform MCMCdiagnostics.John K. KruschkeJohn Krushke wrote a book called Doing Bayesian Data Analysis: A Tutorial with Rand BUGS. It's an excellent entry point into the world of Bayesianstatistics for the social and behavioural scientist who hasreasonable quantiative training, but is not necessarily ready to absorbthe kinds of books that are used in graduate-level statistics courses.The book has awebsite thatprovides all the examples used in the book all the examples used in thebook. See this blog post for a link to the zip file containing the JAGScode.BUGS ProjectBUGS is well known for the large set of examples that accompany the project.The PDF providing documentation for Volume 1 and 2 of the examples is availablehere.You can see the JAGS code used to run these exampleshere.Patrick J Mineault
An example from Gelman et al examining the effect of training programs onSATscoresMiguel Lobo
A short tutorialSimon JackmanSimon Jackman wrote the book Bayesian Analysis for the Social Sciencesthat has accompanying JAGS code.The book's website has severaluseful resources including example papers using Bayesian methods.An associated coursethat uses the book as a text book has slides and many examples of usingand R and JAGS.Johannes Karreth
A course on applied bayesian modelling with examples of data, and code using the R2jags interface.Myself
I also plan to post a few examples in upcoming blog posts. I typicallywill share the code for these on my github account:jeromyanglim. If you are reading thisthrough syndication you may wish to subscribe to the RSS feed of thesource blog jeromyanglim.blogspot.com.More broadly, examples and tutorials designed for WinBUGS can generally beadapted to be useful for JAGS. So for example, you can explore these WinBUGSexamples:
Michael Lee and Eric-Jan Wagnemakers have a free online book called A Course in Bayesian Graphical Modelingfor Cognitive Science: see PDF andwebsite.The website for the book Markov Chain MonteCarlo has several WinBUGS examples.There is an extensive list of BUGSresources onthe BUGS project website.Asking questionsThere are several places to ask questions about JAGS, R, and Bayesianstatistics.
JAGS, BUGS, and bayesian questionson stats.stackexchange.com (aka CrossValidated). JAGS discussion forumThere's also a BUGS discussionlistIn general, I prefer the Stack Exchange model for asking and answering questionson the internet, although the most important issue is typically where theexperts are located.
Interesting Psychological Applications of Bayesian ModellingIf you want to see some examples of Bayesian modelling applied to psychologicaldata, I found the following articles quite interesting. PDFs are available online.
Shiffrin, Lee, Kim, and Wagenmakers (2008, PDF) present a tutorial on hierarchical bayesian methods in the context of cognitive science.Michael Lee (2011, PDF) in Journal of Mathematical Psychologydiscusses the benefits of hiearchical Bayesian methods to modellingpsychological data and provides several example applications.Lee Averell and Andrew Heathcote (2010,PDF) in Journal ofMathematical Psychology analyse individual differences in the forgetting curveusing a hierarchical Bayesian approach.If you know of any other interesting JAGS resources or have any comments about mychoice of software for Bayesian data analysis, feel free to post a comment.
To leave a comment for the author, please follow the link and comment on his blog: Jeromy Anglim's Blog: Psychology and Statistics.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...
R_bloggers
Bayesian
JAGS
R
from google
This post provides links to various resources on getting started with Bayesianmodelling using JAGS and R.It discusses: (1) what is JAGS;(2) why you might want to perform Bayesian modelling using JAGS;(3) how to install JAGS;(4) where to find further information on JAGS;(5) where to find examples of JAGS scripts in action;(6) where to ask questions; and(7) some interesting psychological applications of Bayesian modelling.
What is JAGS?JAGS stands for Just Another Gibbs Sampler.To quote the program author, Martyn Plummer, "It is a program for analysis ofBayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation..." It uses a dialect of the BUGS language, similar but a little different to OpenBUGS and WinBUGS.
Why JAGS?The question of why you might want to use JAGS can be approached in several different ways:
Why Bayesian rather than Null Hypothesis Significance Testing (NHST) approaches?
To quote John D. Cook quoting Anthony O'Hagan, the benefits of "the bayesianapproach are that it is 1. fundamentally sound, 2. very flexible, 3.produces clear and direct inferences, and 4. makes use of all availableinformation." (see John's blog post forelaboration)John K. Kruschke made a similar argument in an Open Letter extolling thebenefits of the bayesianapproach summarisedas: "(1) Scientific disciplines from astronomy to zoology are moving toBayesian data analysis. We should be leaders of the move, not followers. (2) Modern Bayesian methods provide richer information,with greater flexibility and broader applicability than 20th centurymethods. Bayesian methods are intellectually coherent and intuitive.Bayesian analyses are readily computed with modern software and hardware.(3) Null-hypothesis significance testing (NHST), with its reliance on pvalues, has many problems. There is little reason to persist with NHST nowthat Bayesian methods are accessible to everyone."Why JAGS/BUGS rather than coding in a low-level language?
It's simpler; for models that BUGS can handle, BUGS can shield you fromsome of the thorny details related to numeric integration.There are simple interfaces with R.Why JAGS rather than WinBUGS or OpenBUGS?
I'm using JAGS because it works well on Ubuntu. WinBUGS is broadly Windowsspecific, although I've read that it may work with the emulation software Wine.JAGS interfaces well with R. I'm comfortable writingscripts. Thus, I don't personally see the benefits of using a dedicatedGUI like WinBUGS. I can leverage what I know about R. However, ultimately converting code between different flavours of BUGSis not that difficult. For further discussion of the issue, see this r-helpdiscussion andthis discussion on CrossValidated.More than anything I found that JAGS provided a useful entry point into the world ofBayesian modelling. This in turn appealed to me for several reasons:
Even when I perform analyses using an NHST approach I often intuitively thinkof empirical research questions in terms of probability densities on aparameter of interest that changes as empirical and theoretical evidence isaccumulated. See for example Thompson's (2002) concept ofmeta-analytic thinking.Bayesian analysis provides tools for formalising this orientation. More broadly, I appreciate the explicitness that a Bayesian approachrequires and encourages. E.g., specifying the distribution of the error term,specifying a prior, specifying the distribution of parameters in a mixedeffects model, and so on.There are several modelling challenges that I'm currently working throughwhere a Bayesian approach offers substantial flexibility and applicability. In particular, I'm interested in modelling individual differences in theeffect of practice on strategy use and task performance and then relatingthese individual differences to factors like intelligence, prior experience,and personality.JAGS InstallationJAGS runs on Linux, Mac, and Windows.I run JAGS on Ubuntu through an interface with R called rjags.
The following sets out a basic installation process:
If necessary Download and install R andpotentially a user interface to R like R Studio(see here for tips on getting started withR).Download and install JAGS as peroperating system requriements. Install additional R packages: e.g., in R install.packages("rjags") .In particular, I use the packages rjags to interface with JAGS andcoda to process MCMC output.Information on JAGSThe manual for different versions of JAGS is locatedhere.e.g., the pdf of the manual for 3.1.0.Several particularly relevant sections include:the list of supported distributions and how they are parameterised. This isoften important given that the code looks similar to R but often uses different parameterisation (e.g., precision is used instead ofstandard deviation for a normal distribution).It summarises differences between WinBUGS and JAGS. It sets out available functions and operators.The rjags help pdffor information about how to interface with JAGS from R.Martin Plummer has a blog called JAGS NEWSThe Bayesian Task View on CRAN lists and brieflydescribes the many R packages related to Bayesian statistics. Lunn and colleagues have a 2009 article calledThe BUGS project: Evolution, critique and future directions.It provides a useful historical perspective on the broader BUGS project,although it does not mention much about JAGS specifically.Examples JAGS ScriptsI find it easier to pick up a new language by playing with examples.The following provides links to example JAGS code, often with accompanyingexplanations:
John Myles White A course on statistical models that is under development with JAGS scripts on githubA model of Cannabalt scores using a gammadistributionSimple introductory examples of fitting a normal distribution, linearregression, and logisticregressionA follow-up post demonstrating the use of the coda package with rjagsto perform MCMCdiagnostics.John K. KruschkeJohn Krushke wrote a book called Doing Bayesian Data Analysis: A Tutorial with Rand BUGS. It's an excellent entry point into the world of Bayesianstatistics for the social and behavioural scientist who hasreasonable quantiative training, but is not necessarily ready to absorbthe kinds of books that are used in graduate-level statistics courses.The book has awebsite thatprovides all the examples used in the book all the examples used in thebook. See this blog post for a link to the zip file containing the JAGScode.BUGS ProjectBUGS is well known for the large set of examples that accompany the project.The PDF providing documentation for Volume 1 and 2 of the examples is availablehere.You can see the JAGS code used to run these exampleshere.Patrick J Mineault
An example from Gelman et al examining the effect of training programs onSATscoresMiguel Lobo
A short tutorialSimon JackmanSimon Jackman wrote the book Bayesian Analysis for the Social Sciencesthat has accompanying JAGS code.The book's website has severaluseful resources including example papers using Bayesian methods.An associated coursethat uses the book as a text book has slides and many examples of usingand R and JAGS.Johannes Karreth
A course on applied bayesian modelling with examples of data, and code using the R2jags interface.Myself
I also plan to post a few examples in upcoming blog posts. I typicallywill share the code for these on my github account:jeromyanglim. If you are reading thisthrough syndication you may wish to subscribe to the RSS feed of thesource blog jeromyanglim.blogspot.com.More broadly, examples and tutorials designed for WinBUGS can generally beadapted to be useful for JAGS. So for example, you can explore these WinBUGSexamples:
Michael Lee and Eric-Jan Wagnemakers have a free online book called A Course in Bayesian Graphical Modelingfor Cognitive Science: see PDF andwebsite.The website for the book Markov Chain MonteCarlo has several WinBUGS examples.There is an extensive list of BUGSresources onthe BUGS project website.Asking questionsThere are several places to ask questions about JAGS, R, and Bayesianstatistics.
JAGS, BUGS, and bayesian questionson stats.stackexchange.com (aka CrossValidated). JAGS discussion forumThere's also a BUGS discussionlistIn general, I prefer the Stack Exchange model for asking and answering questionson the internet, although the most important issue is typically where theexperts are located.
Interesting Psychological Applications of Bayesian ModellingIf you want to see some examples of Bayesian modelling applied to psychologicaldata, I found the following articles quite interesting. PDFs are available online.
Shiffrin, Lee, Kim, and Wagenmakers (2008, PDF) present a tutorial on hierarchical bayesian methods in the context of cognitive science.Michael Lee (2011, PDF) in Journal of Mathematical Psychologydiscusses the benefits of hiearchical Bayesian methods to modellingpsychological data and provides several example applications.Lee Averell and Andrew Heathcote (2010,PDF) in Journal ofMathematical Psychology analyse individual differences in the forgetting curveusing a hierarchical Bayesian approach.If you know of any other interesting JAGS resources or have any comments about mychoice of software for Bayesian data analysis, feel free to post a comment.
To leave a comment for the author, please follow the link and comment on his blog: Jeromy Anglim's Blog: Psychology and Statistics.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...
7 weeks ago by rahuldave
Monkeying with Bayes’ theorem
11 weeks ago by rahuldave
In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious.
So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.
In other words, if we change Bayes’ theorem (!) the algorithm works better. He goes on to explain
Now should we dig up Bayes and notify him that he was wrong? No, I don’t think that’s it. …
I imagine most statisticians would respond that this cannot possibly be right. While it appears to work, there must be some underlying reason why and we should find that reason before using an algorithm based on an ad hoc tweak.
While such a reaction is understandable, it’s also a little hypocritical. Statisticians are constantly drawing inference from empirical data without understanding the underlying mechanisms that generate the data. When analyzing someone else’s data, a statistician will say that of course we’d rather understand the underlying mechanism than fit statistical models, that just not always possible. Reality is too complicated and we’ve got to do the best we can.
I agree, but that same reasoning applied at a higher level of abstraction could be used to accept Norvig’s translation algorithm. Here’s this model (derived from spurious math, but we’ll ignore that). Let’s see empirically how well it works.
Statistics
Bayesian
Probability_and_Statistics
from google
So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.
In other words, if we change Bayes’ theorem (!) the algorithm works better. He goes on to explain
Now should we dig up Bayes and notify him that he was wrong? No, I don’t think that’s it. …
I imagine most statisticians would respond that this cannot possibly be right. While it appears to work, there must be some underlying reason why and we should find that reason before using an algorithm based on an ad hoc tweak.
While such a reaction is understandable, it’s also a little hypocritical. Statisticians are constantly drawing inference from empirical data without understanding the underlying mechanisms that generate the data. When analyzing someone else’s data, a statistician will say that of course we’d rather understand the underlying mechanism than fit statistical models, that just not always possible. Reality is too complicated and we’ve got to do the best we can.
I agree, but that same reasoning applied at a higher level of abstraction could be used to accept Norvig’s translation algorithm. Here’s this model (derived from spurious math, but we’ll ignore that). Let’s see empirically how well it works.
11 weeks ago by rahuldave
The universal solvent of statistics
february 2012 by rahuldave
Andrew Gelman just posted an interesting article on the philosophy of Bayesian statistics. Here’s my favorite passage.
This reminds me of a standard question that Don Rubin … asks in virtually any situation: “What would you do if you had all the data?” For me, that “what would you do” question is one of the universal solvents of statistics.
Emphasis added.
I had not heard Don Rubin’s question before, but I think I’ll be asking it often. It reminds me of Alice’s famous dialog with the Cheshire Cat:
“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where–” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
Related post: Irrelevant uncertainty
Statistics
Uncategorized
Bayesian
Probability_and_Statistics
from google
This reminds me of a standard question that Don Rubin … asks in virtually any situation: “What would you do if you had all the data?” For me, that “what would you do” question is one of the universal solvents of statistics.
Emphasis added.
I had not heard Don Rubin’s question before, but I think I’ll be asking it often. It reminds me of Alice’s famous dialog with the Cheshire Cat:
“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where–” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
Related post: Irrelevant uncertainty
february 2012 by rahuldave
Teaching Bayesian stats backward
april 2011 by rahuldave
Most presentations of Bayesian statistics I’ve seen start with elementary examples of Bayes’ Theorem. And most of these use the canonical example of testing for rare diseases. But the connection between these examples and Bayesian statistics is not obvious at first. Maybe this isn’t the best approach.
What if we begin with the end in mind? Bayesian calculations produce posterior probability distributions on parameters. An effective way to teach Bayesian statistics might be to start there. Suppose we had probability distributions on our parameters. Never mind where they came from. Never mind classical objections that say you can’t do this. What if you could? If you had such distributions, what could you do with them?
For starters, point estimation and interval estimation become trivial. You could, for example, use the distribution mean as a point estimate and the area between two quantiles as an interval estimate. The distributions tell you far more than point estimates or interval estimates could; these estimates are simply summaries of the information contained in the distributions.
It makes logical sense to start with Bayes’ Theorem since that’s the tool used to construct posterior distributions. But I think it makes pedagogical sense to start with the posterior distribution and work backward to how one would come up with such a thing.
Bayesian statistics is so named because Bayes’ Theorem is essential to its calculations. But that’s a little like classical statistics Central Limitist statistics because it relies heavily on the Central Limit Theorem.
The key idea of Bayesian statistics is to represent all uncertainty by probability distributions. That idea can be obscured by an early emphasis on calculations.
Related posts:
Interview with David Spiegelhalter
Occam’s razor and Bayes’ theorem
Four reasons to use Bayesian inference
Statistics
Bayesian
Education
from google
What if we begin with the end in mind? Bayesian calculations produce posterior probability distributions on parameters. An effective way to teach Bayesian statistics might be to start there. Suppose we had probability distributions on our parameters. Never mind where they came from. Never mind classical objections that say you can’t do this. What if you could? If you had such distributions, what could you do with them?
For starters, point estimation and interval estimation become trivial. You could, for example, use the distribution mean as a point estimate and the area between two quantiles as an interval estimate. The distributions tell you far more than point estimates or interval estimates could; these estimates are simply summaries of the information contained in the distributions.
It makes logical sense to start with Bayes’ Theorem since that’s the tool used to construct posterior distributions. But I think it makes pedagogical sense to start with the posterior distribution and work backward to how one would come up with such a thing.
Bayesian statistics is so named because Bayes’ Theorem is essential to its calculations. But that’s a little like classical statistics Central Limitist statistics because it relies heavily on the Central Limit Theorem.
The key idea of Bayesian statistics is to represent all uncertainty by probability distributions. That idea can be obscured by an early emphasis on calculations.
Related posts:
Interview with David Spiegelhalter
Occam’s razor and Bayes’ theorem
Four reasons to use Bayesian inference
april 2011 by rahuldave
Significance testing and Congress
april 2011 by rahuldave
The US Supreme Court’s criticism of significance testing has been in the news lately. Here’s a criticism of significance testing involving the US Congress. Consider the following syllogism.
If a person is an American, he is not a member of Congress.
This person is a member of Congress.
Therefore he is not American.
The initial premise is false, but the reasoning is correct if we assume the initial premise is true.
The premise that Americans are never members of Congress is clearly false. But it’s almost true! The probability of an American being a member of Congress is quite small, about 535/309,000,000. So what happens if we try to salvage the syllogism above by inserting “probably” in the initial premise and conclusion?
If a person is an American, he is probably not a member of Congress.
This person is a member of Congress.
Therefore he is probably not American.
What went wrong? The probability is backward. We want to know the probability that someone is American given he is a member of Congress, not the probability he is a member of Congress given he is American.
Science continually uses flawed reasoning analogous to the example above. We start with a “null hypothesis,” a hypothesis we seek to disprove. If our data are highly unlikely assuming this hypothesis, we reject that hypothesis.
If the null hypothesis is correct, then these data are highly unlikely.
These data have occurred.
Therefore, the null hypothesis is highly unlikely.
Again the probability is backward. We want to know the probability of the hypothesis given the data, not the probability of the data given the hypothesis.
We can’t reject a null hypothesis just because we’ve seen data that are rare under this hypothesis. Maybe our data are even more rare under the alternative. It is rare for an American to be in Congress, but it is even more rare for someone who is not American to be in the US Congress!
I found this illustration in The Earth is Round (p < 0.05) by Jacob Cohen (1994). Cohen in turn credits Pollard and Richardson (1987) in his references.
Related posts:
How insignificant is significance testing?
Five criticisms of significance testing
Most published research results are false
Classical statistics in a nutshell
Statistics
Bayesian
Probability_and_Statistics
Science
from google
If a person is an American, he is not a member of Congress.
This person is a member of Congress.
Therefore he is not American.
The initial premise is false, but the reasoning is correct if we assume the initial premise is true.
The premise that Americans are never members of Congress is clearly false. But it’s almost true! The probability of an American being a member of Congress is quite small, about 535/309,000,000. So what happens if we try to salvage the syllogism above by inserting “probably” in the initial premise and conclusion?
If a person is an American, he is probably not a member of Congress.
This person is a member of Congress.
Therefore he is probably not American.
What went wrong? The probability is backward. We want to know the probability that someone is American given he is a member of Congress, not the probability he is a member of Congress given he is American.
Science continually uses flawed reasoning analogous to the example above. We start with a “null hypothesis,” a hypothesis we seek to disprove. If our data are highly unlikely assuming this hypothesis, we reject that hypothesis.
If the null hypothesis is correct, then these data are highly unlikely.
These data have occurred.
Therefore, the null hypothesis is highly unlikely.
Again the probability is backward. We want to know the probability of the hypothesis given the data, not the probability of the data given the hypothesis.
We can’t reject a null hypothesis just because we’ve seen data that are rare under this hypothesis. Maybe our data are even more rare under the alternative. It is rare for an American to be in Congress, but it is even more rare for someone who is not American to be in the US Congress!
I found this illustration in The Earth is Round (p < 0.05) by Jacob Cohen (1994). Cohen in turn credits Pollard and Richardson (1987) in his references.
Related posts:
How insignificant is significance testing?
Five criticisms of significance testing
Most published research results are false
Classical statistics in a nutshell
april 2011 by rahuldave
Estimating the chances of something that hasn’t happened yet
march 2010 by rahuldave
Suppose you’re proofreading a book. If you’ve read 20 pages and found 7 typos, you might reasonably estimate that the chances of a page having a typo are 7/20. But what if you’ve read 20 pages and found no typos. Are you willing to conclude that the chances of a page having a typo are 0/20, i.e. the book has absolutely no typos?
To take another example, suppose you are testing children for perfect pitch. You’ve tested 100 children so far and haven’t found any with perfect pitch. Do you conclude that children don’t have perfect pitch? You know that some do because you’ve heard of instances before. But your data suggest perfect pitch in children is at least rare. But how rare?
The rule of three gives a quick and dirty way to estimate these kinds of probabilities. It says that if you’ve tested N cases and haven’t found what you’re looking for, a reasonable estimate is that the probability is less than 3/N. So in our proofreading example, if you haven’t found any typos in 20 pages, you could estimate that the probability of a page having a typo is less than 15%. In the perfect pitch example, you could conclude that fewer than 3% of children have perfect pitch.
Note that the rule of three says that your probability estimate goes down in proportion to the number of cases you’ve studied. If you’d read 200 pages without finding a typo, your estimate would drop from 15% to 1.5%. But it doesn’t suddenly drop to zero. I imagine most people would harbor a suspicion that that there may be typos even though they haven’t seen any in the first few pages. But at some point they might say “I’ve read so many pages without finding any errors, there must not be any.” The situation is a little different with the perfect pitch example, however, because you may know before you start that the probability cannot be zero.
If the sight of math makes you squeamish, you might want to stop reading now. Just remember that if you haven’t seen something happen in N observations, a good estimate is that the chances of it happening are less than 3/N.
What makes the rule of three work? Suppose the probability of what you’re looking for is p. If we want a 95% confidence interval, we want to find the largest p so that the probability of no successes out of n trials is 0.05, i.e. we want to solve (1-p)n = 0.05 for p. Taking logs of both sides, n log(1-p) = log(0.05) ≈ -3. Since log(1-p) is approximately -p for small values of p, we have p ≈ 3/n.
The derivation above gives the frequentist perspective. I’ll now give the Bayesian derivation of the same result. Then you can say “p is probably less than 3/N” in clear conscience since Bayesians are allowed to make such statements.
Suppose you start with a uniform prior on p. The posterior distribution on p after having seen 0 successes and N failures has a beta(1, N+1) distribution. If you calculate the posterior probability of p being less than 3/N you get an expression that approaches 1 – exp(-3) as N gets large, and 1 – exp(-3) ≈ 0.95.
Related posts:
What is a confidence interval?
Probability mistake can give a good approximation
Four reasons to use Bayesian inference
Statistics
Bayesian
Probability_and_Statistics
from google
To take another example, suppose you are testing children for perfect pitch. You’ve tested 100 children so far and haven’t found any with perfect pitch. Do you conclude that children don’t have perfect pitch? You know that some do because you’ve heard of instances before. But your data suggest perfect pitch in children is at least rare. But how rare?
The rule of three gives a quick and dirty way to estimate these kinds of probabilities. It says that if you’ve tested N cases and haven’t found what you’re looking for, a reasonable estimate is that the probability is less than 3/N. So in our proofreading example, if you haven’t found any typos in 20 pages, you could estimate that the probability of a page having a typo is less than 15%. In the perfect pitch example, you could conclude that fewer than 3% of children have perfect pitch.
Note that the rule of three says that your probability estimate goes down in proportion to the number of cases you’ve studied. If you’d read 200 pages without finding a typo, your estimate would drop from 15% to 1.5%. But it doesn’t suddenly drop to zero. I imagine most people would harbor a suspicion that that there may be typos even though they haven’t seen any in the first few pages. But at some point they might say “I’ve read so many pages without finding any errors, there must not be any.” The situation is a little different with the perfect pitch example, however, because you may know before you start that the probability cannot be zero.
If the sight of math makes you squeamish, you might want to stop reading now. Just remember that if you haven’t seen something happen in N observations, a good estimate is that the chances of it happening are less than 3/N.
What makes the rule of three work? Suppose the probability of what you’re looking for is p. If we want a 95% confidence interval, we want to find the largest p so that the probability of no successes out of n trials is 0.05, i.e. we want to solve (1-p)n = 0.05 for p. Taking logs of both sides, n log(1-p) = log(0.05) ≈ -3. Since log(1-p) is approximately -p for small values of p, we have p ≈ 3/n.
The derivation above gives the frequentist perspective. I’ll now give the Bayesian derivation of the same result. Then you can say “p is probably less than 3/N” in clear conscience since Bayesians are allowed to make such statements.
Suppose you start with a uniform prior on p. The posterior distribution on p after having seen 0 successes and N failures has a beta(1, N+1) distribution. If you calculate the posterior probability of p being less than 3/N you get an expression that approaches 1 – exp(-3) as N gets large, and 1 – exp(-3) ≈ 0.95.
Related posts:
What is a confidence interval?
Probability mistake can give a good approximation
Four reasons to use Bayesian inference
march 2010 by rahuldave
Copy this bookmark: