
Case Studies
open-source methods and models
The case studies on this page are intended to reflect best practices in Bayesian methodology and Stan programming.
Contributing Case Studies
To contribute a case study, please contact us through the users group. We require
-
a documented, reproducible example with narrative documentation (preferably coded using knitr or Jupyter) and
-
an open-source license (preferably BSD 3 clause); authors retain all rights, including copyright.
Stan Case Studies, Volume 3 (2016)
A Primer on Bayesian Multilevel Modeling using PyStan
This case study replicates the analysis of home radon levels using hierarchical models of Lin, Gelman, Price, and Kurtz (1999). It illustrates how to generalize linear regressions to hierarchical models with group-level predictors and how to compare predictive inferences and evaluate model fits. Along the way it shows how to get data into Stan using pandas, how to sample using PyStan, and how to visualize the results using Seaborn.
View (HTML)
- Author
- Chris Fonnesbeck
- Keywords
- hierararchical/multilevel modeling, linear regression, model comparison, predictive inference, radon
- Source Repository
- fonnesbeck/stan_workshop_2016 (GitHub)
- Python Package Dependencies
- pystan, numpy, pandas, matplotlib, seaborn
- License
- Apache 2.0 (code), CC-BY 3 (text)
Reparameterization: MLE vs. Bayes
When changing variables, a Jacobian adjustment needs to be provided to account for the rate of change of the transform. Applying the adjustment preserves the probability distributions of quantities of interest, thus making Bayesian inference invariant to reparameterizations. In contrast, the maximum likelihood estimate (posterior mode) is changed when the distribution-preserving Jacobian adjustment is included for a parameter. In this note, we use Stan to code a repeated binary trial model parameterized by chance of success, along with its reparameterization in terms of log odds in order to demonstrate the effect of the Jacobian adjustment on the Bayesian posterior and maximum likelihood estimate. Along the way, we derive the logistic distribution by transforming a uniformly distributed variable.
View (HTML)
- Author
- Bob Carpenter
- Keywords
- MLE, Bayesian posterior, reparameterization, Jacobian, binomial
- Source Repository
- example-models/knitr/mle-params (GitHub)
- R Package Dependencies
- rstan
- License
- BSD (3 clause), CC-BY
Hierarchical Two-Parameter Logistic Item Response Model
This case study documents a Stan model for the two-parameter logistic model (2PL) with hierarchical priors. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a grade 12 science assessment is provided.
View (HTML)
- Author
- Daniel C. Furr
- Keywords
- education, item response theory, two-parameter logistic model, hierarchical priors
- Source Repository
- example-models/education/hierarchical_2pl (GitHub)
- R Package Dependencies
- rstan, ggplot2, mirt
- License
- BSD (3 clause), CC-BY
Generalized Rating Scale Model with Latent Regression
This case study documents a Stan model for the generalized rating scale model (GRSM) with latent regression. The latent regression portion of the model may be restricted to an intercept only, yielding a standard GRSM. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a survey of public perceptions of science and technology is provided.
View (HTML)
- Authors
- Daniel C. Furr
- Keywords
- education, item response theory, generalized rating scale model
- Source Repository
- example-models/education/grsm_latent_reg (GitHub)
- R Package Dependencies
- rstan, ggplot2, ltm
- License
- BSD (3 clause), CC-BY
Generalized Partial Credit Model with Latent Regression
This case study documents a Stan model for the generalized partial credit model (GPCM) with latent regression. The latent regression portion of the model may be restricted to an intercept only, yielding a standard GPCM. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using the TIMSS 2011 mathematics assessment is provided
View (HTML)
- Authors
- Daniel C. Furr
- Keywords
- education, item response theory, generalized partial credit model
- Source Repository
- example-models/education/gpcm_latent_reg (GitHub)
- R Package Dependencies
- rstan, ggplot2, TAM
- License
- BSD (3 clause), CC-BY
Rating Scale Model with Latent Regression
This case study documents a Stan model for the rating scale model (RSM) with latent regression. The latent regression portion of the model may be restricted to an intercept only, yielding a standard RSM. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a survey of public perceptions of science and technology is provided.
View (HTML)
- Authors
- Daniel C. Furr
- Keywords
- education, item response theory, rating scale model
- Source Repository
- example-models/education/rsm_latent_reg (GitHub)
- R Package Dependencies
- rstan, ggplot2, ltm
- License
- BSD (3 clause), CC-BY
Partial Credit Model with Latent Regression
This case study documents a Stan model for the partial credit model (PCM) with latent regression. The latent regression portion of the model may be restricted to an intercept only, yielding a standard PCM. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using the TIMSS 2011 mathematics assessment is provided.
View (HTML)
- Authors
- Daniel C. Furr
- Keywords
- education, item response theory, partial credit model
- Source Repository
- example-models/education/pcm_latent_reg (GitHub)
- R Package Dependencies
- rstan, ggplot2, TAM
- License
- BSD (3 clause), CC-BY
Rasch Model with Latent Regression
This case study documents a Stan model for the Rasch model with latent regression. The latent regression portion of the model may be restricted to an intercept only, yielding a standard Rasch model. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a grade 12 science assessment is provided.
View (HTML)
- Authors
- Daniel C. Furr
- Keywords
- education, item response theory, rasch model
- Source Repository
- example-models/education/rasch_latent_reg (GitHub)
- R Package Dependencies
- rstan, ggplot2, TAM
- License
- BSD (3 clause), CC-BY
Two-Parameter Logistic Item Response Model
This tutorial introduces the R package edstan for estimating two-parameter logistic item response models using Stan without knowing the Stan language. Subsequently, the tutorial explains how the model can be expressed in the Stan language and fit using the rstan package. Specification of prior distributions and assessment of convergence are discussed. Using the Stan language directly has the advantage that it becomes quite easy to extend the model, and this is demonstrated by adding a latent regression and differential item functioning to the model. Posterior predictive model checking is also demonstrated.
View (HTML)
- Author
- Daniel C. Furr, Seung Yeon Lee, Joon-Ho Lee, and Sophia Rabe-Hesketh
- Keywords
- education, item response theory, two-parameter logistic model
- Source Repository
- example-models/education/tutorial_twopl (GitHub)
- R Package Dependencies
- rstan, reshape2, ggplot2, gridExtra, devtools, edstan
- License
- BSD (3 clause), CC-BY
Two-Parameter Logistic Model with Latent Regression
This case study documents a Stan model for the two-parameter logistic model (2PL) with latent regression. The latent regression portion of the model may be restricted to an intercept only, yielding a standard 2PL. A brief simulation indicates that the Stan model successfully recovers the generating parameters. An example using a grade 12 science assessment is provided.
View (HTML)
- Authors
- Daniel C. Furr
- Keywords
- education, item response theory, two-parameter logistic model
- Source Repository
- example-models/education/2pl_latent_reg (GitHub)
- R Package Dependencies
- rstan, ggplot2, TAM
- License
- BSD (3 clause), CC-BY
Pooling with Hierarchical Models for Repeated Binary Trials
This note illustrates the effects on posterior inference of pooling data (aka sharing strength) across items for repeated binary trial data. It provides Stan models and R code to fit and check predictive models for three situations: (a) complete pooling, which assumes each item is the same, (b) no pooling, which assumes the items are unrelated, and (c) partial pooling, where the similarity among the items is estimated. We consider two hierarchical models to estimate the partial pooling, one with a beta prior on chance of success and another with a normal prior on the log odds of success. The note explains with working examples how to (i) fit models in RStan and plot the results in R using ggplot2, (ii) estimate event probabilities, (iii) evaluate posterior predictive densities to evaluate model predictions on held-out data, (iv) rank items by chance of success, (v) perform multiple comparisons in several settings, (vi) replicate new data for posterior p-values, and (vii) perform graphical posterior predictive checks.
View (HTML)
- Author
- Bob Carpenter
- Keywords
- binary trials, pooling, hierarchical models, baseball, epidemiology, prediction, posterior predictive checks
- Source Repository
- example-models/knitr/pool-binary-trials (GitHub)
- R Package Dependencies
- rstan, ggplot2, rmarkdown
- License
- BSD (3 clause), CC-BY
RStanARM version
There is also a version of this case study in which all models are fit using the RStanARM interface. Many of the visualizations are also created using RStanARM’s plotting functions.
View RStanARM version (HTML)
- Author
- Bob Carpenter, Jonah Gabry, Ben Goodrich
Stan Case Studies, Volume 2 (2015)
Multiple Species-Site Occupancy Model
This case study replicates the analysis and output graphs of Dorazio et al. (2006) noisy-measurement occupancy model for multiple species abundance of butterflies. Going beyond the paper, the supercommunity assumptions are tested to show they are invariant to sizing, and posterior predictive checks are provided.
View (HTML)
- Author
- Bob Carpenter
- Keywords
- ecology, occupancy, species abundance, supercommunity, posterior predictive check
- Source Repository
- example-models/knitr/dorazio-royle-occupancy (GitHub)
- License
- BSD (3 clause), CC-BY
- R Package Dependencies
- rstan, ggplot2, rmarkdown
Stan Case Studies, Volume 1 (2014)
Soil Carbon Modeling with RStan
This case study provides ordinary differential equation-based compartment models of soil carbon flux, with experimental data fitted with unknown initial compartment balance and noisy CO2 measurements. Results form Sierra and Müller’s (2014) soilR package are replicated.
View (HTML)
- Author
- Bob Carpenter
- Keywords
- biogeochemistry, compartment ODE, soil carbon respiration, incubation experiment
- Source Repository
- soil-metamodel/stan/soil-knit (GitHub)
- License
- BSD (3 clause), CC-BY
- R Package Dependencies
- rstan, ggplot2, rmarkdown