1 Model

1.1 Overview

The rating scale model (Andrich 1978) is appropriate for item response data that involves Likert scale responses. The version presented includes a latent regression. However, the latent regression part of the model may be restricted to an intercept only, resulting in the standard partial credit model.

\[ \Pr(Y_{ij} = y,~y > 0 | \theta_j, \beta_i, \kappa_s) = \frac{\exp \sum_{s=1}^y (\theta_j - \beta_i - \kappa_s)} {1 + \sum_{k=1}^{m} \exp \sum_{s=1}^k (\theta_j - \beta_i - \kappa_s)} \] \[ \Pr(Y_{ij} = y,~y = 0 | \theta_j, \beta_i, \kappa_s) = \frac{1} {1 + \sum_{k=1}^{m} \exp \sum_{s=1}^k (\theta_j - \beta_i - \kappa_s)} \] \[ \theta_j \sim \mathrm{N}(w_{j}' \lambda, \sigma^2) \]

Variables:

$i = 1 \ldots I$ indexes items.
$j = 1 \ldots J$ indexes persons.
$y_{ij} \in \{ 0 \ldots m \}$ is the response of person $j$ to item $i$
$m$ is simulataneously the maximum score and number of step difficulty parameters per item.
$w_{j}$ is the vector of covariates for person $j$, which will in general include an intercept term.

Parameters:

$\beta_i$ is the item-specific difficulty for item $i$.
$\kappa_s$ is the $s$-th step difficulty, constant across items.
$\theta_j$ is the ability for person $j$.
$\lambda$ is the vector of latent regression parameters.
$\sigma^2$ is the variance for the ability distribution.

Constraints:

The last item-specific difficulty is constrained to be the negative sum of the other item-specific difficulties, $\beta_I = -\sum_{i}^{(I-1)} \beta_i$, resulting in the average item difficulty being zero.
The last step difficulty is likewise constrained to be the negative sum of the other step difficulties, $\kappa_m = -\sum_{s}^{(m-1)} \kappa_s$, resulting in the average step difficulty being zero.

Priors:

$\beta_1 \ldots \beta_{I-1} \sim \mathrm{N}(0, 5)$ is weakly informative, and no prior is needed for the contrained difficulty $\beta_I$.
$\kappa_1 \ldots \kappa_{m-1} \sim \mathrm{N}(0, 5)$ is weakly informative, and no prior is needed for the contrained difficulty $\kappa_m$.
$\sigma \sim \mathrm{Exp}(.1)$ is weakly informative.
A uniform prior is chosen for $\lambda$ because the scale of the person covariates will vary depending on the application.

1.2 Stan program

A few aspects of the Stan program for the partial credit model bear mentioning. First, the program begins with the creation of a user-specified function rsm_probs(), which accepts a value for theta and beta as well as a vector for kappa. With these inputs, it returns a vector of model-predicted probabilities for each possible response. Later, in the model block, rsm_probs() is used to get the likelihood of the observed item responses.

Second, the encoding of item responses are modified such that the lowest response category is one instead of zero. This modification takes place in the transformed data block, in which a new variable r is created for this purpose. The adjustment is necessary for compatibility with the categorical() function.

Lastly, a constraint is placed on both beta and kappa for model identification. For beta, this is accomplished by creating beta_free, which is the vector of unconstrained item parameters, in the parameters block. Then in the transformed parameters block, beta is created to be identical to beta_free except for one additional element that is the constrained item difficulty. As a result of this constraint, the mean of beta will be zero. In a parallel way, kappa includes one constrained item parameter not found in kappa_free.

functions {
  vector rsm_probs(real theta, real beta, vector kappa) {
    vector[rows(kappa) + 1] unsummed;
    vector[rows(kappa) + 1] probs;
    unsummed <- append_row(rep_vector(0, 1), theta - beta - kappa);
    probs <- softmax(cumulative_sum(unsummed));
    return probs;
  }
}
data {
  int<lower=1> I;                // # items
  int<lower=1> J;                // # persons
  int<lower=1> N;                // # responses
  int<lower=1,upper=I> ii[N];    // i for n
  int<lower=1,upper=J> jj[N];    // j for n
  int<lower=0> y[N];             // response for n; y in {0 ... m_i}
  int<lower=1> K;                // # person covariates
  matrix[J,K] W;                 // person covariate matrix
}
transformed data {
  int r[N];                      // modified response; r in {1 ... m_i + 1}
  int m;                         // # steps
  m <- max(y);
  for(n in 1:N)
    r[n] <- y[n] + 1;
}
parameters {
  vector[I-1] beta_free;         // unconstrained item parameters
  vector[m-1] kappa_free;        // unconstrained step parameters
  vector[J] theta;
  real<lower=0> sigma;
  vector[K] lambda;
}
transformed parameters {
  vector[I] beta;                // all item parameters
  vector[m] kappa;               // all step parameters
  beta <- append_row(beta_free, rep_vector(-1*sum(beta_free), 1));
  kappa <- append_row(kappa_free, rep_vector(-1*sum(kappa_free), 1));
}
model {
  theta ~ normal(W*lambda, sigma);
  sigma ~ exponential(.1);
  beta_free ~ normal(0, 5);
  kappa_free ~ normal(0, 5);
  for (n in 1:N)
    r[n] ~ categorical(rsm_probs(theta[jj[n]], beta[ii[n]], kappa));
}

2 Simulation

First, the necessary R packages are loaded.

# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)

The R code that follows simulates a dataset conforming to the model. The Stan model will be evaluated in terms of its ability to recover the generating values of the parameters when fit to this dataset.

# Person covariates and abilities
J <- 500
sigma <- 1
lambda <- c(0.5, 0.5, 0.5)
W <- cbind(1, rnorm(J, 0, 1), rnorm(J, 0, 1))
theta <- W %*% matrix(lambda) + rnorm(J, 0, sigma)

# Item parameters
I <- 20
S <- 5
beta_uncentered <- seq(from = -1, to = 1, length.out = I)
beta <- beta_uncentered - mean(beta_uncentered)
kappa <- seq(from = -1, to = 1, length.out = S - 1)

# Start of Stan data list
sim_data <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J, 
    each = I), K <- ncol(W), W <- W)

# Function to simulate responses
simulate_response <- function(theta, beta, kappa) {
    unsummed <- c(0, theta - beta - kappa)
    numerators <- exp(cumsum(unsummed))
    denominator <- sum(numerators)
    response_probs <- numerators/denominator
    simulated_y <- sample(1:length(response_probs) - 1, size = 1, prob = response_probs)
    return(simulated_y)
}

# Add simulated responses to Stan data list
sim_data$y <- numeric(sim_data$N)
for (n in 1:sim_data$N) {
    sim_data$y[n] <- simulate_response(theta[sim_data$jj[n]], beta[sim_data$ii[n]], 
        kappa)
}

The simulated data consists of 20 items, each with 5 response categories, and 500 persons. The simulated dataset is next fit with Stan.

# Fit model to simulated data
sim_fit <- stan(file = "rsm_latent_reg.stan", data = sim_data, chains = 4, iter = 300)

Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the $\hat{R}$ statistic for the model parameters and log posterior. These are provided in the following figure. All values for $\hat{R}$ should be less than 1.1.

# Plot of convergence statistics
sim_summary <- as.data.frame(summary(sim_fit)[[1]])
sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

$Convergence statistics ($\hat{R}$) by parameter for the simulation. All values should be less than 1.1 to infer convergence.$

Convergence statistics ($\hat{R}$) by parameter for the simulation. All values should be less than 1.1 to infer convergence.

The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.

# Make vector of wanted parameter names
wanted_pars <- c(paste0("beta[", 1:length(beta), "]"), paste0("kappa[", 1:length(kappa), 
    "]"), paste0("lambda[", 1:ncol(W), "]"), "sigma")

# Get estimated and generating values for wanted parameters
generating_values = c(beta, kappa, lambda, sigma)
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]

# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values
sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values

# Plot the discrepancy
ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) + 
    scale_x_discrete() + labs(y = "Discrepancy", x = NULL) + geom_abline(intercept = 0, 
    slope = 0, color = "white") + geom_linerange() + geom_point(size = 2) + 
    theme(panel.grid = element_blank()) + coord_flip()

Discrepancies between estimated and generating parameters. Points indicate the difference between the posterior means and generating values for a parameter, and horizontal lines indicate 95% posterior intervals for the difference. Most of the discrepancies are about zero, indicating that Stan successfully recovers the true parameters.

3 Example application

The example data are from the Consumer Protection and Perceptions of Science and Technology section of the 1992 Euro-Barometer Survey (Karlheinzand and Melich 1992). Because these data do not include person covariates, the latent regression aspect of the model will include an intercept only.

# Attach the example dataset. The ltm package is required.
data(Science, package = "ltm")

# Convert dataset to an integer matrix with values 0 ... 3
M <- matrix(NA, ncol = ncol(Science), nrow = nrow(Science))
for (i in 1:ncol(M)) M[, i] <- as.integer(Science[, i]) - 1

The dataset contains 7 items and 392 persons with no missing responses. The items pertain to attitudes towards science and technology, and responses are scored on a 4-point Likert scale. For example, the text of the first item reads, “Science and technology are making our lives healthier, easier and more comfortable.” The response options are strongly disagree, disagree, agree, and strongly agree.

Before fitting the model, the response frequencies for each item are considered.

# Frequencies for each item
freqs <- t(apply(M, 2, table))
rownames(freqs) <- names(Science)
freqs

##              0   1   2   3
## Comfort      5  32 266  89
## Environment 29  90 145 128
## Work        33  98 206  55
## Future      14  72 210  96
## Technology  18  91 157 126
## Industry    10  47 173 162
## Benefit     21 100 193  78

The data are now formatted into a list and fit with Stan.

# Assemble data list for Stan
ex_list <- list(I = ncol(M), J = nrow(M), N = length(M), ii = rep(1:ncol(M), 
    each = nrow(M)), jj = rep(1:nrow(M), times = ncol(M)), y = as.vector(M), 
    K = 1, W = matrix(1, nrow = nrow(M), ncol = 1))

# Run Stan model
ex_fit <- stan(file = "rsm_latent_reg.stan", data = ex_list, chains = 4, iter = 300)

As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior density, using $\hat{R}$.

# Plot of convergence statistics
ex_summary <- as.data.frame(summary(ex_fit)[[1]])
ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

$Convergence statistics ($\hat{R}$) by parameter for the example. All values should be less than 1.1 to infer convergence.$

Convergence statistics ($\hat{R}$) by parameter for the example. All values should be less than 1.1 to infer convergence.

Next we view a summary of the parameter posteriors.

# View table of parameter posteriors
print(ex_fit, pars = c("beta", "kappa", "lambda", "sigma"))

## Inference for Stan model: rsm_latent_reg.
## 4 chains, each with iter=300; warmup=150; thin=1; 
## post-warmup draws per chain=150, total post-warmup draws=600.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## beta[1]   -0.25    0.00 0.06 -0.37 -0.29 -0.25 -0.21 -0.14   600 1.00
## beta[2]    0.07    0.00 0.06 -0.06  0.03  0.07  0.11  0.20   600 1.00
## beta[3]    0.46    0.00 0.06  0.35  0.41  0.46  0.50  0.58   550 1.00
## beta[4]   -0.01    0.00 0.06 -0.11 -0.05  0.00  0.03  0.10   600 1.00
## beta[5]   -0.02    0.00 0.06 -0.15 -0.06 -0.02  0.02  0.09   600 1.00
## beta[6]   -0.52    0.00 0.07 -0.66 -0.56 -0.52 -0.47 -0.38   600 1.00
## beta[7]    0.27    0.00 0.07  0.13  0.22  0.26  0.31  0.40   451 1.00
## kappa[1]  -1.14    0.01 0.08 -1.30 -1.19 -1.14 -1.08 -1.00   174 1.01
## kappa[2]  -0.36    0.00 0.06 -0.47 -0.41 -0.36 -0.32 -0.24   295 1.01
## kappa[3]   1.50    0.00 0.06  1.40  1.46  1.50  1.54  1.61   190 1.00
## lambda[1]  0.73    0.00 0.04  0.65  0.70  0.73  0.76  0.82   186 1.02
## sigma      0.55    0.00 0.04  0.46  0.52  0.54  0.58  0.63   114 1.02
## 
## Samples were drawn using NUTS(diag_e) at Tue Apr 19 16:46:25 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

4 References

Andrich, David. 1978. “A Rating Formulation for Ordered Response Categories.” Psychometrika 43 (4). Springer: 561–73.

Karlheinzand, R, and A Melich. 1992. Euro-Barometer 38.1: Consumer Protection and Perceptions of Science and Technology. Brussels: INRA (Europe).

Rating scale model with latent regression

Daniel C. Furr

April 19, 2016