1 Model

1.1 Overview

The generalized partial credit model (Muraki 1992) is appropriate for item response data that features more than two ordered response categories. The items may have differing numbers of response categories. For dichotomous items (items with exactly two response categories), the generalized partical credit model is equivalent to the two-parameter logistic model. The version presented includes a latent regression. However, the latent regression may be restricted to a model intercept, resulting in the standard generalized partial credit model. Also, we use the item-intercept parameterization of the model and separate the latent regression from the product of the ability and discrimination parameters for better performance.

\[ \mathrm{logit} [ \Pr(Y_{ij} = y,~y > 0 | \theta_j, \alpha_i, \beta_i) ] = \frac{\exp \sum_{s=1}^y \alpha_i (\theta_j + w_{j}' \lambda - \beta_{is})} {1 + \sum_{k=1}^{m_i} \exp \sum_{s=1}^k \alpha_i (\theta_j + w_{j}' \lambda - \beta_{is})} \] \[ \mathrm{logit} [ \Pr(Y_{ij} = y,~y = 0 | \theta_j, \alpha_i, \beta_i) ] = \frac{1} {1 + \sum_{k=1}^{m_i} \exp \sum_{s=1}^k \alpha_i (\theta_j + w_{j}' \lambda - \beta_{is})} \] \[ \theta_j \sim \mathrm{N}(0, 1) \]

Variables:

  • \(i = 1 \ldots I\) indexes items.
  • \(j = 1 \ldots J\) indexes persons.
  • \(y_{ij} \in \{ 0 \ldots m_i \}\) is the response of person \(j\) to item \(i\)
  • \(m_i\) is simulataneously the maximum score and number of step difficulty parameters for item \(i\).
  • \(w_{j}\) is the vector of covariates for person \(j\), which will in general include an intercept term.

Parameters:

  • \(\alpha_i\) is the discrimination for item \(i\).
  • \(\beta_{is}\) is the \(s\)-th step difficulty for item \(i\).
  • \(\theta_j\) is the ability for person \(j\).
  • \(\lambda\) is the vector of latent regression parameters.

Constraints:

  • The last difficulty parameter is constrained to be the negative sum of the other difficulties, resulting in the average difficulty parameter being zero.

Priors:

  • \(\alpha_i \sim \mathrm{log~N}(1, 1)\) is weakly informative, except that positive discrimination parameters are assumed.
  • \(\beta_1 \ldots \beta_{I-1} \sim \mathrm{N}(0, 5)\) is weakly informative, and no prior is needed for the contrained difficulty \(\beta_I\).
  • A uniform prior is chosen for \(\lambda\) because the scale of the person covariates will vary depending on the application.

1.2 Stan program

A few aspects of the Stan program for the generalized partial credit model bear mentioning. First, the prediction for person ability is calculated and temporarily stored in mu in the model block. This is done for efficiency and readability of the code.

Second, the program begins with the creation of a user-specified function gpcm_probs(), which accepts values for theta, mu, and alpha and a vector of parameters beta for one item. With these inputs, it returns a vector of model-predicted probabilities for each possible response. Later, in the model block, gpcm_probs() is used to get the likelihood of the observed item responses.

Third, the encoding of item responses are modified such that the lowest response category is one instead of zero. This modification takes place in the transformed data block, in which a new variable r is created for this purpose. The adjustment is necessary for compatibility with the categorical() function.

Fourth, the variables m and pos are also created in the transformed data block. These are needed to pick out the vector of item parameters for a single item from the vector of all item parameters beta. pos indicates the position of the first parameter for a given item, while m indicates the count of parameters for an item.

Lastly, a constraint is placed on the item difficulties for model identification. This is accomplished by creating beta_free, which is the vector of unconstrained item parameters, in the parameters block. Then in the transformed parameters block, beta is created to be identical to beta_free except for one additional element that is the constrained item difficulty. As a result of this constraint, the mean of beta will be zero.

functions {
  vector gpcm_probs(real theta, real mu, real alpha, vector beta) {
    vector[rows(beta) + 1] unsummed;
    vector[rows(beta) + 1] probs;
    unsummed <- append_row(rep_vector(0.0, 1), alpha*(theta + mu - beta));
    probs <- softmax(cumulative_sum(unsummed));
    return probs;
  }
}
data {
  int<lower=1> I;                // # items
  int<lower=1> J;                // # persons
  int<lower=1> N;                // # responses
  int<lower=1,upper=I> ii[N];    // i for n
  int<lower=1,upper=J> jj[N];    // j for n
  int<lower=0> y[N];             // response for n; y = 0, 1 ... m_i
  int<lower=1> K;                // # person covariates
  matrix[J,K] W;                 // person covariate matrix
}
transformed data {
  int r[N];                      // modified response; r = 1, 2, ... m_i + 1
  int m[I];                      // # parameters per item
  int pos[I];                    // first position in beta vector for item
  m <- rep_array(0, I);
  for(n in 1:N) {
    r[n] <- y[n] + 1;
    if(y[n] > m[ii[n]]) m[ii[n]] <- y[n];
  }
  pos[1] <- 1;
  for(i in 2:(I))
    pos[i] <- m[i-1] + pos[i-1];
}
parameters {
  vector<lower=0>[I] alpha;
  vector[sum(m)-1] beta_free;    // unconstrained item parameters
  vector[J] theta;
  vector[K] lambda;
}
transformed parameters {
  vector[sum(m)] beta;           // constrained item parameters
  beta <- append_row(beta_free, rep_vector(-1*sum(beta_free), 1));
}
model {
  vector[J] mu;
  mu <- W*lambda;
  theta ~ normal(0, 1);
  alpha ~ lognormal(1, 1);
  beta_free ~ normal(0, 5);
  for (n in 1:N)
    r[n] ~ categorical(gpcm_probs(theta[jj[n]], mu[jj[n]], alpha[ii[n]],
                                  segment(beta, pos[ii[n]], m[ii[n]])));
}

2 Simulation

# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)

The R code that follows simulates a dataset conforming to the model. The Stan model will be evaluated in terms of its ability to recover the generating values of the parameters when fit to this dataset.

# Person covariates and abilities
J <- 500
lambda <- c(0.5, 0.5, 0.5)
W <- cbind(1, rnorm(J, 0, 1), rnorm(J, 0, 1))
mu <- W %*% matrix(lambda)
theta <- rnorm(J, 0, 1)

# Item parameters
I <- 20
alpha <- rep(c(0.8, 1, 1.2), length.out = I)
Beta_uncentered <- matrix(NA, nrow = I, ncol = 2)
Beta_uncentered[, 1] <- seq(from = -1.5, to = 1.5, length.out = I)
Beta_uncentered[, 2] <- Beta_uncentered[, 1] + rep(c(0.25, 0.5, 0.75, 1), length.out = I)
Beta_centered <- Beta_uncentered - mean(Beta_uncentered)
beta <- as.vector(t(Beta_centered))

# Start of Stan data list
data_list <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J, 
    each = I), K = ncol(W), W = W)

# Function to simulate responses
simulate_response <- function(theta, mu, alpha, beta) {
    unsummed <- c(0, alpha * (theta + mu - beta))
    numerators <- exp(cumsum(unsummed))
    denominator <- sum(numerators)
    response_probs <- numerators/denominator
    simulated_y <- sample(1:length(response_probs) - 1, size = 1, prob = response_probs)
    return(simulated_y)
}

# Add simulated responses to Stan data list
data_list$y <- numeric(data_list$N)
for (n in 1:data_list$N) {
    data_list$y[n] <- simulate_response(theta[data_list$jj[n]], mu[data_list$jj[n]], 
        alpha[data_list$ii[n]], Beta_centered[data_list$ii[n], ])
}

The simulated data consists of 20 items, each with 3 categories, and 500 persons. The latent regression includes an intercept and 2 person-related covariates, which are standard normal variables. Next, the model is fit to the simulated dataset with Stan.

# Fit model to simulated data
sim_fit <- stan(file = "gpcm_latent_reg.stan", data = data_list, chains = 4, 
    iter = 500)

Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the \(\hat{R}\) statistic for the model parameters and log posterior. These are provided in the following figure. All values for \(\hat{R}\) should be less than 1.1.

# Plot of convergence statistics
sim_summary <- as.data.frame(summary(sim_fit)[[1]])
sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))
Convergence statistics (\hat{R}) by parameter for the simulation. All values should be less than 1.1 to infer convergence.

Convergence statistics (\(\hat{R}\)) by parameter for the simulation. All values should be less than 1.1 to infer convergence.

The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.

# Make vector of wanted parameter names
wanted_pars <- c(paste0("alpha[", 1:length(alpha), "]"), paste0("beta[", 1:length(beta), 
    "]"), paste0("lambda[", 1:ncol(W), "]"))

# Get estimated and generating values for wanted parameters
generating_values = c(alpha, beta, lambda)
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]

# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values
sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values

# Plot the discrepancy
ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) + 
    scale_x_discrete() + labs(y = "Discrepancy", x = NULL) + geom_abline(intercept = 0, 
    slope = 0, color = "white") + geom_linerange() + geom_point(size = 2) + 
    theme(panel.grid = element_blank()) + coord_flip()
Discrepancies between estimated and generating parameters for the simulation. Points indicate the difference between the posterior means and generating values for a parameter, and horizontal lines indicate 95% posterior intervals for the difference. Most of the discrepancies are about zero, indicating that Stan successfully recovers the true parameters.

Discrepancies between estimated and generating parameters for the simulation. Points indicate the difference between the posterior means and generating values for a parameter, and horizontal lines indicate 95% posterior intervals for the difference. Most of the discrepancies are about zero, indicating that Stan successfully recovers the true parameters.

3 Example application

The example data are from the TIMSS 2011 mathematics assessment (Mullis et al. 2012) of Australian and Taiwanese students. For convenience, a subset of 500 students is used. The subsetted data is then divided into a person covariate matrix and an item response matrix.

# Attach the example dataset. The TAM package is required.
data(data.timssAusTwn.scored, package = "TAM")

# Subset the full data
select <- floor(seq(from = 1, to = nrow(data.timssAusTwn.scored), length.out = 500))
subsetted_df <- data.timssAusTwn.scored[select, ]
str(subsetted_df)
## 'data.frame':    500 obs. of  14 variables:
##  $ M032166 : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ M032721 : int  0 1 1 0 0 1 0 0 1 1 ...
##  $ M032757 : int  2 2 2 0 2 2 2 0 2 2 ...
##  $ M032760A: int  0 2 2 0 2 0 2 0 1 2 ...
##  $ M032760B: int  1 1 1 0 1 0 0 0 1 1 ...
##  $ M032760C: int  0 0 1 0 0 0 0 0 0 1 ...
##  $ M032761 : int  2 2 2 0 1 0 0 0 0 1 ...
##  $ M032692 : int  2 2 2 0 0 0 0 0 0 2 ...
##  $ M032626 : int  0 1 1 0 1 1 1 0 1 1 ...
##  $ M032595 : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ M032673 : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ IDCNTRY : int  36 36 36 36 36 36 36 36 36 36 ...
##  $ ITSEX   : int  1 1 2 2 2 2 1 1 2 1 ...
##  $ IDBOOK  : int  1 1 1 1 1 1 1 1 1 1 ...

The dataset is next divided into an item response matrix and a matrix of student covariates.

# Make a matrix of person predictors
w_mat <- cbind(intercept = rep(1, times = nrow(subsetted_df)), taiwan = as.numeric(subsetted_df$IDCNTRY == 
    158), female = as.numeric(subsetted_df$ITSEX == 2), book14 = as.numeric(subsetted_df$IDBOOK == 
    14))
head(w_mat)
##      intercept taiwan female book14
## [1,]         1      0      0      0
## [2,]         1      0      0      0
## [3,]         1      0      1      0
## [4,]         1      0      1      0
## [5,]         1      0      1      0
## [6,]         1      0      1      0
# Make a matrix of item responses
y_mat <- as.matrix(subsetted_df[, grep("^M", names(subsetted_df))])
head(y_mat)
##    M032166 M032721 M032757 M032760A M032760B M032760C M032761 M032692
## 1        1       0       2        0        1        0       2       2
## 4        1       1       2        2        1        0       2       2
## 8        1       1       2        2        1        1       2       2
## 11       0       0       0        0        0        0       0       0
## 15       1       0       2        2        1        0       1       0
## 18       1       1       2        0        0        0       0       0
##    M032626 M032595 M032673
## 1        0       1       1
## 4        1       1       1
## 8        1       1       1
## 11       0       1       0
## 15       1       1       1
## 18       1       1       1

The person covariate matrix w_mat has columns representing an intercept and three indicator variables for being in Taiwan (versus Australia), being female (versus male), and being assigned test booklet 14 (instead of booklet 1). The item response matrix y_mat contains 11 items. Neither the response matrix or person covariates contain missing data.

Before fitting the model, some descriptive statistics are considered. First, a frequency table for the person covariates is created.

# Customized version of table() that does not omit missing categories.
# Specify expected categories with key.
consec_table <- function(x, key) {
    y <- rep(0, times = length(key))
    names(y) <- as.character(key)
    x_table <- table(x)
    y[names(x_table)] <- x_table
    return(y)
}

# Frequency table for person covariates
t(apply(w_mat, 2, consec_table, key = 0:1))
##             0   1
## intercept   0 500
## taiwan    297 203
## female    247 253
## book14    250 250

Next, a frequency table for item responses is considered.

# Frequency table for item responses
item_freqs <- t(apply(y_mat, 2, consec_table, key = 0:2))
item_freqs
##            0   1   2
## M032166  157 343   0
## M032721  246 254   0
## M032757  130  21 349
## M032760A 233  20 247
## M032760B 306 194   0
## M032760C 343 157   0
## M032761  278  79 143
## M032692  299  13 188
## M032626  187 313   0
## M032595  155 345   0
## M032673  168 332   0

The above table shows that the data are a mixture of dichotomous items and polytomous items having three responses categories. The first and second items are dichotomous, while the third and fourth are polytomous, for example. Consequently, the first and second items will have one step parameter each, while the third and fourth will have two each. If this table indicated there were no 0 or 1 responses for one of the polytomous items, responses for that item would need to be recoded.

Because all item parameters are stored in one vector, beta, some care is required in understanding which elements of beta correspond to which items. The following R code produces a table that maps each element in beta to its associated item and step.

# Make a table mapping elements of beta to items/steps
x <- item_freqs[, -1] > 0
beta_key_trans <- t(matrix(NA, ncol = ncol(x), nrow = nrow(x)))
beta_key_trans[t(x[])] <- 1:sum(x)
beta_key <- t(beta_key_trans)
rownames(beta_key) <- rownames(item_freqs)
colnames(beta_key) <- paste("Step", 1:ncol(beta_key))
beta_key
##          Step 1 Step 2
## M032166       1     NA
## M032721       2     NA
## M032757       3      4
## M032760A      5      6
## M032760B      7     NA
## M032760C      8     NA
## M032761       9     10
## M032692      11     12
## M032626      13     NA
## M032595      14     NA
## M032673      15     NA

The data are now formatted into a list and fit with Stan.

# Assemble data list for Stan
ex_list <- list(I = ncol(y_mat), J = nrow(y_mat), N = length(y_mat), ii = rep(1:ncol(y_mat), 
    each = nrow(y_mat)), jj = rep(1:nrow(y_mat), times = ncol(y_mat)), y = as.vector(y_mat), 
    K = ncol(w_mat), W = w_mat)

# Run Stan model
ex_fit <- stan(file = "gpcm_latent_reg.stan", data = ex_list, chains = 4, iter = 500)

As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior density, using \(\hat{R}\).

# Plot of convergence statistics
ex_summary <- as.data.frame(summary(ex_fit)[[1]])
ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))
Convergence statistics (\hat{R}) by parameter for the example. All values should be less than 1.1 to infer convergence.

Convergence statistics (\(\hat{R}\)) by parameter for the example. All values should be less than 1.1 to infer convergence.

Next we view summaries of the parameter posteriors.

# View table of parameter posteriors
print(ex_fit, pars = c("alpha", "beta", "lambda"))
## Inference for Stan model: gpcm_latent_reg.
## 4 chains, each with iter=500; warmup=250; thin=1; 
## post-warmup draws per chain=250, total post-warmup draws=1000.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha[1]   0.76    0.00 0.11  0.56  0.68  0.76  0.84  0.99  1000 1.01
## alpha[2]   0.48    0.00 0.08  0.32  0.42  0.48  0.53  0.64  1000 1.00
## alpha[3]   1.06    0.01 0.14  0.80  0.96  1.05  1.14  1.37   471 1.01
## alpha[4]   1.86    0.01 0.26  1.41  1.69  1.83  1.99  2.50   340 1.01
## alpha[5]   2.01    0.01 0.22  1.61  1.85  1.99  2.15  2.50   440 1.00
## alpha[6]   3.09    0.02 0.44  2.35  2.80  3.04  3.32  4.10   515 1.00
## alpha[7]   2.58    0.02 0.36  1.97  2.32  2.55  2.80  3.34   358 1.00
## alpha[8]   1.24    0.01 0.13  1.02  1.14  1.23  1.32  1.51   377 1.00
## alpha[9]   1.65    0.01 0.19  1.29  1.51  1.65  1.77  2.03  1000 1.00
## alpha[10]  1.61    0.01 0.19  1.28  1.48  1.60  1.73  2.01   525 1.00
## alpha[11]  1.33    0.00 0.16  1.05  1.23  1.32  1.43  1.67  1000 1.00
## beta[1]   -1.07    0.01 0.18 -1.46 -1.19 -1.05 -0.94 -0.75   414 1.01
## beta[2]    0.09    0.01 0.19 -0.30 -0.03  0.10  0.22  0.45  1000 1.00
## beta[3]    1.06    0.01 0.36  0.45  0.80  1.03  1.28  1.85   568 1.01
## beta[4]   -2.59    0.02 0.41 -3.45 -2.86 -2.55 -2.31 -1.90   518 1.01
## beta[5]    0.99    0.01 0.23  0.60  0.83  0.97  1.12  1.49   409 1.01
## beta[6]   -0.82    0.01 0.22 -1.32 -0.96 -0.81 -0.68 -0.42   416 1.01
## beta[7]    0.61    0.00 0.07  0.48  0.56  0.61  0.66  0.77  1000 1.00
## beta[8]    0.88    0.00 0.07  0.75  0.83  0.88  0.92  1.01   398 1.01
## beta[9]    0.51    0.00 0.09  0.35  0.45  0.51  0.57  0.69   433 1.00
## beta[10]   0.83    0.00 0.09  0.64  0.78  0.83  0.89  0.99   514 1.00
## beta[11]   2.52    0.02 0.35  1.90  2.27  2.48  2.74  3.26   394 1.00
## beta[12]  -1.34    0.02 0.33 -2.06 -1.56 -1.31 -1.10 -0.76   453 1.00
## beta[13]  -0.38    0.00 0.08 -0.54 -0.44 -0.38 -0.33 -0.24  1000 1.00
## beta[14]  -0.68    0.00 0.09 -0.86 -0.73 -0.68 -0.62 -0.51  1000 1.00
## beta[15]  -0.61    0.00 0.10 -0.81 -0.67 -0.60 -0.54 -0.44   836 1.00
## lambda[1] -0.35    0.01 0.10 -0.56 -0.42 -0.35 -0.28 -0.15   234 1.01
## lambda[2]  1.48    0.01 0.11  1.26  1.41  1.48  1.56  1.69   139 1.02
## lambda[3]  0.07    0.01 0.11 -0.14 -0.01  0.06  0.14  0.28   214 1.00
## lambda[4] -0.21    0.01 0.09 -0.40 -0.28 -0.22 -0.15 -0.03   267 1.00
## 
## Samples were drawn using NUTS(diag_e) at Thu Apr 21 17:39:07 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

If person covariates are unavailable, or their inclusion unwanted, the model may be fit restricting the matrix of person covariates to an intercept only. In this case, the vector lambda contains only one element, which will represent the mean of the ability distribution. The code below is an example of how to structure the data for Stan for this purpose.

# Fit the example data without latent regression
noreg_list <- ex_list
noreg_list$K <- 1
noreg_list$W <- matrix(1, nrow = nrow(M), ncol = 1)
noreg_fit <- stan(file = "gpcm_latent_reg.stan", data = noreg_list, chains = 4, 
    iter = 500)

4 References

Mullis, Ina V S, Michael O Martin, Pierre Foy, and Alka Arora. 2012. TIMSS 2011 International Results in Mathematics. ERIC.

Muraki, Eiji. 1992. “A Generalized Partial Credit Model: Application of an EM Algorithm.” ETS Research Report Series 1992 (1). Wiley Online Library: i–i30.