1 Model

1.1 Overview

The partial credit model (Masters 1982) is appropriate for item response data that features more than two ordered response categories for some or all items. The items may have differing numbers of response categories. For dichotomous items (items with exactly two response categories), the partical credit model is equivalent to the Rasch model. The version presented includes a latent regression. However, the latent regression part of the model may be restricted to an intercept only, resulting in the standard partial credit model.

\[ \Pr(Y_{ij} = y,~y > 0 | \theta_j, \beta_i) = \frac{\exp \sum_{s=1}^y (\theta_j - \beta_{is})} {1 + \sum_{k=1}^{m_i} \exp \sum_{s=1}^k (\theta_j - \beta_{is})} \] \[ \Pr(Y_{ij} = y,~y = 0 | \theta_j, \beta_i) = \frac{1} {1 + \sum_{k=1}^{m_i} \exp \sum_{s=1}^k (\theta_j - \beta_{is})} \] \[ \theta_j \sim \mathrm{N}(w_{j}' \lambda, \sigma^2) \]

Variables:

$i = 1 \ldots I$ indexes items.
$j = 1 \ldots J$ indexes persons.
$Y_{ij} \in \{ 0 \ldots m_i \}$ is the response of person $j$ to item $i$
$m_i$ is simulataneously the maximum score and number of step difficulty parameters for item $i$.
$w_{j}$ is the vector of covariates for person $j$, which will in general include an intercept term.

Parameters:

$\beta_{is}$ is the $s$-th step difficulty for item $i$.
$\theta_j$ is the ability for person $j$.
$\lambda$ is the vector of latent regression parameters.
$\sigma^2$ is the variance for the ability distribution.

Constraints:

The last difficulty parameter is constrained to be the negative sum of the other difficulties, resulting in the average difficulty parameter being zero.

Priors:

$\beta_1 \ldots \beta_{I-1} \sim \mathrm{N}(0, 5)$ is weakly informative, and no prior is needed for the contrained difficulty $\beta_I$.
$\sigma \sim \mathrm{Exp}(.1)$ is weakly informative.
A uniform prior is chosen for $\lambda$ because the scale of the person covariates will vary depending on the application.

1.2 Stan program

A few aspects of the Stan program for the partial credit model bear mentioning. First, the program begins with the creation of a user-specified function pcm_probs(), which accepts a value for theta and a vector of parameters beta for one item. With these inputs, it returns a vector of model-predicted probabilities for each possible response. Later, in the model block, pcm_probs() is used to get the likelihood of the observed item responses.

Second, the encoding of item responses are modified such that the lowest response category is one instead of zero. This modification takes place in the transformed data block, in which a new variable r is created for this purpose. The adjustment is necessary for compatibility with the categorical() function.

Third, the variables m and pos are also created in the transformed data block. These are needed to pick out the vector of item parameters for a single item from the vector of all item parameters beta. pos indicates the position of the first parameter for a given item, while m indicates the count of parameters for an item.

Lastly, a constraint is placed on the item difficulties for model identification. This is accomplished by creating beta_free, which is the vector of unconstrained item parameters, in the parameters block. Then in the transformed parameters block, beta is created to be identical to beta_free except for one additional element that is the constrained item difficulty. As a result of this constraint, the mean of beta will be zero.

functions {
  vector pcm_probs(real theta, vector beta) {
    vector[rows(beta) + 1] unsummed;
    vector[rows(beta) + 1] probs;
    unsummed <- append_row(rep_vector(0.0, 1), theta - beta);
    probs <- softmax(cumulative_sum(unsummed));
    return probs;
  }
}
data {
  int<lower=1> I;                // # items
  int<lower=1> J;                // # persons
  int<lower=1> N;                // # responses
  int<lower=1,upper=I> ii[N];    // i for n
  int<lower=1,upper=J> jj[N];    // j for n
  int<lower=0> y[N];             // response for n; y = 0, 1 ... m_i
  int<lower=1> K;                // # person covariates
  matrix[J,K] W;                 // person covariate matrix
}
transformed data {
  int r[N];                      // modified response; r = 1, 2, ... m_i + 1
  int m[I];                      // # parameters per item
  int pos[I];                    // first position in beta vector for item
  m <- rep_array(0, I);
  for(n in 1:N) {
    r[n] <- y[n] + 1;
    if(y[n] > m[ii[n]]) m[ii[n]] <- y[n];
  }
  pos[1] <- 1;
  for(i in 2:(I))
    pos[i] <- m[i-1] + pos[i-1];
}
parameters {
  vector[sum(m)-1] beta_free;    // unconstrained item parameters
  vector[J] theta;
  real<lower=0> sigma;
  vector[K] lambda;
}
transformed parameters {
  vector[sum(m)] beta;           // constrained item parameters
  beta <- append_row(beta_free, rep_vector(-1*sum(beta_free), 1));
}
model {
  theta ~ normal(W*lambda, sigma);
  beta_free ~ normal(0, 5);
  sigma ~ exponential(.1);
  for (n in 1:N)
    r[n] ~ categorical(pcm_probs(theta[jj[n]],
                                 segment(beta, pos[ii[n]], m[ii[n]])));
}

2 Simulation

First, the necessary R packages are loaded.

# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)

The R code that follows simulates a dataset conforming to the model. The Stan model will be evaluated in terms of its ability to recover the generating values of the parameters when fit to this dataset.

# Generate person-related parameters and predictors
J <- 500
sigma <- 1
lambda <- c(0.5, 0.5, 0.5)
W <- cbind(1, rnorm(J, 0, 1), rnorm(J, 0, 1))
theta <- W %*% matrix(lambda) + rnorm(J, 0, sigma)

# Generate item parameters
I <- 20
Beta_uncentered <- matrix(NA, nrow = I, ncol = 2)
Beta_uncentered[, 1] <- seq(from = -1.5, to = 1.5, length.out = I)
Beta_uncentered[, 2] <- Beta_uncentered[, 1] + rep(c(0.25, 0.5, 0.75, 1), length.out = I)
Beta_centered <- Beta_uncentered - mean(Beta_uncentered)

# A function to simulate responses from the model
simulate_response <- function(theta, beta) {
    unsummed <- c(0, theta - beta)
    numerators <- exp(cumsum(unsummed))
    denominator <- sum(numerators)
    response_probs <- numerators/denominator
    simulated_y <- sample(1:length(response_probs) - 1, size = 1, prob = response_probs)
    return(simulated_y)
}

# Assemble the data list to pass to Stan
data_list <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J, 
    each = I))
data_list$y <- numeric(data_list$N)
for (n in 1:data_list$N) {
    data_list$y[n] <- simulate_response(theta[data_list$jj[n]], Beta_centered[data_list$ii[n], 
        ])
}
data_list$K <- ncol(W)
data_list$W <- W

The simulated data consists of 20 items, each with 3 response categories, and 500 persons. The simulated dataset is next fit with Stan.

# Fit model to simulated data
sim_fit <- stan(file = "pcm_latent_reg.stan", data = data_list, chains = 4, 
    iter = 200)

Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the $\hat{R}$ statistic for the model parameters and log posterior. These are provided in the following figure. All values for $\hat{R}$ should be less than 1.1.

# Plot of convergence statistics
sim_summary <- as.data.frame(summary(sim_fit)[[1]])
sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

$Convergence statistics ($\hat{R}$) by parameter for the simulation. All values should be less than 1.1 to infer convergence.$

Convergence statistics ($\hat{R}$) by parameter for the simulation. All values should be less than 1.1 to infer convergence.

The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.

# Make vector of wanted parameter names
beta <- as.vector(t(Beta_centered))
wanted_pars <- c(paste0("beta[", 1:length(beta), "]"), paste0("lambda[", 1:ncol(W), 
    "]"), "sigma")

# Get estimated and generating values for wanted parameters
generating_values = c(beta, lambda, sigma)
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]

# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values
sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values

# Plot the discrepancy
ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) + 
    scale_x_discrete() + labs(y = "Discrepancy", x = NULL) + geom_abline(intercept = 0, 
    slope = 0, color = "white") + geom_linerange() + geom_point(size = 2) + 
    theme(panel.grid = element_blank()) + coord_flip()

Discrepancies between estimated and generating parameters. Points indicate the difference between the posterior means and generating values for a parameter, and horizontal lines indicate 95% posterior intervals for the difference. Most of the discrepancies are about zero, indicating that Stan successfully recovers the true parameters.

3 Example application

The example data are from the TIMSS 2011 mathematics assessment (Mullis et al. 2012) of Australian and Taiwanese students. For convenience, a subset of 500 students is used. The subsetted data is then divided into a person covariate matrix and an item response matrix.

# Attach the example dataset. The TAM package is required.
data(data.timssAusTwn.scored, package = "TAM")

# Subset the full data
select <- floor(seq(from = 1, to = nrow(data.timssAusTwn.scored), length.out = 500))
subsetted_df <- data.timssAusTwn.scored[select, ]
str(subsetted_df)

## 'data.frame':    500 obs. of  14 variables:
##  $ M032166 : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ M032721 : int  0 1 1 0 0 1 0 0 1 1 ...
##  $ M032757 : int  2 2 2 0 2 2 2 0 2 2 ...
##  $ M032760A: int  0 2 2 0 2 0 2 0 1 2 ...
##  $ M032760B: int  1 1 1 0 1 0 0 0 1 1 ...
##  $ M032760C: int  0 0 1 0 0 0 0 0 0 1 ...
##  $ M032761 : int  2 2 2 0 1 0 0 0 0 1 ...
##  $ M032692 : int  2 2 2 0 0 0 0 0 0 2 ...
##  $ M032626 : int  0 1 1 0 1 1 1 0 1 1 ...
##  $ M032595 : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ M032673 : int  1 1 1 0 1 1 1 0 0 1 ...
##  $ IDCNTRY : int  36 36 36 36 36 36 36 36 36 36 ...
##  $ ITSEX   : int  1 1 2 2 2 2 1 1 2 1 ...
##  $ IDBOOK  : int  1 1 1 1 1 1 1 1 1 1 ...

The dataset is next divided into an item response matrix and a matrix of student covariates.

# Make a matrix of person predictors
w_mat <- cbind(intercept = rep(1, times = nrow(subsetted_df)), taiwan = as.numeric(subsetted_df$IDCNTRY == 
    158), female = as.numeric(subsetted_df$ITSEX == 2), book14 = as.numeric(subsetted_df$IDBOOK == 
    14))
head(w_mat)

##      intercept taiwan female book14
## [1,]         1      0      0      0
## [2,]         1      0      0      0
## [3,]         1      0      1      0
## [4,]         1      0      1      0
## [5,]         1      0      1      0
## [6,]         1      0      1      0

# Make a matrix of item responses
y_mat <- as.matrix(subsetted_df[, grep("^M", names(subsetted_df))])
head(y_mat)

##    M032166 M032721 M032757 M032760A M032760B M032760C M032761 M032692
## 1        1       0       2        0        1        0       2       2
## 4        1       1       2        2        1        0       2       2
## 8        1       1       2        2        1        1       2       2
## 11       0       0       0        0        0        0       0       0
## 15       1       0       2        2        1        0       1       0
## 18       1       1       2        0        0        0       0       0
##    M032626 M032595 M032673
## 1        0       1       1
## 4        1       1       1
## 8        1       1       1
## 11       0       1       0
## 15       1       1       1
## 18       1       1       1

The person covariate matrix w_mat has columns representing an intercept and three indicator variables for being in Taiwan (versus Australia), being female (versus male), and being assigned test booklet 14 (instead of booklet 1). The item response matrix y_mat contains 11 items. Neither the response matrix or person covariates contain missing data.

Before fitting the model, some descriptive statistics are considered. First, a frequency table for the person covariates is created.

# Customized version of table() that does not omit missing categories.
# Specify expected categories with key.
consec_table <- function(x, key) {
    y <- rep(0, times = length(key))
    names(y) <- as.character(key)
    x_table <- table(x)
    y[names(x_table)] <- x_table
    return(y)
}

# Frequency table for person covariates
t(apply(w_mat, 2, consec_table, key = 0:1))

##             0   1
## intercept   0 500
## taiwan    297 203
## female    247 253
## book14    250 250

Next, a frequency table for item responses is considered.

# Frequency table for item responses
item_freqs <- t(apply(y_mat, 2, consec_table, key = 0:2))
item_freqs

##            0   1   2
## M032166  157 343   0
## M032721  246 254   0
## M032757  130  21 349
## M032760A 233  20 247
## M032760B 306 194   0
## M032760C 343 157   0
## M032761  278  79 143
## M032692  299  13 188
## M032626  187 313   0
## M032595  155 345   0
## M032673  168 332   0

The table shows that the data are a mixture of dichotomous item and polytomous items with three responses categories. The first and second items are dichotomous, while the third and fourth are polytomous, for example. Consequently, the first and second items will have one step parameter each, while the third and fourth will have two each. If this table indicated there were no 0 or 1 responses for one of the polytomous items, responses for that item would need to be recoded.

Because all item parameters are stored in one vector, beta, some care is required in understanding which elements of beta correspond to which items. The following R code produces a table that maps each element in beta corresponds to its associated item and step.

# Make a table mapping elements of beta to items/steps
x <- item_freqs[, -1] > 0
beta_key_trans <- t(matrix(NA, ncol = ncol(x), nrow = nrow(x)))
beta_key_trans[t(x[])] <- 1:sum(x)
beta_key <- t(beta_key_trans)
rownames(beta_key) <- rownames(item_freqs)
colnames(beta_key) <- paste("Step", 1:ncol(beta_key))
beta_key

##          Step 1 Step 2
## M032166       1     NA
## M032721       2     NA
## M032757       3      4
## M032760A      5      6
## M032760B      7     NA
## M032760C      8     NA
## M032761       9     10
## M032692      11     12
## M032626      13     NA
## M032595      14     NA
## M032673      15     NA

The data are now formatted into a list and fit with Stan.

# Assemble data list for Stan
ex_list <- list(I = ncol(y_mat), J = nrow(y_mat), N = length(y_mat), ii = rep(1:ncol(y_mat), 
    each = nrow(y_mat)), jj = rep(1:nrow(y_mat), times = ncol(y_mat)), y = as.vector(y_mat), 
    K = ncol(w_mat), W = w_mat)

# Run Stan model
ex_fit <- stan(file = "pcm_latent_reg.stan", data = ex_list, chains = 4, iter = 200)

As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior density, using $\hat{R}$.

# Plot of convergence statistics
ex_summary <- as.data.frame(summary(ex_fit)[[1]])
ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

$Convergence statistics ($\hat{R}$) by parameter for the example. All values should be less than 1.1 to infer convergence.$

Convergence statistics ($\hat{R}$) by parameter for the example. All values should be less than 1.1 to infer convergence.

Next we view a summary of the parameter posteriors.

# View table of parameter posteriors
print(ex_fit, pars = c("beta", "lambda", "sigma"))

## Inference for Stan model: pcm_latent_reg.
## 4 chains, each with iter=200; warmup=100; thin=1; 
## post-warmup draws per chain=100, total post-warmup draws=400.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## beta[1]   -1.04    0.01 0.11 -1.27 -1.11 -1.05 -0.96 -0.83   400 1.00
## beta[2]    0.11    0.01 0.12 -0.10  0.02  0.11  0.18  0.33   400 1.01
## beta[3]    0.68    0.01 0.24  0.16  0.53  0.68  0.84  1.15   400 1.00
## beta[4]   -2.77    0.01 0.23 -3.25 -2.91 -2.76 -2.62 -2.31   400 0.99
## beta[5]    1.90    0.01 0.24  1.48  1.74  1.89  2.05  2.39   400 0.99
## beta[6]   -1.82    0.01 0.24 -2.34 -1.97 -1.81 -1.65 -1.40   400 0.99
## beta[7]    0.87    0.01 0.12  0.63  0.79  0.88  0.96  1.12   375 1.00
## beta[8]    1.37    0.01 0.12  1.13  1.30  1.37  1.46  1.58   400 0.99
## beta[9]    1.13    0.01 0.14  0.83  1.03  1.13  1.22  1.39   390 0.99
## beta[10]   0.72    0.01 0.16  0.43  0.61  0.72  0.83  1.03   326 0.99
## beta[11]   2.98    0.01 0.26  2.52  2.79  2.96  3.14  3.50   400 0.99
## beta[12]  -1.54    0.01 0.27 -2.11 -1.72 -1.54 -1.36 -1.06   400 1.00
## beta[13]  -0.63    0.01 0.10 -0.82 -0.71 -0.63 -0.56 -0.42   396 1.00
## beta[14]  -1.07    0.01 0.10 -1.27 -1.13 -1.07 -1.00 -0.87   329 1.00
## beta[15]  -0.89    0.01 0.11 -1.10 -0.96 -0.89 -0.82 -0.67   400 0.99
## lambda[1] -0.52    0.01 0.14 -0.79 -0.61 -0.52 -0.42 -0.25   400 1.01
## lambda[2]  2.04    0.01 0.16  1.75  1.94  2.04  2.15  2.34   279 1.00
## lambda[3]  0.08    0.01 0.14 -0.20 -0.01  0.09  0.18  0.37   386 1.01
## lambda[4] -0.30    0.01 0.14 -0.59 -0.40 -0.30 -0.20 -0.03   400 1.00
## sigma      1.40    0.01 0.07  1.29  1.36  1.40  1.44  1.54   133 1.01
## 
## Samples were drawn using NUTS(diag_e) at Tue Apr 19 16:38:36 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

If person covariates are unavailable, or their inclusion unwanted, the model may be fit restricting the matrix of person covariates to an intercept only. In this case, the vector lambda contains only one element, which will represent the mean of the ability distribution. The code below is an example of how to structure the data for Stan for this purpose.

# Fit the example data without latent regression
noreg_list <- ex_list
noreg_list$K <- 1
noreg_list$W <- matrix(1, nrow = nrow(M), ncol = 1)
noreg_fit <- stan(file = "pcm_latent_reg.stan", data = noreg_list, chains = 4, 
    iter = 200)

4 References

Masters, Geoff N. 1982. “A Rasch Model for Partial Credit Scoring.” Psychometrika 47 (2). Springer: 149–74.

Mullis, Ina V S, Michael O Martin, Pierre Foy, and Alka Arora. 2012. TIMSS 2011 International Results in Mathematics. ERIC.

Partial credit model with latent regression

Daniel C. Furr

April 29, 2016