The partial credit model (Masters 1982) is appropriate for item response data that features more than two ordered response categories for some or all items. The items may have differing numbers of response categories. For dichotomous items (items with exactly two response categories), the partical credit model is equivalent to the Rasch model. The version presented includes a latent regression. However, the latent regression part of the model may be restricted to an intercept only, resulting in the standard partial credit model.
\[ \Pr(Y_{ij} = y,~y > 0 | \theta_j, \beta_i) = \frac{\exp \sum_{s=1}^y (\theta_j - \beta_{is})} {1 + \sum_{k=1}^{m_i} \exp \sum_{s=1}^k (\theta_j - \beta_{is})} \] \[ \Pr(Y_{ij} = y,~y = 0 | \theta_j, \beta_i) = \frac{1} {1 + \sum_{k=1}^{m_i} \exp \sum_{s=1}^k (\theta_j - \beta_{is})} \] \[ \theta_j \sim \mathrm{N}(w_{j}' \lambda, \sigma^2) \]
Variables:
Parameters:
Constraints:
Priors:
A few aspects of the Stan program for the partial credit model bear mentioning. First, the program begins with the creation of a user-specified function pcm_probs()
, which accepts a value for theta
and a vector of parameters beta
for one item. With these inputs, it returns a vector of model-predicted probabilities for each possible response. Later, in the model block, pcm_probs()
is used to get the likelihood of the observed item responses.
Second, the encoding of item responses are modified such that the lowest response category is one instead of zero. This modification takes place in the transformed data block, in which a new variable r
is created for this purpose. The adjustment is necessary for compatibility with the categorical()
function.
Third, the variables m
and pos
are also created in the transformed data block. These are needed to pick out the vector of item parameters for a single item from the vector of all item parameters beta
. pos
indicates the position of the first parameter for a given item, while m
indicates the count of parameters for an item.
Lastly, a constraint is placed on the item difficulties for model identification. This is accomplished by creating beta_free
, which is the vector of unconstrained item parameters, in the parameters block. Then in the transformed parameters block, beta
is created to be identical to beta_free
except for one additional element that is the constrained item difficulty. As a result of this constraint, the mean of beta
will be zero.
functions {
vector pcm_probs(real theta, vector beta) {
vector[rows(beta) + 1] unsummed;
vector[rows(beta) + 1] probs;
unsummed <- append_row(rep_vector(0.0, 1), theta - beta);
probs <- softmax(cumulative_sum(unsummed));
return probs;
}
}
data {
int<lower=1> I; // # items
int<lower=1> J; // # persons
int<lower=1> N; // # responses
int<lower=1,upper=I> ii[N]; // i for n
int<lower=1,upper=J> jj[N]; // j for n
int<lower=0> y[N]; // response for n; y = 0, 1 ... m_i
int<lower=1> K; // # person covariates
matrix[J,K] W; // person covariate matrix
}
transformed data {
int r[N]; // modified response; r = 1, 2, ... m_i + 1
int m[I]; // # parameters per item
int pos[I]; // first position in beta vector for item
m <- rep_array(0, I);
for(n in 1:N) {
r[n] <- y[n] + 1;
if(y[n] > m[ii[n]]) m[ii[n]] <- y[n];
}
pos[1] <- 1;
for(i in 2:(I))
pos[i] <- m[i-1] + pos[i-1];
}
parameters {
vector[sum(m)-1] beta_free; // unconstrained item parameters
vector[J] theta;
real<lower=0> sigma;
vector[K] lambda;
}
transformed parameters {
vector[sum(m)] beta; // constrained item parameters
beta <- append_row(beta_free, rep_vector(-1*sum(beta_free), 1));
}
model {
theta ~ normal(W*lambda, sigma);
beta_free ~ normal(0, 5);
sigma ~ exponential(.1);
for (n in 1:N)
r[n] ~ categorical(pcm_probs(theta[jj[n]],
segment(beta, pos[ii[n]], m[ii[n]])));
}
First, the necessary R packages are loaded.
# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)
The R code that follows simulates a dataset conforming to the model. The Stan model will be evaluated in terms of its ability to recover the generating values of the parameters when fit to this dataset.
# Generate person-related parameters and predictors
J <- 500
sigma <- 1
lambda <- c(0.5, 0.5, 0.5)
W <- cbind(1, rnorm(J, 0, 1), rnorm(J, 0, 1))
theta <- W %*% matrix(lambda) + rnorm(J, 0, sigma)
# Generate item parameters
I <- 20
Beta_uncentered <- matrix(NA, nrow = I, ncol = 2)
Beta_uncentered[, 1] <- seq(from = -1.5, to = 1.5, length.out = I)
Beta_uncentered[, 2] <- Beta_uncentered[, 1] + rep(c(0.25, 0.5, 0.75, 1), length.out = I)
Beta_centered <- Beta_uncentered - mean(Beta_uncentered)
# A function to simulate responses from the model
simulate_response <- function(theta, beta) {
unsummed <- c(0, theta - beta)
numerators <- exp(cumsum(unsummed))
denominator <- sum(numerators)
response_probs <- numerators/denominator
simulated_y <- sample(1:length(response_probs) - 1, size = 1, prob = response_probs)
return(simulated_y)
}
# Assemble the data list to pass to Stan
data_list <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J,
each = I))
data_list$y <- numeric(data_list$N)
for (n in 1:data_list$N) {
data_list$y[n] <- simulate_response(theta[data_list$jj[n]], Beta_centered[data_list$ii[n],
])
}
data_list$K <- ncol(W)
data_list$W <- W
The simulated data consists of 20 items, each with 3 response categories, and 500 persons. The simulated dataset is next fit with Stan.
# Fit model to simulated data
sim_fit <- stan(file = "pcm_latent_reg.stan", data = data_list, chains = 4,
iter = 200)
Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the \(\hat{R}\) statistic for the model parameters and log posterior. These are provided in the following figure. All values for \(\hat{R}\) should be less than 1.1.
# Plot of convergence statistics
sim_summary <- as.data.frame(summary(sim_fit)[[1]])
sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0,
width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))
Convergence statistics (\(\hat{R}\)) by parameter for the simulation. All values should be less than 1.1 to infer convergence.
The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.
# Make vector of wanted parameter names
beta <- as.vector(t(Beta_centered))
wanted_pars <- c(paste0("beta[", 1:length(beta), "]"), paste0("lambda[", 1:ncol(W),
"]"), "sigma")
# Get estimated and generating values for wanted parameters
generating_values = c(beta, lambda, sigma)
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]
# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values
sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values
# Plot the discrepancy
ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) +
scale_x_discrete() + labs(y = "Discrepancy", x = NULL) + geom_abline(intercept = 0,
slope = 0, color = "white") + geom_linerange() + geom_point(size = 2) +
theme(panel.grid = element_blank()) + coord_flip()
Discrepancies between estimated and generating parameters. Points indicate the difference between the posterior means and generating values for a parameter, and horizontal lines indicate 95% posterior intervals for the difference. Most of the discrepancies are about zero, indicating that Stan successfully recovers the true parameters.
The example data are from the TIMSS 2011 mathematics assessment (Mullis et al. 2012) of Australian and Taiwanese students. For convenience, a subset of 500 students is used. The subsetted data is then divided into a person covariate matrix and an item response matrix.
# Attach the example dataset. The TAM package is required.
data(data.timssAusTwn.scored, package = "TAM")
# Subset the full data
select <- floor(seq(from = 1, to = nrow(data.timssAusTwn.scored), length.out = 500))
subsetted_df <- data.timssAusTwn.scored[select, ]
str(subsetted_df)
## 'data.frame': 500 obs. of 14 variables:
## $ M032166 : int 1 1 1 0 1 1 1 0 0 1 ...
## $ M032721 : int 0 1 1 0 0 1 0 0 1 1 ...
## $ M032757 : int 2 2 2 0 2 2 2 0 2 2 ...
## $ M032760A: int 0 2 2 0 2 0 2 0 1 2 ...
## $ M032760B: int 1 1 1 0 1 0 0 0 1 1 ...
## $ M032760C: int 0 0 1 0 0 0 0 0 0 1 ...
## $ M032761 : int 2 2 2 0 1 0 0 0 0 1 ...
## $ M032692 : int 2 2 2 0 0 0 0 0 0 2 ...
## $ M032626 : int 0 1 1 0 1 1 1 0 1 1 ...
## $ M032595 : int 1 1 1 1 1 1 1 0 1 1 ...
## $ M032673 : int 1 1 1 0 1 1 1 0 0 1 ...
## $ IDCNTRY : int 36 36 36 36 36 36 36 36 36 36 ...
## $ ITSEX : int 1 1 2 2 2 2 1 1 2 1 ...
## $ IDBOOK : int 1 1 1 1 1 1 1 1 1 1 ...
The dataset is next divided into an item response matrix and a matrix of student covariates.
# Make a matrix of person predictors
w_mat <- cbind(intercept = rep(1, times = nrow(subsetted_df)), taiwan = as.numeric(subsetted_df$IDCNTRY ==
158), female = as.numeric(subsetted_df$ITSEX == 2), book14 = as.numeric(subsetted_df$IDBOOK ==
14))
head(w_mat)
## intercept taiwan female book14
## [1,] 1 0 0 0
## [2,] 1 0 0 0
## [3,] 1 0 1 0
## [4,] 1 0 1 0
## [5,] 1 0 1 0
## [6,] 1 0 1 0
# Make a matrix of item responses
y_mat <- as.matrix(subsetted_df[, grep("^M", names(subsetted_df))])
head(y_mat)
## M032166 M032721 M032757 M032760A M032760B M032760C M032761 M032692
## 1 1 0 2 0 1 0 2 2
## 4 1 1 2 2 1 0 2 2
## 8 1 1 2 2 1 1 2 2
## 11 0 0 0 0 0 0 0 0
## 15 1 0 2 2 1 0 1 0
## 18 1 1 2 0 0 0 0 0
## M032626 M032595 M032673
## 1 0 1 1
## 4 1 1 1
## 8 1 1 1
## 11 0 1 0
## 15 1 1 1
## 18 1 1 1
The person covariate matrix w_mat
has columns representing an intercept and three indicator variables for being in Taiwan (versus Australia), being female (versus male), and being assigned test booklet 14 (instead of booklet 1). The item response matrix y_mat
contains 11 items. Neither the response matrix or person covariates contain missing data.
Before fitting the model, some descriptive statistics are considered. First, a frequency table for the person covariates is created.
# Customized version of table() that does not omit missing categories.
# Specify expected categories with key.
consec_table <- function(x, key) {
y <- rep(0, times = length(key))
names(y) <- as.character(key)
x_table <- table(x)
y[names(x_table)] <- x_table
return(y)
}
# Frequency table for person covariates
t(apply(w_mat, 2, consec_table, key = 0:1))
## 0 1
## intercept 0 500
## taiwan 297 203
## female 247 253
## book14 250 250
Next, a frequency table for item responses is considered.
# Frequency table for item responses
item_freqs <- t(apply(y_mat, 2, consec_table, key = 0:2))
item_freqs
## 0 1 2
## M032166 157 343 0
## M032721 246 254 0
## M032757 130 21 349
## M032760A 233 20 247
## M032760B 306 194 0
## M032760C 343 157 0
## M032761 278 79 143
## M032692 299 13 188
## M032626 187 313 0
## M032595 155 345 0
## M032673 168 332 0
The table shows that the data are a mixture of dichotomous item and polytomous items with three responses categories. The first and second items are dichotomous, while the third and fourth are polytomous, for example. Consequently, the first and second items will have one step parameter each, while the third and fourth will have two each. If this table indicated there were no 0 or 1 responses for one of the polytomous items, responses for that item would need to be recoded.
Because all item parameters are stored in one vector, beta
, some care is required in understanding which elements of beta
correspond to which items. The following R code produces a table that maps each element in beta
corresponds to its associated item and step.
# Make a table mapping elements of beta to items/steps
x <- item_freqs[, -1] > 0
beta_key_trans <- t(matrix(NA, ncol = ncol(x), nrow = nrow(x)))
beta_key_trans[t(x[])] <- 1:sum(x)
beta_key <- t(beta_key_trans)
rownames(beta_key) <- rownames(item_freqs)
colnames(beta_key) <- paste("Step", 1:ncol(beta_key))
beta_key
## Step 1 Step 2
## M032166 1 NA
## M032721 2 NA
## M032757 3 4
## M032760A 5 6
## M032760B 7 NA
## M032760C 8 NA
## M032761 9 10
## M032692 11 12
## M032626 13 NA
## M032595 14 NA
## M032673 15 NA
The data are now formatted into a list and fit with Stan.
# Assemble data list for Stan
ex_list <- list(I = ncol(y_mat), J = nrow(y_mat), N = length(y_mat), ii = rep(1:ncol(y_mat),
each = nrow(y_mat)), jj = rep(1:nrow(y_mat), times = ncol(y_mat)), y = as.vector(y_mat),
K = ncol(w_mat), W = w_mat)
# Run Stan model
ex_fit <- stan(file = "pcm_latent_reg.stan", data = ex_list, chains = 4, iter = 200)
As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior density, using \(\hat{R}\).
# Plot of convergence statistics
ex_summary <- as.data.frame(summary(ex_fit)[[1]])
ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0,
width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))
Convergence statistics (\(\hat{R}\)) by parameter for the example. All values should be less than 1.1 to infer convergence.
Next we view a summary of the parameter posteriors.
# View table of parameter posteriors
print(ex_fit, pars = c("beta", "lambda", "sigma"))
## Inference for Stan model: pcm_latent_reg.
## 4 chains, each with iter=200; warmup=100; thin=1;
## post-warmup draws per chain=100, total post-warmup draws=400.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## beta[1] -1.04 0.01 0.11 -1.27 -1.11 -1.05 -0.96 -0.83 400 1.00
## beta[2] 0.11 0.01 0.12 -0.10 0.02 0.11 0.18 0.33 400 1.01
## beta[3] 0.68 0.01 0.24 0.16 0.53 0.68 0.84 1.15 400 1.00
## beta[4] -2.77 0.01 0.23 -3.25 -2.91 -2.76 -2.62 -2.31 400 0.99
## beta[5] 1.90 0.01 0.24 1.48 1.74 1.89 2.05 2.39 400 0.99
## beta[6] -1.82 0.01 0.24 -2.34 -1.97 -1.81 -1.65 -1.40 400 0.99
## beta[7] 0.87 0.01 0.12 0.63 0.79 0.88 0.96 1.12 375 1.00
## beta[8] 1.37 0.01 0.12 1.13 1.30 1.37 1.46 1.58 400 0.99
## beta[9] 1.13 0.01 0.14 0.83 1.03 1.13 1.22 1.39 390 0.99
## beta[10] 0.72 0.01 0.16 0.43 0.61 0.72 0.83 1.03 326 0.99
## beta[11] 2.98 0.01 0.26 2.52 2.79 2.96 3.14 3.50 400 0.99
## beta[12] -1.54 0.01 0.27 -2.11 -1.72 -1.54 -1.36 -1.06 400 1.00
## beta[13] -0.63 0.01 0.10 -0.82 -0.71 -0.63 -0.56 -0.42 396 1.00
## beta[14] -1.07 0.01 0.10 -1.27 -1.13 -1.07 -1.00 -0.87 329 1.00
## beta[15] -0.89 0.01 0.11 -1.10 -0.96 -0.89 -0.82 -0.67 400 0.99
## lambda[1] -0.52 0.01 0.14 -0.79 -0.61 -0.52 -0.42 -0.25 400 1.01
## lambda[2] 2.04 0.01 0.16 1.75 1.94 2.04 2.15 2.34 279 1.00
## lambda[3] 0.08 0.01 0.14 -0.20 -0.01 0.09 0.18 0.37 386 1.01
## lambda[4] -0.30 0.01 0.14 -0.59 -0.40 -0.30 -0.20 -0.03 400 1.00
## sigma 1.40 0.01 0.07 1.29 1.36 1.40 1.44 1.54 133 1.01
##
## Samples were drawn using NUTS(diag_e) at Tue Apr 19 16:38:36 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
If person covariates are unavailable, or their inclusion unwanted, the model may be fit restricting the matrix of person covariates to an intercept only. In this case, the vector lambda
contains only one element, which will represent the mean of the ability distribution. The code below is an example of how to structure the data for Stan for this purpose.
# Fit the example data without latent regression
noreg_list <- ex_list
noreg_list$K <- 1
noreg_list$W <- matrix(1, nrow = nrow(M), ncol = 1)
noreg_fit <- stan(file = "pcm_latent_reg.stan", data = noreg_list, chains = 4,
iter = 200)
Masters, Geoff N. 1982. “A Rasch Model for Partial Credit Scoring.” Psychometrika 47 (2). Springer: 149–74.
Mullis, Ina V S, Michael O Martin, Pierre Foy, and Alka Arora. 2012. TIMSS 2011 International Results in Mathematics. ERIC.