1 Model

1.1 Overview

The two-parameter logistic model (2PL) (Swaminathan and Gifford 1985) is an item response theory model that includes parameters for both the difficulty and discrimination of dichotomous items. The version presented includes a latent regression. However, the latent regression part of the model may be restricted to an intercept only, resulting in a regular 2PL.

\[ \mathrm{logit} [ \Pr(y_{ij} = 1 | \alpha_i, \beta_i, \theta_j, \lambda) ] = \alpha_i (\theta_j + w_j' \lambda - \beta_i) \] \[ \theta_j \sim \mathrm{N}(0, 1) \]

Variables:

$i = 1 \ldots I$ indexes items.
$j = 1 \ldots J$ indexes persons.
$y_{ij} \in \{ 0 , 1 \}$ is the response of person $j$ to item $i$.
$w_{j}$ is the vector of covariates for person $j$, which will in general include an intercept term.

Parameters:

$\alpha_i$ is the discrimination for item $i$.
$\beta_i$ is the difficulty for item $i$.
$\theta_j$ is the ability for person $j$.
$\lambda$ is the vector of latent regression parameters.

Constraints:

The last item difficulty is constrained to be the negative sum of the other difficulties, $\beta_I = -\sum_{i}^{(I-1)} \beta_i$, resulting in the average item difficulty being zero.

Priors:

$\alpha_i \sim \mathrm{log~N}(1, 1)$ is weakly informative, except that positive discrimination parameters are assumed.
$\beta_1 \ldots \beta_{I-1} \sim \mathrm{N}(0, 5)$ is weakly informative, and no prior is needed for the contrained difficulty $\beta_I$.
A uniform prior is chosen for $\lambda$ because the scale of the person covariates will vary depending on the application.

1.2 Stan program

The Stan program is presented below. The constraint is placed on the last item difficulty by creating beta_free, which is the vector of unconstrained item parameters, in the parameters block. Then in the transformed parameters block, beta is created to be identical to beta_free except for one additional element that is the constrained item difficulty.

No priors are specified for lambda, the regression coefficients, because the scale of the person covariates may very greatly depending on the application. In the absence of a specified prior, Stan will assign the uniform prior. Readers should consider adding appropriate priors for lambda to the Stan program according to their knowledge of the problem.

data {
  int<lower=1> I;               // # questions
  int<lower=1> J;               // # persons
  int<lower=1> N;               // # observations
  int<lower=1, upper=I> ii[N];  // question for n
  int<lower=1, upper=J> jj[N];  // person for n
  int<lower=0, upper=1> y[N];   // correctness for n
  int<lower=1> K;               // # person covariates
  matrix[J,K] W;                // person covariate matrix
}
parameters {
  vector<lower=0>[I] alpha;
  vector[I-1] beta_free;
  vector[J] theta;
  vector[K] lambda;
}
transformed parameters {
  vector[I] beta;
  beta <- append_row(beta_free, rep_vector(-1*sum(beta_free), 1));
}
model {
  vector[J] mu;
  mu <- W*lambda;
  alpha ~ lognormal(1, 1);
  beta_free ~ normal(0, 5);
  theta ~ normal(0, 1);
  y ~ bernoulli_logit(alpha[ii].*(theta[jj] + mu[jj] - beta[ii]));
}

2 Simulation

The Stan model is fit to a simulated dataset to evaluate it’s ability to recover the generating parameter values. The simulation is conducted in R and analysis proceeds using the rstan implementation of Stan.

First, the necessary R packages are loaded.

# Load R packages
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(ggplot2)

The R code that follows simulates a dataset conforming to the model.

# Set paramters for the simulated data
I <- 20
J <- 500
sigma <- 1
lambda <- c(0.5, 0.5, 0.5)
W <- cbind(1, rnorm(J, 0, 1), rnorm(J, 0, 1))
alpha <- rep(c(0.8, 1, 1.2, 1.4), length.out = I)
beta_free <- seq(from = -1.5, to = 1.5, length.out = I - 1)

# Calculate or sample remaining paramters
theta <- rnorm(J, 0, sigma)
mu <- W %*% matrix(lambda)
beta <- c(beta_free, -1 * sum(beta_free))

# Assemble data and simulate response
sim_data <- list(I = I, J = J, N = I * J, ii = rep(1:I, times = J), jj = rep(1:J, 
    each = I), K = ncol(W), W = W)
eta <- alpha[sim_data$ii] * (theta[sim_data$jj] + mu[sim_data$jj] - beta[sim_data$ii])
sim_data$y <- as.numeric(boot::inv.logit(eta) > runif(sim_data$N))

The simulated data consists of 20 dichotomous items and 500 persons. The latent regression includes an intercept and 2 person-related covariates, which are standard normal variables. The simulated dataset is next fit with Stan.

# Fit model to simulated data
sim_fit <- stan(file = "2pl_latent_reg.stan", data = sim_data, chains = 4, iter = 500)

Before interpreting the results, it is necessary to check that the chains have converged. Stan provides the $\hat{R}$ statistic for the model parameters and log posterior. These are provided in the following figure. All values for $\hat{R}$ should be less than 1.1.

# Plot of convergence statistics
sim_summary <- as.data.frame(summary(sim_fit)[[1]])
sim_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(sim_summary)))
ggplot(sim_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

$Convergence statistics ($\hat{R}$) by parameter for the simulation. All values should be less than 1.1 to infer convergence.$

Convergence statistics ($\hat{R}$) by parameter for the simulation. All values should be less than 1.1 to infer convergence.

The Stan model is evaluated in terms of its ability to recover the generating values of the parameters. The R code below prepares a plot in which the points indicate the difference between the posterior means and generating values for the parameters of main interest. This difference is referred to as discrepancy. The lines indicate the 95% posterior intervals for the difference. Ideally, (nearly) all the 95% posterior intervals would include zero.

# Make vector of wanted parameter names
wanted_pars <- c(paste0("alpha[", 1:sim_data$I, "]"), paste0("beta[", 1:sim_data$I, 
    "]"), paste0("lambda[", 1:sim_data$K, "]"))

# Get estimated and generating values for wanted parameters
generating_values = c(alpha, beta, lambda)
estimated_values <- sim_summary[wanted_pars, c("mean", "2.5%", "97.5%")]

# Assesmble a data frame to pass to ggplot()
sim_df <- data.frame(parameter = factor(wanted_pars, rev(wanted_pars)), row.names = NULL)
sim_df$middle <- estimated_values[, "mean"] - generating_values
sim_df$lower <- estimated_values[, "2.5%"] - generating_values
sim_df$upper <- estimated_values[, "97.5%"] - generating_values

# Plot the discrepancy
ggplot(sim_df) + aes(x = parameter, y = middle, ymin = lower, ymax = upper) + 
    scale_x_discrete() + labs(y = "Discrepancy", x = NULL) + geom_abline(intercept = 0, 
    slope = 0, color = "white") + geom_linerange() + geom_point(size = 2) + 
    theme(panel.grid = element_blank()) + coord_flip()

Discrepancies between estimated and generating parameters for the simulation. Points indicate the difference between the posterior means and generating values for a parameter, and horizontal lines indicate 95% posterior intervals for the difference. Most of the discrepancies are about zero, indicating that Stan successfully recovers the true parameters.

3 Example application

The example data are from the The First International Mathematics Study (Husen and others 1967; Postlethwaite 1967). The data include information about student gender and country (Australia or Japan). For convenience, only a subset of the full data are used.

# Attach the example dataset. The TAM package is required.
data(data.fims.Aus.Jpn.scored, package = "TAM")

# Subset the full data
select <- floor(seq(from = 1, to = nrow(data.fims.Aus.Jpn.scored), length.out = 500))
subsetted_df <- data.fims.Aus.Jpn.scored[select, ]
str(subsetted_df)

## 'data.frame':    500 obs. of  16 variables:
##  $ SEX    : int  1 1 2 2 1 1 2 2 2 1 ...
##  $ M1PTI1 : num  1 1 1 0 0 1 1 1 1 1 ...
##  $ M1PTI2 : num  0 0 0 0 0 1 1 1 1 1 ...
##  $ M1PTI3 : num  1 1 1 0 1 1 0 1 1 0 ...
##  $ M1PTI6 : num  1 0 0 1 0 0 0 1 1 0 ...
##  $ M1PTI7 : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ M1PTI11: num  1 0 0 0 0 0 1 1 1 0 ...
##  $ M1PTI12: num  0 0 0 0 0 1 0 0 1 1 ...
##  $ M1PTI14: num  0 0 1 0 0 1 0 1 1 1 ...
##  $ M1PTI17: num  1 0 0 0 0 1 0 1 1 0 ...
##  $ M1PTI18: num  0 0 1 1 0 0 1 1 0 1 ...
##  $ M1PTI19: num  0 0 0 0 0 1 0 0 0 0 ...
##  $ M1PTI21: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ M1PTI22: num  0 0 0 0 0 1 0 0 0 0 ...
##  $ M1PTI23: num  1 1 1 1 0 1 1 1 0 0 ...
##  $ country: int  1 1 1 1 1 1 1 1 1 1 ...

The dataset is next divided into an item response matrix and a matrix of student covariates.

# Extract the response matrix
response_matrix <- as.matrix(subsetted_df[, grepl("^M1", names(subsetted_df))])
head(response_matrix)

##    M1PTI1 M1PTI2 M1PTI3 M1PTI6 M1PTI7 M1PTI11 M1PTI12 M1PTI14 M1PTI17
## 1       1      0      1      1      0       1       0       0       1
## 13      1      0      1      0      0       0       0       0       0
## 26      1      0      1      0      0       0       0       1       0
## 39      0      0      0      1      0       0       0       0       0
## 52      0      0      1      0      0       0       0       0       0
## 64      1      1      1      0      1       0       1       1       1
##    M1PTI18 M1PTI19 M1PTI21 M1PTI22 M1PTI23
## 1        0       0       0       0       1
## 13       0       0       0       0       1
## 26       1       0       0       0       1
## 39       1       0       0       0       1
## 52       0       0       0       0       0
## 64       0       1       0       1       1

# Assemble a matrix of person covariates
male <- as.numeric(subsetted_df$SEX == 2)
japan <- as.numeric(subsetted_df$country == 2)
W = cbind(intercept = 1, male = male, japan = japan, interact = male * japan)
head(W)

##      intercept male japan interact
## [1,]         1    0     0        0
## [2,]         1    0     0        0
## [3,]         1    1     0        0
## [4,]         1    1     0        0
## [5,]         1    0     0        0
## [6,]         1    0     0        0

500 students responded to 4 dichotomously scored items. The data contain no missing values. The two matrices are converted to a list suitable for the Stan model.

# Assemble data list and fit model
ex_list <- list(I = ncol(response_matrix), J = nrow(response_matrix), N = length(response_matrix), 
    ii = rep(1:ncol(response_matrix), each = nrow(response_matrix)), jj = rep(1:nrow(response_matrix), 
        times = ncol(response_matrix)), y = as.vector(response_matrix), K = ncol(W), 
    W = W)
ex_fit <- stan(file = "2pl_latent_reg.stan", data = ex_list, chains = 4, iter = 500)

As discussed above, convergence of the chains is assessed for every parameter, and also the log posterior density, using $\hat{R}$.

# Plot of convergence statistics
ex_summary <- as.data.frame(summary(ex_fit)[[1]])
ex_summary$Parameter <- as.factor(gsub("\\[.*]", "", rownames(ex_summary)))
ggplot(ex_summary) + aes(x = Parameter, y = Rhat, color = Parameter) + geom_jitter(height = 0, 
    width = 0.5, show.legend = FALSE) + ylab(expression(hat(italic(R))))

$Convergence statistics ($\hat{R}$) by parameter for the example. All values should be less than 1.1 to infer convergence.$

Convergence statistics ($\hat{R}$) by parameter for the example. All values should be less than 1.1 to infer convergence.

Next we view summaries of the parameter posteriors.

# View table of parameter posteriors
print(ex_fit, pars = c("alpha", "beta", "lambda"))

## Inference for Stan model: 2pl_latent_reg.
## 4 chains, each with iter=500; warmup=250; thin=1; 
## post-warmup draws per chain=250, total post-warmup draws=1000.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## alpha[1]   0.69    0.01 0.14  0.44  0.60  0.69  0.78  0.97   399 1.01
## alpha[2]   1.44    0.01 0.22  1.04  1.29  1.42  1.57  1.93   691 1.00
## alpha[3]   1.14    0.01 0.20  0.77  1.00  1.13  1.27  1.56   597 1.00
## alpha[4]   1.20    0.01 0.17  0.89  1.09  1.19  1.31  1.55   749 1.01
## alpha[5]   1.81    0.01 0.27  1.36  1.62  1.79  1.97  2.43   482 1.00
## alpha[6]   1.39    0.01 0.22  0.98  1.23  1.37  1.54  1.84   495 1.00
## alpha[7]   0.45    0.00 0.10  0.26  0.38  0.45  0.52  0.65   574 1.01
## alpha[8]   0.27    0.00 0.09  0.10  0.21  0.26  0.33  0.46   359 1.00
## alpha[9]   1.18    0.01 0.17  0.88  1.07  1.17  1.28  1.52   627 1.00
## alpha[10]  0.63    0.00 0.11  0.43  0.56  0.63  0.71  0.85   690 1.00
## alpha[11]  2.17    0.02 0.35  1.58  1.94  2.13  2.38  2.96   427 1.00
## alpha[12]  0.19    0.00 0.05  0.10  0.15  0.19  0.22  0.31   204 1.02
## alpha[13]  1.28    0.01 0.18  0.95  1.14  1.28  1.40  1.66   498 1.00
## alpha[14]  1.37    0.01 0.20  1.01  1.23  1.35  1.49  1.80   783 1.00
## beta[1]   -2.67    0.03 0.45 -3.71 -2.89 -2.58 -2.35 -2.01   269 1.02
## beta[2]   -1.61    0.02 0.21 -2.07 -1.73 -1.58 -1.47 -1.27   146 1.02
## beta[3]   -2.38    0.02 0.30 -3.05 -2.54 -2.34 -2.17 -1.88   237 1.02
## beta[4]   -0.82    0.02 0.19 -1.29 -0.92 -0.80 -0.69 -0.51   139 1.03
## beta[5]    0.99    0.01 0.20  0.54  0.87  1.01  1.13  1.32   185 1.01
## beta[6]   -1.77    0.02 0.23 -2.25 -1.90 -1.75 -1.61 -1.39   160 1.02
## beta[7]    1.34    0.02 0.45  0.59  1.05  1.29  1.59  2.41   428 1.00
## beta[8]    1.04    0.13 1.01  0.04  0.55  0.88  1.27  2.73    64 1.04
## beta[9]    0.59    0.01 0.20  0.13  0.49  0.60  0.73  0.95   211 1.01
## beta[10]  -1.12    0.02 0.25 -1.70 -1.26 -1.10 -0.95 -0.69   150 1.03
## beta[11]   0.72    0.01 0.19  0.27  0.62  0.73  0.84  1.04   159 1.01
## beta[12]   5.90    0.13 1.75  3.36  4.67  5.61  6.78 10.19   184 1.03
## beta[13]   1.18    0.02 0.23  0.69  1.04  1.18  1.33  1.62   210 1.02
## beta[14]  -1.39    0.02 0.20 -1.86 -1.51 -1.37 -1.26 -1.05   155 1.02
## lambda[1] -0.80    0.02 0.20 -1.26 -0.90 -0.78 -0.66 -0.48   100 1.04
## lambda[2]  0.07    0.01 0.13 -0.20 -0.02  0.07  0.16  0.33   241 1.01
## lambda[3]  1.22    0.01 0.17  0.88  1.10  1.22  1.34  1.54   223 1.01
## lambda[4] -0.32    0.02 0.23 -0.78 -0.47 -0.32 -0.16  0.11   226 1.01
## 
## Samples were drawn using NUTS(diag_e) at Tue Apr 19 15:43:32 2016.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

A 2PL without the latent regression could be fit by changing the person covariate matrix to include only an intercept term. Shown below is how this may be done for the example data.

# Fit the example data without latent regression
noreg_list <- ex_list
noreg_list$W <- matrix(1, nrow = ex_list$J, ncol = 1)
noreg_list$K <- 1
noreg_fit <- stan(file = "2pl_latent_reg.stan", data = noreg_list, chains = 4, 
    iter = 500)

4 References

Husen, Torsten, and others. 1967. “International Study of Achievement in Mathematics, a Comparison of Twelve Countries, Volume I.” ERIC.

Postlethwaite, Neville. 1967. School Organization and Student Achievement. Stockholm: Almqvist & Wiksell.

Swaminathan, Hariharan, and Janice A Gifford. 1985. “Bayesian Estimation in the Two-Parameter Logistic Model.” Psychometrika 50 (3). Springer: 349–64.

Two-parameter logistic model with latent regression

Daniel C. Furr

April 19, 2016