IRT models - publicifsv.sund.ku.dk
Embed Size (px)
Transcript of IRT models - publicifsv.sund.ku.dk

IRT models
Karl Bang Christensen
Dept. of BiostatisticsUniv. of Copenhagen
http://publicifsv.sund.ku.dk/~kach/scaleval_IRT/
Karl Bang Christensen IRT models

IRT
Item response theory (IRT) is a collection of modelingtechniques for the analysis of items, tests, and persons.
An IRT model for dichotomous item specifies that theprobability of response pattern xv = (xvi )i=1,...,I is
P(xv ) =
∫ I∏i=1
(Pi (θ)xvi (1− Pi (θ))(1−xvi )
)φ(θ)dθ
where is Pi (θ) is the probability of a correct response on itemi as a function of the trait θ, and φ is the populationdistribution for θ.
Many (parametric) choices for Pi
Generalized to ordinal responses Xi (partial credit).
Karl Bang Christensen IRT models

Example: educational testing
dichotomous (correct/incorrect) items i = 1, . . . , I
Pi (θ) = Gi (θ)/(1 + Gi (θ))
Gi (θ) = exp(αi (θ − βi )) [2PL]
Maximize L =∏
v P(xv ) to get estimates of α and β
Maximize Pα̂,β̂(xv ) to get estimate of person v ’s ability θ̂v
Gi (θ) = exp(θ − βi ) [1PL]: (number-correct) score R =∑
i xiis sufficient for θ.
Pi (θ) = Φ(αi (θ − βi ))
Do we trust the estimate (measurement of) θ̂v : evaluategoodness-of-fit
Karl Bang Christensen IRT models

Example: educational testing
Karl Bang Christensen IRT models

Example: educational testing
Karl Bang Christensen IRT models

Dichotomous IRT models: Probabilities for item i
1PL
P(Xvi = 1|θv = θ, βi ) =exp(θ − βi )
1 + exp(θ − βi )
person location θ, location of item i .1
2PL
P(Xvi = 1|θv = θ, βi ) =exp(αi (θ − βi ))
1 + exp(αi (θ − βi ))
person location θ location and scale of item i .2
1Rasch, G. (1960). Probabilistic models for some intelligence andattainment tests. Danish National Institute for Educational Research,Copenhagen
2Birnbaum, Some latent trait models and their use in inferring anexaminee’s ability. In Lord & Novick (Eds.), Statistical theories of mental testscores. Reading, MA: Addison-Wesley, 1968 [Three technical reports dated1957-8, Randolph Air Force Base, Tx: USAF School of Aviation Medicine].
Karl Bang Christensen IRT models

Item response functions - Rasch [1PL] model
relates person locations to probability of correct answer
Xvi = 1 indicates higher level of the construct we are measuring (acorrect answer).
Karl Bang Christensen IRT models

Item and person locations
IRT model gives us estimates β̂i of the item locations (the itemdifficulties) and of each persons location θ̂v .
Karl Bang Christensen IRT models

Item information functions ...
... the accuracy with which we can estimate person locations.
Any item provides some information about this, but theamount of information depends on targeting.
Rasch [1PL] model
Ii (θ) = P(Xvi = 1|θv = θ)(1− P(Xvi = 1|θv = θ))
Karl Bang Christensen IRT models

Item information functions
Ii (θ) = P(Xvi = 1|θv = θ)(1− P(Xvi = 1|θv = θ))
Karl Bang Christensen IRT models

More item gives us more information
Example, probabilities for item i = 1, . . . , 5
P(Xvi = 1|θv = θ, βi ) =exp(θ − βi )
1 + exp(θ − βi )
parameters β1, . . . , β5
Karl Bang Christensen IRT models

Test information function
given by
I (θ) =∑i
Ii (θ)
Karl Bang Christensen IRT models

Precision
We use IRT models because we want to estimate the personlocation θ, or to measure θ,
but how much measurement error is there. Variance of theestimate θ̂ is
V (θ̂) =1
I (θ̂)
the standard error of measurement
SEM(θ̂) =
√1
I (θ̂)
is expressed in the same units as the measurement itself.
confidence interval around the estimate.
Karl Bang Christensen IRT models

Standard Error of Measurement
Karl Bang Christensen IRT models

1PL (Rasch) model and 2PL model
Use (θ − βi ) or αi (θ − βi ) in the model
not homogenous: order of 2PL items not the same for all persons.
Karl Bang Christensen IRT models

Item response functions, item and person locations [2PL]
2PL model gives us estimates (α̂i , β̂i ) of the item locationparameters and the item scale parameters and also of each personslocation θ̂.
2PL models have crossing item response functions. Scaleparameters αi can be interpreted as discrimination parameters.
Karl Bang Christensen IRT models

Item information functions [2PL] model
Formula
Ii (θ) = α2i Pi (Xi = 1|θ)(1− Pi (Xi = 1|θ))
Karl Bang Christensen IRT models

Item and test information functions [2PL] model
Karl Bang Christensen IRT models

Evaluating goodness-of-fit of individual items
Not straightforward because θ is latent
1PL (Rasch): sufficiency ⇒ predictions can be directly comparedwith observed data for each score group.
All other IRT models: predictions cannot be directly comparedwith observed data.
estimate θsort examinees by their θ estimates,form subgroupscalculate proportion in each subgroup who answered item icorrectlycompare ’observed’ proportions with those predicted by themodel (’χ2-like statistic’ and/or graphical)
Karl Bang Christensen IRT models

Evaluating goodness-of-fit of individual items (ii)
Rasch model
Ri = 0 Ri = 1 · · ·xi = 0
xi = 1
Birnbaum model
θ̂ ∈ A0 θ̂ ∈ A1 · · ·xi = 0
xi = 1
Restscore Ri =∑
k 6=i xi . (−∞,∞) =⋃Ak (equal size).
Karl Bang Christensen IRT models

Evaluating goodness-of-fit of individual items (iii)
Item by rest score table can by used to evaluate fit in theBirnbaum model using recursive algorithm3.
This can also be generalized to ordinal items
3Orlando, Thissen, Applied Psychological Measurement, 24, 50-64, 2000Karl Bang Christensen IRT models

Simulation
Data set (Xiv ) with observed responses to items i = 1, . . . , I frompersons v = 1, . . . ,N. Missing responses not uncommon. Computeobserved test statistic T (obs)
estimate α and β
get estimates of (θv )v=1,...,N
simulate N values of the latent variable the empiricaldistribution of (θ̂v )v=1,...,N .
simulate data matrix (X(l)iv ) from Pαi ,βi (Xiv = .|θ̂v )
compute test statistic T (l) in l ’th simulated data set
compare T (obs) to empirical distribution (T (l))l=1,2,...
Karl Bang Christensen IRT models

Simulation
Data set (Xiv ) with observed responses to items i = 1, . . . , I frompersons v = 1, . . . ,N. Missing responses not uncommon. Computeobserved test statistic T (obs)
compute total score Rv =∑
i Xvi for each person
collaps persons into score groups
plot item means
compare to simulated plots
Karl Bang Christensen IRT models

PRO: Data example
Example: I am very happy I am sad
I never cough I cough all the time
I have no phlegm (mucus) inmy chest at all
My chest is full of phlegm(mucus)
My chest feels very tightMy chest does not feel tightat all
When I walk up a hill or oneflight of stairs I am notbreathless
When I walk up a hill or oneflight of stairs I am verybreathless
I am not limited doing anyactivities at home
I am very limited doingactivities at home
I am confident leaving myhome despite my lungcondition
I am not at all confidentleaving my home because ofmy lung condition
I sleep soundlyI don't sleep soundly becauseof my lung condition
I have lots of energy I have no energy at all
Name: Today's Date:
.
How is your COPD? Take the COPD Assessment Test (CAT)This questionnaire will help you and your healthcare professional measure the impact COPD (Chronic Obstructive Pulmonary Disease) is havingon your wellbeing and daily life. Your answers and test score, can be used by you and your healthcare professional to help improve themanagement of your COPD and get the greatest benefit from treatment.
SCORE
COPD Assessment Test and CAT logo is a trade mark of theGlaxoSmithKline group of companies.©2009 GlaxoSmithKline group of companies. All rights reserved.
http://www.catestonline.org/english/index_Danish.htm
http://www.catestonline.org/english/indexEN.htm
Karl Bang Christensen IRT models

Marginal distribution
floor and ceiling effects (item trykken ∼ 42% lowest category)
Karl Bang Christensen IRT models

GOF
Karl Bang Christensen IRT models

GOF
Karl Bang Christensen IRT models

GOF
Karl Bang Christensen IRT models

GOF
Karl Bang Christensen IRT models

GOF
Observed item-total correlation and 95% reference interval fromsimulated data sets.
Item obs. 95% REF
hoster 0.51 0.42-0.57slim 0.58 0.50-0.63trykken 0.45 0.35-0.50etage 0.50 0.46-0.59hjemme 0.60 0.55-0.67tryg 0.44 0.38-0.53sover 0.47 0.40-0.54energi 0.61 0.55-0.67
Karl Bang Christensen IRT models

Choice of model
All models can be written in the form
P(Xvi = x |θv ). (1)
and can be interpreted using the probabilities
θ 7→ P(Xvi = x |θv = θ) (x = 0, 1, . . . ,m). (2)
(m = 3: four possible ratings)
Karl Bang Christensen IRT models

Choice of model
IRT models can be classified into ’difference models’ and’divide-by-total models’4
P(Xvi = x) =
P(X ≥ x |θ)− P(X ≥ x + 1|θ)
f (x , θ)/∑mi
k=0 f (k, θ)
This has to do with the underlying process, but it can be hard todistinguish between these models empirically. The interpretation interms of the classification probabilities is common to all models.
4Thissen, Steinberg. Psychometrika, vol. 51, 567-577, 1986.4Maydeu-Olivares et al. Applied Psychological Measurement, vol. 18,
245-256, 1994.Karl Bang Christensen IRT models

Nominal Categories Model
Most general polytomous IRT model is the nominal categoriesmodel5 [not discussed in this course].
5Bock, Psychometrika, vol. 37, 29-51, 1972Karl Bang Christensen IRT models

Graded Response Model6
specifically designed for ordinal manifest variables, first discussedby and mainly used in cases where the assumption of ordinal levelsof response options is plausible. The model is a ’difference model’and is defined by the cumulative probabilities
P(Xvi ≥ x |θv = θ) =exp(αi (θ − βix))
1 + exp(αi (θ − βix))
or equivalently
P(Xvi ≥ x |θv = θ) =1
1 + exp(−αi (θ − βix)).
6Samejima, Psychometrika Monograph Supplement, 34, 100-114, 1969Karl Bang Christensen IRT models

Graded Response Model
This yields
P(Xvi = x |θv = θ) =exp(αi (θ − βix))
1 + exp(αi (θ − βix))−
exp(αi (θ − βi ,x+1))
1 + exp(αi (θ − βi ,x+1)).
or equivalently
P(Xvi = x |θv = θ) =1
1 + exp(−αi (θ − βix))− 1
1 + exp(−αi (θ − βi ,x+1)).
Item parameters η = (ηix)i=1,...,I ;x=1,...,mican’t be identified
visually from plots of the classification probabilities.
Karl Bang Christensen IRT models

Graded Response Model
Comparison of item categories in the graded response model is
{0} vs. {1, . . . ,mi},{0, 1} vs. {2, . . . ,mi},...,
{0, . . . ,mi − 1} vs. {mi}.
The shortest formulation of the model is in terms of the log odds
log
(P(Xvi ≥ x |θv = θ)
1− P(Xvi ≥ x |θv = θ)
)= αi (θ − βix).
Karl Bang Christensen IRT models

Generalized Partial Credit Model (GPCM)7
A ’divide-by-total model’ defined as follows
P(Xvi = x |θv = θ) =exp (
∑xl=0 αi (θ − βik))∑mi
k=0 exp(∑k
l=0 αi (θ − βik))
proposed as a generalization of the partial credit model (in itself ageneralization of the Rasch model).
7Muraki, Applied Psychological Measurement, vol. 16, 159-176, 1992Karl Bang Christensen IRT models

Steps
Parametric IRT analysis has four stages
(i) estimate item parameters (α1, β1), . . . , (αI , βI )
(ii) estimate person locations θ1, . . . , θN
(iii) evaluate item/model fit
(iv) evaluate targeting
Implementation SAS macros8
%lirt mml.sas, %lirt ppar.sas, %lirt icc.sas, ..
Joint estimation of α’s, β′s, and θ’s yield inconsistent estimates9
8Olsbjerg, Christensen, LIRT: SAS macros for longitudinal IRT models,Research report 3, Dept. of Biostatistics, Univ., of Copenhagen, 2014
9Neyman, Scott. Econometrika, 1948, 16:1–32.Karl Bang Christensen IRT models

(i) estimate item parameters (α1, β1), . . . , (αk , βk)
Usual approach: assume θ ∼ N(0, σ2) together with localindependence this yields the likelihood function
Lv (α, β, σ2) =
∫ I∏i=1
exp(xviαi (θ − βi ))
1 + exp(αi (θ − βi ))ϕ(θ)dθ
that can be maximized numerically. We get estimates andasymptotic s.e. If the distributional assumption is OK estimatesare unbiased. Implementation SAS macro %lirt mml.sas
Karl Bang Christensen IRT models

(ii) estimate person locations θ1, . . . , θN
This is done using estimates of (α1, β1), . . . , (αI , βI ). For a personwith responses xv = (xv1, . . . , xvI )
L(θv ) =I∏
i=1
exp(xvi α̂i (θv − β̂i ))
1 + exp(α̂i (θv − β̂i ))
easy to maximize. ’Asymptotic’ s.e., biased estimates. Weightedlikelihood10
L(θ) =k∏
i=1
exp(xi α̂i (θ − β̂i ))
1 + exp(α̂i (θ − β̂i ))g(θ)
has smaller bias. Implementation SAS macro %lirt ppar.sas
10Warm. Psychometrika, 1989, 54:427–450.Karl Bang Christensen IRT models