IRT models - publicifsv.sund.ku.dk

40
IRT models Karl Bang Christensen Dept. of Biostatistics Univ. of Copenhagen http://publicifsv.sund.ku.dk/ ~ kach/scaleval_IRT/ Karl Bang Christensen IRT models

Transcript of IRT models - publicifsv.sund.ku.dk

Page 1: IRT models - publicifsv.sund.ku.dk

IRT models

Karl Bang Christensen

Dept. of BiostatisticsUniv. of Copenhagen

http://publicifsv.sund.ku.dk/~kach/scaleval_IRT/

Karl Bang Christensen IRT models

Page 2: IRT models - publicifsv.sund.ku.dk

IRT

Item response theory (IRT) is a collection of modelingtechniques for the analysis of items, tests, and persons.

An IRT model for dichotomous item specifies that theprobability of response pattern xv = (xvi )i=1,...,I is

P(xv ) =

∫ I∏i=1

(Pi (θ)xvi (1− Pi (θ))(1−xvi )

)φ(θ)dθ

where is Pi (θ) is the probability of a correct response on itemi as a function of the trait θ, and φ is the populationdistribution for θ.

Many (parametric) choices for Pi

Generalized to ordinal responses Xi (partial credit).

Karl Bang Christensen IRT models

Page 3: IRT models - publicifsv.sund.ku.dk

Example: educational testing

dichotomous (correct/incorrect) items i = 1, . . . , I

Pi (θ) = Gi (θ)/(1 + Gi (θ))

Gi (θ) = exp(αi (θ − βi )) [2PL]

Maximize L =∏

v P(xv ) to get estimates of α and β

Maximize Pα̂,β̂(xv ) to get estimate of person v ’s ability θ̂v

Gi (θ) = exp(θ − βi ) [1PL]: (number-correct) score R =∑

i xiis sufficient for θ.

Pi (θ) = Φ(αi (θ − βi ))

Do we trust the estimate (measurement of) θ̂v : evaluategoodness-of-fit

Karl Bang Christensen IRT models

Page 4: IRT models - publicifsv.sund.ku.dk

Example: educational testing

Karl Bang Christensen IRT models

Page 5: IRT models - publicifsv.sund.ku.dk

Example: educational testing

Karl Bang Christensen IRT models

Page 6: IRT models - publicifsv.sund.ku.dk

Dichotomous IRT models: Probabilities for item i

1PL

P(Xvi = 1|θv = θ, βi ) =exp(θ − βi )

1 + exp(θ − βi )

person location θ, location of item i .1

2PL

P(Xvi = 1|θv = θ, βi ) =exp(αi (θ − βi ))

1 + exp(αi (θ − βi ))

person location θ location and scale of item i .2

1Rasch, G. (1960). Probabilistic models for some intelligence andattainment tests. Danish National Institute for Educational Research,Copenhagen

2Birnbaum, Some latent trait models and their use in inferring anexaminee’s ability. In Lord & Novick (Eds.), Statistical theories of mental testscores. Reading, MA: Addison-Wesley, 1968 [Three technical reports dated1957-8, Randolph Air Force Base, Tx: USAF School of Aviation Medicine].

Karl Bang Christensen IRT models

Page 7: IRT models - publicifsv.sund.ku.dk

Item response functions - Rasch [1PL] model

relates person locations to probability of correct answer

Xvi = 1 indicates higher level of the construct we are measuring (acorrect answer).

Karl Bang Christensen IRT models

Page 8: IRT models - publicifsv.sund.ku.dk

Item and person locations

IRT model gives us estimates β̂i of the item locations (the itemdifficulties) and of each persons location θ̂v .

Karl Bang Christensen IRT models

Page 9: IRT models - publicifsv.sund.ku.dk

Item information functions ...

... the accuracy with which we can estimate person locations.

Any item provides some information about this, but theamount of information depends on targeting.

Rasch [1PL] model

Ii (θ) = P(Xvi = 1|θv = θ)(1− P(Xvi = 1|θv = θ))

Karl Bang Christensen IRT models

Page 10: IRT models - publicifsv.sund.ku.dk

Item information functions

Ii (θ) = P(Xvi = 1|θv = θ)(1− P(Xvi = 1|θv = θ))

Karl Bang Christensen IRT models

Page 11: IRT models - publicifsv.sund.ku.dk

More item gives us more information

Example, probabilities for item i = 1, . . . , 5

P(Xvi = 1|θv = θ, βi ) =exp(θ − βi )

1 + exp(θ − βi )

parameters β1, . . . , β5

Karl Bang Christensen IRT models

Page 12: IRT models - publicifsv.sund.ku.dk

Test information function

given by

I (θ) =∑i

Ii (θ)

Karl Bang Christensen IRT models

Page 13: IRT models - publicifsv.sund.ku.dk

Precision

We use IRT models because we want to estimate the personlocation θ, or to measure θ,

but how much measurement error is there. Variance of theestimate θ̂ is

V (θ̂) =1

I (θ̂)

the standard error of measurement

SEM(θ̂) =

√1

I (θ̂)

is expressed in the same units as the measurement itself.

confidence interval around the estimate.

Karl Bang Christensen IRT models

Page 14: IRT models - publicifsv.sund.ku.dk

Standard Error of Measurement

Karl Bang Christensen IRT models

Page 15: IRT models - publicifsv.sund.ku.dk

1PL (Rasch) model and 2PL model

Use (θ − βi ) or αi (θ − βi ) in the model

not homogenous: order of 2PL items not the same for all persons.

Karl Bang Christensen IRT models

Page 16: IRT models - publicifsv.sund.ku.dk

Item response functions, item and person locations [2PL]

2PL model gives us estimates (α̂i , β̂i ) of the item locationparameters and the item scale parameters and also of each personslocation θ̂.

2PL models have crossing item response functions. Scaleparameters αi can be interpreted as discrimination parameters.

Karl Bang Christensen IRT models

Page 17: IRT models - publicifsv.sund.ku.dk

Item information functions [2PL] model

Formula

Ii (θ) = α2i Pi (Xi = 1|θ)(1− Pi (Xi = 1|θ))

Karl Bang Christensen IRT models

Page 18: IRT models - publicifsv.sund.ku.dk

Item and test information functions [2PL] model

Karl Bang Christensen IRT models

Page 19: IRT models - publicifsv.sund.ku.dk

Evaluating goodness-of-fit of individual items

Not straightforward because θ is latent

1PL (Rasch): sufficiency ⇒ predictions can be directly comparedwith observed data for each score group.

All other IRT models: predictions cannot be directly comparedwith observed data.

estimate θsort examinees by their θ estimates,form subgroupscalculate proportion in each subgroup who answered item icorrectlycompare ’observed’ proportions with those predicted by themodel (’χ2-like statistic’ and/or graphical)

Karl Bang Christensen IRT models

Page 20: IRT models - publicifsv.sund.ku.dk

Evaluating goodness-of-fit of individual items (ii)

Rasch model

Ri = 0 Ri = 1 · · ·xi = 0

xi = 1

Birnbaum model

θ̂ ∈ A0 θ̂ ∈ A1 · · ·xi = 0

xi = 1

Restscore Ri =∑

k 6=i xi . (−∞,∞) =⋃Ak (equal size).

Karl Bang Christensen IRT models

Page 21: IRT models - publicifsv.sund.ku.dk

Evaluating goodness-of-fit of individual items (iii)

Item by rest score table can by used to evaluate fit in theBirnbaum model using recursive algorithm3.

This can also be generalized to ordinal items

3Orlando, Thissen, Applied Psychological Measurement, 24, 50-64, 2000Karl Bang Christensen IRT models

Page 22: IRT models - publicifsv.sund.ku.dk

Simulation

Data set (Xiv ) with observed responses to items i = 1, . . . , I frompersons v = 1, . . . ,N. Missing responses not uncommon. Computeobserved test statistic T (obs)

estimate α and β

get estimates of (θv )v=1,...,N

simulate N values of the latent variable the empiricaldistribution of (θ̂v )v=1,...,N .

simulate data matrix (X(l)iv ) from Pαi ,βi (Xiv = .|θ̂v )

compute test statistic T (l) in l ’th simulated data set

compare T (obs) to empirical distribution (T (l))l=1,2,...

Karl Bang Christensen IRT models

Page 23: IRT models - publicifsv.sund.ku.dk

Simulation

Data set (Xiv ) with observed responses to items i = 1, . . . , I frompersons v = 1, . . . ,N. Missing responses not uncommon. Computeobserved test statistic T (obs)

compute total score Rv =∑

i Xvi for each person

collaps persons into score groups

plot item means

compare to simulated plots

Karl Bang Christensen IRT models

Page 24: IRT models - publicifsv.sund.ku.dk

PRO: Data example

Example: I am very happy         I am sad

I never cough I cough all the time         

I have no phlegm (mucus) inmy chest at all

My chest is full of phlegm(mucus)

         

My chest feels very tightMy chest does not feel tightat all

         

When I walk up a hill or oneflight of stairs I am notbreathless

When I walk up a hill or oneflight of stairs I am verybreathless         

I am not limited doing anyactivities at home

I am very limited doingactivities at home

         

I am confident leaving myhome despite my lungcondition

I am not at all confidentleaving my home because ofmy lung condition         

I sleep soundlyI don't sleep soundly becauseof my lung condition

         

I have lots of energy I have no energy at all         

Name: Today's Date:

.

How is your COPD? Take the COPD Assessment Test (CAT)This questionnaire will help you and your healthcare professional measure the impact COPD (Chronic Obstructive Pulmonary Disease) is havingon your wellbeing and daily life. Your answers and test score, can be used by you and your healthcare professional to help improve themanagement of your COPD and get the greatest benefit from treatment.

SCORE     

COPD Assessment Test and CAT logo is a trade mark of theGlaxoSmithKline group of companies.©2009 GlaxoSmithKline group of companies. All rights reserved.

http://www.catestonline.org/english/index_Danish.htm

http://www.catestonline.org/english/indexEN.htm

Karl Bang Christensen IRT models

Page 25: IRT models - publicifsv.sund.ku.dk

Marginal distribution

floor and ceiling effects (item trykken ∼ 42% lowest category)

Karl Bang Christensen IRT models

Page 26: IRT models - publicifsv.sund.ku.dk

GOF

Karl Bang Christensen IRT models

Page 27: IRT models - publicifsv.sund.ku.dk

GOF

Karl Bang Christensen IRT models

Page 28: IRT models - publicifsv.sund.ku.dk

GOF

Karl Bang Christensen IRT models

Page 29: IRT models - publicifsv.sund.ku.dk

GOF

Karl Bang Christensen IRT models

Page 30: IRT models - publicifsv.sund.ku.dk

GOF

Observed item-total correlation and 95% reference interval fromsimulated data sets.

Item obs. 95% REF

hoster 0.51 0.42-0.57slim 0.58 0.50-0.63trykken 0.45 0.35-0.50etage 0.50 0.46-0.59hjemme 0.60 0.55-0.67tryg 0.44 0.38-0.53sover 0.47 0.40-0.54energi 0.61 0.55-0.67

Karl Bang Christensen IRT models

Page 31: IRT models - publicifsv.sund.ku.dk

Choice of model

All models can be written in the form

P(Xvi = x |θv ). (1)

and can be interpreted using the probabilities

θ 7→ P(Xvi = x |θv = θ) (x = 0, 1, . . . ,m). (2)

(m = 3: four possible ratings)

Karl Bang Christensen IRT models

Page 32: IRT models - publicifsv.sund.ku.dk

Choice of model

IRT models can be classified into ’difference models’ and’divide-by-total models’4

P(Xvi = x) =

P(X ≥ x |θ)− P(X ≥ x + 1|θ)

f (x , θ)/∑mi

k=0 f (k, θ)

This has to do with the underlying process, but it can be hard todistinguish between these models empirically. The interpretation interms of the classification probabilities is common to all models.

4Thissen, Steinberg. Psychometrika, vol. 51, 567-577, 1986.4Maydeu-Olivares et al. Applied Psychological Measurement, vol. 18,

245-256, 1994.Karl Bang Christensen IRT models

Page 33: IRT models - publicifsv.sund.ku.dk

Nominal Categories Model

Most general polytomous IRT model is the nominal categoriesmodel5 [not discussed in this course].

5Bock, Psychometrika, vol. 37, 29-51, 1972Karl Bang Christensen IRT models

Page 34: IRT models - publicifsv.sund.ku.dk

Graded Response Model6

specifically designed for ordinal manifest variables, first discussedby and mainly used in cases where the assumption of ordinal levelsof response options is plausible. The model is a ’difference model’and is defined by the cumulative probabilities

P(Xvi ≥ x |θv = θ) =exp(αi (θ − βix))

1 + exp(αi (θ − βix))

or equivalently

P(Xvi ≥ x |θv = θ) =1

1 + exp(−αi (θ − βix)).

6Samejima, Psychometrika Monograph Supplement, 34, 100-114, 1969Karl Bang Christensen IRT models

Page 35: IRT models - publicifsv.sund.ku.dk

Graded Response Model

This yields

P(Xvi = x |θv = θ) =exp(αi (θ − βix))

1 + exp(αi (θ − βix))−

exp(αi (θ − βi ,x+1))

1 + exp(αi (θ − βi ,x+1)).

or equivalently

P(Xvi = x |θv = θ) =1

1 + exp(−αi (θ − βix))− 1

1 + exp(−αi (θ − βi ,x+1)).

Item parameters η = (ηix)i=1,...,I ;x=1,...,mican’t be identified

visually from plots of the classification probabilities.

Karl Bang Christensen IRT models

Page 36: IRT models - publicifsv.sund.ku.dk

Graded Response Model

Comparison of item categories in the graded response model is

{0} vs. {1, . . . ,mi},{0, 1} vs. {2, . . . ,mi},...,

{0, . . . ,mi − 1} vs. {mi}.

The shortest formulation of the model is in terms of the log odds

log

(P(Xvi ≥ x |θv = θ)

1− P(Xvi ≥ x |θv = θ)

)= αi (θ − βix).

Karl Bang Christensen IRT models

Page 37: IRT models - publicifsv.sund.ku.dk

Generalized Partial Credit Model (GPCM)7

A ’divide-by-total model’ defined as follows

P(Xvi = x |θv = θ) =exp (

∑xl=0 αi (θ − βik))∑mi

k=0 exp(∑k

l=0 αi (θ − βik))

proposed as a generalization of the partial credit model (in itself ageneralization of the Rasch model).

7Muraki, Applied Psychological Measurement, vol. 16, 159-176, 1992Karl Bang Christensen IRT models

Page 38: IRT models - publicifsv.sund.ku.dk

Steps

Parametric IRT analysis has four stages

(i) estimate item parameters (α1, β1), . . . , (αI , βI )

(ii) estimate person locations θ1, . . . , θN

(iii) evaluate item/model fit

(iv) evaluate targeting

Implementation SAS macros8

%lirt mml.sas, %lirt ppar.sas, %lirt icc.sas, ..

Joint estimation of α’s, β′s, and θ’s yield inconsistent estimates9

8Olsbjerg, Christensen, LIRT: SAS macros for longitudinal IRT models,Research report 3, Dept. of Biostatistics, Univ., of Copenhagen, 2014

9Neyman, Scott. Econometrika, 1948, 16:1–32.Karl Bang Christensen IRT models

Page 39: IRT models - publicifsv.sund.ku.dk

(i) estimate item parameters (α1, β1), . . . , (αk , βk)

Usual approach: assume θ ∼ N(0, σ2) together with localindependence this yields the likelihood function

Lv (α, β, σ2) =

∫ I∏i=1

exp(xviαi (θ − βi ))

1 + exp(αi (θ − βi ))ϕ(θ)dθ

that can be maximized numerically. We get estimates andasymptotic s.e. If the distributional assumption is OK estimatesare unbiased. Implementation SAS macro %lirt mml.sas

Karl Bang Christensen IRT models

Page 40: IRT models - publicifsv.sund.ku.dk

(ii) estimate person locations θ1, . . . , θN

This is done using estimates of (α1, β1), . . . , (αI , βI ). For a personwith responses xv = (xv1, . . . , xvI )

L(θv ) =I∏

i=1

exp(xvi α̂i (θv − β̂i ))

1 + exp(α̂i (θv − β̂i ))

easy to maximize. ’Asymptotic’ s.e., biased estimates. Weightedlikelihood10

L(θ) =k∏

i=1

exp(xi α̂i (θ − β̂i ))

1 + exp(α̂i (θ − β̂i ))g(θ)

has smaller bias. Implementation SAS macro %lirt ppar.sas

10Warm. Psychometrika, 1989, 54:427–450.Karl Bang Christensen IRT models