Karl Bang Christensen Dept. of Biostatistics Univ. of ...

35
Rasch models Karl Bang Christensen Dept. of Biostatistics Univ. of Copenhagen http://publicifsv.sund.ku.dk/ ~ kach/scaleval_IRT Karl Bang Christensen Rasch models

Transcript of Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Page 1: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Rasch models

Karl Bang Christensen

Dept. of BiostatisticsUniv. of Copenhagen

http://publicifsv.sund.ku.dk/~kach/scaleval_IRT

Karl Bang Christensen Rasch models

Page 2: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

IRT model for dichotomous item

specifies that the probability of response pattern xv = (xvi )i=1,...,I

is

P(xv ) =

∫ I∏i=1

(Pi (θ)xvi (1− Pi (θ))(1−xvi )

)φ(θ)dθ

where is Pi (θ) is the probability of a correct response on item i asa function of the trait θ, and φ is the population distribution for θ.

Last week

Pi (θ) 2PL, φ(θ) standard normal distribution (mean=zero, SD=1).

This week

Pi (θ) 1PL, no assumption distributional assumptions are needed.

Karl Bang Christensen Rasch models

Page 3: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Rasch model: Probabilities for item i

person location θ, location βi and scale αi of item i .

P(Xvi = 1|θv = θ) =exp(θ − βi )

1 + exp(θ − βi )

The ’trick’

P(Xvi = xi ,Xvi ′ = xi ′ |Rv = xi + xi ′ , θ)

where Rv = Xvi + Xvi ′ is independent of θ

Karl Bang Christensen Rasch models

Page 4: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Model framework, Assumptions

Unidimensional latent variable θv responsible for all correlationbetween the observed items items X = (Xi )i∈I , covariatesY = (Y1,Y2, ..)

(i) Θ unidimensional

(ii) Monotonous relationship between Θ and Xi

(iii) No differential item functioning (DIF) Xi ⊥ Yj |Θ(iv) Local independence: Xi ⊥ Xj |ΘRasch model (=1PL) further requirements

sufficiency, specific objectivity, exchangeability (invariance)

Karl Bang Christensen Rasch models

Page 5: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Invariance

P(X1 = 1|θ) P(X1 = 0|θ)

P(X2 = 1|θ) P(X2 = 0|θ)

=

exp(θ−β1)

1+exp(θ−β1)1

1+exp(θ−β1)

exp(θ−β2)1+exp(θ−β2)

11+exp(θ−β2)

Odds

P(X1=1|θ)P(X1=0|θ)

P(X2=1|θ)P(X2=0|θ)

=

exp(θ − β1)

exp(θ − β2)

Comparison OR = exp(β2 − β1) independent of θ

Karl Bang Christensen Rasch models

Page 6: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Sufficiency

Local independence gives us

P(X = x|θ) =I∏

i=1

exp(xi (θ − βi ))1 + exp(θ − βi )

=exp(

∑Ii=1 xi (θ − βi ))∏I

i=1(1 + exp(θ − βi ))

=exp(tθ −

∑Ii=1 xiβi )∏k

i=1(1 + exp(θ − βi ))=

exp(tθ −∑I

i=1 xiβi )

K(θ,β)

where t =∑k

i=1 xi is the sum score.

Karl Bang Christensen Rasch models

Page 7: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Sufficiency

The sum score t =∑

i xi is sufficient for θ.

This means that

P(X1 = h1, . . . ,XI = hI |X1 + · · ·+ XI = t)

does not depend on θ.

We can use this to estimate item parameters without makingassumptions about the distribution of the latent variable.

Karl Bang Christensen Rasch models

Page 8: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Estimation

Marginal likelihood

normal distribution (mean zero, SD σ)

LM(η) =N∏

v=1

∫P(Xv = xv |θv = θ)ϕσ(θ) (1)

consistent estimates if normal distribution is correct.

Conditional likelihood

no distributional assumptions

LC (η) =N∏

v=1

P(Xv = xv |Rv = rv ) (2)

conditionally consistent estimates1

1Andersen. Journal of the Royal Statistical Society B, 1970, 32:283-301.Karl Bang Christensen Rasch models

Page 9: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Estimation in SAS

Marginal likelihood

PROC IRT

macro %rasch mml.sas2

Graphics: item characteristic curves (ICC’s), person-item locationmaps, item and test information functions. Goodness-of-fit plots.

2Christensen, Olsbjerg. Marginal maximum likelihood estimation inpolytomous Rasch models using SAS. Pub. Inst. Stat. Univ. Paris, vol. 57,fasc. 1-2, 69-84, 2013.

Karl Bang Christensen Rasch models

Page 10: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Estimation in SAS

Conditional likelihood

macro %rasch cml.sas3

Graphics: item characteristic curves (ICC’s), person-item locationmaps, item and test information functions. Goodness-of-fit plots.

3Christensen. Conditional maximum likelihood estimation in polytomousRasch models using SAS, ISRN Computational Mathematics, vol. 2013, ArticleID 617475, 8 pages, 2013 http://doi.org/10.1155/2013/617475

Karl Bang Christensen Rasch models

Page 11: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

ADL six months after breast cancer surgery

17 items (ADL=Activities of Daily Living)

dichotomized: (0=no problems, 1=problems)Karl Bang Christensen Rasch models

Page 12: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

ADL six months after breast cancer surgery

%l e t i t ems=Q6m 1 Q6m 2 Q6m 3 Q6m 4 Q6m 5 Q6m 6 Q6m 7 Q6m 8 Q6m 9 Q6m 10 Q6m 11 Q6m 12 Q6m 13 Q6m 14 Q6m 15 Q6m 16 Q6m 17 ;∗ r ead ADL data ;data s a s u s e r .ADL;

f i l e n ame dat u r l ’ h t tp : // b i o s t a t . ku . dk/˜kach/ s c a l e v a l IRT/ADL. t x t ’ ;i n f i l e dat f i r s t o b s =2;i n pu t i d &i t ems ;

run ;

Karl Bang Christensen Rasch models

Page 13: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Dichotomous Rasch model for ADL items

Item i taking values 0,1:

P(Xvi = x |θv = θ) =exp(x(θ − βi ))

1 + exp(θ − βi )

proc i r t data=s a s u s e r .ADL p l o t s =( i c c i i c ) ;va r Q6m 1−Q6m 17 ;model Q6m 1−Q6m 17 / r e s f u n c=ra s ch ;

run ;

PROC IRT uses marginal likelihood (assumes θ ∼ N(0, σ2))

parameters

β1, . . . , β17, σ

Karl Bang Christensen Rasch models

Page 14: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Dichotomous Rasch model for ADL items

Item i taking values 0,1:

P(Xvi = x |θv = θ) =exp(xα(θ − βi ))

1 + exp(α(θ − βi ))

proc i r t data=s a s u s e r .ADL p l o t s =( i c c t i c ) ;va r Q6m 1−Q6m 17 ;model Q6m 1−Q6m 17 / r e s f u n c=onep ;

run ;

PROC IRT uses marginal likelihood (assumes θ ∼ N(0, 1))

parameters

β1, . . . , β17, α

Karl Bang Christensen Rasch models

Page 15: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Dichotomous Rasch model for ADL items

Item Parameter Estimate s.e.

Q6m 1 Difficulty 3.36981 0.21681Slope 1.00000

Q6m 2 Difficulty 1.54327 0.15923Slope 1.00000

Karl Bang Christensen Rasch models

Page 16: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Dichotomous Rasch model for ADL items

interpretation of βi is the usual (P = 12)

Karl Bang Christensen Rasch models

Page 17: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Dichotomous Rasch model for ADL items

Item i taking values 0,1:

P(Xvi = x |θv = θ, βi1) =exp(x(θ − β1))

1 + exp(θ − β1)

proc i r t data=s a s u s e r .ADL p l o t s =( i c c i i c ) ;va r Q6m 1−Q6m 17 ;model Q6m 1−Q6m 17 / r e s f u n c=ra s ch ;

run ;

Karl Bang Christensen Rasch models

Page 18: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Rasch model in SAS, CML

Include macros

%l e t u r l=h t t p s : // raw . g i t h u bu s e r c o n t e n t . com/Ka r lBangCh r i s t en s en /Rasch/master ;f i l e n ame r u r l ”&u r l / r a s ch i n c l u d e a l l . s a s ” ;%i n c l u d e r ;

Item information

data i n ;i n pu t i tem no i tem name $ i tem t e x t $ max group ;d a t a l i n e s ;1 Q6m 1 x 1 12 Q6m 2 x 1 23 Q6m 3 x 1 34 Q6m 4 x 1 45 Q6m 5 x 1 56 Q6m 6 x 1 67 Q6m 7 x 1 78 Q6m 8 x 1 89 Q6m 9 x 1 910 Q6m 10 x 1 1011 Q6m 11 x 1 1112 Q6m 12 x 1 1213 Q6m 13 x 1 1314 Q6m 14 x 1 1415 Q6m 15 x 1 1516 Q6m 16 x 1 1617 Q6m 17 x 1 17;run ;

Karl Bang Christensen Rasch models

Page 19: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Rasch model in SAS, CML

call macro

%ra s ch data (data=s a s u s e r .ADL,i tem names=i n ) ;

%ra s ch CML(data=s a s u s e r .ADL,i tem names=in ,out=CML) ;

takes a long time for big tables (contingency table has 217 cells).

Karl Bang Christensen Rasch models

Page 20: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Rasch model in SAS, MML

call macro

%ra s ch data (data=s a s u s e r .ADL,i tem names=i n ) ;

%ra s ch MML(data=s a s u s e r .ADL,i tem names=in ,out=MML) ;

takes a long time for big data sets.

Karl Bang Christensen Rasch models

Page 21: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Does the Rasch model fit data? Overall test of the model4

Divide sample into G groups: Lg likelihood in group g

CLR = −2 log

(L∏G

g=1 Lg

)

asymptotically χ2 on (G − 1) · (I − 1) degrees of freedom (fordichotomous items). Note that this uses the fact that we do nothave to assume anything about the distribution

4Andersen. Psychometrika, vol. 38, 123-140, 1973.Karl Bang Christensen Rasch models

Page 22: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Does the Rasch model fit data? Overall test of the model

Two data sets G = 1 (score 0-2), G = 2 (score 3-17)

data ADL;s e t s a s u s e r .ADL;s c o r e=sum( o f &i t ems ) ;nm=nmiss ( o f &i t ems ) ;

run ;data ADL;

s e t ADL;i f nm=0;

run ;data ADL1 ;

s e t ADL;i f s c o r e i n ( 0 , 1 , 2 , 3 ) ;

run ;data ADL2 ;

s e t ADL;i f s c o r e > 3 ;

run ;

Karl Bang Christensen Rasch models

Page 23: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Does the Rasch model fit data? Overall test of the model

Fit Rasch model in each data set

%ra s ch data (data=ADL,

i tem names=i n ) ;%ra s ch cml (

data=ADL,i tem names=in ,out=cml ) ;

%ra s ch data (data=ADL1 ,

i tem names=i n ) ;%ra s ch cml (

data=ADL1 ,i tem names=in ,out=cml1 ) ;

%ra s ch data (data=ADL2 ,

i tem names=i n ) ;%ra s ch cml (

data=ADL2 ,i tem names=in ,out=cml2 ) ;

Karl Bang Christensen Rasch models

Page 24: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Does the Rasch model fit data? Overall test of the model

Compare fit of original Rasch model L with combined fit in eachdata set L1 · L2proc s q l ;

s e l e c t v a l u e i n t o : l 1 from cml1 l o g l ;s e l e c t v a l u e i n t o : l 2 from cml2 l o g l ;s e l e c t v a l u e i n t o : l from cml l o g l ;

q u i t ;data l r t ;

l r t =(& l 1+& l2−& l ) ;d f =16;p=1−cd f ( ’ c h i s q u a r e d ’ , l r t , d f ) ;

run ;p roc p r i n t data= l r t round noobs ;run ;

by compare log likelihood values

χ2 = 48.2(df = 16)

model is rejected

Karl Bang Christensen Rasch models

Page 25: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Multiple groups analysis 5

G = 1 X1 X2

X3

:

Θ

KK EE

::

--

&&

:

X16

X17

G = 2 X1 X2

X3

:

Θ

KK EE

::

--

&&

:

X16

X17

5Andersen. Psychometrika, vol. 38, 123-140, 1973.Karl Bang Christensen Rasch models

Page 26: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Does the Rasch model fit data? Individual item fit

ResidualsRvi = Xvi − Evi

where Xvi is the response of person v to item i and Evi expectedresponse of person v to item i .

Two issues to discuss

(i) How to calculate the expected values

(ii) How to summarize residuals

Karl Bang Christensen Rasch models

Page 27: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

The conditional item characteristic curve

The item characteristic curve is the item response function

θ 7→ P(Xvi = x |θv = θ).

In the Rasch model: simple expression for a conditional version ofthis

r 7→ P(Xvi = x |Rv = r) =exp(−βi )γr−1(β1, . . . , βi−1, βi+1, . . . , βI )

β

Karl Bang Christensen Rasch models

Page 28: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

(i) How to calculate the expected values

Expected response Evi :ideal situation: Known person locations

Evi = P(Xvi = 1|Θv = θv ) =exp(θv − βi )

1 + exp(θv − βi )

less than ideal situation: Person locations are estimated

Evi = P(Xvi = 1|Θv = θ̂v ) =exp(θ̂v − β̂i )

1 + exp(θ̂v − β̂i )

Two problems

Person locations are estimated with error

Formula is incorrect

Commercial Rasch model software ignores these problems ..

Karl Bang Christensen Rasch models

Page 29: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Standard evaluation of individual item fit

using

Ziv =Xiv − E (Xiv |Θv = θ̂v )√

V (Xiv |Θv = θ̂v )

Fit statistics like

OUTFITi =1

N

N∑v=1

Z 2iv

with no established null distribution are used. Early Raschliterature claims of χ2 distribution.6

6Wright, Stone (1979). Mesa Press, Chicago, USA.Karl Bang Christensen Rasch models

Page 30: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Wilson-Hilferty cube-root transformation

It has been suggested that the Wilson-Hilferty (1931) cube-roottransformation

ti = (OUTFIT1/3i − 1)

3

V (OUTFITi )+

V (OUTFITi )

3

has an approximate t distribution and that

FitResidi =f (log(N · OUTFITi )− log(f ))√

V (N · OUTFITi )

with f = (Nk − N − k + 1)/k, has a symmetrical distribution withmean zero and variance one.

Karl Bang Christensen Rasch models

Page 31: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Commercial software

WINSTEPS

OUTFIT, t-transformed OUTFIT

INFIT (weighted version of OUTFIT), t-transformed INFIT

RUMM

item χ2 fit statistic

item ANOVA fit statistic

item FitResidi

Karl Bang Christensen Rasch models

Page 32: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

RUMM

groups respondents in G ’class intervals’ based on the estimatedperson locations θ̂v .

still relies on θ̂1, θ̂2, . . .

Item χ2 fit statistic

χ2(Xi ) =∑g

(∑v∈Vg

Xvi −∑

v∈VgE (Xvi )

)2∑v∈Vg

V (Xvi )

where Vg denotes the set of respondents in class interval g . ItemANOVA fit statistic uses ANOVA based on the grouping

Ziv = µg(v) + ERROR

Karl Bang Christensen Rasch models

Page 33: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Item fit statistics in SAS

Fit model

%ra s ch data (data=s a s u s e r .ADL,i tem names=i n ) ;

%ra s ch CML(data=s a s u s e r .ADL,i tem names=in ,out=CML) ;

creates output data sets with information.Estimate person locations

%ra s ch ppar (DATA=s a s u s e r .ADL,ITEM NAMES=in ,DATA IPAR=CML ipa r ,out=pp cml ) ;

also creates output data sets with information

Karl Bang Christensen Rasch models

Page 34: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Item fit statistics in SAS

Compute item fit statistics

%ra s ch i t e m f i t (DATA=s a s u s e r .ADL,ITEM NAMES=in ,DATA IPAR=cml i p a r ,DATA POPPAR=pp cml outdata ,NCLASS=3,OUT=f i t cm l ) ;

(takes a long time). We specify that we want three class intervals.Output data sets ’fitcml chisq’, ...

Karl Bang Christensen Rasch models

Page 35: Karl Bang Christensen Dept. of Biostatistics Univ. of ...

Exercise 10: Item fit statistics in SAS

Use the 11 symptom items from the colitis data set.

1 Evaluate item fit using PROC IRT

2 Evaluate item fit using SAS macros

use

http://publicifsv.sund.ku.dk/~kach/scaleval_IRT/exerc10.sas

Karl Bang Christensen Rasch models