1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

46
1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill

Transcript of 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

Page 1: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

1

Conceptual Issues in Observed-Score Equating

Wim J. van der Linden

CTB/McGraw-Hill

Page 2: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

2

Outline

• Review of Lord (1980)

• Local equating

• Few examples

• Discussion

Page 3: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

3

Review of Lord (1980)

• Notation– X: old test form with observed score X– Y: new test form Y with observed score Y– θ: common ability measured by X and Y– x=φ(y): equating transformation

Page 4: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

4

Review of Lord (1980) Cont’d

• Case 1: Infallible measures– X and Y order any population identically– Equivalence of ranks establishes equating

transformation

( )G y p

( ( ))F y p 1( ) ( ( ))y F G y

Page 5: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

5

Review of Lord (1980) Cont’d

• Case 1: Infallible measures Cont’d

– Q-Q curve – Issues related to discreteness, strict

monotonicity, and sampling error will beignored

– Equating is population invariant– Equating error always equal to zero

( ) ( )e x y x

Page 6: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

6

Review of Lord (1980) Cont’d

• Case 2: Fallible measures– For each test taker, observed score are random

variables– Realizations of X and Y do not order populations

of test takers identically– Criterion of equity of equating

( ) θ θY Xf f for all θ

Page 7: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

7

Review of Lord (1980) Cont’d

• Case 2: Fallible measures Cont’d

– Lord’s theorem: Under realistic conditions, scores X and Y on two tests cannot be equated unless either (1) both scores are perfectly reliable of (2) the two tests are strictly parallel [in which case φ(y)=y]

Page 8: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

8

Review of Lord (1980) Cont’d

• Case 2: Fallible measures Cont’d

– Equating no longer population invariant

θ( ) ( ) (θ) θX Xf x f x f d

θ( ) ( ) (θ) θY Yf y f y f d

Page 9: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

9

Review of Lord (1980) Cont’d

• Two approximate methods– IRT true-score equating

– Use ξ=ξ(η) to equate Y to X

1

(θ)n

ii

P

1

(θ)n

jj

P

Page 10: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

10

Review of Lord (1980) Cont’d

• Two approximate methods Cont’d

– IRT observed-score equating, for a sample of test takers a=1,…,N

1

1

( ) ( θ )N

aXa

f x N f X

1

1

( ) ( θ )N

aYa

f y N f Y

Page 11: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

11

Review of Lord (1980) Cont’d

• Lord’s forgotten question:

What is really needed is a criterion for evaluatingsuch approximate procedures, so as to be able to choose from among them. If you can’t be fair (provide equity) to everyone, what is the next best thing? (p.207)

Page 12: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

12

Local Equating

• New definition of equating error

• Equity=no equating error!

• Setting e2(y) equal to zero and solving for φ(y) gives

2 ( ) θ θ( ;θ) ( ) ( )Y Xe x F y F x

* 1θ ( ) θ( ;θ) ( ), X Yx y F F y R

Page 13: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

13

Local Equating Cont’d

• Because of monotonicity of x=φ(y), the resultis the family of error-free (or true) equating transformations

• Lord’s theorem is based on implicit assumptionof a single transformation

* 1θ θ( ;θ) ( ), X Yy F F y R

Page 14: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

14

Local Equating Cont’d

• Theorem: For a population of test takers P for which X and Y measure the same θ, equating with the family of transformations φ*(y;θ) has the following properties:(i) equity for each p P(ii) symmetry in X and Y for each p P (iii) population invariance within P

Page 15: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

15

Local Equating Cont’d

• Theorem defines population P– No sampling of test takers required– Includes future test takers

• Alternative definition of equating error:

13 θ θ( ;θ) ( ) ( )X Ye y y F F y

Page 16: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

16

Local Equating Cont’d

• Definition of bias, MSE, etc., in equating now straightforward

• Lord’s criterion for finding the “next best thing”

Page 17: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

17

Local Equating Cont’d

• Alternative motivations of local equating– Thought experiment– History of standard error of measurement– Comparison with

• true-score equating

• IRT observed-score equating

– Same score but different equated scores?

Page 18: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

18

Local Equating Cont’d

• Alternative motivations Cont’d

– One measurement instrument but different transformations?

Page 19: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

19

Few Examples

• It may seem as if local equating replaces Lord’s set of impossible conditions for equating (perfect reliability; parallel test) by another impossible condition (known ability)

• However, post hoc improvement of reliability or parallelness is impossible but we can always approximate an unknown ability

Page 20: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

20

Few Examples Cont’d

• Possible approximations– Estimating ability– Anchor scores as a proxy of ability– Y=y as a proxy of ability– Proxies based on collateral information

Page 21: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

21

Discussion

• Criterion of equity involves a different equating transformation for each ability level

• Traditional equating uses “one-size fits all” transformation, which compromises betweenthe transformations for ability levels. As a result, the equating is always (i) biased and(ii) population dependent

Page 22: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

22

Discussion Cont’d

• Lord’s theorem on the impossibility or unnecessity of observed-score equating wastoo pessimistic because it assumed the use of a single equating transformation for a population of test takers

Page 23: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

23

Equipercentile Method

0.0

0.5

1.0

0 5 10 15 20 25 30 35 40

Test Y

Test X

Test Score

Cum

ulat

ive

Pro

babi

lity

F (x)

G(y)

1( ( ))x F G y

p

Page 24: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

24

Thought Experiment

y

pTest Y

Page 25: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

25

Thought Experiment Cont’d

y

x

p

p

Test Y

Test X

Page 26: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

26

Thought Experiment Cont’d

y

x

yx=

φ(y

)

p

p

p

Test Y

Test X

Transformation Y → X

Page 27: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

27

Thought Experiment Cont’d

y

x

yx=

φ(y

)

p

p

p

qTest Y

Test X

Transformation Y → X

Page 28: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

28

Thought Experiment Cont’d

y

x

yx=

φ(y

)

pq

p

p

Test Y

Test X

Transformation Y → X

q

Page 29: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

29

Thought Experiment Cont’d

y

yx=

φ(y

)

pq

pq

qp

x

Test Y

Test X

Transformations Y → X

Page 30: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

30

Thought Experiment Cont’d

Test Y (Population 1)

Test X (Population 2)

y

x

yx=

φ(y

)

Transformation Y → X

Page 31: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

31

Thought Experiment Cont’d

y

x=φ

(y)

Transformation Y → X

y

x=φ

(y) qp

Transformations Y → X

Page 32: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

32

Standard Error of Measurement

• Classical test theory involves one SEM for an entire population of test takers

• Stronger models condition on ability measured; e.g., IRT

'1E X XX

1/2

θ (θ)(1- (θ))i iEi

P P

Page 33: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

33

True-Score Equating

• True-score equating is a degenerate case of local equating

( θ) ( θ), θE Y E X R

Page 34: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

34

Different Equated Scores?

• Why should two test takers, p and q, with the same score of 23 out of 30 items correct on a new test form need different equated scores on the same old form?– Would this not even be unfair?– Fallible scores

Page 35: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

35

Different Equated Scores? Cont’d

Observed-score distribution of p Observed-score distribution of q

Page 36: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

36

Different Transformations?

• Example of measuring tape

• Number-correct scores are counts of responses, no fundamental measures

• Responses have person and item effects– Test equating requires “some type of control for

differential examinee ability”—von Davier, Holland & Thayer (2004, p. 2)

Page 37: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

37

Different Transformations? Cont’d

• An effective way to disentangle item and person effects is through IRT modeling

• Observed-score equating is an attempt to do the same through a transformation of total scores– Only possible way is (i) to first condition on the

abilities and (ii) then transform the score to adjust for the item effects

Page 38: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

38

Estimating Ability

• Assumption: fitting response model

• Calculate family of true equating transformations (Lord-Wingersky’s recursive procedure)

• Use member of family at point estimate of θ

• Bias study for 40-item subtests of LSAT

• Application in adaptive testing

Page 39: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

39

Bias Study

-6

-4

-2

0

2

4

6

0 10 20 30 40e

-6

-4

-2

0

2

4

6

0 10 20 30 40

Bia

s

Bia

s

Traditional Equating Local Equating at θ

Page 40: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

40

Family of True Transformationsfor LSAT Subtest

0

5

10

15

20

25

0 5 10 15 20 25

=-2.0x

y

=2.0

Page 41: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

41

Anchor Score as Proxy

• Current methods– Chain equating– Poststratification equating– Linear equating methods: Tucker, Levine,

Braun-Holland, linear chain equating

• Use conditional distributions of X and Y given anchor score A=a

1( ( )) X a Y ax F F y a A

Page 42: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

42

Anchor Score as Proxy Cont’d

• Empirical bias study for same LSAT subtests

Page 43: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

43

Bias Study—Anchor-Test Design

-8-6-4-202468

0 10 20 30 40

Chain Equating

-8-6-4-202468

0 10 20 30 40

Poststratification Equating

-8-6-4-202468

0 10 20 30 40

Local Equating

Page 44: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

44

Y=y as Proxy of Ability

• Single-group design– Estimate distributions of X given Y=y directly

from bivariate distribution of X and Y– Model-based estimate of Y given y

Page 45: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

45

Y=y as Proxy of Ability

• Linear local equating

• Because μY|y=y (classical test theory),

( ) ( ), 0,1,..., .X y

X y Y yY y

x y y y n

( ) , 0,1,..., .X yx y y n

Page 46: 1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.

46

Collateral Information

• Any variables correlating substantiallywith θ– Earlier tests – Battery of subtests– Response times

• Alternative sources give different equatings; just find the “next best thing”