Volkert Siersma Research Unit for General Practice in Copenhagen V.Siersma@gpract.ku.dk P γ...

Post on 22-Dec-2015

216 views 1 download

Transcript of Volkert Siersma Research Unit for General Practice in Copenhagen V.Siersma@gpract.ku.dk P γ...

Volkert Siersma

Research Unit for General Practice in Copenhagen

http://www.gpract.ku.dk V.Siersma@gpract.ku.dk

Pγ measure for association between categorical variables with partial or

tentative ordering of categories

Categorical variables

Nominal variables Ordinal variables

Ordinal variables are categorical variables with additional information in the ordering of the categories.

This can be used to devise stronger and more meaningful analyses between ordinal variables.

Categories have an inherent ordering

Categories are unordered relative to each other

1

Categorical inference

Nominal variables Ordinal variables

Tests for conditional independence in multidimensional contingency tables

2

Categorical inference

Nominal variables Ordinal variables

Inference based on χ2-measures.

LR test with saturated alternative

LR test with 2-factor alternative

Tests for conditional independence in multidimensional contingency tables

2

Categorical inference

Nominal variables Ordinal variables

Inference based on χ2-measures.

LR test with saturated alternative

LR test with 2-factor alternative

Inference based on χ2-measures.

Tests for conditional independence in multidimensional contingency tables

2

Categorical inference

Nominal variables Ordinal variables

Inference based on χ2-measures.

LR test with saturated alternative

LR test with 2-factor alternative

Inference based on χ2-measures.

Inference based on rank correlation measures.

Tests for conditional independence in multidimensional contingency tables

Goodman and Kruskal’s γ measure

2

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 24 84 95 18 2

Previous 29 77 102 20 3

Current 31 82 121 18 4

An example

Type 2 diabetes patients at diagnosis

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 3.4% 11.8% 13.4% 2.5% 0.3%

Previous 4.1% 10.9% 14.4% 2.8% 0.4%

Current 4.4% 11.6% 17.0% 2.5% 0.6%

An example

…the corresponding empirical probability table

3An example

If the two variables are independent then the joint probability function, i.e. the cell probabilities of the table, are just the products of the marginal probabilities of the categories of each of the variables

P(X=x and Y=y) = P(X=x)*P(Y=y)

3An example

…the marginal probability distributions

Self-rated health

Excellent Good Fair Poor Very poor

Smoking 11.8% 34.2% 44.8% 7.9% 1.3%

Never 31.4%

Previous 32.5%

Current 36.1%

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 24/26 84/76 95/100 18/18 2/3

Previous 29/27 77/79 102/103 20/18 3/3

Current 31/30 82/88 121/115 18/20 4/3

An example

…observed versus expected

3An example

Compare the observed table and the table expected under independence.

Pearson’s statistic, which is based on the sum of squared differences between the observed and the expected table entries, is chi-squared distributed when the null hypothesis is true.

Here: df=8 and p=0.94

Or we perform an exact test!

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 24 84 95 18 2

Previous 29 77 102 20 3

Current 31 82 121 18 4

An example

Goodman and Kruskal’s γ.

Two independent draws (X1,Y1) and (X2,Y2) from the joint (X,Y) distribution.

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 24 84 95 18 2

Previous 29 77 102 20 3

Current 31 82 121 18 4

An example

Two independent draws (X1,Y1) and (X2,Y2) from the joint (X,Y) distribution.

Concordance ”If X goes up, Y goes up.”

X1

X2

Y1 Y2

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 24 84 95 18 2

Previous 29 77 102 20 3

Current 31 82 121 18 4

An example

X1

X2

Y1Y2

Concordance ”If X goes up, Y goes up; if X goes down, Y goes down” or ”X and Y move in the same direction.”

The definition is symmetric

3

Self-rated health

Smoking Excellent Good Fair Poor Very poor

Never 24 84 95 18 2

Previous 29 77 102 20 3

Current 31 82 121 18 4

An exampleDiscordance ”If X goes up, Y goes down.” or ”X and Y move in opposite directions.”

X1

X2

Y1 Y2

3An example

nce)P(discordance)P(concorda

nce)P(discordance)P(concordaγ

Goodman and Kruskal’s γ. Difference of the probabilities for concordance and discordance scaled with the probability of not having ties.

Here: γ=0.02 and p=0.60

Partial order

Nominal Ordinal

Only part of the categories is ordered.

Goals in a weight control programme:

No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg

4

Partial order

Nominal Ordinal

Only part of the categories is ordered.

Goals in a weight control programme:

No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg

Extra-ordinal category

4

Partial order

Nominal Ordinal

Only part of the categories is ordered.

Goals in a weight control programme:

No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg

Has to be treated as nominal variable and the information in the ordering is lost.

4

Partial order

Nominal Ordinal

Only part of the categories is ordered.

Goals in a weight control programme:

No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg

No indication on the effect of the extra-ordinal category in relation to the others.

?

4

Nominal Ordinal

Tentative order

The ordering of the categories is of interest.

Danish political parties:

Ø SF A B Q CD Z V C DF

5

Nominal Ordinal

Tentative order

The ordering of the categories is of interest.

Danish political parties:

Ø SF A B Q CD Z V C DF

Ordering w.r.t. left-right affiliation

Methods for nominal variables do not give information on the nature of the relationship.

5

Ordinal information

Partially ordinal variables:

have to be treated as nominal variables in general

information in the ordering of the categories, and statistical power, is lost.

Tentatively ordinal variables:

the form of the association has to be deducted by examination of stratified tables or parameters of loglinear models

which in multivariate analysis can be most confusing.

6

An ordering X(r) of X is an ordinal random variable with a specific permutation r of the categories of X.

If X has a (partial) order, we regard only valid orderings of X, i.e. orderings based on permutations that do not violate this partial order.

Nominal variable: all orderings are valid

Ordinal variable: only one ordering is valid

An ordering of a categorical variable7

A Pγ measure of association

The Pγ measure of association between a partially ordered or nominal X and an ordinal Y: the maximum γ between a valid ordering of X and Y.

X(r)Yr

XYP γmaxγ

The optimal monotone ordering of X w.r.t. Y: the valid ordering of X for which this maximum is obtained.

X(r)Yr

opt γargmaxr

8

A partial γ measure of association

In multidimensional contingency tables one is often interested in the relationship of two variables, X and Y, conditional on (controlled for, stratified by) a third variable Z.

Within each stratum of Z, a γ measure is calculated between X and Y.

A partial γ measure of monotone association between X and Y is defined as a weighted summary γ measure across subtables spanned by the categories of Z.

9

The partial PγXY|Z between a partially ordered or nominal X and an ordinal Y conditional on a nominal Z is defined as the maximum partial γ between a valid ordering of X and Y.

The partial optimal monotone ordering of X w.r.t. Y, controlled for Z is the ordering corresponding to the partial PγXY|Z.

A partial Pγ measure of association10

Inference

Significance of the Pγ measure and its corresponding partial measure is assessed by comparison of the obtained value with a simulated distribution under the null hypothesis where X and Y are independent.

Resampling tests are standard in the analysis of multi-way contingency tables as tests based on the asymptotic distribution are of very low power

11

Simulation study

Relationship between X and Y conditional on Z

X and Y ordinal, Z nominal.

Dim(X) = 3 or 5

Dim(Y) = 3 or 5

Dim(Z) = 2 or 10

Uniform marginal distributions

N = 200

partial γ = 0 or 0.15

Categories of Y are permuted to calculate Pγ

12

Simulation study – results γ=0

The attained level of significance, i.e. the power of the tests when the true γ is 0, has to be 5%.

Our results show that this is not a problem. All MC estimates of the critical value are in the 95% confidence region:

0.05 0.0135

13

Simulation study – results γ=0.15

# rows

# columns

# strata

(XZ,YZ) against (XYZ)

(XZ,YZ) against

(XY,XZ,YZ)

Partial γ Partial Pγ

Correct ordering

Correlation with true ordering (mean)

3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70

3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68

3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54

3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53

5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72

5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70

5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60

5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57

14

Simulation study – results γ=0.15

# rows

# columns

# strata

(XZ,YZ) against (XYZ)

(XZ,YZ) against

(XY,XZ,YZ)

Partial γ Partial Pγ

Correct ordering

Correlation with true ordering (mean)

3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70

3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68

3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54

3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53

5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72

5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70

5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60

5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57

Considerably higher power than the other tests.

This was to be expected because the data was generated with a monotone relationship.

14

Simulation study – results γ=0.15

# rows

# columns

# strata

(XZ,YZ) against (XYZ)

(XZ,YZ) against

(XY,XZ,YZ)

Partial γ Partial Pγ

Correct ordering

Correlation with true ordering (mean)

3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70

3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68

3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54

3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53

5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72

5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70

5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60

5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57

The test based on P is not as powerful as the one based on .

The power is higher than both the LR tests considered here.

14

Simulation study – results γ=0.15

# rows

# columns

# strata

(XZ,YZ) against (XYZ)

(XZ,YZ) against

(XY,XZ,YZ)

Partial γ Partial Pγ

Correct ordering

Correlation with true ordering (mean)

3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70

3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68

3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54

3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53

5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72

5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70

5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60

5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57

The influence of the simulation parameters is intuitive.

This becomes clear in more extensive simulations.

14

Simulation study – results γ=0.15

# rows

# columns

# strata

(XZ,YZ) against (XYZ)

(XZ,YZ) against

(XY,XZ,YZ)

Partial γ Partial Pγ

Correct ordering

Correlation with true ordering (mean)

3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70

3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68

3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54

3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53

5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72

5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70

5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60

5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57

Insight is gained in the ordering of the categories.

The identification of the correct ordering depends on the number of categories that is permuted.

The ordering will be close to, but unlikely to be the correct ordering.

14

The relation between γ and Pγ

Dim(X) = 5

Dim(Y) = 5

Dim(Z) = 10

γ = 0.15

15

The relation between γ and Pγ

|γ|γP

15

The relation between γ and Pγ

|γ| is closer to Pγ when the estimated values for these coefficients are higher

|γ|γP

15

The distribution of Pγ

γ = 0

Normal?!?

16

The distribution of Pγ

γ = 0.15

Normal…

16

Danish political parties

European Values Studies

Denmark:

survey in 1981, 1990 and 1999

preferred political party (10 parties)

political attitudes measured on a left-right discrete (10 point) VAS scale

10 x 10 x 3 table

Significance of the assiciation is obvious

Ordering of the parties is common knowledge (up to a certain level…)

17

Danish political parties - Pγ

Left Right categories |Far left |The Red-Green Alliance Ø |The Socialist People’s party SF |The Social Democratic Party A |The Social Liberal Party B |The Christian People’s party Q |The Centre Democrats CD |The Progress Party Z |The Liberal Party V |The Conservative Party C |The Danish People’s Party DF | | |Far right

P = 0.629 a very strong association

18

Danish political parties - Pγ

Left Right categories |Far left |The Red-Green Alliance Ø |The Socialist People’s party SF |The Social Democratic Party A |The Social Liberal Party B |The Christian People’s party Q |The Centre Democrats CD |The Progress Party Z |The Liberal Party V |The Conservative Party C |The Danish People’s Party DF | | |Far right

Common knowledge:

Left (in this order)

Center

Right

18

Danish political parties - Pγ

Left Right categories |Far left |The Red-Green Alliance Ø |The Socialist People’s party SF |The Social Democratic Party A |The Social Liberal Party B |The Christian People’s party Q |The Centre Democrats CD |The Progress Party Z |The Liberal Party V |The Conservative Party C |The Danish People’s Party DF | | |Far right

The position of DF on the far right is somewhat surprising

New party in 1999

Reflects the political attitudes of the persons preferring DF to other parties in 1999

Since then, the party has with some success attempted to move towards the middle of the spectrum

18

A weight control program19

Weight goals against attained weight.

A considerable number have no goal set.

A weight control program19

For convenience we code the categories of the weight goal variable with letters.

A weight control program19

We investigate the placement of the no goal set category with Pγ.

The relationship is significant, but confounded.

A weight control program19

We investigate the no goal set category with the partial Pγ.

A weight control program19

We investigate the no goal set category with the partial Pγ conditional on many possible confounders.

A weight control program19

We construct the product of weight goal and weight at current session. The optimal monotone ordering with respect to attained weight shows effect modification.

A weight control program19

For normal or extremely obese people no goal set is associated with low attained weight, but others are better off with guidance in the form of weight goals.

Metabolic syndrome

Combinations (products) of ordinal variables

sex x age → C

hyperglycaemia x obesity → D

Metabolic Syndrome

No Yes D

sex age No Yes No Yes C C 1 4 3 2

M Y 45 15 14 5 1 3 83 9 5 2

M O 12 9 18 12 2 4 25 11 12 0

F Y 83 2 5 9 3 1 45 5 14 15

F O 25 0 12 11 4 2 12 12 18 9

D 1 2 3 4 Pγ = 0,53

Hyperglycaemia

Obesity

20

Metabolic syndrome

Categories of C and D permuted to obtain Pγ

• omnibus tests

• screening priority

• scale construction

Metabolic Syndrome

No Yes D

sex age No Yes No Yes C C 1 4 3 2

M Y 45 15 14 5 1 3 83 9 5 2

M O 12 9 18 12 2 4 25 11 12 0

F Y 83 2 5 9 3 1 45 5 14 15

F O 25 0 12 11 4 2 12 12 18 9

D 1 2 3 4 Pγ = 0,53

Hyp.glyc.

Obesity

D 1 2 3 4

20

Metabolic syndrome

It is difficult to interpret the obtained orderings as something clinically meaningful.

Metabolic Syndrome

No Yes D

sex age No Yes No Yes C C 1 4 3 2

M Y 45 15 14 5 1 3 83 9 5 2

M O 12 9 18 12 2 4 25 11 12 0

F Y 83 2 5 9 3 1 45 5 14 15

F O 25 0 12 11 4 2 12 12 18 9

D 1 2 3 4 Pγ = 0,53

Hyp.glyc.

Obesity

20

Canonical correlations

Can also find the optimal monotone ordering of one nominal variable relative to one ordinal variable. The extension to a partial analysis is troublesome.

Can also be used to construct omnibus tests. But these do not take the ordinal nature of the variables into account.

21

Conclusion

Pγ based inference is more powerful than nominal χ2 based inference.

Pγ based inference makes use of information in a partial ordering of the categories.

Pγ based inference gives information on the ordering of the categories.

Pγ based inference gives insight in effect modification.

Pγ based inference allows for omnibus tests.

22