Volkert Siersma Research Unit for General Practice in Copenhagen [email protected] P γ...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Volkert Siersma Research Unit for General Practice in Copenhagen [email protected] P γ...
Volkert Siersma
Research Unit for General Practice in Copenhagen
http://www.gpract.ku.dk [email protected]
Pγ measure for association between categorical variables with partial or
tentative ordering of categories
Categorical variables
Nominal variables Ordinal variables
Ordinal variables are categorical variables with additional information in the ordering of the categories.
This can be used to devise stronger and more meaningful analyses between ordinal variables.
Categories have an inherent ordering
Categories are unordered relative to each other
1
Categorical inference
Nominal variables Ordinal variables
Tests for conditional independence in multidimensional contingency tables
2
Categorical inference
Nominal variables Ordinal variables
Inference based on χ2-measures.
LR test with saturated alternative
LR test with 2-factor alternative
Tests for conditional independence in multidimensional contingency tables
2
Categorical inference
Nominal variables Ordinal variables
Inference based on χ2-measures.
LR test with saturated alternative
LR test with 2-factor alternative
Inference based on χ2-measures.
Tests for conditional independence in multidimensional contingency tables
2
Categorical inference
Nominal variables Ordinal variables
Inference based on χ2-measures.
LR test with saturated alternative
LR test with 2-factor alternative
Inference based on χ2-measures.
Inference based on rank correlation measures.
Tests for conditional independence in multidimensional contingency tables
Goodman and Kruskal’s γ measure
2
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 24 84 95 18 2
Previous 29 77 102 20 3
Current 31 82 121 18 4
An example
Type 2 diabetes patients at diagnosis
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 3.4% 11.8% 13.4% 2.5% 0.3%
Previous 4.1% 10.9% 14.4% 2.8% 0.4%
Current 4.4% 11.6% 17.0% 2.5% 0.6%
An example
…the corresponding empirical probability table
3An example
If the two variables are independent then the joint probability function, i.e. the cell probabilities of the table, are just the products of the marginal probabilities of the categories of each of the variables
P(X=x and Y=y) = P(X=x)*P(Y=y)
3An example
…the marginal probability distributions
Self-rated health
Excellent Good Fair Poor Very poor
Smoking 11.8% 34.2% 44.8% 7.9% 1.3%
Never 31.4%
Previous 32.5%
Current 36.1%
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 24/26 84/76 95/100 18/18 2/3
Previous 29/27 77/79 102/103 20/18 3/3
Current 31/30 82/88 121/115 18/20 4/3
An example
…observed versus expected
3An example
Compare the observed table and the table expected under independence.
Pearson’s statistic, which is based on the sum of squared differences between the observed and the expected table entries, is chi-squared distributed when the null hypothesis is true.
Here: df=8 and p=0.94
Or we perform an exact test!
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 24 84 95 18 2
Previous 29 77 102 20 3
Current 31 82 121 18 4
An example
Goodman and Kruskal’s γ.
Two independent draws (X1,Y1) and (X2,Y2) from the joint (X,Y) distribution.
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 24 84 95 18 2
Previous 29 77 102 20 3
Current 31 82 121 18 4
An example
Two independent draws (X1,Y1) and (X2,Y2) from the joint (X,Y) distribution.
Concordance ”If X goes up, Y goes up.”
X1
X2
Y1 Y2
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 24 84 95 18 2
Previous 29 77 102 20 3
Current 31 82 121 18 4
An example
X1
X2
Y1Y2
Concordance ”If X goes up, Y goes up; if X goes down, Y goes down” or ”X and Y move in the same direction.”
The definition is symmetric
3
Self-rated health
Smoking Excellent Good Fair Poor Very poor
Never 24 84 95 18 2
Previous 29 77 102 20 3
Current 31 82 121 18 4
An exampleDiscordance ”If X goes up, Y goes down.” or ”X and Y move in opposite directions.”
X1
X2
Y1 Y2
3An example
nce)P(discordance)P(concorda
nce)P(discordance)P(concordaγ
Goodman and Kruskal’s γ. Difference of the probabilities for concordance and discordance scaled with the probability of not having ties.
Here: γ=0.02 and p=0.60
Partial order
Nominal Ordinal
Only part of the categories is ordered.
Goals in a weight control programme:
No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg
4
Partial order
Nominal Ordinal
Only part of the categories is ordered.
Goals in a weight control programme:
No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg
Extra-ordinal category
4
Partial order
Nominal Ordinal
Only part of the categories is ordered.
Goals in a weight control programme:
No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg
Has to be treated as nominal variable and the information in the ordering is lost.
4
Partial order
Nominal Ordinal
Only part of the categories is ordered.
Goals in a weight control programme:
No goal set Keep current weight Reduction < 2 kg Reduction < 4 kg Reduction < 6 kg Reduction > 6 kg
No indication on the effect of the extra-ordinal category in relation to the others.
?
4
Nominal Ordinal
Tentative order
The ordering of the categories is of interest.
Danish political parties:
Ø SF A B Q CD Z V C DF
5
Nominal Ordinal
Tentative order
The ordering of the categories is of interest.
Danish political parties:
Ø SF A B Q CD Z V C DF
Ordering w.r.t. left-right affiliation
Methods for nominal variables do not give information on the nature of the relationship.
5
Ordinal information
Partially ordinal variables:
have to be treated as nominal variables in general
information in the ordering of the categories, and statistical power, is lost.
Tentatively ordinal variables:
the form of the association has to be deducted by examination of stratified tables or parameters of loglinear models
which in multivariate analysis can be most confusing.
6
An ordering X(r) of X is an ordinal random variable with a specific permutation r of the categories of X.
If X has a (partial) order, we regard only valid orderings of X, i.e. orderings based on permutations that do not violate this partial order.
Nominal variable: all orderings are valid
Ordinal variable: only one ordering is valid
An ordering of a categorical variable7
A Pγ measure of association
The Pγ measure of association between a partially ordered or nominal X and an ordinal Y: the maximum γ between a valid ordering of X and Y.
X(r)Yr
XYP γmaxγ
The optimal monotone ordering of X w.r.t. Y: the valid ordering of X for which this maximum is obtained.
X(r)Yr
opt γargmaxr
8
A partial γ measure of association
In multidimensional contingency tables one is often interested in the relationship of two variables, X and Y, conditional on (controlled for, stratified by) a third variable Z.
Within each stratum of Z, a γ measure is calculated between X and Y.
A partial γ measure of monotone association between X and Y is defined as a weighted summary γ measure across subtables spanned by the categories of Z.
9
The partial PγXY|Z between a partially ordered or nominal X and an ordinal Y conditional on a nominal Z is defined as the maximum partial γ between a valid ordering of X and Y.
The partial optimal monotone ordering of X w.r.t. Y, controlled for Z is the ordering corresponding to the partial PγXY|Z.
A partial Pγ measure of association10
Inference
Significance of the Pγ measure and its corresponding partial measure is assessed by comparison of the obtained value with a simulated distribution under the null hypothesis where X and Y are independent.
Resampling tests are standard in the analysis of multi-way contingency tables as tests based on the asymptotic distribution are of very low power
11
Simulation study
Relationship between X and Y conditional on Z
X and Y ordinal, Z nominal.
Dim(X) = 3 or 5
Dim(Y) = 3 or 5
Dim(Z) = 2 or 10
Uniform marginal distributions
N = 200
partial γ = 0 or 0.15
Categories of Y are permuted to calculate Pγ
12
Simulation study – results γ=0
The attained level of significance, i.e. the power of the tests when the true γ is 0, has to be 5%.
Our results show that this is not a problem. All MC estimates of the critical value are in the 95% confidence region:
0.05 0.0135
13
Simulation study – results γ=0.15
# rows
# columns
# strata
(XZ,YZ) against (XYZ)
(XZ,YZ) against
(XY,XZ,YZ)
Partial γ Partial Pγ
Correct ordering
Correlation with true ordering (mean)
3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70
3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68
3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54
3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53
5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72
5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70
5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60
5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57
14
Simulation study – results γ=0.15
# rows
# columns
# strata
(XZ,YZ) against (XYZ)
(XZ,YZ) against
(XY,XZ,YZ)
Partial γ Partial Pγ
Correct ordering
Correlation with true ordering (mean)
3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70
3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68
3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54
3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53
5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72
5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70
5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60
5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57
Considerably higher power than the other tests.
This was to be expected because the data was generated with a monotone relationship.
14
Simulation study – results γ=0.15
# rows
# columns
# strata
(XZ,YZ) against (XYZ)
(XZ,YZ) against
(XY,XZ,YZ)
Partial γ Partial Pγ
Correct ordering
Correlation with true ordering (mean)
3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70
3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68
3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54
3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53
5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72
5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70
5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60
5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57
The test based on P is not as powerful as the one based on .
The power is higher than both the LR tests considered here.
14
Simulation study – results γ=0.15
# rows
# columns
# strata
(XZ,YZ) against (XYZ)
(XZ,YZ) against
(XY,XZ,YZ)
Partial γ Partial Pγ
Correct ordering
Correlation with true ordering (mean)
3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70
3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68
3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54
3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53
5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72
5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70
5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60
5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57
The influence of the simulation parameters is intuitive.
This becomes clear in more extensive simulations.
14
Simulation study – results γ=0.15
# rows
# columns
# strata
(XZ,YZ) against (XYZ)
(XZ,YZ) against
(XY,XZ,YZ)
Partial γ Partial Pγ
Correct ordering
Correlation with true ordering (mean)
3 3 2 0.178 0.229 0.377 0.283 59.6% 0.70
3 3 10 0.084 0.221 0.343 0.280 58.2% 0.68
3 5 2 0.137 0.193 0.433 0.242 8.6% 0.54
3 5 10 0.074 0.202 0.425 0.255 8.9% 0.53
5 3 2 0.158 0.247 0.431 0.359 61.8% 0.72
5 3 10 0.079 0.196 0.408 0.329 62.0% 0.70
5 5 2 0.158 0.231 0.553 0.347 12.0% 0.60
5 5 10 0.097 0.219 0.518 0.312 10.8% 0.57
Insight is gained in the ordering of the categories.
The identification of the correct ordering depends on the number of categories that is permuted.
The ordering will be close to, but unlikely to be the correct ordering.
14
The relation between γ and Pγ
Dim(X) = 5
Dim(Y) = 5
Dim(Z) = 10
γ = 0.15
15
The relation between γ and Pγ
|γ|γP
15
The relation between γ and Pγ
|γ| is closer to Pγ when the estimated values for these coefficients are higher
|γ|γP
15
The distribution of Pγ
γ = 0
Normal?!?
16
The distribution of Pγ
γ = 0.15
Normal…
16
Danish political parties
European Values Studies
Denmark:
survey in 1981, 1990 and 1999
preferred political party (10 parties)
political attitudes measured on a left-right discrete (10 point) VAS scale
10 x 10 x 3 table
Significance of the assiciation is obvious
Ordering of the parties is common knowledge (up to a certain level…)
17
Danish political parties - Pγ
Left Right categories |Far left |The Red-Green Alliance Ø |The Socialist People’s party SF |The Social Democratic Party A |The Social Liberal Party B |The Christian People’s party Q |The Centre Democrats CD |The Progress Party Z |The Liberal Party V |The Conservative Party C |The Danish People’s Party DF | | |Far right
P = 0.629 a very strong association
18
Danish political parties - Pγ
Left Right categories |Far left |The Red-Green Alliance Ø |The Socialist People’s party SF |The Social Democratic Party A |The Social Liberal Party B |The Christian People’s party Q |The Centre Democrats CD |The Progress Party Z |The Liberal Party V |The Conservative Party C |The Danish People’s Party DF | | |Far right
Common knowledge:
Left (in this order)
Center
Right
18
Danish political parties - Pγ
Left Right categories |Far left |The Red-Green Alliance Ø |The Socialist People’s party SF |The Social Democratic Party A |The Social Liberal Party B |The Christian People’s party Q |The Centre Democrats CD |The Progress Party Z |The Liberal Party V |The Conservative Party C |The Danish People’s Party DF | | |Far right
The position of DF on the far right is somewhat surprising
New party in 1999
Reflects the political attitudes of the persons preferring DF to other parties in 1999
Since then, the party has with some success attempted to move towards the middle of the spectrum
18
A weight control program19
Weight goals against attained weight.
A considerable number have no goal set.
A weight control program19
For convenience we code the categories of the weight goal variable with letters.
A weight control program19
We investigate the placement of the no goal set category with Pγ.
The relationship is significant, but confounded.
A weight control program19
We investigate the no goal set category with the partial Pγ.
A weight control program19
We investigate the no goal set category with the partial Pγ conditional on many possible confounders.
A weight control program19
We construct the product of weight goal and weight at current session. The optimal monotone ordering with respect to attained weight shows effect modification.
A weight control program19
For normal or extremely obese people no goal set is associated with low attained weight, but others are better off with guidance in the form of weight goals.
Metabolic syndrome
Combinations (products) of ordinal variables
sex x age → C
hyperglycaemia x obesity → D
Metabolic Syndrome
No Yes D
sex age No Yes No Yes C C 1 4 3 2
M Y 45 15 14 5 1 3 83 9 5 2
M O 12 9 18 12 2 4 25 11 12 0
F Y 83 2 5 9 3 1 45 5 14 15
F O 25 0 12 11 4 2 12 12 18 9
D 1 2 3 4 Pγ = 0,53
Hyperglycaemia
Obesity
20
Metabolic syndrome
Categories of C and D permuted to obtain Pγ
• omnibus tests
• screening priority
• scale construction
Metabolic Syndrome
No Yes D
sex age No Yes No Yes C C 1 4 3 2
M Y 45 15 14 5 1 3 83 9 5 2
M O 12 9 18 12 2 4 25 11 12 0
F Y 83 2 5 9 3 1 45 5 14 15
F O 25 0 12 11 4 2 12 12 18 9
D 1 2 3 4 Pγ = 0,53
Hyp.glyc.
Obesity
D 1 2 3 4
20
Metabolic syndrome
It is difficult to interpret the obtained orderings as something clinically meaningful.
Metabolic Syndrome
No Yes D
sex age No Yes No Yes C C 1 4 3 2
M Y 45 15 14 5 1 3 83 9 5 2
M O 12 9 18 12 2 4 25 11 12 0
F Y 83 2 5 9 3 1 45 5 14 15
F O 25 0 12 11 4 2 12 12 18 9
D 1 2 3 4 Pγ = 0,53
Hyp.glyc.
Obesity
20
Canonical correlations
Can also find the optimal monotone ordering of one nominal variable relative to one ordinal variable. The extension to a partial analysis is troublesome.
Can also be used to construct omnibus tests. But these do not take the ordinal nature of the variables into account.
21
Conclusion
Pγ based inference is more powerful than nominal χ2 based inference.
Pγ based inference makes use of information in a partial ordering of the categories.
Pγ based inference gives information on the ordering of the categories.
Pγ based inference gives insight in effect modification.
Pγ based inference allows for omnibus tests.
22