Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis ›...

57
Extended Mosaic and Association Plots for Visualizing (Conditional) Independence Achim Zeileis David Meyer Kurt Hornik

Transcript of Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis ›...

Page 1: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Extended Mosaic and Association Plotsfor Visualizing (Conditional)

Independence

Achim Zeileis David Meyer Kurt Hornik

Page 2: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Overview

❆ The independence problem in 2-way contingency tables

❖ Standard approach: χ2 test

❖ Alternative approach: max test

❆ Visualizing the independence problem

❖ Association plots

❖ Mosaic plots

❆ Extensions

❖ Visualization & significance testing

❖ HCL instead of HSV colors

❖ Multi-way tables and conditional independence

❖ Implementation in grid

❆ The vcd package

Page 3: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

The independence problem

Standard approach:

❆ Analyze the relationship between two categorical variables

based on the associated 2-way contingency table.

❆ Measure the discrepancy between observed frequencies{nij

}and expected frequencies under independence

{n̂ij

}by the

Pearson residuals:

rij =nij − n̂ij√

n̂ij

.

❆ Use the Pearson X2 statistic for testing:

X2 =∑ij

r2ij,

which has an asymptotic χ2 distribution.

Page 4: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

The independence problem

Alternative approach(es):

❆ There are many conceivable functionals λ(·) which lead toreasonable test statistics λ

({rij

}).

❆ In particular:

M = maxij

∣∣∣rij

∣∣∣ .Then, every residual exceeding the critical value cα violatesthe null hypothesis at level α.

❆ Instead of relying on unconditional limiting distributions, per-form a permutation test, either by simulating or computingthe conditional permutation distribution of λ

({rij

}).

Page 5: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

The independence problem

Relationship between hair color and eye color among 328 female

students:

Eye colorHair color Brown Blue Hazel Green Total

Black 36 9 5 2 52Brown 81 34 29 14 158

Red 16 7 7 7 37Blond 4 64 5 8 181Total 137 114 46 31 328

X2 = 112.30 p = 0

M = 6.76 p = 0

Page 6: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

The independence problem

Home and away goals in the Bundesliga in 1995:

Away goalsHome goals 0 1 2 3 4 5 6

0 26 16 13 5 0 1 01 19 58 20 5 4 0 12 27 23 20 5 1 1 13 14 11 10 4 2 0 04 3 5 3 0 0 0 05 4 1 0 1 0 0 06 1 0 0 1 0 0 0

X2 = 46.07 p = 0.121

M = 2.87 p = 0.355

Page 7: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

Association plot: display for the Pearson residuals{rij

}and the

raw residuals{nij − n̂ij

}in an rectangular array.

Mosaic plot: display in which the sizes of the mosaic tiles is

proportional to the observed frequencies{nij

}.

Page 8: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

Blo

ndR

edB

row

nB

lack

Brown Blue Hazel Green

Hai

rEye

Page 9: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

Page 10: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

A

1 2 3

Page 11: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

A

B1 2 3

12

Page 12: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

A

B1 2 3

12

Page 13: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization

Brown Blue Hazel GreenB

lack

Bro

wn

Red

Blo

nd

EyeH

air

Page 14: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Colors are commonly used to enhance these plots. In particular,

Friendly (1994) suggested shadings for mosaic displays.

Page 15: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Colors are commonly used to enhance these plots. In particular,

Friendly (1994) suggested shadings for mosaic displays.

In R these are implemented based on HSV colors.

The HSV color space is one of the most common implementa-

tions of color in many computer packages. Hue, saturation and

value range in [0,1].

Page 16: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

The hue is typically used to code the sign of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0hue

saturation = 1value = 1

Page 17: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

The hue is typically used to code the sign of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0hue

saturation = 1value = 1

rij < 0 rij > 0

Page 18: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Friendly’s extended mosaic displays use the saturation to code

the absolute size of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0saturation

h = 0

h = 2/3

value = 1

Page 19: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Friendly’s extended mosaic displays use the saturation to code

the absolute size of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0saturation

h = 0

h = 2/3

value = 1

rij < 2 2 < rij < 4 rij > 4

Page 20: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Value is currently not used for coding, always set to 1.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

Page 21: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Value is currently not used for coding, always set to 1.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

Page 22: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

Blo

ndR

edB

row

nB

lack

Brown Blue Hazel Green

Hai

r

Eye

−4

−2

0

2

4

6

Pearsonresiduals:

Page 23: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HSV colors

−4

−2

0

2

4

6

Pearsonresiduals:

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 24: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

−2

−1

0

1

2

Pearsonresiduals:

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 25: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

Intuition: colored cells convey the impression that there is sig-

nificant dependence. Currently, this is not true.

Page 26: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

Intuition: colored cells convey the impression that there is sig-

nificant dependence. Currently, this is not true.

Approach 1: use the 90% and 99% critical values for the max

statistic M instead of 2 and 4.

Advantage:

❆ color ⇔ significance

❆ highlights the cells which “cause” the dependence (if any).

Disadvantage:

❆ does not work for the χ2 test (or any other functional λ(·)).

Page 27: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

Approach 2: Use value to code the result of a significance test

for independence.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

Page 28: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

Approach 2: Use value to code the result of a significance test

for independence.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

non−significant significant

Page 29: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 30: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 31: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.355

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 32: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Visualization & testing

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.12133

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 33: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HCL colors

Disadvantages of HSV colors:

❆ device dependent,

❆ not copierproof,

❆ flashy colors good for drawing attention to a plot, but hard

to look at.

Page 34: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HCL colors

Disadvantages of HSV colors:

❆ device dependent,

❆ not copierproof,

❆ flashy colors good for drawing attention to a plot, but hard

to look at.

Alternative: use HCL colors instead (see Ihaka, 2003).

HCL colors are defined by hue (in [0,360]), chroma and lumi-

nance (in [0,100]). HCL space essentially looks like a double

cone.

Page 35: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HCL colors

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel GreenB

lack

Bro

wn

Red

Blo

nd

EyeH

air

Page 36: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HCL colors

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel GreenB

lack

Bro

wn

Red

Blo

nd

EyeH

air

Page 37: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HCL colors

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.12133

0 1 2 3 4 5 60

12

34

56

HomeGoalsA

way

Goa

ls

Page 38: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

HCL colors

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.12133

0 1 2 3 4 5 60

12

34

56

HomeGoalsA

way

Goa

ls

Page 39: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Principal idea of the mosaic plot:

❆ subdivision of tiles according to (conditional) probabilities

→ can also be used for n-way tables

The same idea does not apply to association plots.

Page 40: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

(A ⊥⊥ | B)

A

B

1 2 3

12

Page 41: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

(A ⊥⊥ | B)

A

B

1 2 3

12

Page 42: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

(A ⊥⊥ | B)

A

B

1 2 3

12

1 2 1 2 1 2

Page 43: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

(A ⊥⊥ | B)

A

B

1 2 3

12

1 2

12

31

23

1 2 1 2

Page 44: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

(A ⊥⊥ | B)

A

B

1 2 3

12

1 2 1 2 1 2

Page 45: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

(A ⊥⊥ | B)

A

B

1 2 3

12

1 2 1 2 1 2

Page 46: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Complete independence: A ⊥⊥ B ⊥⊥ C (A ⊥⊥ | B)

A

B

12

1 2 31 2 1 2 1 2

Page 47: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Joint independence: (A, C) ⊥⊥ B (A ⊥⊥ | B)

A

B

12

1 2 31 2 1 2 1 2

Page 48: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Conditional independence: B ⊥⊥ C | A (A ⊥⊥ | B)

A

B

1 2 3

12

1 2 1 2 1 2

Page 49: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Correspondence:

❆ conditioning in the model (→ shading of residuals)

❆ conditioning in the visual display

→ can also be done in Trellis-like layout

This idea does also work for association plots.

Page 50: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

−3

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.0028402

A B C D E F

Mal

eF

emal

e

Admit Reject Admit Admit Reject Admit Reject Admit Admit

Dept

Gen

der

Page 51: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tablesG

ende

r

Admit

Dept = A

Admit Reject

Mal

eF

emal

e

Dept = B

Admit Reject

Mal

eF

emal

e

Dept = C

Admit Reject

Mal

eF

emal

e

Dept = D

Admit Reject

Mal

eF

emal

e

Dept = E

Admit Reject

Mal

eF

emal

e

Dept = F

Admit Reject

Mal

eF

emal

e

Page 52: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tablesG

ende

r

Admit

Dept = AF

emal

eM

ale

Admit Reject

Dept = B

Fem

ale

Mal

e

Admit Reject

Dept = C

Fem

ale

Mal

e

Admit Reject

Dept = D

Fem

ale

Mal

e

Admit Reject

Dept = E

Fem

ale

Mal

e

Admit Reject

Dept = F

Fem

ale

Mal

e

Admit Reject

Page 53: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Conditioning in the plot:

R> assocplot(Admit ~ Gender | Dept, data = UCBAdmissions)

Page 54: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Multi-way tables

Conditioning in the plot:

R> assocplot(Admit ~ Gender | Dept, data = UCBAdmissions)

Conditioning in the model:

R> fm <- loglm(~(Admit + Gender) * Dept, data = UCBAdmissions)

R> assocplot(fm)

Page 55: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Implementation in grid

The graphics engine grid overcomes the old R concept of plots

with a plot region surrounded by a margin. grid is

❆ based on generic drawing regions (viewports),

❆ allows for plotting to relative coordinates,

❆ is also the basis for an implementation of Trellis graphics

called lattice.

(see Murrell, 2002)

Thus, the new implementation of mosaic and association plots

makes them easily reusable, e.g., in Trellis-like layouts.

Page 56: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

Implementation in grid

Furthermore, graphics parameters for the rectangles, e.g.,

❆ fill color,

❆ line type,

❆ line color,

can be specified for each cell individually by the user. Each

graphics parameter can be an object of the same dimensionality

as the original table.

→ new shadings can easily be implemented.

Page 57: Extended Mosaic and Association Plots for Visualizing (Conditional) Independence › ~zeileis › papers › gR-2003.pdf · 2018-01-19 · Association plot: display for the Pearson

The vcd package

New methods will be available in the package vcd for visualizing

categorical data.

Currently only in development version. The released version is

available from the Comprehensive R Archive Network

http://CRAN.R-project.org/

and it already offers some functionality for

❆ fitting & graphing of discrete distributions,

❆ plots for independence and agreement,

❆ visualization of log-linear models.