Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers ›...

61
Visualizing Independence Using Extended Association and Mosaic Plots Achim Zeileis David Meyer Kurt Hornik

Transcript of Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers ›...

Page 1: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualizing Independence UsingExtended Association and Mosaic Plots

Achim Zeileis David Meyer Kurt Hornik

Page 2: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Overview

❆ The independence problem in 2-way contingency tables

❖ Standard approach: χ2 test

❖ Alternative approach: max test

❆ Visualizing the independence problem

❖ Association plots

❖ Mosaic plots

❆ Extensions

❖ Visualization & significance testing

❖ HCL instead of HSV colors

❖ Implementation in grid

❖ Multi-way tables

❆ The vcd package

Page 3: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

The independence problem

Standard approach:

❆ Analyze the relationship between two categorical variables

based on the associated 2-way contingency table.

❆ Measure the discrepancy between observed frequencies{nij

}and expected frequencies under independence

{n̂ij

}by the

Pearson residuals:

rij =nij − n̂ij√

n̂ij

.

❆ Use the Pearson X2 statistic for testing:

X2 =∑ij

r2ij,

which has an asymptotic χ2 distribution.

Page 4: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

The independence problem

Alternative approach(es):

❆ There are many conceivable functionals λ(·) which lead toreasonable test statistics λ

({rij

}).

❆ In particular:

M = maxij

∣∣∣rij

∣∣∣ .Then, every residual exceeding the critical value cα violatesthe null hypothesis at level α.

❆ Instead of relying on unconditional limiting distributions, per-form a permutation test, either by simulating or computingthe conditional permutation distribution of λ

({rij

}).

Page 5: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

The independence problem

Relationship between hair color and eye color among 328 female

students:

Eye colorHair color Brown Blue Hazel Green Total

Black 36 9 5 2 52Brown 81 34 29 14 158

Red 16 7 7 7 37Blond 4 64 5 8 181Total 137 114 46 31 328

X2 = 112.30 p = 0

M = 6.76 p = 0

Page 6: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

The independence problem

Home and away goals in the Bundesliga in 1995:

Away goalsHome goals 0 1 2 3 4 5 6

0 26 16 13 5 0 1 01 19 58 20 5 4 0 12 27 23 20 5 1 1 13 14 11 10 4 2 0 04 3 5 3 0 0 0 05 4 1 0 1 0 0 06 1 0 0 1 0 0 0

X2 = 46.07 p = 0.121

M = 2.87 p = 0.355

Page 7: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

The independence problem

Treatment for rheumatoid arthritis:

TreatmentImprovement Placebo Treated Total

None 29 13 42Some 7 7 14

Marked 7 21 28Total 43 41 84

X2 = 13.06 p = 0.001

M = 1.98 p = 0.001

Page 8: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization

Association plot: display for the Pearson residuals{rij

}and the

raw residuals{nij − n̂ij

}in an rectangular array.

Mosaic plot: display in which the sizes of the mosaic tiles is

proportional to the observed frequencies{nij

}.

Page 9: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization

Blo

ndR

edB

row

nB

lack

Brown Blue Hazel Green

Hai

rEye

Page 10: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization

Brown Blue Hazel GreenB

lack

Bro

wn

Red

Blo

nd

EyeH

air

Page 11: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Colors are commonly used to enhance these plots. In particular,

Friendly (1994) suggested shadings for mosaic displays.

Page 12: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Colors are commonly used to enhance these plots. In particular,

Friendly (1994) suggested shadings for mosaic displays.

In R these are implemented based on HSV colors.

The HSV color space is one of the most common implementa-

tions of color in many computer packages. Hue, saturation and

value range in [0,1].

Page 13: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

The hue is typically used to code the sign of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0hue

saturation = 1value = 1

Page 14: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

The hue is typically used to code the sign of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0hue

saturation = 1value = 1

rij < 0 rij > 0

Page 15: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Friendly’s extended mosaic displays use the saturation to code

the absolute size of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0saturation

h = 0

h = 2/3

value = 1

Page 16: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Friendly’s extended mosaic displays use the saturation to code

the absolute size of the residuals.

0.0 0.2 0.4 0.6 0.8 1.0saturation

h = 0

h = 2/3

value = 1

rij < 2 2 < rij < 4 rij > 4

Page 17: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Value is currently not used for coding, always set to 1.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

Page 18: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Value is currently not used for coding, always set to 1.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

Page 19: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

Blo

ndR

edB

row

nB

lack

Brown Blue Hazel Green

Hai

r

Eye

−4

−2

0

2

4

6

Pearsonresiduals:

Page 20: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HSV colors

−4

−2

0

2

4

6

Pearsonresiduals:

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 21: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

−2

−1

0

1

2

Pearsonresiduals:

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 22: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

Intuition: colored cells convey the impression that there is sig-

nificant dependence.

Page 23: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

Intuition: colored cells convey the impression that there is sig-

nificant dependence.

Currently this is not true. But it can be achieved by using the

90% and 99% critical values for the max statistic M instead of

2 and 4.

Advantage:

❆ color ⇔ significance

❆ highlights the cells which “cause” the dependence (if any).

Disadvantage:

❆ does not work for the χ2 test (or any other functional λ(·)).

Page 24: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 25: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

−2

−1

0

1

2

Pearsonresiduals:

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 26: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

Use value to code the result of a significance test for indepen-

dence.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

Page 27: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

Use value to code the result of a significance test for indepen-

dence.

0.0 0.2 0.4 0.6 0.8 1.0value

h = 0

h = 2/3

saturation = 1

non−significant significant

Page 28: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 29: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Visualization & testing

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.12133

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 30: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Disadvantages of HSV colors:

❆ device dependent,

❆ not copierproof,

❆ flashy colors good for drawing attention to a plot, but hard

to look at.

Page 31: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Disadvantages of HSV colors:

❆ device dependent,

❆ not copierproof,

❆ flashy colors good for drawing attention to a plot, but hard

to look at.

Alternative: use HCL colors instead (see Ihaka, 2003).

HCL colors are defined by hue (in [0,360]), chroma and lumi-

nance (in [0,100]). HCL space essentially looks like a double

cone.

Page 32: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 33: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 34: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 35: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 36: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 37: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 38: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 39: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 40: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 41: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 42: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 43: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 44: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

Page 45: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

chroma

025

5075

100

025

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260

lum

inan

ce

Page 46: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

5075

100

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260chroma

lum

inan

ce

Page 47: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

5075

100

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260chroma

lum

inan

ce

Page 48: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

5075

100

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260chroma

lum

inan

ce

Page 49: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

5075

100

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260chroma

lum

inan

ce

Page 50: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

5075

100

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260chroma

lum

inan

ce

significant

Page 51: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

5075

100

5075

100

100 75 50 25 0 25 50 75 100

hue = 0 hue = 260chroma

lum

inan

ce

significant

non−significant

Page 52: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 53: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

−4

−2

0

2

4

6

Pearsonresiduals:

p−value:< 2.22e−16

Brown Blue Hazel Green

Bla

ckB

row

nR

edB

lond

Eye

Hai

r

Page 54: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.12133

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 55: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

−2

−1

0

1

2

Pearsonresiduals:

p−value:0.12133

0 1 2 3 4 5 6

01

23

45

6

HomeGoals

Aw

ayG

oals

Page 56: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

−1

0

1

Pearsonresiduals:

p−value:0.0014626

Placebo Treated

Non

eS

ome

Mar

ked

Treatment

Impr

oved

Page 57: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

HCL colors

−1

0

1

Pearsonresiduals:

p−value:0.001

Placebo Treated

Non

eS

ome

Mar

ked

Treatment

Impr

oved

Page 58: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Implementation in grid

The graphics engine grid overcomes the old R concept of plots

with a plot region surrounded by a margin. grid is

❆ based on generic drawing regions (viewports),

❆ allows for plotting to relative coordinates,

❆ is also the basis for an implementation of Trellis graphics

called lattice.

(see Murrell, 2002)

Thus, the new implementation of mosaic and association plots

makes them easily reusable, e.g., in Trellis-like layouts.

Page 59: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Implementation in grid

Furthermore, graphics parameters for the rectangles, e.g.,

❆ fill color,

❆ line type,

❆ line color,

can be specified for each cell individually by the user. Each

graphics parameter can be an object of the same dimenionality

as the original table.

→ new shadings can easily be implemented.

Page 60: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

Multi-way tablesG

ende

r

Admit

Dept = AF

emal

eM

ale

Admit Reject

Dept = B

Fem

ale

Mal

e

Admit Reject

Dept = C

Fem

ale

Mal

e

Admit Reject

Dept = D

Fem

ale

Mal

e

Admit Reject

Dept = E

Fem

ale

Mal

e

Admit Reject

Dept = F

Fem

ale

Mal

e

Admit Reject

Page 61: Visualizing Independence Using Extended Association and Mosaic … › ~zeileis › papers › OeSG-2003.pdf · 2018-01-19 · Visualizing Independence Using Extended Association

The vcd package

New methods will be available in the package vcd for visualizing

categorical data.

Currently only in development version. The released version is

available from the Comprehensive R Archive Network

http://CRAN.R-project.org/

and it already offers some functionality for

❆ fitting & graphing of discrete distributions,

❆ plots for independence and agreement,

❆ visualization of log-linear models.