Correlation Hal Whitehead BIOL4062/5062. The correlation coefficient Tests Non-parametric...

Post on 13-Jan-2016

233 views 2 download

Transcript of Correlation Hal Whitehead BIOL4062/5062. The correlation coefficient Tests Non-parametric...

CorrelationHal Whitehead

BIOL4062/5062

• The correlation coefficient

• Tests

• Non-parametric correlations

• Partial correlation

• Multiple correlation

• Autocorrelation

• Many correlation coefficients

The correlation coefficient

Linked observations: x1,x2,...,xn y1,y2,...,yn

Mean: x = Σ xi / n y = Σ yi / n

Variance: S²(x)= Σ(xi-x)²/(n-1) S²(y)= Σ(yi-y)²/(n-1)

Standard Deviation:

S(x) S(y) Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)

Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)

Correlation coefficient

(“Pearson” or “product-moment”):

r = {Σ(xi-x) ∙ (yi-y) / (n-1) } / {S(x) ∙ S(y)}

r = S²(x,y) / {S(x) ∙ S(y)}

The correlation coefficient:

r = S²(x,y) / {S(x) ∙ S(y)}

-1 ≤ r ≤ +1

If no linear relationship: r = 0

r2: proportion of variance accounted for by linear regression

r = -0.01

r = 0.38

r = -0.31

r = 0.95

r = 0.04

r = 0.64

r = -0.46

r = 0.99

r = -0.0

Tests on Correlation Coefficients

Tests on Correlation Coefficients• Assume:

– Independence– Bivariate Normality

Tests on Correlation Coefficients• Assume:

– Independence– Bivariate Normality

Tests on Correlation Coefficients• Assume:

– Independence

– Bivariate Normality

• Then:

z = Ln [(1+r)/(1-r)]/2 is normally distributed

with variance 1/(n-3)

And, if (true population value of r) = 0 :

r ∙ √(n-2) / √(1-r²) is distributed as Student's t with n-2 degrees of freedom

We can test:

a) r ≠ 0

b) r > 0 or r < 0

c) r = constant

d) r(x,y) = r(z,w)

Also confidence intervals for r

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism

0

10

20

30

Rel

ativ

e M

elon

Are

a

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

r = 0.75

(SE = 0.15)

(95% C.I. 0.47-0.89)

Tests:

r ≠ 0 : P = 0.0001

r > 0 : P = 0.00005-30 -20 -10 0 10 20 30 40 50 60

Sexual Size Dimorphism

0

10

20

30

Rel

ativ

e M

elon

Are

a

More sexually dimorphic specieshave relatively larger melons

Why do Large Animals have Large Brains?

(Schoenemann Brain Behav. Evol. 2004)• Correlations among mammals

– Log brain size with

• Log muscle mass

r=0.984

• Log fat mass r=0.942

• Are these significantly different?

t=5.50; df=36; P<0.01

Hotelling-William test

• Brain mass is more closely related to muscle than fat 0.1 1.0 10.0 100.0 1000.0

Fat/Muscle mass (g)

1.0

10.0

100.0

Bra

in m

ass

(g)

MuscleFat

Non-Parametric Correlation

Non-Parametric Correlation

• If one variable normally distributed– can test r=0 as before.

• If neither normally distributed:– Spearman's rS rank correlation coefficient

(replace values by ranks)

or:– Kendall's τ correlation coefficient

• Use Spearman's when there is less certainty about the close rankings

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

r = 0.75

rS = 0.62

τ= 0.47

-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism

0

10

20

30

Rel

ativ

e M

elon

Are

a

Partial Correlation

Partial Correlation• Correlation between X and Y controlling for Z

r (X,Y|Z) = {r(X,Y) - r(X,Z)∙r(Y,Z)}

√{(1 - r(X,Z)²)∙(1 - r(Y,Z)²)}

• Correlation between X and Y controlling for W,Zr (X,Y|W,Z) = {r(X,Y|W) - r(X,Z|W)∙r(Y,Z|W)}

√{(1 - r(X,Z|W)²)∙(1 - r(Y,Z|W)²)}

n-2-c degrees of freedom

(c is number of control variables)

Why do Large Animals have Large Brains?

(Schoenemann Brain Behav. Evol. 2004)

• Correlations among mammals

– Log brain size with

Log muscle mass

Controlling for Log body mass

r=0.466

Log fat mass

Controlling for Log body mass

r=-0.299

• Fatter species have relatively smaller brains and more muscular species relatively larger brains

Semi-partial Correlation Coefficient

• Correlation between X & Y controlling Y for Z

r (X,(Y|Z)) = {r(X,Y) - r(X,Z)∙r(Y,Z)}

√(1 - r(Y,Z)²)

Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)

Correlation

r = 0.75

Partial Correlation

r (SSD,MA|L) = 0.73

Semi-partial Correlations

r (SSD,(MA|L)) = 0.69

r ((SSD |L),MA) = 0.71

ME

LA

RE

AS

SD

MELAREA

LE

NG

TH

SSD LENGTH

Multiple Correlation

Multiple Correlation Coefficient

• Correlation between one dependent variable and its best estimate from a regression on several independent variables:

r(Y∙X1,X2,X3,...)

• Square of multiple correlation coefficient is:– proportion of variance accounted for by multiple

regression

Multiple Partial Correlation Coefficient

!

Autocorrelation

Autocorrelation

• Purposes– Examine time series

– Look at (serial) independence

Data

(e.g. Feeding rate on consecutive days,

plankton biomass at each station on a transect):

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

Autocorrelation of lag=1 is correlation between:

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7

1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

r = 0.508

Autocorrelation of lag=2 is correlation between:

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9

4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

r = -0.053

…….

Autocorrelation Plot

0 5 10 15Lag

-1.0

-0.5

0.0

0.5

1.0

Cor

rela

t ion

Autocorrelation Plot (Correlogram)

Many Correlation Coefficients

Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Listwise deletion, n=40; P<0.10; P<0.05; uncorrected

Expected no. with P<0.10 = 3.6; with P<0.05 = 1.8

Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33 1.00LSPEED 0.10 -0.28 0.06 1.00APROP -0.15 -0.34 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29 -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Listwise deletion, n=40; P<0.10; P<0.05; Bonferroni corrected

P=1.0 for all coefficients

Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Listwise deletion, n=40; P<0.10; P<0.05; uncorrected

Pairwise deletion, n=59-118; P<0.10; P<0.05; uncorrectedNGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR

NGR25L 1.00SST 0.11 1.00SHITR -0.17+ -0.46* 1.00LSPEED 0.05 -0.17 0.05 1.00APROP -0.05 -0.20+ 0.04 0.31* 1.00SOCV -0.00 -0.05 -0.06 -0.02 -0.25* 1.00SHR2 -0.15 -0.13 0.07 -0.14 0.05 0.01 1.00LFMECS 0.01 0.07 -0.02 -0.14 -0.25* 0.43* -0.26+ 1.00LAERR -0.06 0.06 0.09 -0.27* -0.20+ 0.06 -0.06 0.21+ 1.00

Many Correlation Coefficients

• Missing values:– Listwise deletion (comparability), or– Pairwise deletion (power)

• P-values:– Uncorrected: type 1 errors– Bonferroni, etc.: type 2 errors

Beware!

Correlation Causation

Y1 Y2

Y1 Y3

Y4

Y2 Y5

Y1

Y3

Y2

Y2

Y1 Y3

Y4

Y1 Y3

Y4

Y2 Y5

Y1 Y3

Y4

Y5

Y2 Y6