CorrelationHal Whitehead
BIOL4062/5062
• The correlation coefficient
• Tests
• Non-parametric correlations
• Partial correlation
• Multiple correlation
• Autocorrelation
• Many correlation coefficients
The correlation coefficient
Linked observations: x1,x2,...,xn y1,y2,...,yn
Mean: x = Σ xi / n y = Σ yi / n
Variance: S²(x)= Σ(xi-x)²/(n-1) S²(y)= Σ(yi-y)²/(n-1)
Standard Deviation:
S(x) S(y) Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)
Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)
Correlation coefficient
(“Pearson” or “product-moment”):
r = {Σ(xi-x) ∙ (yi-y) / (n-1) } / {S(x) ∙ S(y)}
r = S²(x,y) / {S(x) ∙ S(y)}
The correlation coefficient:
r = S²(x,y) / {S(x) ∙ S(y)}
-1 ≤ r ≤ +1
If no linear relationship: r = 0
r2: proportion of variance accounted for by linear regression
r = -0.01
r = 0.38
r = -0.31
r = 0.95
r = 0.04
r = 0.64
r = -0.46
r = 0.99
r = -0.0
Tests on Correlation Coefficients
Tests on Correlation Coefficients• Assume:
– Independence– Bivariate Normality
Tests on Correlation Coefficients• Assume:
– Independence– Bivariate Normality
Tests on Correlation Coefficients• Assume:
– Independence
– Bivariate Normality
• Then:
z = Ln [(1+r)/(1-r)]/2 is normally distributed
with variance 1/(n-3)
And, if (true population value of r) = 0 :
r ∙ √(n-2) / √(1-r²) is distributed as Student's t with n-2 degrees of freedom
We can test:
a) r ≠ 0
b) r > 0 or r < 0
c) r = constant
d) r(x,y) = r(z,w)
Also confidence intervals for r
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism
0
10
20
30
Rel
ativ
e M
elon
Are
a
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
r = 0.75
(SE = 0.15)
(95% C.I. 0.47-0.89)
Tests:
r ≠ 0 : P = 0.0001
r > 0 : P = 0.00005-30 -20 -10 0 10 20 30 40 50 60
Sexual Size Dimorphism
0
10
20
30
Rel
ativ
e M
elon
Are
a
More sexually dimorphic specieshave relatively larger melons
Why do Large Animals have Large Brains?
(Schoenemann Brain Behav. Evol. 2004)• Correlations among mammals
– Log brain size with
• Log muscle mass
r=0.984
• Log fat mass r=0.942
• Are these significantly different?
t=5.50; df=36; P<0.01
Hotelling-William test
• Brain mass is more closely related to muscle than fat 0.1 1.0 10.0 100.0 1000.0
Fat/Muscle mass (g)
1.0
10.0
100.0
Bra
in m
ass
(g)
MuscleFat
Non-Parametric Correlation
Non-Parametric Correlation
• If one variable normally distributed– can test r=0 as before.
• If neither normally distributed:– Spearman's rS rank correlation coefficient
(replace values by ranks)
or:– Kendall's τ correlation coefficient
• Use Spearman's when there is less certainty about the close rankings
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
r = 0.75
rS = 0.62
τ= 0.47
-30 -20 -10 0 10 20 30 40 50 60Sexual Size Dimorphism
0
10
20
30
Rel
ativ
e M
elon
Are
a
Partial Correlation
Partial Correlation• Correlation between X and Y controlling for Z
r (X,Y|Z) = {r(X,Y) - r(X,Z)∙r(Y,Z)}
√{(1 - r(X,Z)²)∙(1 - r(Y,Z)²)}
• Correlation between X and Y controlling for W,Zr (X,Y|W,Z) = {r(X,Y|W) - r(X,Z|W)∙r(Y,Z|W)}
√{(1 - r(X,Z|W)²)∙(1 - r(Y,Z|W)²)}
n-2-c degrees of freedom
(c is number of control variables)
Why do Large Animals have Large Brains?
(Schoenemann Brain Behav. Evol. 2004)
• Correlations among mammals
– Log brain size with
Log muscle mass
Controlling for Log body mass
r=0.466
Log fat mass
Controlling for Log body mass
r=-0.299
• Fatter species have relatively smaller brains and more muscular species relatively larger brains
Semi-partial Correlation Coefficient
• Correlation between X & Y controlling Y for Z
r (X,(Y|Z)) = {r(X,Y) - r(X,Z)∙r(Y,Z)}
√(1 - r(Y,Z)²)
Are Whales Battering Rams?(Carrier et al. J. Exp. Biol. 2002)
Correlation
r = 0.75
Partial Correlation
r (SSD,MA|L) = 0.73
Semi-partial Correlations
r (SSD,(MA|L)) = 0.69
r ((SSD |L),MA) = 0.71
ME
LA
RE
AS
SD
MELAREA
LE
NG
TH
SSD LENGTH
Multiple Correlation
Multiple Correlation Coefficient
• Correlation between one dependent variable and its best estimate from a regression on several independent variables:
r(Y∙X1,X2,X3,...)
• Square of multiple correlation coefficient is:– proportion of variance accounted for by multiple
regression
Multiple Partial Correlation Coefficient
!
Autocorrelation
Autocorrelation
• Purposes– Examine time series
– Look at (serial) independence
Data
(e.g. Feeding rate on consecutive days,
plankton biomass at each station on a transect):
1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6
Autocorrelation of lag=1 is correlation between:
1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7
1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6
r = 0.508
Autocorrelation of lag=2 is correlation between:
1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9
4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6
r = -0.053
…….
Autocorrelation Plot
0 5 10 15Lag
-1.0
-0.5
0.0
0.5
1.0
Cor
rela
t ion
Autocorrelation Plot (Correlogram)
Many Correlation Coefficients
Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]
NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00
Listwise deletion, n=40; P<0.10; P<0.05; uncorrected
Expected no. with P<0.10 = 3.6; with P<0.05 = 1.8
Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]
NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33 1.00LSPEED 0.10 -0.28 0.06 1.00APROP -0.15 -0.34 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29 -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00
Listwise deletion, n=40; P<0.10; P<0.05; Bonferroni corrected
P=1.0 for all coefficients
Many Correlation Coefficients:[Behaviour of Sperm Whale Groups]
NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERRNGR25L 1.00SST 0.12 1.00SHITR -0.21 -0.33* 1.00LSPEED 0.10 -0.28+ 0.06 1.00APROP -0.15 -0.34* 0.07 0.18 1.00SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00
Listwise deletion, n=40; P<0.10; P<0.05; uncorrected
Pairwise deletion, n=59-118; P<0.10; P<0.05; uncorrectedNGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR
NGR25L 1.00SST 0.11 1.00SHITR -0.17+ -0.46* 1.00LSPEED 0.05 -0.17 0.05 1.00APROP -0.05 -0.20+ 0.04 0.31* 1.00SOCV -0.00 -0.05 -0.06 -0.02 -0.25* 1.00SHR2 -0.15 -0.13 0.07 -0.14 0.05 0.01 1.00LFMECS 0.01 0.07 -0.02 -0.14 -0.25* 0.43* -0.26+ 1.00LAERR -0.06 0.06 0.09 -0.27* -0.20+ 0.06 -0.06 0.21+ 1.00
Many Correlation Coefficients
• Missing values:– Listwise deletion (comparability), or– Pairwise deletion (power)
• P-values:– Uncorrected: type 1 errors– Bonferroni, etc.: type 2 errors
Beware!
Correlation Causation
Y1 Y2
Y1 Y3
Y4
Y2 Y5
Y1
Y3
Y2
Y2
Y1 Y3
Y4
Y1 Y3
Y4
Y2 Y5
Y1 Y3
Y4
Y5
Y2 Y6
Top Related