Simmons Comprehensive Cancer Center

19
C OMPARING GROUPS –PART 1C ONTINUOUS DATA Min Chen, Ph.D. Assistant Professor Quantitative Biomedical Research Center Department of Clinical Sciences Bioinformatics Shared Resource Simmons Comprehensive Cancer Center Lecture 4 July 9, 2013 Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 1 / 38 OUTLINE 1 REVIEW 2 I NTRODUCTION 3 COMPARISON OF TWO GROUPS Parametric tests Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 2 / 38

Transcript of Simmons Comprehensive Cancer Center

Page 1: Simmons Comprehensive Cancer Center

COMPARING GROUPS – PART 1 CONTINUOUS DATA

Min Chen, Ph.D.

Assistant Professor

Quantitative Biomedical Research CenterDepartment of Clinical SciencesBioinformatics Shared Resource

Simmons Comprehensive Cancer Center

Lecture 4

July 9, 2013

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 1 / 38

OUTLINE

1 REVIEW

2 INTRODUCTION

3 COMPARISON OF TWO GROUPSParametric tests

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 2 / 38

Page 2: Simmons Comprehensive Cancer Center

REVIEW: (1−α )% CONFIDENCE INTERVAL OF THEMEAN

Lower Limit :

L = X̄− zα/2 ×s√n

Upper Limit :

U = X̄+ zα/2 ×s√n

Standard Normal Distribution:

µ = 0,σ = 1

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 3 / 38

REVIEW OF CONFIDENCE INTERVAL FROM SMALLSAMPLE

As a rule of thumb, if sample size, N < 30, use the formula below.

(1−α)% Confidence Interval:

X̄± tα/2,n−1 ×s√n

where tα/2 is the (α/2)th quantile of the t-distribution with

(n -1) degrees of freedom.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 4 / 38

Page 3: Simmons Comprehensive Cancer Center

REVIEW: INTERPRETATION OF CI

The CI:

Pr(L(X)≤ θ ≤ U(X)) = 1−α.

It is temping to state “the probability that the θ lies between two

numbers, L and U, is (1−α)”.

� Wrong because θ is a fixed number;� L(X) and U(X) are random variables, not numbers.� On average 95% times the calculated intervals will contain the true

population parameter θ .

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 5 / 38

RELATIONSHIP BETWEEN TYPE I ERROR (α ) AND POWER

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 6 / 38

Page 4: Simmons Comprehensive Cancer Center

PARAMETRIC VS NON-PARAMETRIC

Parametric tests

Assume data follow some known distribution

E.g., Normal, t-distribution, chi-square, Binomial distribution etc. –

Compare means, variances

Non-parametric tests

Don’t assume a form of distribution

Compare other measures of central tendency (e.g., median, or location

shift)

Useful for skewed data, small samples, ordinal data

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 7 / 38

NOTATION

Population parameter Sample value

Mean µ X̄

Standard deviation σ s

Variance σ2s

2

Sample Size n

Sample Value xi

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 8 / 38

Page 5: Simmons Comprehensive Cancer Center

ONE SAMPLE t TEST

Recall one - sample t-test:

t =X̄−µ0

s/√

n

Test statistic for comparing the mean of one group against a fixed

value.

General form of a t-statistic is

t =difference of means

standard error.

T-statistic follows a t-distribution!

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 9 / 38

STUDENT’S t-DISTRIBUTION

Here is how to generate a Student’s t random variable:

Tν =Z�V/ν

,

where

Z is a standard normal distribution;

V has a chi-squared distribution with ν degrees of freedom (df), i.e.,

V =ν

∑i=1

Z2i

where Zi are iid standard normal r.v.’s. (Recall E[Z2i] = 1. So

E[V] = ν .)

Z and V are independent.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 10 / 38

Page 6: Simmons Comprehensive Cancer Center

t–A FAMILY OF DISTRIBUTIONS IDENTIFIED BY df

Recall t = X̄−µ0s/√

n=

(X̄−µ0)/σ√

n√s2/σ2

.

Approaches Normal distribution as df increases.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 11 / 38

SMALL SAMPLE VS. LARGE SAMPLE

Recall in CI, as a rule of thumb, if sample size n < 30, use the tstatistic for the (1−α)% confidence Interval:

X̄± tα/2,n−1 ×s√n

while for large samples we have

X̄± zα/2 ×s√n.

The reason is when sample size is large,

tα/2,n−1 ≈ zα/2.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 12 / 38

Page 7: Simmons Comprehensive Cancer Center

OUTLINE

1 REVIEW

2 INTRODUCTION

3 COMPARISON OF TWO GROUPSParametric tests

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 13 / 38

COMPARING MEANS OF PAIRED SAMPLES

In paired samples each data point in one sample is matched to another

data point in the second sample.

Same subject

� Measured at 2 time points� Before and after intervention� Two eyes (Left, Right)� Two organs (Heart, Liver)

Matched subjects

� Experimental animal, Pair-fed Match� Male, Female

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 14 / 38

Page 8: Simmons Comprehensive Cancer Center

COMPARING MEANS OF TWO INDEPENDENT SAMPLES

Two independent samples

Subjects are unrelated in two separate groups;

Sample sizes may be different in each group, (n1,n2)

Variances in each group may be

� Equal, σ21 = σ2

2� Unequal, σ2

1 �= σ22

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 15 / 38

EXAMPLE 1In a hypertension research study, subjects are given dietary counseling to

restrict their sodium intake. Data on urinary sodium from 8 subjects at

Baseline (Week 0), and Week 1, are shown.

Subject Week 0 Week 1 Change

1 7.85 9.59 1.74

2 12.03 34.5 22.47

3 21.84 4.55 -17.29

4 13.94 20.78 6.84

5 16.68 11.69 -4.99

6 41.78 32.51 -9.27

7 14.97 5.46 -9.51

8 12.07 12.95 0.88

X̄ 17.65 16.5 1.14

s 10.56 11.63 12.22

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 16 / 38

Page 9: Simmons Comprehensive Cancer Center

EXAMPLE 1 (CONTD.)

Subject Week 0 Week 1 Change

1 7.85 9.59 1.74

2 12.03 34.5 22.47

3 21.84 4.55 -17.29

4 13.94 20.78 6.84

5 16.68 11.69 -4.99

6 41.78 32.51 -9.27

7 14.97 5.46 -9.51

8 12.07 12.95 0.88

X̄ 17.65 16.5 1.14

s 10.56 11.63 12.22

Q1:Paired samples or two independent samples?

Q2: Is there a change in mean levels of urinary sodium after 1 week?

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 17 / 38

PAIRED t-TEST

Example 1 has paired sample data (since same subject was measured at

two time points).

Compute the mean and standard deviations of differences.

H0 : µ1 −µ2 = c vs. Ha : µ1 −µ2 �= c

t =X̄d − c

sd/√

n,

which follows a t-distribution with (n−1) degrees of freedom.

If |t|> t∗n−1(1−α/2), reject H0. Here t

∗n−1(1−α/2) is the (1−α/2)

quantile of Tn−1.

P− value = Pr(Tn−1 > |t|).

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 18 / 38

Page 10: Simmons Comprehensive Cancer Center

REJECTION REGIONS

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 19 / 38

PAIRED T-TEST USING EXCEL – EXAMPLE 1

Values shown in bold red have been modified from original data.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 20 / 38

Page 11: Simmons Comprehensive Cancer Center

EXAMPLE 2

A study was performed to compare the mean ERG (electroretinogram)

amplitude of patients with different genetic types of retinitis pigmentosa

(RP), a genetic eye disease that often results in blindness. Data was

collected in patients of age 18-29 years with different genetic types.

Genetic type Mean ± SD N

Dominant 0.85 ± 0.18 62

Recessive 0.38 ± 0.21 35

Table shows values for natural log of ERG.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 21 / 38

EXAMPLE 2 (CONTD.)

Q1:Paired samples or two independent samples?

Q2: Is there a difference in mean log(ERG) amplitude between patients

with dominant RP versus those with the recessive form?

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 22 / 38

Page 12: Simmons Comprehensive Cancer Center

TWO-SAMPLE t-TEST WITH EQUAL VARIANCES

Example 2 has two independent samples.

H0 : µ1 = µ2 vs. Ha : µ1 �= µ2

t =X̄1 − X̄2

sp

�1n1+ 1

n2

,

which follows a t-distribution with (n1 +n2 −2) degrees of freedom, where

s2p=

(n1 −1)s21 +(n2 −1)s2

2n1 +n2 −2

is the pooled variance.

If |t|> t∗n1+n2−2(1−α/2), reject H0.

P− value = Pr(Tn−1 > |t|).

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 23 / 38

TWO-SAMPLE t-TEST FOR EQUAL VARIANCES USINGEXCEL–EXAMPLE 2

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 24 / 38

Page 13: Simmons Comprehensive Cancer Center

COMPARING VARIANCES

In Example 2, the two-sample t-test for independent samples assumed that

variances were equal

Variance of Group 1 = Variance of Group 2

Note that σ21 = 0.182 = 0.032 and σ2

2 = 0.212 = 0.044.

Is equal variance assumption true?

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 25 / 38

COMPARING VARIANCES

To compare variances, we conduct a hypothesis test to exam if the ratio of

variances is equal to 1.

H0 :σ2

1σ2

2= 1 vs. Ha :

σ21

σ22�= 1

Test statistic: f =s

21

s22,

which follows an F-distribution.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 26 / 38

Page 14: Simmons Comprehensive Cancer Center

F-DISTRIBUTION

Here is how to generate a F random variable:

Fν1,ν2 =V1/ν1

V2/ν2,

where

V1 and V2 have chi-squared distributions with ν1 and ν2 degrees of

freedom (df), respectively.

V1 and V2 are independent.

Recall E[V] = ν .

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 27 / 38

F-DISTRIBUTION

F-distribution is a family of distributions that are identified by numerator

and denominator degrees of freedom (df).

F-distribution are

always

right-skewed;

Have numerator

and denominator

df.Recall

f =s

21

s22=

s21/σ2

1s

22/σ2

2.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 28 / 38

Page 15: Simmons Comprehensive Cancer Center

REJECTION REGIONS FOR THE F-TEST

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 29 / 38

F-TEST FOR COMPARING VARIANCES

H0 :σ2

1σ2

2= 1 vs. Ha :

σ21

σ22�= 1

Test statistic: f =s

21

s22,

which follows an F-distribution with (n1 −1,n2 −1) degrees of freedom.

If f > Fn1−1,n2−1(1−α/2) or f < Fn1−1,n2−1(α/2), Reject H0.

If f ≥ 1, then P value = 2×Pr(Fn1−1,n2−1 > f );

If f < 1, then P value = 2×Pr(Fn1−1,n2−1 < f ).

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 30 / 38

Page 16: Simmons Comprehensive Cancer Center

F-TEST FOR EQUALITY OF VARIANCES USINGEXCEL–EXAMPLE 2

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 31 / 38

TWO-SAMPLE t-TEST WITH UNEQUAL VARIANCES

H0 : µ1 = µ2 vs. Ha : µ1 �= µ2

t =X̄1 − X̄2�

s21

n1+

s22

n2

,

which follows a t-distribution with d�degrees of freedom, where

d� =

�s

21/n1 + s

22/n2

�2

s21/n1

n1−1 +s

22/n2

n2−1

.

Round d�down to nearest integer and call it d”.

If |t|> t∗d”(1−α/2), reject H0.

P− value = Pr(Td” > |t|).

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 32 / 38

Page 17: Simmons Comprehensive Cancer Center

EXAMPLE 3

A research study aimed to assess the familial aggregation of cholesterol

levels by collecting data on children of age 2- to 14-years. Cholesterol levels

(mg/dL) were collected in one group of children (say, “cases”) whose father

died from heart disease. Data were also collected in historical control group

of children of same age.

Group Mean ± SD N

Cases 207.3 ± 35.6 100

Historical Control 193.4 ± 17.3 74

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 33 / 38

EXAMPLE 3 (CONTD.)

Paired sample or two independent samples?

Is there a difference in mean cholesterol levels between Cases and

Historical Control group?

Which statistical test should we use?

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 34 / 38

Page 18: Simmons Comprehensive Cancer Center

F-TEST FOR EQUALITY OF VARIANCES USINGEXCEL–EXAMPLE 3

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 35 / 38

TWO-SAMPLE t-TEST FOR UNEQUAL VARIANCES USINGEXCEL–EXAMPLE 3

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 36 / 38

Page 19: Simmons Comprehensive Cancer Center

ADVANTAGES OF PAIRED SAMPLES

Suppose we want to test H0 : µ1 = µ2 vs. Ha : µ1 �= µ2

Test statistic is related to X̄1 − X̄2.The variance is:

Var(X̄1 − X̄2) = Var(X̄1)+Var(X̄2)−2ρ12

�Var(X̄1) ·Var(X̄2)

The positive correlation ρ12 in paired-samples reduces the variance of

the difference, yielding more powerful test than the independent

sample design.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 37 / 38

REFERENCES I

Rafia Bhore. Lecture notes.

Berman, Nancy (2007). Comparison of Means. In Methods in

Molecular Biology, Vol 404: Topics in Biostatistics, edited by W. T.

Ambrosius. Humana Press Inc., Totowa, NJ, USA.

Rosner, Bernard (2000). Fundamentals of Biostatistics, 5th edition.

Duxbury Press, California, USA.

Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 38 / 38