Section IV

Section IV

Sampling distributionsConfidence intervals

Hypothesis testing and p values

1

Population and sampleWe wish to make inferences

(generalizations) about an entire target population (ie, generalize to “everyone”) even though we only study one sample

(have only one study).Population parameters=summary values for

the entire population (ex: μ,σ,ρ,β )Sample statistics=summary values for a

sample (ex: Y, S, r, b)2

Samples drawn from a population

Population

sampleSample is drawn “at random”.

Everyone in the target population is eligible for sampling.

3

True population distribution of Y(individuals)- not Gaussian

Original distribution of Y-individuals

0%5%

10%15%20%25%30%

1 2 3 4

Y

Mean Y=μ= 2.5, SD=σ=1.12 4

Possible samples & statistics from the population (true mean=2.5) sample (n=4) mean (statistic) 1,1,1,1 1.00… 2,2,4,3 2.75… 4,4,4,4 4.00

5

Distribution of the sample means (Ys) - Sampling distribution-

each observation is a SAMPLE statistic

__Y

Mean Y = 2.5, SEM = 0.56, n=4 SEM = SD/n the square root n law

sampling distribution

0

10

20

30

40

50

1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00

freq

uenc

y

6

Central Limit TheoremFor a large enough n, the distribution of any

sample statistic (mean, mean difference, OR, RR, hazard, correlation coeff,regr coeff, proportion…) from sample to sample has a Gaussian (“Normal”) distribution centered at the true population value.

The standard error is proportional to 1/√n. (Rule of thumb: n> 30 is usually enough. May

need non parametric methods for small n) 7

Funnel plot - true difference is δ= 5Each point is one study (meta analysis)

0

50

100

150

200

250

300

350

400

-15.0 -10.0 -5.0 0.0 5.0 10.0 15.0 20.0 25.0

sample mean difference

sam

ple

size

(n)

9

Publication bias - non reproducibility

10

0.00 1.00 2.00 3.00 4.00 5.00 6.000

0.4

true effect=3, published=4

Studies with larger sample effects are more likely to be published and may be larger than the true average effect.

Science – 28 Aug 2015 (Nosek)

11

The mean effect size of the replication effects (M=0.197, SD= 0.257) was half the magnitude of the mean effect size of the original effects (M = 0.403, SD = 0.188). …Ninety-seven percent of original studies had significant results (p < .05).Thirty-six percent of replications had significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; …

Resampling estimation (“bootstrap”)

One does not repeatedly sample from the same population, (one only carries out the study once). But a “simulation” of repeated sampling from the population can be obtained by repeatedly sampling from the sample with replacement & computing the statistic from each resample, creating an “estimated” sampling distribution. The SD of the statistics across all “resamples” is an estimate the standard error (SE) for the statistic.

12

Samples drawn from a populationsample

Population

Original sample

sampleSample is drawn “at random” with

replacement. Everyone in the original sample is eligible for sampling.

13

sample

sample

Confidence interval (for μ)We do not know μ from a sample.

For a sample mean Y and standard error SE, a confidence interval for the

population mean μ is formed by Y - Z SE, Y + Z SE

(sample statistic is in the middle)

For a 95% confidence interval, we use Z=1.96 (Why?) and compute

Y – 1.96 SE, Y + 1.96 SElower upper

mean

14

Confidence Intervals (CI)and sampling dist of Y

Sampling Distribution

2.5% 2.5%

-1.96(/n) 1.96(/n)

95% CI: Y 1.96 (/n)

15

95% Confidence intervals

95% of the intervals will

contain the true population value

But which ones?

16

Z vs t (technical note)

Confidence intervals made with Z assume that the population σ is known. Since σ is usually not known and is estimated with the sample SD, the Gaussian table areas need to be adjusted. The adjusted tables are called “t” tables instead of Gaussian tables (t distribution). For n > 30, they are about the same.

17

Z distribution vs t distribution, about the same for n > 30

18

t vs Gaussian Z percentiles

%ile 85th 90th 95th 97.5th 99.5th

Confidence 70% 80% 90% 95% 99%t, n=5 1.156 1.476 2.015 2.571 4.032

t, n=10 1.093 1.372 1.812 2.228 3.169t, n=20 1.064 1.325 1.725 2.086 2.845t, n=30 1.055 1.310 1.697 2.042 2.750

Gaussian 1.036 1.282 1.645 1.960 2.576

What did the z distribution say to the t distribution? You may look like me but you're not normal.

19

Confidence IntervalsSample Statistic ± Ztabled SE

(using known variance)

Sample Statistic ± ttabled SE(using estimate of variance)

Example: CI for the difference between two means: __ __ (Y1 – Y2) ± ttabled (SEd)

Tabled t uses degrees of freedom, df=n1+n2-2

20

CI for a proportion“law” of small numbers

n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence

bounds? Is is likely that the “real” proportion is more

than 50%?

21

CI for a proportion“law” of small numbers

n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence

bounds? Is is likely that the “real” proportion is more

than 50%?

Answer: 95% CI: 6.7% to 65.3%

22

Standard error for the difference between two means

__Y1 has mean μ1 and SE = √σ1

2/n1 = SE1

__Y2 has mean μ2 and SE = √σ2

2/n2 = SE2

For the difference between two means (δ=1 - 2)

SEδ = √(σ12/n1 + σ2

2/n2)

SEd = (SE12 + SE2

2)

SEd SE2 SE1 SEd is computed from SE1 and SE2 using “Pythagoras’ rule”.

SEd2 = SE1

2 + SE22

23

Statistics for HBA1c changefrom base to 26 weeks (Pratley et al, Lancet 2010)

Tx n Mean SD SE

Liraglutide 225 -1.24 0.99 0.066

Sitaglipin 219 -0.90 0.98 0.066

__Mean difference = d = 0.34 % Std error of mean difference= SEd=[0.0662 + 0.0662] = 0.093%Using t{df=442}=1.97 for the 95% confidence interval: CI: 0.34% ± 1.97 (0.093%) or (0.16%, 0.52%)

24

Null hypothesis & p valuesNull Hypothesis- Assume that, in the population,

the two treatments give the same average improvement in HbA1c. So the average difference is δ=0.

Under this assumption, how likely is it to observe a sample mean difference of d= 0.34% (or more extreme) in any study? This probability is called the (one sided) p value.

The p value is only defined for a given null hypothesis.

25

Hypothesis testingfor a mean difference, d

d =sample mean HBA1c chg difference, _ d = 0.34%, SEd = 0.093%

95% CI for true mean difference = (0.16%, 0.52%)But, under the null hypothesis, the true mean difference (δ) should be zero.

How “far” is the observed 0.34% mean difference from zero (in SE units)?

tobs = (mean difference – hypothesized difference) / SEdiff

tobs = (0.34 – 0) / 0.093 = 3.82 SEsp value: probability of observing t=3.82 or larger if null hypothesis is true.

p value = 0.00008 (one sided t with df=442)p value = 0.00016 (two sided) 26

Hypothesis test statisticsZobs = (Sample Statistic – null value) / Standard error

-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0z

p value

Z (or t)=3.82

27

Difference & Non inferiority (equivalence) hypothesis testing

Difference Testing:Null Hyp: A=B (or A-B=0), Alternative: A≠B Zobs = (observed stat – 0) / SE

Non inferiority (within δ) Testing: Null Hyp: A > B + δ, Alternative: A <= B + δ Zeq = (observed stat – δ )/ SE Must specify δ for non inferiority testing

29

Non inferiority testing-HBA1c dataFor HBA1c data, assume we declare non inferiority if

the true mean difference is δ=0.40% or less. The observed mean difference is d=0.34%, which is smaller than 0.40%. However, the null hypothesis is that the true difference is 0.40% or more versus the alternative of 0.40% or less. So

Zeq=(0.34 –0.40)/0.093=-0.643, p=0.260 (one sided)

We cannot reject the “null hyp” that the true δ is larger than 0.40%. Our 95% confidence interval of (0.16%, 0.52%) also does NOT exclude 0.40%, even though it excludes zero.

30

Confidence intervalsversus hypothesis testing

Study equivalence demonstrated only from –D tp +D(1‑8) (brackets show 95% confidence intervals)

Stat Sig1. Yes ----------------------------------------------------------------------------------------------- < not equivalent >2. Yes -----------------------------------------------------------------------------< uncertain >--------------------3. Yes ------------------------------------------------------------------< equivalent >-----------------------------------4. No ---------------------------------------------------< equivalent >---------------------------------------------------5. Yes ----------------------------------< equivalent > ----------------------------------------------------------------6. Yes ---------------------< uncertain>----------------------------------------------------------------------------------7. Yes -< not equivalent >-----------------------------------------------------------------------------------------------8. No ---------<___________________________uncertain________________________________>------

| | ‑D O +D true difference

Ref: Statistics Applied to Clinical Trials- Cleophas, Zwinderman, Cleopahas 2000 Kluwer Academic Pub Page 35

31

Non inferiorityJAMA 2006 - Piaggio et al, p 1152-1160

32

Paired Mean ComparisonsSerum cholesterol in mmol/L

Difference between baseline and end of 4 weeks

Subject chol(baseline) chol(4 wks) difference(di)

1 9.0 6.5 2.5 2 7.1 6.3 0.8 3 6.9 5.9 1.0 4 6.9 4.9 2.0 5 5.9 4.0 1.9 6 5.4 4.9 0.5mean 6.87 5.42 1.45 SD 1.24 0.97 0.79 SE 0.51 0.40 0.32

_ Difference (baseline – 4 weeks) = amount lowered: d = 1.45 mmol/L SD = 0.79 mmol/L SEd = 0.79/6 = 0.323 mmol/L, df = 6-1=5, t0.975 = 2.571

95% CI: 1.45 ± 2.571 (0.323) = 1.45 ± 0.830 or (0.62 mmol/L, 2.28 mmol/L)

t obs = 1.45 / 0.32 = 4.49, p value < 0.001

33

Confidence IntervalsHypothesis Tests

Confidence intervals are of the form

Sample Statistic +/- (Zpercentile*) (Standard error)

Lower bound = Sample Statistic- (Zpercentile)(Standard error) Upper bound = Sample Statistic + (Zpercentile)(Standard error) Hypothesis test statistics (Zobs*) are of the form

Zobs=(Sample Statistic – null value) / Standard error* t percentile or tobs for continuous data when n is small 34

Sample statistics and their SEsSample Statistic Symbol Standard error (SE)

__Mean Y S/√n = √[S2/n] = SEM __ __ _Mean difference Y1 – Y2 =d √[S1

2/n1 + S22/n2]= SEd

Proportion P √[P(1-P)/n] Proportion difference P1 – P2 √[P1(1-P1)/n1 + P2(1-P2)/n2]Log Odds ratio* logeOR √[ 1/a + 1/b + 1/c + 1/d]Log Risk ratio* logeRR √[1/a -1/(a+c) + 1/b - 1/(b+d)]Slope (rate) b Serror / Sx√(n-1)Hazard rate (survival) h h/√[number dead]Transform (z) of the Correlation coefficient r* z=½loge[(1+r)/(1-r)] SE(z)=1/√([n-3]) r = (e2z -1)/(e2z + 1)

*Form CI bounds on transformed scale, then take anti-transform35

Handy Guide to TestingSample Statistic &Comparison

Population null hypothesis

Comparing two means True population mean difference is zero

Comparing two proportions True population difference is zero

Comparing two medians True population median difference is zero

Odds ratio (comparing odds) True population odds ratio is one

Risk ratio=relative risk (comparing risks) True population risk ratio is one

Correlation coefficient (compare to zero) True population correlation coefficient is zero

Slope=rate of change=regression coefficient True population slope is zero

Comparing two survival curves True difference in survival is zero at all times

36

Nomenclature for TestingDelta (δ) = True difference or size of effect

Alpha (α) = Type I error = false positive = Probability of rejecting the null hypothesis when it is true. (Usually α is set to 0.05)

Beta (β) = Type II error = false negative =Probability of not rejecting the null hypothesis when delta is not zero ( there is a real difference in the population)

Power = 1 – β = Probability of getting a p value less than α (ie declaring statistical significance) when, in fact, there really is a non-zero delta.

We want small alpha levels and high power.

37

Statistical Hypothesis TestingStatistic/type of comparisonMean comparison-unpaired Mean comparison-paired Median comparison-unpairedMedian comparison-pairedProportion comparison-unpairedProportion comparison-pairedOdds ratioRisk ratioCorrelation, slopeSurvival curves, hazard rates

Test/analysis proceduret test (2 groups), ANOVA (3+ groups)paired t test, repeated measures ANOVAWilcoxon rank sum test, KruskalWallis test*Wilcoxon signed rank test on differences*chi-square test (or Fishers test)McNemar’s chi-square testchi-square test, Fisher testchi-square test, Fisher testregression, t statisticlog rank test*

ANOVA = analysis of variance* non parametric – Gaussian distribution theory is

not used to get the p value

38

Parametric vs non parametricCompute p values using ranks of the data.

Does not assume stats follow Gaussian distribution – particularly in distribution “tails”.

Parametric Nonparametric 2 indep means- 2 indep medians- t test Wilcoxon rank sum test=MW 3+ indep mean- 3+ indep medians- ANOVA F test Kruskal Wallis test Paired means- Paired medians- paired t test Wilcoxon signed rank test

Pearson correlation Spearman correlation

39

Section IV

Documents

Transcript of Section IV