Section IV

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Section IV. Sampling distributions Confidence intervals Hypothesis testing and p values. Population and sample. We wish to make inferences (generalizations) about an entire target population (ie, generalize to “everyone”) even though we only study one sample. - PowerPoint PPT Presentation

Transcript of Section IV

Section IV

Section IVSampling distributionsConfidence intervalsHypothesis testing and p values1Population and sampleWe wish to make inferences (generalizations) about an entire target population (ie, generalize to everyone) even though we only study one sample(have only one study).Population parameters=summary values for the entire population (ex: ,,, )Sample statistics=summary values for a sample (ex: Y, S, r, b)

2Samples drawn from a populationPopulationsampleSample is drawn at random. Everyone in the target population is eligible for sampling.3True population distribution of Y(individuals)- not Gaussian

Mean Y== 2.5, SD==1.124Possible samples & statistics from the population (true mean=2.5) sample (n=4) mean (statistic) 1,1,1,1 1.00 2,2,4,3 2.75 4,4,4,4 4.005

Distribution of the sample means (Ys) - Sampling distribution-each observation is a SAMPLE statistic


Mean Y = 2.5, SEM = 0.56, n=4 SEM = SD/n the square root n law

6Central Limit TheoremFor a large enough n, the distribution of any sample statistic (mean, mean difference, OR, RR, hazard, correlation coeff,regr coeff, proportion) from sample to sample has a Gaussian (Normal) distribution centered at the true population value. The standard error is proportional to 1/n. (Rule of thumb: n> 30 is usually enough. May need non parametric methods for small n)


Funnel plot - true difference is = 5Each point is one study (meta analysis)

9Publication bias - non reproducibility10Studies with larger sample effects are more likely to be published and may be larger than the true average effect. Science 28 Aug 2015 (Nosek)11

The mean effect size of the replication effects (M=0.197, SD= 0.257) was half the magnitude of the mean effect size of the original effects (M = 0.403, SD = 0.188). Ninety-seven percent of original studies had significant results (p < .05).Thirty-six percent of replications had significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; Resampling estimation (bootstrap)One does not repeatedly sample from the same population, (one only carries out the study once). But a simulation of repeated sampling from the population can be obtained by repeatedly sampling from the sample with replacement & computing the statistic from each resample, creating an estimated sampling distribution. The SD of the statistics across all resamples is an estimate the standard error (SE) for the statistic.

12Samples drawn from a populationsamplePopulationOriginal sample sampleSample is drawn at random with replacement. Everyone in the original sample is eligible for sampling.13samplesampleConfidence interval (for )We do not know from a sample.For a sample mean Y and standard error SE, a confidence interval for the population mean is formed by Y - Z SE, Y + Z SE(sample statistic is in the middle)

For a 95% confidence interval, we use Z=1.96 (Why?) and compute Y 1.96 SE, Y + 1.96 SE

loweruppermean14Confidence Intervals (CI)and sampling dist of Y

-1.96(/n) 1.96(/n)

95% CI: Y 1.96 (/n)15

95% Confidence intervals95% of the intervals will contain the true population valueBut which ones?16Z vs t (technical note)Confidence intervals made with Z assume that the population is known. Since is usually not known and is estimated with the sample SD, the Gaussian table areas need to be adjusted. The adjusted tables are called t tables instead of Gaussian tables (t distribution). For n > 30, they are about the same. 17

Z distribution vs t distribution, about the same for n > 30 18t vs Gaussian Z percentiles%ile85th90th95th97.5th99.5thConfidence70%80%90%95%99%t, n=51.1561.4762.0152.5714.032t, n=101.0931.3721.8122.2283.169t, n=201.0641.3251.7252.0862.845t, n=301.0551.3101.6972.0422.750Gaussian1.0361.2821.6451.9602.576What did the z distribution say to the t distribution? You may look like me but you're not normal. 19Confidence IntervalsSample Statistic Ztabled SE(using known variance)

Sample Statistic ttabled SE(using estimate of variance)

Example: CI for the difference between two means: __ __ (Y1 Y2) ttabled (SEd)

Tabled t uses degrees of freedom, df=n1+n2-220CI for a proportionlaw of small numbers n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence bounds? Is is likely that the real proportion is more than 50%?

21CI for a proportionlaw of small numbers n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence bounds? Is is likely that the real proportion is more than 50%?

Answer: 95% CI: 6.7% to 65.3%

22Standard error for the difference between two means__Y1 has mean 1 and SE = 12/n1 = SE1__Y2 has mean 2 and SE = 22/n2 = SE2

For the difference between two means (=1 - 2)

SE = (12/n1 + 22/n2)

SEd = (SE12 + SE22)

23Statistics for HBA1c changefrom base to 26 weeks (Pratley et al, Lancet 2010) TxnMeanSDSELiraglutide225-1.240.990.066Sitaglipin219-0.900.980.066 __Mean difference = d = 0.34 % Std error of mean difference= SEd=[0.0662 + 0.0662] = 0.093%Using t{df=442}=1.97 for the 95% confidence interval: CI: 0.34% 1.97 (0.093%) or (0.16%, 0.52%)

24Null hypothesis & p valuesNull Hypothesis- Assume that, in the population, the two treatments give the same average improvement in HbA1c. So the average difference is =0.

Under this assumption, how likely is it to observe a sample mean difference of d= 0.34% (or more extreme) in any study? This probability is called the (one sided) p value. The p value is only defined for a given null hypothesis.

25Hypothesis testingfor a mean difference, d d =sample mean HBA1c chg difference, _ d = 0.34%, SEd = 0.093%

95% CI for true mean difference = (0.16%, 0.52%)But, under the null hypothesis, the true mean difference () should be zero.

How far is the observed 0.34% mean difference from zero (in SE units)? tobs = (mean difference hypothesized difference) / SEdiff tobs = (0.34 0) / 0.093 = 3.82 SEs

p value: probability of observing t=3.82 or larger if null hypothesis is true.

p value = 0.00008 (one sided t with df=442)p value = 0.00016 (two sided)26Hypothesis test statisticsZobs = (Sample Statistic null value) / Standard error

Z (or t)=3.8227

28Difference & Non inferiority (equivalence) hypothesis testingDifference Testing:Null Hyp: A=B (or A-B=0), Alternative: AB Zobs = (observed stat 0) / SE

Non inferiority (within ) Testing: Null Hyp: A > B + , Alternative: A 2. Yes -----------------------------------------------------------------------------< uncertain >--------------------3. Yes ------------------------------------------------------------------< equivalent >-----------------------------------4. No ---------------------------------------------------< equivalent >---------------------------------------------------5. Yes ----------------------------------< equivalent >----------------------------------------------------------------6. Yes ---------------------< uncertain>----------------------------------------------------------------------------------7. Yes -< not equivalent >-----------------------------------------------------------------------------------------------8. No ---------------

| | D O +D true difference

Ref: Statistics Applied to Clinical Trials- Cleophas, Zwinderman, Cleopahas 2000 Kluwer Academic Pub Page 35 31Non inferiorityJAMA 2006 - Piaggio et al, p 1152-116032

Paired Mean ComparisonsSerum cholesterol in mmol/LDifference between baseline and end of 4 weeks

Subject chol(baseline) chol(4 wks) difference(di) 1 9.0 6.5 2.5 2 7.1 6.3 0.8 3 6.9 5.9 1.0 4 6.9 4.9 2.0 5 5.9 4.0 1.9 6 5.4 4.9 0.5mean 6.87 5.42 1.45 SD 1.24 0.97 0.79 SE 0.51 0.40 0.32 _ Difference (baseline 4 weeks) = amount lowered: d = 1.45 mmol/L SD = 0.79 mmol/L SEd = 0.79/6 = 0.323 mmol/L, df = 6-1=5, t0.975 = 2.571

95% CI: 1.45 2.571 (0.323) = 1.45 0.830 or (0.62 mmol/L, 2.28 mmol/L)

t obs = 1.45 / 0.32 = 4.49, p value < 0.00133Confidence IntervalsHypothesis TestsConfidence intervals are of the form

Sample Statistic +/- (Zpercentile*) (Standard error)

Lower bound = Sampl