Biostatistics - Normal Distribution

of 42 /42
Biostatistics  Lecture 6 

Embed Size (px)


Understanding normal distribution in biostatistics briefly at

Transcript of Biostatistics - Normal Distribution

Slide 1

BiostatisticsLecture 6Lecture5Review-Thenormal curve = 27.2 kg/m215105 = 3.5 kg/m2015202530354045Body Mass Index (kg/m2)PercentagePopulation standard deviationMeasure of spread (variability)/ Population mean Measure of central tendancy (Middle of the distribution)The standard normal distributionThe standard normal distribution has: Mean= 0 Standard deviation = 1Any normally distributed variable can be related to the standard normal distribution by subtracting the meanfrom each observation and dividingdeviation Standard normal deviate:x bythestandardz score =Why is the standard normaldistribution useful? Existing tables of values for the standard normaldistribution make it easy to answer questionsaboutcalculationsnormally distributed data withoutcomplicated Standard normal deviate:3.5 From standard tables: tail area = 21%Percentage101505z score = 30 27.2 = 0.8 = 3.5 kg/m2 Melbourne men = 27.2 kg/m2 Percentage of who are obeseis 21%15 20 25 30 35 40 45Body Mass Index (kg/m 2)What proportion of Melbourne men are obese (BMI>30)?ReferencerangeHistogramofbodysurfacearea403020100. (m2)Total body surface0.73m20.97m2--- 95% Reference range ---Percentx (1.96 s)x + (1.96 s)Reference rangeTells us about the variability betweenobservations in the populationindividualAssumes the distribution of theapproximately normaldata is95% reference range 95% ofthe individualobservations in a population lie in this rangeLecture 6 Confidence interval for amean and comparisonoftwomeansConfidence intervals For large samples For small samplesDifference between confidence rangeintervalandreferenceComparison of two means (unpaired) Large samples & small samplesComparison of two means (paired)Kirkwood BL, Sterne JAC.Chapters 6 & 7Example:Insecticide requiredformalariacontrolData:Sprayable area (m2) was measured in a sample of n=100 houses.xm2 Sample mean,= 24.2m2 Sample standard deviation, s=5.9Goal: We want to estimate how much insecticide is required to spray 10,000 Thai houses.Question: What is the population mean sprayable area of houses in Thailand?InferencePopulation mean sprayablearea = ???m2Sample mean = 24.2m2Sample standard deviation = 5.995% confidence interval for meann=100 housesSampling distribution of samplemeansSuppose we: collected many samples each including 100 houses calculated themean sprayable area for each sampleWecouldgraph thedistributionofsamplemeans:8Our sample:6xm2=24.2Percent4202324252627Mean sprayable area from different samples2(m )Population values: = 24.9= 5.286Percent4Samplingvariation ofthesamplemean(2)02324252627Mean sprayable area from different samples2(m )The sample means approximately follow a normal distribution2Samplingdistributionofsamplemeans(3)86Percent4202324252627Mean sprayable area from different samples2(m )Standard deviation of the sample means is called the standard error (s.e.) of the mean, s.e. = / nOn average, the sample means are equal to the population mean s.e.Samplingvariation ofthesamplemean(4)86Percent4202324252627Mean sprayable area from different samples2(m )+1.96 s.e.-1.96 s.e.95% of the sample means lie within a distance of 1.96 x s.e. of the populationmean .Calculating a 95% confidenceinterval for a meanSo in 95% of the possible samples thathave collected, the intervalwe couldx 1.96 s.e.tox + 1.96 s.e.contains the (unknown) population mean .95% confidence interval for the mean:A range of values that we are 95% confident containsthe true population meanA range of plausible values for the population meanExample:Insecticide requiredformalaria controlx =24.2 m2,m2Data:n=100,s =5.9Estimated standard error:s.e. =s/n =5.9/100= 0.5995% confidence interval:x 1.96 s.e.totox + 1.96 s.e.24.2 1.96 0.5924.2 + 1.96 0.5923.0 to25.4Interpretation: Were 95% confident that the population mean sprayable area lies between 23.0 m2 and 25.4 m2.Question: What is the population mean sprayable area?Example:Insecticide required formalaria control in 10,000 housesAssume one litre of insecticide covers 50m2Usingupper confidence limit,insecticiderequired for 10,000 houses==10,000 (25.4/50)5080 litresCalculating a 90%confidencemeanTwo-sided 5% percentage point from standard normal distributioninterval fora95%confidence interval:x 1.96 s.e.tox+ 1.96 s.e.90%confidence interval:x 1.645 s.e.tox + 1.645 s.e.Two-sided 10%percentage point fromstandard normal distributionUnderstanding confidence intervalsTheThepopulation mean () is fixed.confidenceintervalvariesbetweensamples.282726Sprayable area252(m )2423221 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20Samplex = 24.2 m295% CI: 23.0m2 to 25.4m2Population values: = 24.9= 5.2Understanding confidence intervalsTheThepopulation mean () is fixed.confidenceintervalvariesbetweensamples.282726Sprayable area252(m )2423221 2 3 4 56 7 8 9 10 11 12 13 14 15 16 17 18 19 20SamplePopulation values: = 24.9= 5.2Definition of a confidence intervalIf we were to draw several independent,random samples (of equal size) from the sample population and calculate 95% confidence intervals for each of them,then on average 19 out of every 20 (95%) such confidenceintervals would contain the true population mean, and one of every 20 (5%) would not.Be careful CANT SAY95% probability that the population within the confidence intervalThereisamean isCAN SAY95% probability that the confidence contains the population meanORThereisaintervalWith 95% confidence, the population mean is between thelower and upper limit of the confidence intervalConfidence interval for a small sampleThe sample standard deviation (s) may not be a reliable estimate of the population standard deviation () forcalculation of the standard error when n30)95% confidence interval of differencebetweenpopulation means (largesamples)Estimatedpopulation mean difference:( x1 x0 )95% confidenceinterval:Fromto( x x ) (1.96 s.e.( x)) x1010( x x ) + (1.96 s.e.( x x ))1010Example:Randomised controlled trialofweightlossprogrammesintheUKerror( x1 x0 ) Estimated population mean difference:= 1.54kg0.432 x ) = (s.e.( x ))+ (s.e.( x ))2= (0.32)2 + (0.29)2 =s.e.( x1010( x1 x0 ) 1.96 s.e.( xtototo( x1 x0 ) + 1.96 s.e.( x 95% CI: x ) x )1 01 01.54 1.96 0.430.701.54 + 1.96 0.432.38GroupnSample meanWeight loss after4 weeks (kg)Sample standard deviationSample standardAtkins (group 1)574.402.450.32WW (group 0)582.862.230.29Interpretation We found a difference of 1.54 kg in mean weight lossafter 4 weeks between the Atkins & Weight Watchers diet groups. From the 95% confidence interval, the true differencecould be as much as 2.38 kg (much greater weight loss for Atkins diet) or 0.7 kg (marginally greater weight lossfor the Atkins diet compared with Weight Watchers)..Comparison of two means:ExtensionsSmall samplesUse the t distribution instead of the normal distributiont-distribution uses different formula for the standard error of the difference between means (e.g pool sample standard deviations)Small samples and unequal standard deviationsTransform data to make standard deviations similar (andthen use t-distribution)Use non-parametric or bootstrap methodsComparisonPairedWhy worry?of two means:samplesImplies a relationshipSimplify the analysisbetween themeasurementsExample:What is the difference in homocysteine levels recorded at baseline and 3 weeks after taking daily supplements of600 g of folic acid?Example:Homocysteine levels recorded at baseline and3weekspostdailysupplementsof600goffolicacidReducedto asingle samplequestionQuestion: What is the population mean of the differences?Patient IDHomocysteine Level (mol/l)Difference(3 weeks - baseline)Baseline3 weeks post intervention111.637.35- 4.28214.139.00- 5.1335.905.75- 0.15412.1515.052.9518.7320.952.22610.109.05- 1.05etc........6911.0811.200.1295%confidenceintervalforadifferenceinapairedsample95% confidence intervaldifference:1.35 (1.96 0.44)2.21 mol/lforthe population meantoto1.35 + (1.96 0.44)0.49mol/lWere 95% confident that thepopulation meandifference lies between 2.21mol/land 0.49 mol/l.nSample meanof differences (mol/l)Sample standard deviation (mol/l)Sample standard error (mol/l)691.353.65s/n = 0.44Lecture 6 ObjectivesSingle mean Calculate and interpret a confidence interval (for large samples)Understand the difference between a confidence interval and reference rangeDifference between unpaired & paired dataComparison of two means (unpaired) Calculate and interpret a confidence interval for thebetween two means (for large samples)Comparison of two means (paired) Reduce the data to a single sample of differences Calculate and interpret a confidence interval for thedifferencedifferenceThank You