Confidence Interval Estimation For statistical inference in decision making:
Embed Size (px)
Transcript of Confidence Interval Estimation For statistical inference in decision making:
- Slide 1
Confidence Interval Estimation For statistical inference in decision making: Slide 2 Objectives Central Limit Theorem Confidence Interval Estimation of the Mean ( known) Interpretation of the Confidence Interval Confidence Interval Estimation of the Mean ( unknown) Confidence Interval Estimation for the Proportion Determining Sample Size Slide 3 Central Limit Theorem Irrespective of the shape of the underlying distribution of the population, by increasing the sample size, sample means & proportions will approximate normal distributions if the sample sizes are sufficiently large. Slide 4 Central Limit Theorem in action: Slide 5 How large must a sample be for the Central Limit theorem to apply? The sample size varies according to the shape of the population. However, for our use, a sample size of 30 or larger will suffice. Slide 6 Must sample sizes be 30 or larger for populations that are normally distributed? No. If the population is normally distributed, the sample means are normally distributed for sample sizes as small as n=1. Slide 7 Why not just always pick a sample size of 30? Slide 8 How can I tell the shape of the underlying population? CHECK FOR NORMALITY: Use descriptive statistics. Construct stem-and-leaf plots for small or moderate-sized data sets and frequency distributions and histograms for large data sets. Compute measures of central tendency (mean and median) and compare with the theoretical and practical properties of the normal distribution. Compute the interquartile range. Does it approximate the 1.33 times the standard deviation? How are the observations in the data set distributed? Do approximately two thirds of the observations lie between the mean and plus or minus 1 standard deviation? Do approximately four- fifths of the observations lie between the mean and plus or minus 1.28 standard deviations? Do approximately 19 out of every 20 observations lie between the mean and plus or minus 2 standard deviations? Slide 9 Why do I care if X-bar, the sample mean, is normally distributed? Slide 10 Because I want to use Z scores to analyze sample means. But to use Z scores, the data must be normally distributed. Thats where the Central Limit Theorem steps in. Recall that the Central Limit Theorem states that sample means are normally distributed regardless of the shape of the underlying population if the sample size is sufficiently large. Slide 11 Recall from Chapter 5: Z = (X - ) If sample means are normally distributed, the Z score formula applied to sample means would be: Z = [X-bar - X-bar ] X-bar Slide 12 Background To determine X-bar, we would need to randomly draw out all possible samples of the given size from the population, compute the sample means, and average them. This task is unrealistic. Fortunately, X-bar equals the population mean , which is easier to access. Likewise, computing the value of X-bar, we would have to take all possible samples of a given size from a population, compute the sample means, and determine the standard deviation of sample means. This task is also unrealistic. Fortunately, X-bar can be computed by using the population standard deviation divided by the square root of the sample size. Slide 13 Note: As the sample size increases, the standard deviation of the sample means becomes smaller and smaller because the population standard deviation is being divided by larger and larger values of the square root of n. Slide 14 The ultimate benefit of the central limit theorem is a useful version of the Z formula for sample means. Slide 15 Z Formula for Sample Means: Z = [X-bar - ] / n Slide 16 Example: The mean expenditure per customer at a tire store is $85.00, with a standard deviation of $9.00. If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more? Slide 17 Because the sample size is greater than 30, the central limit theorem says the sample means are normally distributed. Z = [X-bar - ] / n Z = [$87.00 - $85.00] $9.00 / 40 Z = $2.00 / $1.42 = 1.41 Slide 18 For Z = 1.41 in the Z distribution table, the probability is.4207. This represents the probability of getting a mean between $87.00 and the population mean $85.00. Solving for the tail of the distribution yields.5000 -.4207 =.0793 This is the probability of X-bar $87.00. Slide 19 Interpretations Therefore, 7.93% of the time, a random sample of 40 customers from this population will yield a mean expenditure of $87.00 or more. OR From any random sample of 40 customers, 7.93% of them will spend on average $87.00 or more. Slide 20 Interpretations Therefore, 7.93% of the time, a random sample of 40 customers from this population will yield a mean expenditure of $87.00 or more. From any random sample of 40 customers, 7.93% of them will spend on average $87.00 or more. Slide 21 Solve: Suppose that during any hour in a large department store, the average number of shoppers is 448, with a standard deviation of 21 shoppers. What is the probability that a random sample of 49 different shopping hours will yield a sample mean between 441 and 446 shoppers? Slide 22 Statistical Inference Slide 23 Statistical Inference facilitates decision making. Slide 24 Via sample data, we can estimate something about our population, such as its average value , by using the corresponding sample mean, X-bar. Slide 25 Recall that , the population mean to be estimated, is a parameter, while X-bar, the sample mean, is a statistic. Slide 26 Point Estimate A point estimate is a statistic taken from a sample and is used to estimate a population parameter. However, a point estimate is only as good as the sample it represents. If other random samples are taken from the population, the point estimates derived from those samples are likely to vary. Because of variation in sample statistics, estimating a population parameter with a confidence interval is often preferable to using a point estimate. Slide 27 Confidence Interval A confidence interval is a range of values within which it is estimated with some confidence the population parameter lies. Confidence intervals can be one or two- tailed. Slide 28 Confidence Interval to Estimate By rearranging the Z formula for sample means, a confidence interval formula is constructed: X-bar +/- Z /2 / n Where: = the area under the normal curve outside the confidence interval /2 = the area in one-tail of the distribution outside the confidence interval Slide 29 The confidence interval formula yields a range (interval) within which we feel with some confidence the population mean is located. It is not certain that the population mean is in the interval unless we have a 100% confidence interval that is infinitely wide, so wide that it is meaningless. Slide 30 Confidence interval estimates for five different samples of n=25, taken from a population where =368 and =15 Slide 31 Common levels of confidence intervals used by analysts are 90%, 95%, 98%, and 99%. Slide 32 95% Confidence Interval For 95% confidence, =.05 and / 2 =.025. The value of Z.025 is found by looking in the standard normal table under.5000 -.025 =.4750. This area in the table is associated with a Z value of 1.96. An alternate method: multiply the confidence interval, 95% by (since the distribution is symmetric and the intervals are equal on each side of the population mean. () (95%) =.4750 (the area on each side of the mean) has a corresponding Z value of 1.96. Slide 33 In other words, of all the possible X-bar values along the horizontal axis of the normal distribution curve, 95% of them should be within a Z score of 1.96 from the mean. Slide 34 Margin of Error Z [ / n] Slide 35 Example: A business analyst for cellular telephone company takes a random sample of 85 bills for a recent month and from these bills computes a sample mean of 153 minutes. If the company uses the sample mean of 153 minutes as an estimate for the population mean, then the sample mean is being used as a POINT ESTIMATE. Past history and similar studies indicate that the population standard deviation is 46 minutes. The value of Z is decided by the level of confidence desired. A confidence level of 95% has been selected. Slide 36 153 + /- 1.96( 46/ 85) = 143.22 162.78 The confidence interval is constructed from the point estimate, 153 minutes, and the margin of error of this estimate, + / - 9.78 minutes. The resulting confidence interval is 143.22 162.78. The cellular telephone company business analyst is 95% confident that the average length of a call for the population is between 143.22 and 162.78 minutes. Slide 37 Interpreting a Confidence Interval For the previous 95% confidence interval, the following conclusions are valid: I am 95% confident that the average length of a call for the population , lies between 143.22 and 162.78 minutes. If I repeatedly obtained samples of size 85, then 95% of the resulting confidence intervals would contain and 5% would not. QUESTION: Does this confidence interval [143.22 to 162.78] contain ? ANSWER: I dont know. All I can say is that this procedure leads to an interval containing 95% of the time. I am 95% confident that my estimate of [namely 153 minutes] is within 9.78 minutes of the actual value of . RECALL: 9.78 is the margin of error. Slide 38 Be Careful! The following statement is NOT true: The probability that lies between 143.22 and 162.78 is.95. Once you have inserted your sample results into the confidence interval formula, the word PROBABILITY can no longer be used to describe the resulting confidence interval.