Two statistical inference methods: Confidence interval Estimator +/- Margin of Error Hypothesis...

of 82 /82
Two statistical inference methods: Confidence interval Estimator +/- Margin of Error Hypothesis testing Hypothesis: H 0 v.s. H a Test statistic P-value Conclusion Review for Final Exam

Embed Size (px)

Transcript of Two statistical inference methods: Confidence interval Estimator +/- Margin of Error Hypothesis...

Ch7 Scatterplots, Association, and Correlation

Two statistical inference methods:Confidence intervalEstimator +/- Margin of ErrorHypothesis testingHypothesis: H0 v.s. HaTest statisticP-valueConclusionReview for Final ExamReview for Final ExamQuestion?p?C.I.?C.I.?Test?Test?1-S?1-S?1-S?1-S?2-S?2-S?2-S?2-S?Inference about population proportion pConfidence interval:A level C confidence interval for p is given by

where z* is a z-critical value corresponding to the confidence level C, n is the sample size, and p is the sample proportion.

^StandardErrorReview for Final ExamInference about population proportion pThe level C confidence interval for a population proportion p will have margin of error approximately equal to a specified value m when the sample size is

where p* is a guessed value for the sample proportion. The margin of error will be at most m if p* is taken to be 0.5.

Review for Final ExamInference about population proportion pHypothesis testingHypotheses:H0:p=p0 v.s. Ha:p>p0/pp0P-value=(z), for Ha:p0/0P-value=Tdf(-t), for Ha:(=5%), we do not reject the null hypothesis.If we concluded that 40% of the online-music-downloaders are in their fifties while in fact this proportion is 35%, then we made a Type I Error.we made a Type II Error.we made a correct decision.Review for Final Exam PracticeThe safety management of an offshore oil-mining corporation believes that the true average escape time would be at most 340 min. A sample of 28 offshore oil-workers took part in a simulated escape exercise. The sample yielded an average escape time of 347.68 min. and standard deviation of 26.95 min. Does this data contradict the management's claim? What are the hypotheses in this case?What is the value of the test statistic?What is the P-value of the test?What is your conclusion at =5%?What is a 98% confidence interval of the average escape time?Review for Final Exam PracticeSolution:The sample:

What are the hypotheses in this case?H0:=340 v.s. Ha:>340What is the value of the test statistic?

The test statistic follows a t-distribution with degrees of freedom 28-1=27.

Review for Final Exam PracticeSolution:What is the P-value of the test?

According to Ha:>340, P-value is between 0.05 and 0.10.Review for Final Exam PracticeSolution:What is your conclusion?Since P-value>(=5%?), we do not reject the null hypothesis.If we concluded that the management's claim is correct while in fact average escape time is 340 min., then we made a Type I Error.we made a Type II Error.we made a correct decision.Review for Final Exam PracticeSolution:What is a 98% confidence interval of the average escape time?A level C confidence interval for is given by

We havet*=2.473 (corresponding to degrees of freedom 27 and the confidence level 98%);n=28, s=26.95, and x=347.68.So a 98% confidence interval of the average escape time is

_Review for Final Exam Practice

Review for Final Exam PracticeIn a test of hypothesis, if we insist on very strong evidence against the null hypothesis we shouldchoose to be very smallchoose to be larger than the P-valuechoose to be very largechoose to be smaller than the P-value Review for Final Exam PracticeBased on a random sample of 50 students from among 40,000, a 91 percent confidence interval on the mean height of all 40,000 students was found to be the interval from 66 inches to 69.2 inches. Select the correct statement below:About 91 percent of all 40,000 students have heights between 66 and 69.2.About 91 percent of the heights in the sample should be between 66 and 69.2The probability that the mean height is between 66 and 69.2 is 91 percent.About 91 percent of all samples would produce intervals containing Review for Final Exam PracticeIn a test of hypotheses, data are deemed to be significant at level =0.05, but not significant at level =0.01. Which of the following is true about the P-value associated with this test?P-value is greater than 0.05.P-value is between 0.01 and 0.05.P-value is less than 0.01.Nothing can be said.Review for Final Exam PracticeSample / PopulationStatistics / ParametersRandom sampling designSimple random sample (SRS)Stratified random sampleCluster sampleMultistage sampleUse random digits to draw simple random samplesReview for Final ExamLaw of large numbersProbability: Sample space / EventsRules for probability model:for any event A, 0 P(A) 1.for sample space S, P(S) = 1.if two events A and B are disjoint, then P(A or B) = P(A) + P(B).for any event A, P(A does not occur) = 1 - P(A).For two independent events A and B, P(A and B) = P(A) X P(B).Venn diagramReview for Final ExamGeneral Addition Rule:For two events A and B, P(A or B) = P(A) + P(B) P(A and B).General Multiplication RuleFor two events A and B, P(A and B) = P(B|A) X P(A).Conditional probability

Independence: P(B|A) = P(B).

Review for Final ExamRandom variable:A random variable is a variable whose value is a numerical outcome of a random phenomenon.Distribution:The probability distribution (distribution) of a random variable tells us what values this random variable can take and how to assign probabilities to those values.Review for Final ExamStatistics are random variables.Sample proportionSample meanCentral limit theoremSampling distributions of statistics

Review for Final ExamSampling distribution of the sample proportion p for an SRS of size n:mean of p equals the population proportion p;standard deviation of p equals

If the sample size is large, then p is approximately Normal, that is,

^^^^Review for Final ExamSampling distribution of the sample mean x for an SRS of size n:mean of x equals the population mean ;standard deviation of x equals , where is the

population standard deviation; if the sample size is large, then x is approximately normal, that is,

if the population has a normal distribution, then the approximation is exact.

____Review for Final ExamMotor vehicles sold to individuals are classified as either cars or light trucks (including SUVs) and as either domestic or imported. In a recent year, 69% of vehicles sold were light trucks, 78% were domestic, and 55% were domestic light trucks. For a randomly selected vehicle, what is the probability thatthe vehicle is a car?the vehicle is either domestic or a light truck or both?the vehicle is an imported light truck?the vehicle is a domestic if we know it is a car?

Review for Final Exam Practice56% of all American workers have a workplace retirement plan, 66% have health insurance, and 73% have at least one of the benefits. We select a worker at random.What is the probability that he has both health insurance and a retirement plan?What is the probability that he has neither health insurance nor a retirement plan?What is the probability that he only has a retirement plan?Knowing that he has a retirement plan, what is the probability that he has health insurance?Review for Final Exam PracticeSolution:Let A be the event that he has a retirement plan.Let B be the event that he has health insurance.Then P(A)=0.56, P(B)=0.66, and P(A or B)=0.73.ABAB

BAReview for Final Exam PracticeSolution:What is the probability that he has both health insurance and a retirement plan?P(A and B)=?General addition rule:P(A or B) = P(A) + P(B) - P(A and B)Therefore, P(A and B) = P(A) + P(B) - P(A or B) = 0.56+0.66-0.73 = 0.49

BAReview for Final Exam PracticeSolution:What is the probability that he has neither health insurance nor a retirement plan?The probability that he has at least one benefit is 0.73.Therefore, the probability that he has neither health insurance nor a retirement plan is 1-0.73=0.27.

BAReview for Final Exam PracticeSolution:What is the probability that he only has a retirement plan?Only has a retirement plan means has a retirement plan but no health insurance (not both).Therefore, P(he only has a retirement plan) = P(A) P(A and B) = 0.56-0.49 = 0.07

BAReview for Final Exam PracticeSolution:Knowing that he has a retirement plan, what is the probability that he has health insurance?

Review for Final Exam PracticeSpell-checking software catches nonword errors that result in a string of letters that is not a word, as when the is typed as teh. When undergraduates are asked to type a 250-word essay (without spell-checking), the number X of nonword errors has the following distribution:

For a randomly selected student, what is the probability thathe made 4 or more errors?he made at most 1 error?For four randomly selected student, what is the probability thateach of them made no more than 2 errors?at least one of them made an error?Review for Final Exam PracticeX0123>=4Probability0.10.20.30.3?In a large Statistics lecture, the professor reports that 52% of the students enrolled have never taken a Calculus course, 34% have taken only one semester of Calculus, and the rest have taken two or more semesters of Calculus. The professor randomly assigns students to groups of three to work on a project for the course.What is the probability that the first group member you meet has studied some Calculus?What is the probability that the first group member you meet has studied no more than one semester of Calculus?What is the probability that both of your two group members have studied exactly one semester of Calculus?What is the probability that at least one of your group members has had more than one semester of Calculus?Review for Final Exam PracticeSolution:Let A denote the event that a student has never taken a Calculus courseLet B denote the event that a student has taken only one semester of CalculusLet C denote the event that a student has taken two or more semesters of Calculus. ABCReview for Final Exam PracticeSolution:First, we can find the probability that a student has taken two or more semesters of Calculus:P(C) = 1P(A)P(B) = 1-0.52-0.34=0.14.What is the probability that the first group member you meet has studied some Calculus?{Some Calculus} = B or CP(Some Calculus) = P(B or C) = P(B)+P(C) = 0.34+0.14 = 0.48.Review for Final Exam PracticeSolution:What is the probability that the first group member you meet has studied no more than one semester of Calculus?C = {a student has taken two or more semesters of Calculus}CC = {a student has studied no more than one semester of Calculus}P(no more than one semester of Calculus) = P(CC) = 1-P(C) = 1-0.14 = 0.86.Review for Final Exam PracticeSolution:What is the probability that both of your two group members have studied exactly one semester of Calculus?The two events A1={first member has studied exactly one semester of Calculus}A2={second member has studied exactly one semester of Calculus}are independent.Thus, P(both members have studied exactly one semester of Calculus) = P(A1 and A2) = P(A1)XP(A2) = 0.34X0.34 = 0.1156

Review for Final Exam PracticeSolution:What is the probability that at least one of your group members has had more than one semester of Calculus?Let E={at least one of your group members has had more than one semester of Calculus}EC={neither of your group members has had more than one semester of Calculus}E1={first members does not have had more than one semester of Calculus}E2={second members does not have had more than one semester of Calculus}P(EC) = P(E1 and E2) = P(E1)XP(E2) = (1-0.14)2.P(E) = 1-P(EC) = 1-(1-0.14)2 = 0.2604.Review for Final Exam PracticeA North American roulette wheel has 38 slots, of which 18 are red, 18 are black, and 2 are green. If you bet on red, the probability of winning is 18/38 = .4737. The probability .4737 represents(A) nothing important, since every spin of the wheel results in one of three outcomes (red, black, or green).(B) the proportion of times this event will occur in a very long series of individual bets on red.(C) the fact that you're more likely to win betting on red than you are to lose.(D) the fact that if you make 100 wagers on red, you'll have 47 or 48 wins.Review for Final Exam PracticeA company has developed a new battery, but the average lifetime is unknown. In order to estimate this average, a sample of 100 batteries is tested and the average lifetime of this sample is found to be 250 hours. Here the population of interest is:100 batteries, which were tested / average of 250 hours/ all newly developed batteries by the company / lifetime of newly developed batteriesHere the sample is:100 batteries, which were tested / lifetime of newly developed batteries / average of 250 hours / not in the listReview for Final Exam PracticeA company has developed a new battery, but the average lifetime is unknown. In order to estimate this average, a sample of 100 batteries is tested and the average lifetime of this sample is found to be 250 hours. What is the parameter of interest in this case?average lifetime of 100 batteries tested / average of all newly developed batteries by the company / 100 batteries sampled and tested / no parameter is involved in this problemThe 250 hours is the value of:parameter / statistic / sample / variableReview for Final Exam PracticeThere are 30 problems in Ch12 in 4 pages and 45 problems in Ch13 in another set of 4 pages. In order to make up a homework set based on chapters 12 and 13 the instructor considers the following different schemes. Identify the sampling scheme employed.Method 1: Label the 75 problems from 1 through 75 and draw 10 numbers at random and choose the corresponding problems.Simple Random SamplingMethod 2: Pick 4 problems from the 30 in chapter 12 and pick 6 problems from the 45 in chapter 13.Stratified Random SamplingMethod 3: Pick two pages at random and assign all the problems in those pagesCluster SamplingMethod 4: Pick two pages at random and pick 5 problems at random from each of those two pages.Multistage SamplingReview for Final Exam PracticeA student group has 8 members:1. Barrett 2. Chen 3. DeRoos 4. Maceli5. Pagliarulo 6. Smithson 7. Williams 8. ZacharyThree of them will be selected to participate a national conference. If we use the following random digits (start from the left) to select a simple random sample of size 3, then who will attend the conference?2023967 8523610 4317063 5689043 5463038 9406022A. Barrett, Chen, DeRoos B. Chen, Chen, DeRoosiC. Chen, DeRoos, Smithson D. Chen, Pagliarulo, WilliamsReview for Final Exam PracticeData / Data tableCasesVariables (Categorical / Quantitative)Display Categorical VariablesFrequency Table / Relative Frequency TableBar Chart / Relative Frequency Bar Chart / Pie ChartReview for Final ExamGraphic techniques for displaying quantitative variables:HistogramsStem-and-leaf displaysShape of distributions:Unimodal / Bimodal / Multimodal / UniformSymmetric / Skewed to the left / Skewed to the rightOutlierReview for Final ExamNumerical descriptions for the distribution of a quantitative variable : The center of a distributionMeanMedian The spread of a distributionStandard deviationInterquartile Range (IQR)Five number summary / Outlier (1.5IQR rule)BoxplotReview for Final ExamShifting and rescaling of quantitative variablesStandardization of quantitative variables (z-score)

The Normal modelMean and standard deviation68-95-99.7 ruleTwo types of problems:Find percentageFind percentiles

Review for Final ExamScatterplot for two quantitative variablesDirectionpositive / negativeFormlinear / curved / no patternStrengthstrong / moderate / weakCorrelation coefficient rReview for Final ExamLinear models

Least square regression line

Predictions and residuals

Review for Final ExamThe mean height of American women in their early twenties is about 64.5 inches and the standard deviation is about 2.5 inches. The mean height of men the same age is about 68.5 inches, with standard deviation about 2.7 inches. If the correlation between the heightsof husbands and wives is about r = 0.5, what is the equation of the regression line of the husbands height on the wifes height in young couples? Predict the height of the husband of a woman who is 67 inches tall. What percentage of variation in husbands height is explained by wives height? Review for Final Exam PracticeMichigan State University researchers want to investigate how rainfall affects the yield of crops in East Lansing. The researchers found that the average amount of rainfall over the past 20 years is about 230 inches and the standard deviation is about 10 inches. The average yield of crops in East Lansing is about 280 tones with a standard deviation of 20 tones. The correlation between the amount of rainfall and yield of crops is about 0.4. What is the slope of the regression line of yield of crop on amount of rainfall?What is the intercept of the appropriate regression line?What is the predicted value of the yield of crop when the amount of rainfall is 240 inches? If the actual yield of crop of the year with rainfall 240 inches is 280, what is the residual?What percentage of variation in crop yield is explained by the rainfall? Review for Final Exam PracticeSolution:What is the slope of the regression line of yield of crop on amount of rainfall?The slope is given by

HereThus the slope is

What is the intercept of the appropriate regression line?The intercept is given byHereThus the intercept is

Review for Final Exam PracticeSolution:What is the predicted value of the yield of crop when the amount of rainfall is 240 inches? If the actual yield of crop of the year with rainfall 240 inches is 280, what is the residual?The predicted value is

The residual is

What percentage of variation in crop yield is explained by the rainfall?The quantity r2 tells us the percentage of changes in the response variable which are explained by the changes in explanatory variable. In this case, r2=0.42=0.16.

Review for Final Exam PracticeIn a population of couples the average height of wives' was 65.2 inches and that of the husbands 68.2 inches. You use the regression line to make predictions of the wife's height from the husband's height. Suppose a husband has height 68.2 inches, what would be the predicted height of the wife?

Solution:The regression line satisfies

Since the husbands height (68.2 inches) is same as the average height of husbands, the predicted height of the wife should also be the average height of wives, that is, 65.2 inches.

Review for Final Exam PracticeA regression study on obesity shows that doing more physical exercises reduces weight. In this study they have found time spent in physical exercise explained 16% of the total sample variation in weight among obese people. What is the correlation between "time spent in physical exercise" and "weight"?

Solution:The quantity r2 tells us the percentage of changes in the response variable which are explained by the changes in explanatory variable. In this case, r2=0.16. So the correlation is r=0.4.Review for Final Exam PracticeSuppose that in families with 5 children X is the number of boys and Y is the number of girls. What is the correlation between X and Y?

Solution:Since X+Y=5, or equivalently Y =5-X, X and Y are linearly related. Therefore, the correlation between X and Y is -1.Review for Final Exam PracticeWhich scatterplot has correlation near zero?

Review for Final Exam PracticeIn a photographic process, the developing time of prints are approximately normal with mean 15.4 seconds and standard deviation 0.4 seconds. What proportion of prints will take at least 14.64 sec to develop? What proportion of prints will take 14.64 sec to 16.00 sec to develop?How many seconds is needed at most for the quickest 10%?

Review for Final Exam PracticeSolution:What proportion of prints will take at least 14.64 sec to develop?The z-score corresponding to 14.64 is

The probability corresponding to z-score -1.9 is 0.0287.Therefore, the proportion of prints that will take at least 14.64 sec to develop is 1-0.0287=0.9713.

Review for Final Exam PracticeSolution:What proportion of prints will take 14.64 sec to 16.00 sec to develop?The z-score corresponding to 16 is

The probability corresponding to z-score 1.5 is 0.9332.Therefore, the proportion of prints will take 14.64 sec to 16.00 sec to develop is 0.9332-0.0287=0.9045.

Review for Final Exam PracticeSolution:How many seconds is needed at most for the quickest 10%?Quickest 10% corresponds to the smallest 10% (less time). The z-score corresponding to probability 0.1 is -1.28.Therefore, the seconds needed at most for the quickest 10% is

Review for Final Exam PracticeWhich seems to be the likely value of Q1 (the first quartile)?Which seems to be the likely value of the median?What percentage of the observations is lying outside the box?What is the approximate value of the range?

224850%110-5=105Review for Final Exam PracticeThe following stem-and-leaf display shows the number of patients attended by a house-physician in 15 randomly selected weeks:Stem | Leaf---------------------------- 0 | 8 9 1 | 3 4 6 6 6 8 8 2 | 0 1 2 4 3 | 0 6Here 0|8 implies 8, 1|3 implies 13 etc. (i.e. the stem represents tens and leaf represents units). Which observation occurred most?How many weeks the physician had to attend between 15 to 25 patients? What is the median, Q1, and Q3?What is the IQR?Are there any outliers?169Median:18; Q1:14; Q3:22IQR=Q3-Q1=22-14=836 is an outlierReview for Final Exam PracticeWhat is the mean and standard deviation of the data set {34, 40, 43, 55}?

Solution:Mean:

Standard deviation:

Review for Final Exam Practice34404355sum-9-30128190144234

An airline company keeps track of the delay in its flights. Generally most flights have small delays but there are a few flights with very long delays. A consumer group claims that the "average" delay is 740 minutes while the airline company claims that the average is only 260 minutes. Why is the difference?

Solution:The consumer group refers to the mean while the company refers to median.The distribution is skewed to the right. So the mean is larger than the median.Review for Final Exam PracticeTo decide whether to provide electrical power using overhead lines or underground lines, the state administration has to consider the total lengths of street (measured in mile) in each subdivision of the respective state. Below is the histogram of street lengths of 47 subdivisions in a state.

Review for Final Exam Practice

Number of subdivisions10+7=1712/47=25.5%The median is the 24th observationMedianWhat is plotted along the Y-axis (the vertical axis)?How many subdivisions have total length of street between 2000 and 4000 miles?What percent of subdivisions have total length less than 1000 miles?Which seems more likely to be true?Mean = Median;Mean < Median;Mean > MedianWhich class will the median street length be in?Review for Final Exam PracticeIn order to plan transportation and parking needs, the administrations of a private high school asked students how they get to school. Some rode a school bus, some rode in with parents or friends, and others used "personal" transportations - bikes, skateboards, or just walking. The following table summarizes the response from boys and girls.

How many students takes part in the survey?What percentage of students surveyed are girl?What percentage of students take school bus?What percent of the students are girls who ride the bus?What percent of girls who ride bus?What percent of bus riders are girls? Boy GirlBus 35 32Ride 35 47Review for Final Exam PracticeSolution:

How many students takes part in the survey?35+35+32+47=149.What percentage of students surveyed are girl?(32+47)/149=53.0%.What percentage of students take school bus?(35+32)/149=45.0%.What percent of the students are girls who take the bus?32/149=21.5%.What percent of girls who ride bus?32/(32+47)=40.5%.What percent of bus riders are girls?32/(32+35)=47.8%. Boy GirlBus 35 32Ride 35 47Review for Final Exam Practice