Two statistical inference methods: Confidence interval Estimator +/ Margin of Error Hypothesis...

Upload
adelajackson 
Category
Documents

view
224 
download
1
Transcript of Two statistical inference methods: Confidence interval Estimator +/ Margin of Error Hypothesis...
• Two statistical inference methods:Confidence interval
Estimator +/ Margin of ErrorHypothesis testing
Hypothesis: H0 v.s. Ha
Test statisticPvalueConclusion
Review for Final Exam
• Inference about population proportion pConfidence interval:
A level C confidence interval for p is given by
where z* is a zcritical value corresponding to the confidence level C, n is the sample size, and p is the sample proportion.
n
ppzp
)ˆ1(ˆˆ *
^
StandardError
Review for Final Exam
• Inference about population proportion pThe level C confidence interval for a
population proportion p will have margin of error approximately equal to a specified value m when the sample size is
where p* is a guessed value for the sample proportion. The margin of error will be at most m if p* is taken to be 0.5.
)1( **
2*
ppm
zn
Review for Final Exam
• Inference about population proportion pHypothesis testing
Hypotheses:H0:p=p0 v.s. Ha:p>p0/p<p0/p≠p0
Test Statistic:
n
pp
ppz
)1(
ˆ
00
0
Review for Final Exam
• Inference about population proportion p (continued):Hypothesis testing:
Pvalue:Pvalue=1Φ(z), for Ha:p>p0
Pvalue=Φ(z), for Ha:p<p0
Pvalue=2(1Φ(z)), for Ha:p≠p0
Here z is the value of the test statistic and Φ(z) is the probability from the normal table corresponding to z.
Conclusion:Reject H0 if Pvalue<αDo not reject H0 if Pvalue>α
Review for Final Exam
• Inference about population mean μConfidence interval:
A level C confidence interval for μ is given by
where t* is the tcritical value corresponding to degrees of freedom n1 and the confidence level C, n is the sample size, s is the sample standard deviation, and x is the sample mean.
n
stx *
_
Standard Error
Review for Final Exam
• Inference about population mean μHypothesis testing:
Hypotheses:H0:μ=μ0 v.s. Ha:μ>μ0/μ<μ0/μ≠μ0
Test Statistic:
The test statistic follows a tdistribution with degrees of freedom n1.
nsx
t 0
Review for Final Exam
• Inference about population mean μHypothesis testing:
Pvalue:Pvalue=Tdf(t), for Ha:μ>μ0
Pvalue=Tdf(t), for Ha:μ<μ0
Pvalue=2Tdf(t), for Ha:μ≠μ0
Here Tdf(t) means look up the tCritical Values Table for the test statistic t.
Conclusion:Reject H0 if Pvalue<α
Do not reject H0 if Pvalue>α
Review for Final Exam
• Interpretation about hypothesis testingPvalue is the probability, assuming the null
hypothesis is true, that the test statistic will take a value as extreme or more extreme (meaning favoring the alternative hypothesis Ha) than that actually observed.Caution: Pvalue is NOT the probability
that the null hypothesis is wrong.
Review for Final Exam
• Interpretation about hypothesis testingType I error: reject H0 while is H0 true
Type II error: do not reject H0 while is H0 falseThe significance level α is our tolerance for
the probability of making type I error.The Pvalue is the probability of making type
I error when we reject the null hypothesis based on our sample.
If the consequences of rejecting the null hypothesis are very serious, we want to be conservative at rejecting H0. Therefore, we should choose a small α.
Review for Final Exam
• In a survey conducted by a firm, 12 of 60 families in two story houses were found to own their houses. Let p denote the population proportion of families of two story houses who own their house.Find a 95% confidence interval for p.The firm came up with a confidence interval
(0.1406, 0.2594) for p. What confidence level did the firm use?
Assume nothing is known about p. The firm requires a 95% confidence interval with margin of error at most 0.034 for p. What is the required sample size?
Suppose that a previous survey indicates that the p is 0.28. The firm requires a 95% confidence interval with margin of error at most 0.034 for p. What is the required sample size?
Review for Final Exam – Practice
• Solution:Find a 95% confidence interval (C.I.) for p.
In general, a level C C.I. for p is given by
In this case, p=12/60=0.2;n=60;z*=1.96 (according to the 95% confidence level)
Thus a 95% C.I. for p is
)3012.0 ,0988.0(60
)2.01(2.096.12.0 ,
60
)2.01(2.096.12.0
n
ppzp
n
ppzp
)ˆ1(ˆˆ ,
)ˆ1(ˆˆ **
^
Review for Final Exam – Practice
• Solution:The firm came up with a confidence interval
(0.1406, 0.2594) for p. What confidence level did the firm use?Confidence interval for p can also be given
by
where ME is the margin of error:In this case,
ME=0.25940.2=0.0594The standard error is
Then z*=ME/SE=0.0594/0.0516=1.15, which corresponds to confidence level 75%.
MEˆ p
0516.060
)2.01(2.0)ˆ1(ˆSE
n
pp
n
ppz
)ˆ1(ˆME *
Review for Final Exam – Practice
• Solution:Assume nothing is known about p. The firm
requires a 95% C.I. with margin of error at most 0.034 for p. What is the required sample size? The required sample size for a level C
(corresponding to z*) C.I. for a p with margin of error approximately equal to m is
In this case:z*=1.96, p*=0.5, m=0.034
Then .8318.830)5.01(5.0
034.0
96.12
n
)1( **
2*
ppm
zn
Review for Final Exam – Practice
• Solution:Suppose that a previous survey indicates
that the p is 0.28. The firm requires a 95% C.I. with margin of error at most 0.034 for p. What is the required sample size? The required sample size for a level C
(corresponding to z*) C.I. for a p with margin of error approximately equal to m is
In this case:z*=1.96, p*=0.28, m=0.034
Then .67095.669)28.01(28.0
034.0
96.12
n
)1( **
2*
ppm
zn
Review for Final Exam – Practice
• To target the right agegroup of people, a marketing consultant must find which agegroup purchases from homeshopping channels on TVs more frequently. According to management of TeleSell24/7, a homeshopping store on TV, about 40% of the onlinemusicdownloaders are in their fifties, but the marketing consultant does not believe in that figure. To test this he selects a random sample of 205 onlinemusicdownloaders and finds 71 of them are in their fifties. What are the hypotheses in this case?What is the value of the test statistic?What is the Pvalue of the test?What is your conclusion at α=5%?
Review for Final Exam – Practice
• Solution:The sample:
What are the hypotheses in this case?H0:p=0.4 v.s. Ha:p≠0.4
What is the value of the test statistic?
58.1
205)40.01(40.0
40.0346.0
)1(
ˆ
00
0
npp
ppz
205,346.0205
71ˆ np
Review for Final Exam – Practice
• Solution:What is the Pvalue of the test?
According to Ha:p≠0.4, Pvalue=2(1Φ(1.58))=0.1141.
What is your conclusion?Since Pvalue>α(=5%), we do not reject
the null hypothesis.If we concluded that 40% of the online
musicdownloaders are in their fifties while in fact this proportion is 35%, then we made a Type I Error.we made a Type II Error.we made a correct decision.
Review for Final Exam – Practice
• The safety management of an offshore oilmining corporation believes that the true average escape time would be at most 340 min. A sample of 28 offshore oilworkers took part in a simulated escape exercise. The sample yielded an average escape time of 347.68 min. and standard deviation of 26.95 min. Does this data contradict the management's claim? What are the hypotheses in this case?What is the value of the test statistic?What is the Pvalue of the test?What is your conclusion at α=5%?What is a 98% confidence interval of the
average escape time?
Review for Final Exam – Practice
• Solution:The sample:
What are the hypotheses in this case?H0:μ=340 v.s. Ha:μ>340
What is the value of the test statistic?
The test statistic follows a tdistribution with degrees of freedom 281=27.
.28,95.26,68.347 nsx
508.1
2895.26
34068.3470
nsx
t
Review for Final Exam – Practice
• Solution:What is the Pvalue of the test?
According to Ha:μ>340, Pvalue is between 0.05 and 0.10.
Review for Final Exam – Practice
• Solution:What is your conclusion?
Since Pvalue>α(=5%?), we do not reject the null hypothesis.
If we concluded that the management's claim is correct while in fact average escape time is 340 min., then we made a Type I Error.we made a Type II Error.we made a correct decision.
Review for Final Exam – Practice
• Solution:What is a 98% confidence interval of the
average escape time?A level C confidence interval for μ is
given by
We havet*=2.473 (corresponding to degrees of freedom 27 and the confidence level 98%);n=28, s=26.95, and x=347.68.
So a 98% confidence interval of the average escape time is
).2752.360,0848.335(28
95.26473.268.347
n
stx *
_
Review for Final Exam – Practice
• In a test of hypothesis, if we insist on very strong evidence against the null hypothesis we shouldchoose α to be very smallchoose α to be larger than the Pvaluechoose α to be very largechoose α to be smaller than the Pvalue
Review for Final Exam – Practice
• Based on a random sample of 50 students from among 40,000, a 91 percent confidence interval on the mean height of all 40,000 students was found to be the interval from 66 inches to 69.2 inches. Select the correct statement below:About 91 percent of all 40,000 students
have heights between 66 and 69.2.About 91 percent of the heights in the
sample should be between 66 and 69.2The probability that the mean height is
between 66 and 69.2 is 91 percent.About 91 percent of all samples would
produce intervals containing μ
Review for Final Exam – Practice
• In a test of hypotheses, data are deemed to be significant at level α=0.05, but not significant at level α=0.01. Which of the following is true about the Pvalue associated with this test?Pvalue is greater than 0.05.Pvalue is between 0.01 and 0.05.Pvalue is less than 0.01.Nothing can be said.
Review for Final Exam – Practice
• Sample / Population• Statistics / Parameters• Random sampling design
Simple random sample (SRS)Stratified random sampleCluster sampleMultistage sample
• Use random digits to draw simple random samples
Review for Final Exam
• Law of large numbers• Probability: Sample space / Events• Rules for probability model:
1. for any event A, 0 ≤ P(A) ≤ 1.2. for sample space S, P(S) = 1.3. if two events A and B are disjoint, then
P(A or B) = P(A) + P(B).4. for any event A,
P(A does not occur) = 1  P(A).5. For two independent events A and B,
P(A and B) = P(A) X P(B).• Venn diagram
Review for Final Exam
• General Addition Rule:For two events A and B,
P(A or B) = P(A) + P(B) – P(A and B).
• General Multiplication RuleFor two events A and B,
P(A and B) = P(BA) X P(A).
• Conditional probability
• Independence: P(BA) = P(B).
P(A)
B) andP(A A)P(B
Review for Final Exam
• Random variable:A random variable is a variable whose value
is a numerical outcome of a random phenomenon.
• Distribution:The probability distribution (distribution) of
a random variable tells us what values this random variable can take and how to assign probabilities to those values.
Review for Final Exam
• Statistics are random variables.Sample proportionSample mean
• Central limit theorem• Sampling distributions of statistics
Review for Final Exam
• Sampling distribution of the sample proportion p for an SRS of size n:mean of p equals the population proportion
p;standard deviation of p equals
If the sample size is large, then p is approximately Normal, that is,
;)1(
n
pp
.)1(
,~ˆ
n
pppNp
^
^
^
^
Review for Final Exam
• Sampling distribution of the sample mean x for an SRS of size n:mean of x equals the population mean μ;standard deviation of x equals , where σ
is the
population standard deviation; if the sample size is large, then x is
approximately normal, that is,
if the population has a normal distribution, then the approximation is exact.
n
; ,~
n
σNx
_
_
_
_
Review for Final Exam
• Motor vehicles sold to individuals are classified as either cars or light trucks (including SUVs) and as either domestic or imported. In a recent year, 69% of vehicles sold were light trucks, 78% were domestic, and 55% were domestic light trucks. For a randomly selected vehicle, what is the probability thatthe vehicle is a car?the vehicle is either domestic or a light
truck or both?the vehicle is an imported light truck?the vehicle is a domestic if we know it is a
car?
Review for Final Exam – Practice
• 56% of all American workers have a workplace retirement plan, 66% have health insurance, and 73% have at least one of the benefits. We select a worker at random.What is the probability that he has both
health insurance and a retirement plan?What is the probability that he has neither
health insurance nor a retirement plan?What is the probability that he only has a
retirement plan?Knowing that he has a retirement plan, what
is the probability that he has health insurance?
Review for Final Exam – Practice
• Solution:Let A be the event that he has a retirement
plan.Let B be the event that he has health
insurance.Then P(A)=0.56, P(B)=0.66, and P(A or
B)=0.73.
A
B
A
B
B
A
Review for Final Exam – Practice
• Solution:What is the probability that he has both
health insurance and a retirement plan?P(A and B)=?General addition rule:
P(A or B) = P(A) + P(B)  P(A and B)Therefore, P(A and B) = P(A) + P(B)  P(A
or B) = 0.56+0.660.73 = 0.49
B
A
Review for Final Exam – Practice
• Solution:What is the probability that he has neither
health insurance nor a retirement plan?The probability that he has at least one
benefit is 0.73.Therefore, the probability that he has
neither health insurance nor a retirement plan is 10.73=0.27.
B
A
Review for Final Exam – Practice
• Solution:What is the probability that he only has a
retirement plan?“Only has a retirement plan” means has a
retirement plan but no health insurance (not both).
Therefore, P(he only has a retirement plan) = P(A) – P(A and B) = 0.560.49 = 0.07
B
A
Review for Final Exam – Practice
• Solution:Knowing that he has a retirement plan, what
is the probability that he has health insurance?
.875.056.0
49.0
P(A)
A) and P(BA)P(B
Review for Final Exam – Practice
• Spellchecking software catches “nonword errors” that result in a string of letters that is not a word, as when “the” is typed as “teh.” When undergraduates are asked to type a 250word essay (without spellchecking), the number X of nonword errors has the following distribution:
• For a randomly selected student, what is the probability thathe made 4 or more errors?he made at most 1 error?
• For four randomly selected student, what is the probability thateach of them made no more than 2 errors?at least one of them made an error?
Review for Final Exam – Practice
X 0 1 2 3 >=4
Probability 0.1 0.2 0.3 0.3 ?
• In a large Statistics lecture, the professor reports that 52% of the students enrolled have never taken a Calculus course, 34% have taken only one semester of Calculus, and the rest have taken two or more semesters of Calculus. The professor randomly assigns students to groups of three to work on a project for the course.What is the probability that the first group
member you meet has studied some Calculus?What is the probability that the first group
member you meet has studied no more than one semester of Calculus?
What is the probability that both of your two group members have studied exactly one semester of Calculus?
What is the probability that at least one of your group members has had more than one semester of Calculus?
Review for Final Exam – Practice
• Solution:Let A denote the event that a student has
never taken a Calculus courseLet B denote the event that a student has
taken only one semester of CalculusLet C denote the event that a student has
taken two or more semesters of Calculus.
A B C
Review for Final Exam – Practice
• Solution:First, we can find the probability that a
student has taken two or more semesters of Calculus:P(C) = 1–P(A)–P(B) = 10.520.34=0.14.
What is the probability that the first group member you meet has studied some Calculus?{Some Calculus} = B or CP(Some Calculus) = P(B or C) = P(B)
+P(C) = 0.34+0.14 = 0.48.
Review for Final Exam – Practice
• Solution:What is the probability that the first group
member you meet has studied no more than one semester of Calculus?C = {a student has taken two or more
semesters of Calculus}CC = {a student has studied no more
than one semester of Calculus}P(no more than one semester of
Calculus) = P(CC) = 1P(C) = 10.14 = 0.86.
Review for Final Exam – Practice
• Solution:What is the probability that both of your two
group members have studied exactly one semester of Calculus?The two events
A1={first member has studied exactly one semester of Calculus}A2={second member has studied exactly one semester of Calculus}
are independent.Thus, P(both members have studied
exactly one semester of Calculus) = P(A1 and A2) = P(A1)XP(A2) = 0.34X0.34 = 0.1156
Review for Final Exam – Practice
• Solution:What is the probability that at least one of your
group members has had more than one semester of Calculus?Let E={at least one of your group members
has had more than one semester of Calculus}EC={neither of your group members has had
more than one semester of Calculus}E1={first members does not have had more than one semester of Calculus}E2={second members does not have had more than one semester of Calculus}
P(EC) = P(E1 and E2) = P(E1)XP(E2) = (10.14)2.P(E) = 1P(EC) = 1(10.14)2 = 0.2604.
Review for Final Exam – Practice
• A North American roulette wheel has 38 slots, of which 18 are red, 18 are black, and 2 are green. If you bet on red, the probability of winning is 18/38 = .4737. The probability .4737 represents
(A) nothing important, since every spin of the wheel results in one of three outcomes (red, black, or green).(B) the proportion of times this event will occur in a very long series of individual bets on red.(C) the fact that you're more likely to win betting on red than you are to lose.(D) the fact that if you make 100 wagers on red, you'll have 47 or 48 wins.
Review for Final Exam – Practice
• A company has developed a new battery, but the average lifetime is unknown. In order to estimate this average, a sample of 100 batteries is tested and the average lifetime of this sample is found to be 250 hours. Here the population of interest is:
100 batteries, which were tested / average of 250 hours/ all newly developed batteries by the company / lifetime of newly developed batteries
Here the sample is:100 batteries, which were tested /
lifetime of newly developed batteries / average of 250 hours / not in the list
Review for Final Exam – Practice
• A company has developed a new battery, but the average lifetime is unknown. In order to estimate this average, a sample of 100 batteries is tested and the average lifetime of this sample is found to be 250 hours. What is the parameter of interest in this
case?average lifetime of 100 batteries tested /
average of all newly developed batteries by the company / 100 batteries sampled and tested / no parameter is involved in this problem
The 250 hours is the value of:parameter / statistic / sample / variable
Review for Final Exam – Practice
• There are 30 problems in Ch12 in 4 pages and 45 problems in Ch13 in another set of 4 pages. In order to make up a homework set based on chapters 12 and 13 the instructor considers the following different schemes. Identify the sampling scheme employed. Method 1: Label the 75 problems from 1 through 75
and draw 10 numbers at random and choose the corresponding problems.Simple Random Sampling
Method 2: Pick 4 problems from the 30 in chapter 12 and pick 6 problems from the 45 in chapter 13.Stratified Random Sampling
Method 3: Pick two pages at random and assign all the problems in those pagesCluster Sampling
Method 4: Pick two pages at random and pick 5 problems at random from each of those two pages.Multistage Sampling
Review for Final Exam – Practice
• A student group has 8 members:1. Barrett 2. Chen 3. DeRoos 4. Maceli5. Pagliarulo 6. Smithson 7. Williams 8. Zachary
Three of them will be selected to participate a national conference. If we use the following random digits (start from the left) to select a simple random sample of size 3, then who will attend the conference?
2023967 8523610 4317063 5689043 5463038 9406022
A. Barrett, Chen, DeRoos B. Chen, Chen, DeRoosiC. Chen, DeRoos, Smithson D. Chen, Pagliarulo, Williams
Review for Final Exam – Practice
• Data / Data table• Cases• Variables (Categorical / Quantitative)• Display Categorical Variables
Frequency Table / Relative Frequency Table
Bar Chart / Relative Frequency Bar Chart / Pie Chart
Review for Final Exam
• Graphic techniques for displaying quantitative variables:HistogramsStemandleaf displays
• Shape of distributions:Unimodal / Bimodal / Multimodal / UniformSymmetric / Skewed to the left / Skewed
to the rightOutlier
Review for Final Exam
• Numerical descriptions for the distribution of a quantitative variable : The center of a distribution
MeanMedian
The spread of a distributionStandard deviationInterquartile Range (IQR)
Five number summary / Outlier (1.5IQR rule)Boxplot
Review for Final Exam
• Shifting and rescaling of quantitative variables• Standardization of quantitative variables (z
score)
• The Normal modelMean and standard deviation689599.7 ruleTwo types of problems:
Find percentageFind percentiles
sxxz
Review for Final Exam
• Scatterplot for two quantitative variablesDirection
positive / negativeForm
linear / curved / no patternStrength
strong / moderate / weak• Correlation coefficient r
Review for Final Exam
• Linear models
• Least square regression line
• Predictions and residuals
xbby10
ˆ
xbybs
srb
x
y101 and
Review for Final Exam
• The mean height of American women in their early twenties is about 64.5 inches and the standard deviation is about 2.5 inches. The mean height of men the same age is about 68.5 inches, with standard deviation about 2.7 inches. If the correlation between the heights
• of husbands and wives is about r = 0.5, what is the equation of the regression line of the husband’s height on the wife’s height in young couples? Predict the height of the husband of a woman who is 67 inches tall. What percentage of variation in husbands’ height is explained by wives’ height?
Review for Final Exam – Practice
• Michigan State University researchers want to investigate how rainfall affects the yield of crops in East Lansing. The researchers found that the average amount of rainfall over the past 20 years is about 230 inches and the standard deviation is about 10 inches. The average yield of crops in East Lansing is about 280 tones with a standard deviation of 20 tones. The correlation between the amount of rainfall and yield of crops is about 0.4. 1) What is the slope of the regression line of yield
of crop on amount of rainfall?2) What is the intercept of the appropriate
regression line?3) What is the predicted value of the yield of crop
when the amount of rainfall is 240 inches? If the actual yield of crop of the year with rainfall 240 inches is 280, what is the residual?
4) What percentage of variation in crop yield is explained by the rainfall?
Review for Final Exam – Practice
• Solution:1) What is the slope of the regression line of
yield of crop on amount of rainfall?The slope is given by
HereThus the slope is
2) What is the intercept of the appropriate regression line?The intercept is given byHereThus the intercept is
x
y
s
srb 1
.20,10,4.0 yx ssr
8.010
204.01 b
xbyb 10 .8.0,280,230 1 byx
.96)8.0(2302800 b
Review for Final Exam – Practice
• Solution:3) What is the predicted value of the yield of crop
when the amount of rainfall is 240 inches? If the actual yield of crop of the year with rainfall 240 inches is 280, what is the residual?The predicted value is
The residual is
4) What percentage of variation in crop yield is explained by the rainfall?The quantity r2 tells us the percentage of
changes in the response variable which are explained by the changes in explanatory variable. In this case, r2=0.42=0.16.
.8288280ˆ yy
.288)240(8.0968.096ˆ xy
Review for Final Exam – Practice
• In a population of couples the average height of wives' was 65.2 inches and that of the husbands 68.2 inches. You use the regression line to make predictions of the wife's height from the husband's height. Suppose a husband has height 68.2 inches, what would be the predicted height of the wife?
• Solution:The regression line satisfies
Since the husband’s height (68.2 inches) is same as the average height of husbands, the predicted height of the wife should also be the average height of wives, that is, 65.2 inches.
xbby 10
Review for Final Exam – Practice
• A regression study on obesity shows that doing more physical exercises reduces weight. In this study they have found time spent in physical exercise explained 16% of the total sample variation in weight among obese people. What is the correlation between "time spent in physical exercise" and "weight"?
• Solution:The quantity r2 tells us the percentage of changes in the response variable which are explained by the changes in explanatory variable. In this case, r2=0.16. So the correlation is r=0.4.
Review for Final Exam – Practice
• Suppose that in families with 5 children X is the number of boys and Y is the number of girls. What is the correlation between X and Y?
• Solution:Since X+Y=5, or equivalently Y =5X, X
and Y are linearly related. Therefore, the correlation between X and
Y is 1.
Review for Final Exam – Practice
• In a photographic process, the developing time of prints are approximately normal with mean 15.4 seconds and standard deviation 0.4 seconds. 1) What proportion of prints will take at least
14.64 sec to develop? 2) What proportion of prints will take 14.64
sec to 16.00 sec to develop?3) How many seconds is needed at most for
the quickest 10%?
Review for Final Exam – Practice
• Solution:1) What proportion of prints will take at least
14.64 sec to develop?The zscore corresponding to 14.64 is
The probability corresponding to zscore 1.9 is 0.0287.
Therefore, the proportion of prints that will take at least 14.64 sec to develop is 10.0287=0.9713.
.9.14.0
4.1564.14
x
z
Review for Final Exam – Practice
• Solution:1) What proportion of prints will take 14.64
sec to 16.00 sec to develop?The zscore corresponding to 16 is
The probability corresponding to zscore 1.5 is 0.9332.
Therefore, the proportion of prints will take 14.64 sec to 16.00 sec to develop is 0.93320.0287=0.9045.
.5.14.0
4.1516
x
z
Review for Final Exam – Practice
• Solution:1) How many seconds is needed at most for
the quickest 10%?Quickest 10% corresponds to the
smallest 10% (less time). The zscore corresponding to probability
0.1 is 1.28.Therefore, the seconds needed at most
for the quickest 10% is
.28.104.0)28.1(4.15 zx
Review for Final Exam – Practice
• Which seems to be the likely value of Q1 (the first quartile)?
• Which seems to be the likely value of the median?
• What percentage of the observations is lying outside the box?
• What is the approximate value of the range?
22
48
50%
1105=105
Review for Final Exam – Practice
• The following stemandleaf display shows the number of patients attended by a housephysician in 15 randomly selected weeks:Stem  Leaf 0  8 9 1  3 4 6 6 6 8 8 2  0 1 2 4 3  0 6Here 08 implies 8, 13 implies 13 etc. (i.e. the stem represents tens and leaf represents units). 1) Which observation occurred most?2) How many weeks the physician had to attend
between 15 to 25 patients? 3) What is the median, Q1, and Q3?4) What is the IQR?5) Are there any outliers?
16
9
Median:18; Q1:14; Q3:22
IQR=Q3Q1=2214=836 is an outlier
Review for Final Exam – Practice
• What is the mean and standard deviation of the data set {34, 40, 43, 55}?
• Solution:Mean:
Standard deviation:
.434
55434034
x
Review for Final Exam – Practice
34 40 43 55 sum
9 3 0 12
81 9 0 144 234xx
2)( xx
832.814
234
1
)( 2
n
xxs
• An airline company keeps track of the delay in its flights. Generally most flights have small delays but there are a few flights with very long delays. A consumer group claims that the "average" delay is 740 minutes while the airline company claims that the average is only 260 minutes. Why is the difference?
• Solution:The consumer group refers to the mean
while the company refers to median.The distribution is skewed to the right. So
the mean is larger than the median.
Review for Final Exam – Practice
• To decide whether to provide electrical power using overhead lines or underground lines, the state administration has to consider the total lengths of street (measured in mile) in each subdivision of the respective state. Below is the histogram of street lengths of 47 subdivisions in a state.
Review for Final Exam – Practice
Number of subdivisions
10+7=17
12/47=25.5%
The median is the 24th observation
Med
ian
• What is plotted along the Yaxis (the vertical axis)?• How many subdivisions have total length of street
between 2000 and 4000 miles?• What percent of subdivisions have total length less than
1000 miles?• Which seems more likely to be true?
1) Mean = Median; Mean < Median;Mean > Median
• Which class will the median street length be in?
Review for Final Exam – Practice
• In order to plan transportation and parking needs, the administrations of a private high school asked students how they get to school. Some rode a school bus, some rode in with parents or friends, and others used "personal" transportations  bikes, skateboards, or just walking. The following table summarizes the response from boys and girls.
1) How many students takes part in the survey?2) What percentage of students surveyed are girl?3) What percentage of students take school bus?4) What percent of the students are girls who ride
the bus?5) What percent of girls who ride bus?6) What percent of bus riders are girls?
Boy GirlBus 35 32Ride 35 47
Review for Final Exam – Practice
• Solution:
1) How many students takes part in the survey?35+35+32+47=149.
2) What percentage of students surveyed are girl?(32+47)/149=53.0%.
3) What percentage of students take school bus?(35+32)/149=45.0%.
4) What percent of the students are girls who take the bus?32/149=21.5%.
5) What percent of girls who ride bus?32/(32+47)=40.5%.
6) What percent of bus riders are girls?32/(32+35)=47.8%.
Boy GirlBus 35 32Ride 35 47
Review for Final Exam – Practice