Introduction to the Practice of Statistics -...

Introduction to the Practice of Statistics Fifth Edition

Moore, McCabe

Section 5.2 Homework Answers 5.29 An automatic grinding machine in an auto parts plant prepares axles with a target diameter µ = 40.125 mm. The machine has some variability, so the standard deviation of the diameters is σ = 0.002mm. A sample of 4 axles is inspected each hour for process control purposes, and records are kept of the sample mean diameter. If the process mean is exactly equal to the target value, what will be the mean and standard deviation of the numbers recorded?

xµ = 40.125mm x0.002σ =

4 = 0.001

5.31 Averages are less variable than individual observations. Suppose that the axle diameters in Exercise 5.29 vary according to a normal distribution. In that case, the mean x of an SRS of axles also has a normal distribution. a) Make a sketch of the normal curve for a single axle. Add the normal curve for the mean of an SRS of 4 axles on the same sketch.

b) What is the probability that the diameter of a single randomly chosen axle differs from the target value by 0.004 mm or more? Let the random variable X measure the diameter of an axle.

P( X < 40.121 OR X > 40.129) = 5% using the 68-95-99.7 Rule

c) What is the probability that the mean diameter of an SRS of 4 axles differs from the target value by 0.004mm or more?

P( X < 40.121 OR X > 40.129) = 2 40.121 40.125P 0.0024

Z

−

<

= 2P(Z < -4) ≈ 0 In actuality 0.0000634. 5.36 North Carolina State University posts the grade distributions for its courses online. You can find the distribution grades in Statistics 101 in the fall 2003 semester was

a) Using the common scale A = 4, B = 3, C = 2, D = 1, F = 0, take X to be the grade of a randomly chosen 101 student. Use the definitions of the mean (page 292) and standard deviation (page 300) for discrete random variables to find the mean µ and the standard deviation σ of grades in this course. µ = 0.21(4) + 0.43(3) + 0.30(2) + 0.05(1) + 0.01(0) = 2.78

2 2 2 2 20.21(4 - 2.78) + 0.43(3 - 2.78) + 0.30(2 - 2.78) + 0.05(1 - 2.78) + 0.01(0 - 2.78)σ =

σ = 0.8669

Grade A B C D F X 4 3 2 1 0 P(x) 0.21 0.43 0.30 0.05 0.01

b) Statistics 101 is a large course. We can take the grades of an SRS of 50 students to be independent of each other. If x is the average of these 50 grades, what are the mean and standard deviation of x ?

xµ = 2.78 while x0.8669σ =

50 ≈ 0.1226

c) What is the probability P(X ≥ 3) that a randomly chosen Statistics 101 student gets a B or better? What is the approximate probability P( x ≥ 3) that the grade point average for 50 randomly chosen Statistics 101 students is B or better? P(X ≥ 3) = 0.43 + 0.21 = 0.64

P( x ≥ 3) ≈ 3-2.78P 0.866950

Z

>

≈ P(Z > 1.794) ≈ 0.03636 5.37 Sheila's doctor is concerned that she may suffer from gestational diabetes (high blood glucose levels during pregnancy). There is variation both in the actual glucose level and in the blood test that measures the level. A patient is classified as having gestational diabetes if the glucose level is above 140 milligrams per deciliter (mg/dl) one hour after a sugary drink is ingested. Sheila's measured glucose level one hour after ingesting the sugary drink varies according to the normal distribution with µ = 125 mg/dl and σ = 10 mg/dl.

(a) If a single glucose measurement is made, what is the probability that Sheila is diagnosed as having gestational diabetes?

The key to any of these problems is to be aware of the assumptions concerning your situation. Fact 1 – We are interested in glucose levels. Fact 2 – It turns out that the Sheila’s glucose levels vary and the distribution is normal. µ = 125 mg/dl and σ = 10 mg/dl Fact 3 – A person with glucose levels above 140 mg/dl is classified as having gestational diabetes

What the result suggests is that if we take one reading at random there is a 6.68% chance of having a glucose level reading higher than 140mg/dl. I imagine that what the doctor is really interested in is not a single measurement to determine if Sheila should be classified with

gestational diabetes but rather Sheila’s glucose level average. Thus, a measurement of one number can be

140 - 125P(x > 140) P Z > 10

= P(Z > 1.5) = 0.0668

≈

used to approximate, µ, but we also understand there would be too much variability associated with one measurement alone.

(b) If measurements are made instead on 4 separate days and the mean result is compared with the criterion 140 mg/dl, what is the probability that Sheila is diagnosed as having gestational diabetes?

The question here is about the probability about the average of four measurements. Key word here we are concerned about the probability of obtaining a particular average. The question suggests we are talking about the distribution of averages. Fact: Since the original distribution from which we sampled is normally distributed, the sampling distribution of averaging four numbers is exactly normal as well. Know this fact well.

140 - 125P(X > 140) = P Z > 104

= P(Z > 3.00) = 0.0013

5.38 A $1 bet in a state lottery's Pick 3 game pays $500 if the three-digit number you choose exactly matches the winning number, which is drawn at random. Here is the distribution of the payoff X: Payoff $0 $500 Prob 0.999 0.001

Each day's drawing is independent of other drawings.

(a) What are the mean and standard deviation of the random variable X?

µ = 0(0.999) + $500(0.001) = $0.5 2 20.999(0 0.5) 0.001(500 0.5)σ = − + − = $15.80

(b) Joe buys a Pick 3 ticket every day. What does the law of large numbers say about the average payoff

Joe receives from his bets? In the long run, the average payoff for a $1 bet is $0.50. (c) What does the Central Limit Theorem say about the distribution of Joe’s average payoff after 365 bets in a year? When we consider the distribution consisting of the average of the 365 outcomes, that distribution, while discreet, will have the approximate shape of a normal distribution.

(d) Joe comes out ahead for the year if his average payoff is greater than $1 (the amount spent each day on a ticket). What is the probability that Joe ends the year ahead?

P( x ≥ $1) ≈ 1 - 0.5P 15.80365

Z

>

= P(Z > 0.60) = 0.2742 There is a2 7.42% chance that at the end of the year Joe’s average winnings exceed $1. Now lets get back to reality. I assigned this problem so you can practice this concept with a discreet table. But the original population suffers from severe case of right skewness. Thus, a distribution consisting of 365 averages will NOT be normally distributed. We would need a sample size of 10,000 to achieve a graph that has that normal shape. Below is what the distribution of 365 averages look like for the first 17 out of 366 possible averages, along with the table of values. The most likely outcome is that the average over the entire year is $0, P( x = $0) = 0.6941. At the other extreme is that Joe averages $500! This means Joe would win $500 on every pick. P( x = $500) = this contains so many zeroes that even using scientific notation the computer can not generate an answer. The chance of winning at Powerball is more likely.

So the correct answer the book question of P( x ≥ $1) is 0.2536 which is close to our approximate value of 0.2742 On the next page is the distribution for averages of 8000

x P( x ) 0.00 0.69407 1.37 0.25359 2.74 0.04620 4.11 0.00560 5.48 0.00051 6.85 0.00004 8.22 0.00000 9.59 0.00000

10.96 0.00000 12.33 0.00000 13.70 0.00000 15.07 0.00000 16.44 0.00000 17.81 0.00000

Average Yearly Winnings

0.000000.100000.200000.300000.400000.500000.600000.700000.80000

0.00 5.00 10.00 15.00 20.00

Possible Averages of 365

Prob

abili

ty

Distribuiton of averages of 8000

0.000000.020000.040000.060000.080000.100000.120000.140000.16000

0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00

Possible averages of 8000 First 23 Values

Prob

abili

ty

40. The number of flaws per square yard in a type of carpet material varies with mean 1.6 flaws per square yard and standard deviation 1.2 flaws per square yard. This population distribution can not be normal, because a count takes only whole-number values (i.e. this is a discrete population). An inspector studies 200 square yards of the material, records the number of flaws found per square yard inspected. Use the central limit theorem to find the approximate probability that the mean number of flaws exceeds 2 per square yard.

The result indicates that the probability of seeing more than 2 flaws per square yard on average from a sample of 200 will occur about 12 times out of 10,000,000 attempts, or a little more than once out of 1,000,000 attempts. In this situation we have a population that is not normally distributed. Regardless of the distribution type we can still calculate the mean and standard deviation but the 68-95-99.7 rule does not apply anymore. We are told that µ = 1.6 flaws per sq yd, and that σ = 1.2 flaws per sq yd. Again, you can see that this is not normally distributed since the smallest value for a measurement is 0 flaws per sq yd, and if we go two standard deviations to the left 1.6 – 2(1.2) we have negative flaws per sq yd which is nonsense. Also normal distributions are continuous and this distribution is

2 - 1.6P(X > 2) = P Z > 1.2200

= P(Z > 4.71) = 0.0000012

discrete, we can only have whole numbers as our outcomes; 0 flaws/sq yd, 1 flaw/sq yd, 2 flaws/sq yd, and so on. We can not have 1.75flaws/sq yd in one measurement; when we average several values then we can have fractional flaws/sq yd. The situation is we will be looking at 200 sq yds of material, and we want to know the likely hood that if we looked at blocks of 1yd by 1yd and recorded the flaws in each square yard, and then averaged all 200 numbers, what is the probability that the average recorded is greater than 2 flaws/sq yd? Key words - Central Limit Theorem – the theorem is mentioned to bring back to memory the fact that you will be dealing with the sampling distribution and not the actual population itself. Also the distribution is approximately normal, according to the theorem, so our calculated probability is also an approximation. I conducted a simulation in which I sampled 200 square yards of material and recorded the flaws in the 200 pieces of 1 yd by 1 yd. I then averaged the 200 values I recorded to get my one value of the sample mean, x . Now I repeated this procedure 765 more times to obtain the sampling distribution of the mean, for the 765 values. By doing this many times I am hoping that the distribution I got by experiment is close to the theoretical distribution, or at least I get to glimpse what the theoretical distribution probably looks like. You can see from the histogram, and the normal quantile plot that the Central Limit Theorem is correct in the fact that the distribution is very, very close to a normal distribution. The straight line in the normal quantile plot indicates that it is extremely close to a normal distribution, so much so that we can depend on the calculations we are about to make to be very good approximations. I want to calculate the probability that my average of 200 numbers exceeds 2 flaws per sq yd. I can see from my histogram that this is not very likely, since in my 750 simulations not once did this occur.

Normal Quantile Plot for the Sampling Dsitribution Simulation; 750 sample means

averaging 200 values at a time.

1.31.41.51.61.71.81.9

-4 -2 0 2 4

EXPECTED Z-SCORE

AVE

RA

GES

OF

200

VALU

ES

Simulated distribution based on given information.

I used the simulated distribution above to sample from.

5.41 In response to the increasing weight of airline passengers, the Federal Aviation Administration in 2003 told airlines to assume that passengers average 190 pounds in the summer, including clothing and carry-on baggage. But passengers vary: the FAA gave a mean but not a standard deviation. A reasonable standard deviation is 35 pounds. Weights are not normally distributed, especially when the population includes both men and women, but they are, not very non-normal. A commuter plane carries 19 passengers. What is the approximate probability that the total weight of the passengers exceeds 4000 pounds? (Hint: To apply the central limit theorem, restate the problem in terms of the mean weight.) How do we go about organizing this information? Keep in mind what we are doing. We are sampling from the U.S. population and measuring the weight of a person, clothing and carry-on baggage. So our random variable X is this final sum. What is the distribution like? They mention that it is, “not normally distributed”, yet the last statement makes us seem that we are not far off, “but they are not very non-normal.” So I will assume, it is slightly right skewed, since some people seem to carry heavy items on the carry-on skewing the weight to the right. The plane carries 19 people and since this problem involves the issue of maximum weight of an airplane, I will calculate for the worst case scenario, a full plane. We should not exceed 4000 lbs, which means that, for the 19 passengers, the average weight can not exceed 210.52 lbs. The central limit theorem suggests that the sampling distribution of the mean will be close to a normal distribution for a sample of 19. So now I will proceed the final calculations using procedures that are applied to a normal distribution.

x = 400019

lbs ≈ 210.52 lbs xσ = 3519

≈ 8.03 lbs

The result suggests that we encounter a plane exceeding maximum weight about 5 times in a thousand flights on average in the long run, for full a passenger plane.

Gather SRS of 19, n = 19 and calculate x .

210.52 - 190P(X > 210.52) = P Z > 3519

= P(Z > 2.56) = 0.0052

5.43 The distribution of annual returns on common stocks is roughly symmetric, but extreme observations are more frequent than in a normal distribution. Because the distribution is not strongly non-normal, the mean return over even a moderate number of years is close to normal. Annual real returns on the Standard & Poor's 500-Stock Index over the period 1871 to 2004 have varied with mean 9.2% and standard deviation 20.6%. Andrew plans to retire in 45 years and is considering investing in stocks. What is the probability (assum-ing that the past pattern of variation continues) that the mean annual return on common stocks over the next 45 years will exceed 15%? What is the probability that the mean return will be less than 5%? First, let us figure out what are we measuring? How is our sample space defined? The answer lies in the last paragraph; we are measuring stock return as a percentage of investment. Thus our measurements will be in the form of percentages (or decimals), positive for gains, negative for losses. We are given information about the distribution of stock returns in a yearly basis; µ = 9.2% and σ = 20.6%, and it is symmetric, but not normal (outliers are a bit too far off compared to a normal distribution). All this means is that the distribution is bell-shaped like a normal distribution, but the outliers are too extreme as compared to a normal distribution. This Andrew fellow is going to retire in 45 years, the distribution we have is on a yearly basis. But the question is about averaging over 45 years, which then means sampling 45 times from the given distribution. The central limit theorem states that the distribution for the sampling distribution of the mean will be very close to a normal distribution, therefore, I will proceed by using calculations that apply to a normal distribution.

P( x > 15% ) = ? xσ = 20.645

≈ 3.07%

15% - 9.2%P(X > 15%) = P Z > 20.6%45

= P(Z > 1.89) = 0.0294

Gather SRS of size 45 and calculate x .

The second question is handled the same way. What is the probability that the mean return will be less than 5%?

What both answers suggest is that Andrew will most likely have a return close to the advertise value (the expected value) of 9.2%.

5% - 9.2%P(X < 5%) = P Z < 20.6%45

= P(Z < 1.37) = 0.0853

−

Introduction to the Practice of Statistics -...

Documents

Transcript of Introduction to the Practice of Statistics -...