All the Probability and Statistics Sheets

20
G5098 Probability and Statistics 2010–11: Exercises 1 Probability ideas Issued: Week 1, Friday 14 January. Workshop: Week 2, 17–21 January. Hand in: Friday 21 January, by 4 pm. Solutions posted online: Friday 28 January, evening. 1.1 On the space Ω = {1, 2, 3,...} a probability distribution is defined by p ω =2 -ω for ω = 1, 2, 3, ... . (a) For positive integer n denote by A n the event {ω is a multiple of n}. Show that P (A n )=1/(2 n - 1). (b) Denote by C the event {ω is a multiple of 2 or of 3}. Find P (C ). 1.2 Show by induction that, for any events A 1 , A 2 , ... , A n , P n k=1 A k n k=1 P (A k ). This is called Boole’s inequality. 1.3 In the UK National Lottery, where six numbers are chosen at random from a list of 49 numbers, players select six numbers themselves, hoping to match as many of the chosen numbers as possible. Find the probabilities that for a given entry: (a) exactly four winning numbers are selected; (b) at least four winning numbers are selected; (c) exactly two of the winning numbers are multiples of six. 1.4 The roulette wheel in a UK casino has 37 numbers, {0, 1,..., 36}, all equally likely. Pandora makes bets on the following combinations: (a) top half = {19, 20, 21,..., 36}; (b) odd = {1, 3, 5,..., 35}; (c) bottom row = {34, 35, 36}; (d) the foursome {14, 15, 17, 18}. Her bet wins if the winning number falls into her selection. Let A, B, C and D indicate that the above four respective bets are winning bets. Find the probabilities of A, B, A B, A B, B C and A B C D. PSEx.tex 17 i 2011

description

Mathematics

Transcript of All the Probability and Statistics Sheets

Page 1: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 1

Probability ideas

Issued: Week 1, Friday 14 January.Workshop: Week 2, 17–21 January.Hand in: Friday 21 January, by 4 pm.Solutions posted online: Friday 28 January, evening.

1.1 On the space Ω = 1, 2, 3, . . . a probability distribution is defined by pω = 2−ω for ω = 1,2, 3, . . . .

(a) For positive integer n denote by An the event ω is a multiple of n. Show thatP (An) = 1/(2n − 1).

(b) Denote by C the event ω is a multiple of 2 or of 3. Find P (C).

1.2 Show by induction that, for any events A1, A2, . . . , An,

P

(

n⋃

k=1

Ak

)

≤n∑

k=1

P (Ak).

This is called Boole’s inequality.

1.3 In the UK National Lottery, where six numbers are chosen at random from a list of 49numbers, players select six numbers themselves, hoping to match as many of the chosennumbers as possible. Find the probabilities that for a given entry:

(a) exactly four winning numbers are selected;(b) at least four winning numbers are selected;(c) exactly two of the winning numbers are multiples of six.

1.4 The roulette wheel in a UK casino has 37 numbers, 0, 1, . . . , 36, all equally likely.Pandora makes bets on the following combinations:

(a) top half = 19, 20, 21, . . . , 36;(b) odd = 1, 3, 5, . . . , 35;(c) bottom row = 34, 35, 36;(d) the foursome 14, 15, 17, 18.

Her bet wins if the winning number falls into her selection. Let A, B, C and D indicatethat the above four respective bets are winning bets. Find the probabilities of A, B,A ∩ B, A ∪ B, B ∩ C and A ∩ B ∩ C ∩D.

PSEx.tex 17 i 2011

Page 2: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 2

Conditional probability and independence

Issued: Week 2, Friday 21 January.Workshop: Week 3, 24–28 January.Hand in: Friday 28 January, by 4 pm.Solutions posted online: Friday 4 February, evening.

2.1 Darren plays twice as many good games as bad games; he scores in 70% of his good gamesand in 40% of his bad games. In what proportion of games does he score? Given that hehas scored, what is the probability that he had a good game?

2.2 An urn initially has one red ball. Persephone uses a device to select n blue balls withprobability e−λλn/n!, for n = 0, 1, 2, . . . , and add them to the urn. She then selects oneball at random from the urn. Show that the probability that she selects the red ball is(1− e−λ)/λ.

2.3 In a communications system, a string of 0s and 1s, known as ‘bits’, is transmitted by asender to a receiver. Noise in the system means that some bits are incorrectly received:the probability that a 0 arrives as a 1 is 0·05 and the probability that a 1 arrives as a 0is 0·1. All bits are sent independently and 60% of all items sent begin as 1.

(a) Find the proportion of bits accurately received.

(b) Find the probability that a 1 was sent, given that a 1 is received.

(c) Find the probability that a 0 was sent, given that a 0 is received.

(d) To improve reliability, 0 is sent as 000 and 1 is sent as 111; the signal receivedis interpreted as 0 or 1, according to the majority of 0s and 1s. For example, 010 isinterpreted as 0. What proportion of signals sent are interpreted correctly?

2.4 The system in the diagram below will work if there is some path from left to right. In theboxes, which represent components of the system, the numbers indicate the probabilitythat that component will fail in the next five years. Components behave independentlyof each other.

input

0·2

0·05 A 0·05 B

0·3

0·3

0·3

output

What is the probability that the system fails within the next five years? Given that ithas not failed in five years, find the probability that neither of the components markedA and B has failed.

Page 3: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 3

Discrete random variables I

Issued: Week 3, Friday 28 January.Workshop: Week 4, 31 January–4 February.Hand in: Friday 4 February, by 4 pm.Solutions posted online: Friday 11 February, evening.

The following facts may help. The first two are standard.

n∑

k=1

k =1

2n(n+ 1),

n∑

k=1

k2 =1

6n(n+ 1)(2n+ 1),

∞∑

k=0

k2xk =x(1 + x)

(1− x)3for |x| < 1.

3.1 In each case below show that the probabilities given are non-negative and sum to 1, andthat the stated means and variances are correct. In parts (b) and (c), take 0 < p < 1 andq = 1− p.

(a) The discrete uniform distribution Unif(1, . . . , n), where P (X = x) = 1/n for x = 1,2, . . . , n, with EX = (n+ 1)/2 and varX = (n2 − 1)/12.

(b) The geometric distribution Geom(p), where P (X = x) = pqx for x = 0, 1, 2, . . . ,with EX = q/p and varX = q/p2.

(c) The binomial distribution Binom(n, p), where P (X = x) =(

nx

)

pxqn−x for x = 0, 1,. . . , n, with EX = np and varX = npq.Hint for (c): find E(X2) by calculating E

(

X(X − 1))

and adding EX to it.

3.2 In a simple Lottery, four numbers are selected at random from twenty, and you win thefirst prize if you match all four numbers, or a second prize if you match 3 (and not 4)numbers. You buy one ticket.

(a) Find the probabilities of winning the respective prizes.

(b) Calculate the mean number of correct guesses that you will make.

3.3 You buy 100 used computer monitors for a lump sum of £1,500. You expect about 60%to be functioning, and you’ll sell those for £40 each. The rest you’ll sell at £5 each asscrap. Model X, the number of monitors that will be functioning, using the binomialdistribution. Write down your net profit, in terms of X. Deduce the mean and standarddeviation of your net profit.

3.4 Suppose the random variables X and Y have the following joint distribution:

P (X = 1, Y = 0) = P (X = 0, Y = 0) = P (X = 0, Y = 1) = P (X = −1, Y = 0) =1

4.

(a) Find the (marginal) distributions of X and of Y .

(b) Deduce that E(XY ) = EX EY , but that X and Y are not independent.

Page 4: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 4

Discrete random variables II

Issued: Week 4, Friday 4 February.Workshop: Week 5, 7–11 February.Hand in: Friday 11 February, by 4 pm.Solutions posted online: Friday 18 February, evening.

4.1 The paper by Y. Mori and B. R. Ellingwood, ‘Reliability-based service-life assessmentof aging concrete structures’ (J. Structural Engineering 119(1993), 1600–1621) suggeststhat as exceptional structural loads occur at random over time, Poisson distributions areappropriate for modelling the numbers of such loads in given time intervals. Supposethat exceptional loads on a specific building occur at the rate of two per year on average.Use a Poisson distribution, with appropriate mean in relation to the time period, to findthe probabilities of the following events.

(a) Exactly five exceptional loads occur over the years 2011–12.

(b) At least two exceptional loads occur in 2013.

(c) The second exceptional load in 2011 or after occurs in 2012.

(d) Just two exceptional loads occur in 2011, given that there are five in total in 2011–12.

4.2 Suppose that X has the Pois(λ) distribution and that, independently, Y is Pois(µ). Thenit is shown in lectures that X + Y is Pois(λ + µ). Find the conditional probabilityP (X = k|X + Y = n) when 0 ≤ k ≤ n and write your answer as a binomial probability.Note that this is a generalisation of Exercise 4.1(d).

4.3 In lectures, it was found that the probability generating function of a Binom(n, p) randomvariable is (pz + 1− p)n.

(a) Show that, if X and Y are independent with Binom(m, p) and Binom(n, p) distribu-tions, then X + Y has a Binom(m+ n, p) distribution.

(b) Show that if, instead, Y has a Binom(n, r) distribution, X + Y does not have abinomial distribution unless r = p.

4.4 Suppose that X has the geometric distribution Geom(p), so that P (X = x) = pqx forx = 0, 1, 2, . . . . Show that its probability generating function is p/(1 − qz), providedthat |z| < 1/q. Hence confirm that the mean and variance of X are q/p and q/p2,respectively.

Use this result to show that the probability generating function of the sum ofr independent Geom(p) random variables is pr/(1 − qz)r. Deduce that if Y has thisdistribution then

P (Y = y) =

(

y + r − 1

y

)

prqy (y = 0, 1, 2, . . .).

This is called the negative binomial distribution NB(r, p) and arises as the probabilitythat in a sequence of Bernoulli trials there are exactly y failures before the rth success.Deduce the mean and variance of a NB(r, p) distribution from its probability generatingfunction, and note that these results can be obtained directly from the fact that thisnegative binomial arises as the sum of r independent geometric random variables.

Page 5: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 5 page 1

Continuous random variables I

Issued: Week 5, Friday 11 February.Workshop: Week 6, 14–18 February.Hand in: Friday 18 February, by 4 pm.Solutions posted online: Friday 25 February, evening.

5.1 LetX denote the vibratory stress, in pounds per square inch (psi), on a wind turbine bladeat a particular wind speed in a wind tunnel. The paper by P. S. Veers, ‘Blade fatiguelife assessment with application to VAWTS’ (J. Solar Energy Engineering 104(1982),106–111) proposes a ‘Rayleigh’ distribution, with density

f(x) =

x

θ2e−x2/(2θ2), for x > 0,

0, otherwise.

where θ > 0, as a model for the distribution of X.

(a) Verify that f is a legitimate density.

(b) Suppose θ = 100 (a value suggested by a graph in the article). What is the probabilitythat X is at most 200 psi? Less than 200 psi? At least 200 psi?

(c) What is the probability that X is between 100 psi and 200 psi (again assuming θ =100)?

(d) Find the distribution function of X.

5.2 A one-person business has to submit detailed accounts to the Income Tax authorities onlyif its annual turnover is at least £67 000 per annum. Let X, in thousands of pounds, bethe annual turnover of a randomly chosen such business. A widely used model for thedensity of such a random variable, for which there are theoretical reasons, is

f(x) :=

kx−α, if x > 67,0, otherwise,

where α and k are parameters.

(a) What values of α can be permitted? Find the value of k in terms of α.

(b) Find the distribution function of X.

(c) For the case α = 4 find P (X ≤ 75) and P (X > 100).

5.3 Suppose that X has density f(x) = 2x for 0 < x < 1.

(a) Calculate P (0·2 < X < 0·6).(b) Calculate P (0·2 < X < 0·6 | X < 0·4).(c) Calculate P (0·2 < X < 0·6 | X > 0·4).(d) Show how P (0·2 < X < 0·6) can be obtained from the answers to parts (b) and (c)using the Law of Total Probability.

5.4 Prove that if X has the Γ(α, λ) distribution, i.e.

f(x) =

λα

Γ(α)xα−1e−λx, if x > 0,

0, elsewhere,

continued . . .

Page 6: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 5 page 2

where α > 0 and λ > 0, then(a) EX = α/λ;(b) varX = α/λ2.

Reminder: Γ(α) =∫∞

0xα−1e−x dx.

Page 7: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 6

Continuous random variables II

Issued: Week 6, Friday 18 February.Workshop: Week 7, 21–25 February.Hand in: Friday 25 February, by 4 pm.Solutions posted online: Friday 4 March, evening.

6.1 The diameter in inches, at chest height, of trees of a certain type is normally distributedwith mean 8·8 and standard deviation 2·8, as suggested in D. M. AedoOrtiz, E. D.Olsen and L. D. Kellogg, ‘Simulating a harvester-forwarder softwood thinning: a softwareevaluation’ (Forest Products J. 47(1997), 36–41).

(a) What is the probability that the diameter of a randomly selected tree will be at least10 in? Will exceed 10 in?

(b) What is the probability that the diameter of a randomly selected tree will exceed20 in? Comment on this calculation.

(c) What is the probability that the diameter of a randomly selected tree will be between5 and 10 in?

(d) What value c is such that the interval (8·8− c, 8·8 + c) includes 98% of all diametervalues?

(e) What is the probability, if four trees are independently selected, that at least one hasa diameter exceeding 10 in?

6.2 Let X have density

f(x) =1

4e−|x|/2, for −∞ < x < ∞.

(a) Sketch this density.

(b) Show that EX = 0 and varX = 8.

(c) Suppose that the error, in grammes, of a balance has the above distribution and that100 items are weighed, independently of each other. Use the Central Limit Theorem toapproximate the probability that the absolute difference between the true total weightand the measured total weight is more than 50 g.

6.3 Suppose that X and Y are independent random variables, with N(−2, 2) and N(10, 3)distributions, respectively. State the distributions of the five random variables −X, 2X,5X + Y , Y −X − 5 and (X + Y )/2.

6.4 (a) Find the moment generating function of X ∼ Unif(−1, 1).

(b) By expanding the mgf as a power series,(i) show that X has mean zero and variance 1/3;(ii) find the third and fourth moments of X.

Page 8: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 7

Continuous random variables III

Issued: Week 7, Friday 25 February.Workshop: Week 8, 28 February–4 March.Hand in: Friday 4 March, by 4 pm.Solutions posted online: Friday 11 March, evening.

7.1 Let X and Y have joint density f(x, y) = c(x+2y) on the rectangle 0 < x < 3, 1 < y < 2.Evaluate c and calculate the marginal densities of X and Y . Are X and Y independent?Find the densities of X, given that Y = 1·25, and of Y , given that X = 2. Confirm thatthe two answers you obtain are indeed densities, that is, that they are non-negative andintegrate to 1. Evaluate EX, EY and cov(X, Y ).

7.2 Let X1, . . . , Xn be random variables denoting n independent bids for an item that is forsale. Suppose each Xi is uniformly distributed between £100 and £200. The seller sellsto the highest bidder.

(a) Find how much, as a function of n, he can expect to make on the sale.Hint: let Y = max(X1, . . . , Xn). Find the distribution function of Y by noting thatY ≤ y if and only if Xi ≤ y for all i.

(b) Work your answer out for small values of n and sketch the result.

7.3 Suppose that Z1, Z2 and Z3 are independent observations from the standard normaldistribution. Let X1 = Z1 + Z2, X2 = Z1 − 2Z2 + 3Z3 and X3 = Z1 − Z2. Find thecovariance and correlation between

(a) X1 and X2,(b) X1 and X3,(c) X2 and X3.

7.4 A rock specimen from a particular area is randomly selected and weighed two differenttimes. Let W denote the actual weight and X1 and X2 the two measured weights. ThusX1 = W + E1 and X2 = W + E2, where E1 and E2 are the two measurement errors.Assume that the Ei are independent of each other and of W and that varE1 = varE2 =σ2E .

(a) Express ρ, the correlation between the two measured weights X1 and X2, in terms ofσ2W , the variance of actual weight, and σ2

E .

(b) Calculate ρ when σW = 1kg and σE = 50 g.

Page 9: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 8 page 1

Linear regression and least squares

Issued: Week 8, Friday 4 March.Workshop: Week 9, 7–11 March.Hand in: Friday 11 March, by 4 pm.Solutions posted online: Friday 18 March, evening.

8.1 The following data are from the paper by O. Klemm & I. C. Ziomas, ‘Urban emissionsmeasured with aircraft’, J. Air & Waste Mgmt. Assoc. 48(1998), 16–25, and record theenrichment of plumes of pollutants along flight paths above Athens. The response variableis ∆NO, the explanatory variable is ∆CO, and the units are parts per billion times secondstimes 100.

∆CO: 50 60 95 108 135 210 214 315 720∆NO: 2·3 4·5 4·0 3·7 8·2 5·4 7·2 13·8 32·1

(a) Find the correlation coefficient of the data.

(b) Calculate the least-squares regression line for these data, and find an estimate of thevariance of the error term.

(c) Plot the points and the regression line on a graph.

(d) Comment on the data and the graph, and carry out any further analysis that yourcomments indicate would be appropriate.

8.2 The article by B. Kroll and M. R. Ramey, ‘Effects of bike lanes on driver and bicyclistbehavior’ (ASCE Transportation Eng. J. 103(1977), 243–256) reports the results of aregression analysis with x the available travel space in feet (a convenient measure ofroadway width, defined as the distance between a cyclist and the centre line) and y theseparation distance between a bike and a passing car (determined by photography). Thedata, for 10 streets with bike lanes, were as follows:

X: 12·8 12·9 12·9 13·6 14·5 14·6 15·1 17·5 19·5 20·8Y : 5·5 6·2 6·3 7·0 7·8 8·3 7·1 10·0 10·8 11·0

(a) Verify that∑

xi = 154·2,∑

yi = 80·0,∑

x2i = 2452·18 and

xiyi = 1282·74.(b) Derive the equation of the estimated regression line.

(c) Plot the points and the regression line on a graph.

(d) What separation distance would you predict for another street that has 15·0 as itsavailable travel space value?

8.3 The following data represent the height (X) in centimetres and weight (Y ) in grammesof a type of plant. A sample of ten plants was taken.

X: 4·7 6·2 6·4 6·9 7·6 7·8 8·1 8·7 9·2 10·4Y : 2·2 4·6 5·0 6·8 9·2 9·2 10·9 13·6 15·9 22·1

(a) Verify that the correlation coefficient of the data is 0·975.(b) Plot the points. Despite how high the correlation coefficient is, the data clearly lieon a curve.

continued . . .

Page 10: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 8 page 2

(c) Weight might be proportional to height3 (e.g. volume) or height2 (e.g. area, for hollowplants). Therefore try fitting a power relationship. Calculate the least-squares regressionline for V := lnY on U := lnX.

(d) Plot the residuals against the values ui.

(e) Express the regression line as an equation giving Y in terms of X, and plot the curveon your original graph.

8.4 A genetic experiment was undertaken to study the competition between two types offemale Drosophila melanogaster (fruit fly) in cages with one male genotype acting as asubstrate. The independent variable X is the time, in days, spent in cages, and thedependent variable Y is the ratio of the numbers of Type 1 to Type 2 females. Thefollowing data were recorded:

X: 17 31 45 59 73Y : 0·2338 0·5804 1·982 3·388 13·01

(a) Plot the points. The relationship is clearly non-linear.

(b) Transform the data appropriately and find the least-squares regression line for thetransformed data.

(c) Plot the residuals and say whether your transformation has made it satisfactory to fita straight line.

(d) Express the regression line as an equation giving Y in terms of X, and plot the curveon your original graph.

Page 11: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 9 page 1

Descriptive statistics and Minitab

Issued: Week 9, Friday 11 March.Workshop: Week 10, 14–18 March.Hand in: Friday 18 March, by 4 pm.Solutions posted online: Friday 29 April, evening.

9.1 In the casino game roulette, if a player bets one unit on red, the probability of winning is18/38 and of losing is 20/38 (in American casinos—European ones are more generous!).Suppose that a player begins with five units and let Y be a player’s maximum capital,before eventually losing their money. The following data are 100 simulations of this valueof Y .

25, 9, 5, 5, 5, 9, 6, 5, 15, 45, 55, 6, 5, 6, 24, 21, 16, 5, 8, 7, 7, 5, 5, 35, 13,9, 5, 18, 6, 10, 19, 16, 21, 8, 13, 5, 9, 10, 10, 6, 23, 8, 5, 10, 15, 7, 5, 5, 24,9, 11, 34, 12, 11, 17, 11, 16, 5, 15, 5, 12, 6, 5, 5, 7, 6, 17, 20, 7, 8, 8, 6, 10,11, 6, 7, 5, 12, 11, 18, 6, 21, 6, 5, 24, 7, 16, 21, 23, 15, 11, 8, 6, 8, 14, 11,6, 9, 6, 10.

(a) Construct an ordered stem-and-leaf plot.

(b) Find the five-number summary and draw a boxplot.

(c) Draw a histogram of the data.

9.2 The following are the ages at death for the 38 American presidents from Washington toFord.

Washington 67, J. Adams 90, Jefferson 83, Madison 85, Monroe 73, J. Q.Adams 80, Jackson 78, Van Buren 79, W. H. Harrison 68, Tyler 71, Polk 53,Taylor 65, Fillmore 74, Pierce 64, Buchanan 77, Lincoln 56, A. Johnson 66,Grant 63, Hayes 70, Garfield 49, Arthur 56, Cleveland 71, B. Harrison 67,Cleveland 71, McKinley 58, T. Roosevelt 60, Taft 72, Wilson 67, Harding57, Coolidge 60, Hoover 90, F. D. Roosevelt 63, Truman 88, Eisenhower78, Kennedy 46, L. Johnson 64, Nixon 81, Ford 93.

(a) Construct a stem-and-leaf plot of the data and describe its shape.

(b) Find the five-number summary, and hence draw a boxplot.

9.3 An article in U.S. Consumer Reports, September 1990, reported the following scores forvarious brands of two types of peanut butter:

Smooth: 56, 44, 62, 36, 39, 53, 50, 65, 45, 40, 56, 68, 41, 30, 40, 50, 56, 30,22.

Crunchy: 62, 53, 75, 42, 47, 40, 34, 62, 52, 50, 34, 42, 36, 75, 80, 47, 56,62.

Construct a comparative stem-and-leaf display by listing stems in the middle of your pageand then displaying the smooth leaves out to the right and the crunchy leaves out to theleft. Also construct boxplots. Use your displays to make a comparison between smoothand crunchy peanut butter, considering shape, spread and location of the distributionsof the scores.

continued . . .

Page 12: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 9 page 2

9.4 Repeat as much as you can of Exercise 8.4 using Minitab. Use two transformations andlook at the residuals in each case. Give reasons for the transformation that you eventuallychoose. Show the Minitab commands that you use clearly.

Page 13: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 10 page 1

Estimation I

Issued: Week 10, Friday 18 March.Workshop: 28 April–5 May.Hand in: Friday 6 May, by 4 pm.Solutions posted online: Friday 13 May, evening.

10.1 The Poisson distribution has been used by traffic engineers as a model for i fi

+

light traffic, based on the rationale that if the rate is approximately constantand the traffic is light (so the individual cars move independently of eachother), the distribution of counts of cars in a given time-interval or space areashould be nearly Poisson. The table on the right records numbers of rightturns in 300 three-minute periods at a specific intersection (D. Gerlough &A. Schuhl, Use of Poisson Distribution in Highway Traffic, Eno Foundationfor Highway Traffic Control, 1955). The table shows the frequency fi ofperiods in which there were i right turns.

(a) The usual formula for sample mean is x :=∑n

i=1 xi/n. However forthe present data-set, in frequency form, the average number of rightturns per 3–minute period is x =

∑12i=0 ifi

/∑12

i=0 fi. Explain why,and calculate x.

(b) Assume that X, the number of right turns in a 3–minute period, hasa Poisson distribution with parameter λ. Find an unbiased estimatorof λ and calculate the estimate for the given data. What is the standard error (i.e.standard deviation) of your estimator? Calculate the estimated standard error.Hint: EX = λ and varX = λ for X Poisson, so E(X) = ? and var(X) = ?

10.2 LetX1, . . . , Xn be independent, each with distribution P (Xi = 1) = p, P (Xi = 0) = 1−p,where p is an unknown parameter satisfying 0 ≤ p ≤ 1.

(a) State the distribution of Y =∑n

i=1 Xi, and give its mean and variance (you mayquote from the Probability Distributions sheet).

(b) Deduce that X = Y/n is an unbiased estimator of p, with variance p(1− p)/n.

(c) Show that E(

X(1− X))

= (n− 1)p(1− p)/n.

(d) Find the value of c so that cX(1−X) is an unbiased estimator of var X = p(1−p)/n.

10.3 Let X1, . . . , Xn be a random sample of size n from the distribution with density

f(x; θ) = θxθ−1 (0 < x < 1, 0 < θ < ∞).

(a) Sketch the graph of this density for θ = 0·5, 1, 2.(b) Show that the maximum-likelihood estimator of θ is given by

θ = − n

ln∏n

i=1 Xi

.

(c)For both the following sets of observations from this distribution, calculate the valuesof the maximum-likelihood estimate and the methods-of-moments estimate for θ.

(i) 0·0256, 0·3051, 0·0278, 0·8971, 0·0739, 0·3191, 0·7379, 0·3671, 0·9763, 0·0102.(ii) 0·4698, 0·3675, 0·5991, 0·9513, 0·6049, 0·9917, 0·1551, 0·0710, 0·2110, 0·2154.

continued . . .

Page 14: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 10 page 2

10.4 Let X1, . . . , Xn be a random sample of size n from the geometric distribution with successparameter p, i.e.

f(x) = p(1− p)x (x = 0, 1, 2, . . .).

(a) Use the method of moments to find a point estimate of p.

(b) Explain in words why this estimate makes sense.

(c) Find a point estimate of p, given the following data:

2, 33, 6, 3, 18, 1, 0, 18, 42, 1, 21, 3, 18, 10, 6, 0, 1, 20, 14, 15.

Page 15: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 11

Estimation II

Issued: Friday 6 May.Workshop: 9–12 May.Hand in: Friday 13 May, by 4 pm.Solutions posted online: Friday 20 May, evening.

11.1 Assume that the yield per acre of a particular variety of soya beans is N(µ, σ2). For arandom sample of n = 5 plots, the yields in bushels per acre were 37·4, 48·8, 46·9, 55·0and 44·0.(a) Give a point estimate for µ.

(b) Find a 90% confidence interval for µ.

11.2 The following observations were made on fracture toughness of a base plate of 18% nickelmaraging steel [J. A. Kies, H. L. Smith, H. E. Romine, H. Bernstein, ‘Fracture testing ofweldments’, ASTM Special Tech. Publ. 381(1965), 328–356]. The observations are in ksi√in and are given in increasing order.

69·5, 71·9, 72·6, 73·1, 73·3, 73·5, 75·5, 75·7, 75·8, 76·1, 76·2,76·2, 77·0, 77·9, 78·1, 79·6, 79·7, 79·9, 80·1, 82·2, 83·7, 93·7.

Calculate a 99% confidence interval for the standard deviation of the fracture toughnessdistribution. Is this interval valid whatever the nature of the distribution? Explain.

11.3 Let Y be the sum of n independent observations from a Pois(θ) distribution. Further letthe prior distribution for θ be Γ(α, λ).

(a) Find the posterior distribution of θ, given Y = y.

(b) Find a point estimate of θ given this value y.

11.4 Suppose that Yi is the result of a Bernoulli trial, with probability θ of success (Yi = 1).If we assign a Unif(0, 1) prior distribution to θ, find the posterior distribution of θ afterthe observation(s)

(a) 1;(b) 0, 1, 1, 0, 0.

Page 16: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 12 page 1

Hypothesis testing I

Issued: Friday 13 May.Workshop: 16–19 May.Hand in: Friday 20 May, by 4 pm.Solutions posted online: Friday 27 May, evening.

12.1 Assume that IQ scores for a certain population are approximately N(µ, 100). To testH0 : µ = 110 against the one-sided alternative H1 : µ > 110 we take a random sample ofsize 16 from this population, and find that the mean of this sample is x = 113·5.(a) Do we accept or reject H0 at the 5% level?

(b) Do we accept or reject H0 at the 10% level?

(c) What is the p-value?

12.2 The calibration of a scale is to be checked by weighing a 5 kg test specimen 10 times.Suppose that the results of different weighings are independent of one another and thatthe weight on each trial is Normally distributed with σ = 0·200 kg. Let µ denote the trueaverage weight reading on the scale.

(a) What hypotheses should be tested?

(b) Suppose the scale is to be re-calibrated if either x ≥ 5·1629 or x ≤ 4·8371. Express

this test procedure in terms of the standardised test-statistic Z = (X − 5)/√

σ2/n.

(c) What is the probability that re-calibration is carried out when it is actually unneces-sary?

(d) Which type of error would that be?

(e) Using the test of (b), what would you conclude from the sample data below?

4·981, 5·006, 4·857, 5·107, 4·888, 4·793, 4·728, 5·439, 5·214, 5·190

12.3 Assume that the birth weight in grammes of a baby born in the US is N(3315, 5252) forboys and girls combined. Let X be the weight of a baby girl who is born at home inOttawa County and assume that X ∼ N(µ, σ2).

(a) Using 11 observations of X, give the test statistic and critical region for testingH0 : µ = 3315 against the alternative H1 : µ > 3315 (home-born girls in Ottawa Countyare heavier) with significance level α = 0·01.(b) Calculate the value of the test statistic and give your conclusion using the followingweights:

3119, 2657, 3459, 3629, 3345, 3629, 3515, 3856, 3629, 3345, 3062.

(c) What is the approximate p-value?

(d) Give the test statistic and critical region for testing H0 : σ2 = 5252 against theAlternative Hypothesis H1 : σ

2 < 5252 at significance level α = 0·05.(e) Calculate the test statistic and state your conclusions.

(f) Find the approximate p-value for this second test.

continued . . .

Page 17: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 12 page 2

12.4 Copper values (µg Cu/100ml blood) were determined for cattle grazing in an area knownto have well-defined molybdenum anomalies (metal values in excess of normal regionalvariation) and for cattle grazing in a non-anomalous area [L. Thornton, G. F. Kershaw,M. K. Davies, ‘An investigation into copper deficiency in cattle in the southern Pennines,I’, J. Agricultural Sci. 78(1972), 157–163], resulting in sX = 21·5 (m = 48) for theanomalous area and sY = 19·45 (n = 45) for the non-anomalous area. Test at significancelevel ·10 for equality of population variances.

Page 18: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 13

Hypothesis testing II

Issued: Friday 20 May.Workshop: 23–26 May.Hand in: Friday 27 May, by 4 pm.Solutions posted online: Friday 3 June, evening.

13.1 It was claimed that 75% of dentists recommend a certain design of toothbrush. A con-sumer group doubted this claim and decided to test H0 : p = 0·75 against H1 : p < 0·75,where p is the proportion of dentists who recommend this design. A survey of 390 dentistsfound that 273 recommended the design.

(a) Would you reject the Null Hypothesis at the 5% level?

(b) Would you reject the Null Hypothesis at the 1% level?

(c) Find the p-value.

13.2 In the Michigan Daily Lottery, each week-day a three-digit integer is generated, one digitat a time. For i = 0, 1, . . . , 9 let pi denote the probability that the digit generated is i.Use the following 50 digits to test H0 : p0 = p1 = · · · = p9 = 0·1, using α = 0·05.

1, 6, 9, 9, 3, 8, 5, 0, 6, 7, 4, 7, 5, 9, 4, 6, 5, 6, 4, 4, 4, 8, 0, 9, 3,

2, 1, 5, 4, 5, 7, 3, 2, 1, 4, 6, 7, 1, 3, 4, 4, 8, 8, 6, 1, 6, 1, 2, 8, 8.

13.3 The article by S. M. Specht, R. J. Tushup and C. N. Deatrick, ‘Psychiatric and alcoholicadmissions do not occur disproportionately close to patients’ birthdays’ [PsychologicalReports 71(1992), 944–946], focusses on the existence of any relationship between date ofpatient admission for treatment of alcoholism and patient’s birthday. Assuming a 365–dayyear (i.e. excluding leap year), in the absence of any relation, a patient’s admission dateis equally likely to be any one of the 365 possible days. The investigators established fourdifferent admission categories: (1) within 7 days of birthday, (2) between 8 and 30 days,inclusive, from the birthday, (3) between 31 and 90 days, inclusive, from the birthday,and (4) more than 90 days from the birthday. A sample of 200 patients gave observedfrequencies of 11, 24, 69 and 96 for categories 1, 2, 3 and 4 respectively. State and testthe relevant hypotheses using a significance level of 0·01.

13.4 The article by J. Levy & J. M. Levy, ‘Human L > R L = R L < R

Men 2 10 28Women 55 18 14

lateralization from head to foot: sex-relatedfactors’ (Science 200(1978), 1291–1292) reportsfor a sample of right-handed men and women thenumbers of individuals whose feet were the same size, the numbers with a bigger left footthan right (a difference of half a shoe size or more), and the numbers with a bigger rightfoot than left.

(a) Do the data indicate that gender has a strong effect on the development of footasymmetry? State the appropriate Null and Alternative Hypotheses and test at levelα = ·01.(b) If there is evidence of an effect, state where it primarily lies.

Page 19: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 14 page 1

Hypothesis testing III

Issued: Friday 27 May.Workshop: 30 May–2 June.Hand in: Friday 3 June, by 4 pm.Solutions posted online: Friday 10 June, evening.

14.1 The length of life of brand X light bulbs is assumed to be N(µX , 784). The length of lifeof brand Y light bulbs is assumed to be N(µY , 627). If a random sample of m = 56 brandX bulbs yielded a mean life of x = 937·4 hours and an independent random sample ofsize n = 57 brand Y bulbs yielded a mean life of y = 988·9 hours, find a 90% confidenceinterval for µX − µY .

14.2 The article by K. Vermeer, F. A. J. Armstrong and D. R. M. Hatch, ‘Mercury in aquaticbirds at Clay Lake, Western Ontario’ (J. Wildlife Mgmt. 37(1973), 58–61) reported thefollowing data on mercury residues in breast muscles:

Mallards: m = 16, x = 6·13, sx = 2·40,Blue-winged teals: n = 17, y = 6·46, sy = 1·73.

(a) Assuming that X ∼ N(µX , σ2) and Y ∼ N(µY , σ

2), find a 95% confidence interval forµX − µY .

(b) Deduce, without further calculation or use of tables, the result of the test at signific-ance level 5% for the Null Hypothesis µX = µY .

(c) Repeat (a) without the assumption of common variance of X and Y , i.e. define Tusing a non-pooled estimate of variance:

T :=X − Y − (µX − µY )√

S2X/m+ S2

Y /n,

and use Welch’s formula for the degrees of freedom r, as given in lectures.

(d) Test at the 10% level whether the variances of residues in the two populations areequal.

14.3 The driver of a diesel-powered car decided to test the quality of three types of dieselfuel, based upon miles per gallon. Test the Null Hypothesis that the three means areequal using the data below, using the significance level α = 0·05 and making the usualassumptions.

Brand A: 38·7, 39·2, 40·1, 38·9Brand B: 41·9, 42·3, 41·3Brand C: 40·8, 41·2, 39·5, 38·9, 40·3

14.4 Different sizes of nails are packaged in one-pound boxes. Let Xi for i = 1, 2, 3, 4, 5 bethe weight of a box with nail size 4C, 8C, 12C, 16C, 20C respectively, these being thesizes from smallest to largest. It is desired to test whether the mean weights of nails inthe 4C, 8C, 12C, 16C and 20C boxes are equal. Assume that the distribution of Xi is

N(µi, σ2).

(a) Using random samples of size 7, give a critical region for a test with α = 0·05.

continued . . .

Page 20: All the Probability and Statistics Sheets

G5098 Probability and Statistics 2010–11: Exercises 14 page 2

(b) Construct an ANOVA table, and state your conclusions using the following data.

X1 : 1·03, 1·04, 1·07, 1·03, 1·08, 1·06, 1·07X2 : 1·03, 1·10, 1·08, 1·05, 1·06, 1·06, 1·05X3 : 1·03, 1·08, 1·06, 1·02, 1·04, 1·04, 1·07X4 : 1·10, 1·10, 1·09, 1·09, 1·06, 1·05, 1·08X5 : 1·04, 1·06, 1·07, 1·06, 1·05, 1·07, 1·05

(c) Construct boxplots on the same diagram for each type of nail, and comment.