36-217 Probability Theory and Random Processes...

36-217 Probability Theory and Random Processes

Homework Solutions

Sangwon (Justin) Hyun

June 22, 2016

Homework 1 (due Thursday May 19th)

1. (25 pts) Consider the Normal distribution N (µ, σ2) that we mentionedin class and a sample X1, . . . , Xn of size n obtained from this distri-bution. Which of the following are statistics and why?

(a) X

(b)√n(X−µ)S

(c)∏ni=1Xi

(d) maxX1, . . . , Xn, σ2

(e) 1X3>0(X3)

Extra credit (5pt) What parameter do you think X is trying toestimate? Justify your answer.

Solution: (a), (c), and (e) are statistics since they are functions de-pending only on the data. (b) and (d) are not statistics since theyalso depend on the parameters µ and σ2. Extra Credit: The sam-ple mean is a good estimate of the actual mean of the distribution µ.This is because, intuitively, we can expect the sample mean of manyrepeated experiments to well represent the underlying process. We willlearn more about this when we learn about the central limit theorem.

2. (25 pts) Show that, for any A,B1, . . . , Bn ⊂ Ω with ∪ni=1Bi = Ω, wehave A = (A ∩B1) ∪ · · · ∪ (A ∩Bn).

Solution: We have A = A ∩ Ω and Ω = ∪ni=1Bn. Thus,

A = A ∩ Ω = A ∩ (∪ni=1Bi)

1

Using the distributive property of intersection, we get

A = ∪ni=1(A ∩Bi).

3. (25 pts) Consider the following experiments:

• you have three keys and exactly one of them opens the doorthat you want to open; at each attempt to open the door, youpick one of the keys completely at random and try to open thedoor (at each attempt you pick one of the three keys at random,regardless of the fact that you may already have tried some inprevious attempts). You count the number of attempts neededto open the door. What is the sample space for this experiment?

• you have an urn with 1 red ball, 1 blue ball and 1 white ball. Fortwo times, you draw a ball from the urn, look at its color, take anote of the color on a piece of paper, and put the ball back in theurn. What is the sample space for this experiment? Write outthe set corresponding to the event ‘you observe at least 1 whiteball’.

Solution:

• The sample space is the set of positive integers Ω = 1, 2, 3, 4, 5, . . . .• Let R: red, B: blue, and W: white. The sample space is the set

of all possible pairs of observed colors

Ω = RR,RB,RW,BR,BB,BW,WR,WB,WW.

Let A be the event ‘you observe at least 1 white ball’. ThenA = RW,BW,WR,WB,WW.

4. (25 pts) Think of an example of a random process and describe whatthe quantity of interest Xt is and what the variable t represents inyour example (time, space, . . . ?) Solution: Graded according to howspecific/believable the example is.

5. Extra credit (10 pts) Show that the expression of the sample variance

S2 =1

n− 1

n∑i=1

(Xi − X)2 (1)

2

is equivalent to

S2 =1

n− 1

n∑i=1

X2i −

1

n

(n∑i=1

Xi

)2 . (2)

Solution:We have

(n− 1)S2 =n∑i=1

(Xi − X)2 =n∑i=1

(X2i − 2XiX + X2)

=n∑i=1

X2i − 2X

n∑i=1

Xi + nX2 =n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2 =

n∑i=1

X2i − n

1

n2

(n∑i=1

Xi

)2

.

Thus,

S2 =1

n− 1

n∑i=1

X2i −

1

n

(n∑i=1

Xi

)2 .

3

Homework 2 (due Thursday, May 19)

1. (12 pts) A secretary types 4 letters to 4 people and addresses the 4 en-velopes. If he/she inserts the letters at random, what is the probabilitythat exactly 2 letters will go in the correct envelopes?

Solution:Call the letters A, B, C, D. What is the probability that he/she cor-rectly address letters A and B? That is 1/4! (There is exactly onecase where A and B are matched to their envelopes and the other twoaren’t! this case is the four letters addressed to (A,B,D,C) sent topersons (A,B,C,D), in that order. This is one out of the 4! possibleorderings). Now, one has C4

2 possible choices of 2 letters out of 4 to beaddressed correctly. Thus, the probability that exactly 2 letters willgo in the correct envelopes is C4

2 ∗ 1/4! = 1/4.

Alternative solution We use counting. Define F to be the event thatthere are exactly two chosen correctly, while defining elements of thesample space Ω = ABCD,ACBD,ADBC, · · · as all addressees onthe letters sent to person A,B,C,D, respectively. Then, |Ω| = 4!, and|F | =

(42

)· 1. How did we count |F |? Every event in F will have two

addressees (let’s say A and B) correctly addressed (there are(

42

)such

scenarios) and, out of the two possible cases in this scenario (ABCDand ABDC), only one of them is contained in F, since the other hasnot 2 but 4 correct addressees! The probability of interest, P (F ), is

now equal to |F ||Ω| =(42)4! .

2. (12 pts, 6 pts each) A labor dispute has arisen concerning the distri-bution of 20 laboreres to four different construction jobs. The first jobis considered very undesirable, and requires 6 laborers. The second,third and fourth utilized 4,5,5 laborers respectively. The accusationwas made because four people in the same race out of the twenty wereall placed in job 1. In considering whether the assignment representedinjustice, an investigation is launched to see the probability of theobserved event. (6 points each)

(a) What is the number of points in the sample space?

(b) Find the probability of the observed event, if it is assumed thatthe laborers are randomly assigned to all jobs

Solution: The number of ways of assigning the 20 laborers to the fourjobs is equal to the number of ways of partitioning the n = 20 into

4

four groups of sizes n1 = 6, n2 = 4, n3 = n4 = 5. N =(

nn1 n2 n3 n4

)=

20!6!4!5!5!

By a random assignment of laborers to the jobs, we mean that eachof the N sample points has probability equal to 1/N. If A denotes theevent of interest and na the number of sample points in A, the sumof the probabilities of the sample points in A is P (A) = na(1/N) =na/N. The number of sample points in A, na, is the number of waysof assigning laborers to the four jobs with the 4 members of the ethnicgroup all going to job 1. The remaining 16 laborers need to be assignedto the remaining jobs. Because there remain two openings for job 1,this can be done in ways. It follows that

na =16!

2!4!5!5!, P (A) =

naN

= 0.0031

Thus, if laborers are randomly assigned to jobs, the probability thatthe 4 members of the ethnic group all go to the undesirable job is verysmall. There is reason to doubt that the jobs were randomly assigned.

3. (12 pts) A fair die is tossed 6 times, and the number on the uppermostface is recorded each time. What is the probability that the numbersrecorded are in the first four rolls are 1, 2, 3, 4 in any order? It isuseful to invoke the counting rules.

Solution:First, notice that all outcomes of the six dice rolls are equally likely.

The denominator The number of possible outcomes of the experi-ment is 66 (in the first roll there are 6 possible outcomes, in the secondroll there are again 6 possible outcomes, and so on).

The numerator The number of ways in which the numbers 1, 2, 3,4 can appear in any order is 4!. This can be understood as countingthe number of ways of putting 1,2,3 or 4 into four slots exactly oneeach. The first slot has four options, the second slot has only threeoptions (since the first slot took one of the four candidates), and soforth. Then, the last two slots can take anything, so there are 62 waysof this happening.

Numerator/denominator gives 4!62/66.

5

Alternatively, use multiplication of conditional probabilities. The probability of the first dice roll resulting in one of 1, 2, 3, 4 is4/6. The second roll can now only take one of the three remaining;this has 3/6, and so forth. The last two slots do not matter. So, theprobability is

4

6· 3

6· 2

6· 1

6

Lastly, this can be understood from a series of conditional probabili-ties multiplied using Bayes’ rule (which we will learn later this week)!Denote Ai’th roll the event that the i’th roll fits the description of theevent that we are interested in! Then:

P (A1, A2, A3, A4) = P (A1)P (A2|A1)P (A3|A1, A2)P (A4|A1, A2, A3) =4

6·36·26·16

4. (12 pts, 6 pts each)

(a) What is the number of ways of selecting two random applicantsout of five?

(b) What is the probability that exactly one of the two best appli-cants appear in a random selection of two out of five?

Solution:

(a)(

52

)= 5!

2!3! = 10

(b) Let na denote the number of sample points in A. Then na equalsthe number of ways of selecting one of the two best (call thisnumber m) times the number of ways of selecting one of the threelow-ranking applicants (call this number n). Then m = 2 , n =3 , and applying the mn rule,

na =

(2

1

)(3

1

)=

2!

1!1!

3!

1!2!= 6

5. (12 pts) A population of voters contains 40% Republicans and 60%Democrats. It is reported that 30% of the Republicans and 70% of theDemocrats favor an election issue. A person chosen at random fromthis population is found to favor the issue in question. What is theconditional probability that this person is a Democrat?

Solution:Let D and R denote the events ‘the person is a Democrat’ and ‘the

6

person is a Republican’, respectively. Let F denote the event ‘theperson favors the issue’. We have P (D) = 0.6, P (R) = 0.4, P (F |D) =0.7, and P (F |R) = 0.3. Then, using Bayes’ Rule,

P (D|F ) =P (F |D)P (D)

P (F )=

P (F |D)P (D)

P (F |D)P (D) + P (F |R)P (R)

=0.7 ∗ 0.6

0.7 ∗ 0.6 + 0.3 ∗ 0.4= 7/9.

6. (4 pts) Use the axioms of probability to show that for any A ⊂ S wehave P (Ac) = 1− P (A).

Solution:We have Ω = A∪Ac. Since A and Ac are disjoint, P (Ω) = P (A∪Ac) =P (A)+P (Ac). Thus, P (Ac) = P (Ω)−P (A). Since P (Ω) = 1, we thenhave P (Ac) = 1− P (A).

7. (8 pts) Let B ⊂ A. Show that P (A) = P (B) + P (A ∩Bc).

Solution:Remember from Homework 1 that we have A = (A ∩ B) ∪ (A ∩ Bc).Furthermore, these two sets are disjoint: A∩B is a subset of B, A∩Bc

is a subset of Bc, and obviously B and Bc are disjoint. Therefore,P (A) = P (A∩B) +P (A∩Bc). Since B ⊂ A, we have A∩B = B andthus P (A) = P (B) + P (A ∩Bc).

8. (10 pts) Show that for A,B ⊂ Ω, P (A∩B) ≥ max0, P (A)+P (B)−1.Solution:Because the probability of any event is a non-negative number, it isclear that P (A ∩ B) ≥ 0. Furthermore, recall that in general for twoevents A and B we have

P (A ∪B) = P (A) + P (B)− P (A ∩B)

from which it follows that

P (A ∩B) = P (A) + P (B)− P (A ∪B)

Since P (A ∪B) ≤ 1, we have

P (A ∩B) = P (A) + P (B)− P (A ∪B) ≥ P (A) + P (B)− 1.

Thus, since P (A ∩ B) ≥ 0 and P (A ∩ B) ≥ P (A) + P (B) − 1, weconclude that P (A ∩B) ≥ max0, P (A) + P (B)− 1.

7

9. (10 pts) Let A,B ⊂ Ω be such that P (A) > 0 and P (B) > 0. Showthat

• if A and B are disjoint, then they are not independent

• if A and B are independent, then they are not disjoint.

Solution:

• Suppose that A and B are independent. Then, P (A ∩ B) =P (A)P (B) > 0 (since P (A) > 0 and P (B) > 0 by assumption).However, A and B are disjoint by assumption, i.e. A ∩ B = ∅which implies that P (A ∩B) = 0, and we reach a contradiction.

• Suppose that A and B are disjoint, i.e. A ∩B = ∅. Then P (A ∩B) = 0. However, A and B are independent by assumption, i.e.P (A∩B) = P (A)P (B) > 0 (recall that P (A) > 0 and P (B) > 0),and again we reach a contradiction.

10. (10 pts) Let A,B ⊂ Ω be such that P (A) > 0 and P (B) > 0. Proveor disprove the following claim: in general, it is true that P (A|B) =P (B|A). If you believe that the claim is true, clearly explain yourreasoning. If you believe that the claim is false in general, can youfind an additional condition under which the claim is true?

Solution:The claim is false in general. A simple counter example: take Aarbitrary with 0 < P (A) < 1 and B = Ω. We have P (A|B) =P (A|Ω) = P (A) while P (B|A) = P (Ω|A) = 1. Notice, however, thatif P (A) = P (B), then

P (A|B) =P (A ∩B)

P (B)=P (A ∩B)

P (A)= P (B|A).

Therefore, under this additional condition, the claim is true.

11. (10 pts) Let A,B ⊂ Ω be independent events. Show that the followingpairs of events are also independent:

• A,Bc

• Ac, B• Ac, Bc.

Solution:

8

• We have A = (A∩B)∪ (A∩Bc) and the two events are disjoint.Therefore,

P (A ∩Bc) = P (A)− P (A ∩B) = P (A)− P (A)P (B)

= P (A)[1− P (B)] = P (A)P (Bc)

which implies that A and Bc are independent.

• To show Ac and B are independent, just switch the roles of Aand B in the argument above.

• Once again we can write Ac = (Ac ∩Bc) ∩ (Ac ∩B) and the twoevents are disjoint. Hence, as in the first part,

P (Ac ∩Bc) = P (Ac)− P (Ac ∩B) = P (Ac)− P (Ac)P (B)

= P (Ac)[1− P (B)] = P (Ac)P (Bc)

implying that Ac and Bc are also independent (notice that herewe used the previous result on the independence of Ac and B).

9

Homework 3 (due Tuesday, May 24th)

1. (4 pts) Consider three urns. Urn A contains 2 white and 3 red balls,urn B contains 8 white and 4 red balls and urn C contains 1 whiteand and 3 red balls. If one ball is selected from each urn, what isthe probability that the ball chosen from urn A was white, given thatexactly 2 white balls were selected?

Solution:Let A a random variable representing the color of the ball from urn A.A takes values W,R for white and red balls, resp ectively. Let B andC be defined similarly for urns B,C. Let NW be a random variablethat represents the number of white balls chosen. In this case, NW

can be 0, 1, 2, or 3. Using Bayes Rule,

P (A = W |NW = 2) =P (NW = 2|A = W )P (A = W )

P (NW = 2|A = W )P (A = W ) + P (NW = 2|A = R)P (A = R)

P (A = W ) = 2/(2 + 3) = 2/5, P (A = R) = 3/(2 + 3) = 3/5

since there are 2 white balls and 3 red balls in urn A. There are twoways to get 2 white balls given that A = W . Either B = W andC = R, or B = R and C = W . The probability of this happening isP (NW = 2|A = W ) = P (B = W ∩ C = R) +P (B = R ∩ C = W ) =P (B = W )∗P (C = R)+P (B = R)∗P (C = W ) = 8

12 ∗ 34 + 4

12 ∗ 14 = 7

12 .Given that A = R, the only way to get 2 white balls is if both B = Wand C = W . The probability of this happening is P (NW = 2|A =R) = P (B = W ∩ C = W ) = P (B = W ) ∗ P (C = W ) = 8

12 ∗ 14 = 1

6 .Thus,

P (A = W |NW = 2) =P (NW = 2|A = W )P (A = W )

P (NW = 2|A = W )P (A = W ) + P (NW = 2|A = R)P (A = R)

=712 ∗ 2

57

1 2 ∗ 25 + 1

6 ∗ 35

=7

10

10

2. (4 pts) Consider the random variable X with p.m.f.

p(x) =

0.1 if x = 0

0.9 if x = 1

0 if x /∈ 0, 1.

Draw the p.m.f. of X. Compute the c.d.f. of X and draw it.

Solution:The c.d.f. of X is

F (x) =

0 if x < 0

p(0) if 0 ≤ x < 1

p(0) + p(1) if x ≥ 1

=

0 if x < 0

0.1 if 0 ≤ x < 1

1 if x ≥ 1.

=

0 if x < 0

0.1 if 0 ≤ x < 1

0.1 + 0.9 if x ≥ 1.

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

1) p.m.f.

x

p(x)

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

1) c.d.f.

x

F(x)

3. (10 pts) Let X be a continuous random variable with p.d.f.

f(x) = 3x21[0,1](x).

Compute the c.d.f. of X and draw it. What is P (X ∈ (0.2, 0.4))?What is P (X ∈ [−3, 0.2))?

Solution:

11

The c.d.f. of X is

F (x) =

∫ x

−∞f(x) dx =

0 if x < 0∫ x

0 3y2 dy if 0 ≤ x < 1

1 if x ≥ 1

=

0 if x < 0

y3∣∣x0

if 0 ≤ x < 1

1 if x ≥ 1

=

0 if x < 0

x3 if 0 ≤ x < 1

1 if x ≥ 1.

-2 -1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

2) c.d.f.

x

F(x)

We have

P (X ∈ (0.2, 0.4)) = F (0.4)− F (0.2) = 0.43 − 0.23 = 0.056

and

P (X ∈ [−3, 0.2)) = F (0.2)− F (−3) = 0.23 − 0 = 0.008.

4. (10 pts) Consider again the exercise of page ?? on the defective com-ponents. Compute the c.d.f. of X and use it (possibly together withthe p.m.f. of X) to compute the following probabilities:

• P (2 ≤ X ≤ 5)

• P (X > 3)

• P (0 < X < 4).

12


F (x) =

0 if x < 2

p(2) if 2 ≤ x < 3

p(2) + p(3) if 3 ≤ x < 4

p(2) + p(3) + p(4) if x ≥ 4

=

0 if x < 2

1/6 if 2 ≤ x < 3

1/6 + 1/3 if 3 ≤ x < 4

1/6 + 1/3 + 1/2 if x ≥ 4

=

0 if x < 2

1/6 if 2 ≤ x < 3

1/2 if 3 ≤ x < 4

1 if x ≥ 4.

Thus, we have

P (2 ≤ X ≤ 5) = F (5)− F (2) + p(2) = 1− 1/6 + 1/6 = 1

P (X > 3) = 1− P (X ≤ 3) = 1− F (3) = 1− 1/2 = 1/2

P (0 < X < 4) = F (3)− F (0) = 1/2− 0 = 1/2.

5. (10 pts) Consider the following c.d.f. for the random variable X:

F (x) = P (X ≤ x) =

0 if x < 018

√x if 0 ≤ x < 1

−38 + 1

2x if 1 ≤ x < 32

−32 + 5

4x if 32 ≤ x < 2

1 if x ≥ 2.

What is the p.d.f. of X?

Solution:Notice that F is piecewise differentiable. Therefore, the p.d.f. of X is

f(x) =d

dyF (y)

∣∣∣∣y=x

=

0 if x ≤ 01

16√x

if 0 < x < 112 if 1 ≤ x < 3

254 if 3

2 ≤ x < 2

0 if x ≥ 2.

6. (10 pts) Let α ∈ (0, 1). Derive the α-quantile associated to the follow-ing c.d.f.

F (x) =1

1 + e−x.

13

Solution:We need to solve

F (xα) = α

with respect to xα. We have

1

1 + e−xα= α

1

α= 1 + e−xα

− log

(1

α− 1

)= xα

xα = − log1− αα

= logα

1− α.

7. (10 pts) Consider the continuous random variable X with p.d.f. fand its tranformation 1A(X) where A ⊂ R. Show that E(1A(X)) =P (X ∈ A).

Solution:Let g(x) = 1A(x). By definition, the expected value of the randomvariable g(X) is

E(g(X)) =

∫Rg(x)f(x) dx.

In this case, we have

E(g(X)) = E(1A(X)) =

∫R1A(x)f(x) dx =

∫Af(x) dx = P (X ∈ A).

8. (6 pts) Consider a random variable X and a scalar a ∈ R. Using thedefinition of variance, show that V (X + a) = V (X).

Solution:Consider the random variable Y = X + a. We have that E(Y ) =E(X+a) = E(X) +a. Now, by definition, V (Y ) = E[(Y −E(Y ))2] =E[(X + a− (E(X) + a))2] = E[(X − E(X))2] = V (X).

9. ( 6 pts ) A gambling book recommends the following winning strategyfor the game of roulette. It recommends that a gambler bets $1 on red.If red appears (which has probability 18/38 ), then the gambler shouldtake her $1 profit and quit. If the gambler looses this bet (which hasprobability 20/38 of occurring), she should make additional $1 bets onred on each of the next two spins of the roulette wheel and then quit.Let X denote the gamblers winning when she quits.

14

(a) Find P (X ≥ 0)

(b) Are you convinced that this is a winning strategy? Explain youranswer.

(c) Find E[X].

Solution:

(a) P (X ≥ 0) corresponds to (i) the gambler wins at first and quits.(ii)the gambler keeps gambling after first game, and wins for thelast two games. P (X ≥ 0) = 18/38 + 20/38 ∗ (18/38)2 = 0.592

(b) No. Winning means winning 1, but losing means losing 1 or 3.The argument is complete following E(X) < 0

(c) P (X = 1) = 18/38 + 20/38 ∗ (18/38)2 = 0.592, P (X = −1) =20/38 ∗ 18/38 ∗ 20/38 ∗ 2, P (X = −3) = (20/38)3. Thus E(X) =−0.108.

10. (10 pts) Let X be a random variable with p.d.f. f(x) = 1[0,1](x).

• Draw the p.d.f. of X.

• Compute the c.d.f. of X and draw it.

• Compute E(3X).

• Compute V (2X + 100).


F (x) =

∫ x

−∞f(y) dy =

0 if x < 0∫ x

0 dy if 0 ≤ x < 1

1 if if x ≥ 1

=

0 if x < 0

x if 0 ≤ x < 1

1 if if x ≥ 1

15

-2 -1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

9) p.d.f.

x

f(x)

-2 -1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

9) c.d.f.

x

F(x)

We have that E(X) =∫R xf(x) dx =

∫ 10 x ∗ 1 dx = 1

2 x2∣∣10

= 12 . There-

fore, E(3X) = 3E(X) = 32 . Furthermore,

E(X2) =

∫Rx2f(x) dx =

∫ 1

0x2 ∗ 1 dx =

1

3x3∣∣10

=1

3

hence V (X) = E(X2)− [E(X)]2 = 13 − 1

4 = 112 . Thus, V (2X + 100) =

V (2X) = 4V (X) = 412 = 1

3 .

11. (10 pts) A secretary is given n > 0 passwords to access a computer file.In order to open the file, she starts entering the password in a randomorder and after each attempt she discards the password that she used ifthat password was not the correct one. Let X be the random variablecorresponding to the number of password that the secretary needs toenter before she can open the file.

• Argue that P (X = 1) = P (X = 2) = · · · = P (X = n) = 1/n.

• Find the expectation and the variance of X.

Potentially useful facts:

• ∑nx=1 x = n(n+ 1)/2

• ∑nx=1 x

2 = n(n+ 1)(2n+ 1)/6.

Solution:Let Ri denote the event ‘the secretary chooses the right password

16

on the i-th attempt’. We have P (X = 1) = P (R1) = 1/n sincethe secretary is picking one password at random from the n availablepasswords. The probability that she chooses the right password on thesecond attempt is

P (X = 2) = P (Rc1 ∩R2) = P (Rc1)P (R2|Rc1) =n− 1

n

1

n− 1=

1

n,

i.e. it equals the probability that she chooses the wrong password onthe first attempt and then she chooses the right one on the secondattempt. Similarly,

P (X = 3) = P (Rc1 ∩Rc2 ∩Rc3)

= P (Rc1)P (Rc2|Rc1)P (R3|Rc1 ∩Rc2) =n− 1

n

n− 2

n− 1

1

n− 2

=1

n,

and then it is clear that P (X = i) = 1n for all i ∈ 1, . . . , n. The

expectation of X is therefore

E(X) =n∑x=1

xP (X = x) =n∑x=1

x1

n

=1

n

n∑x=1

x =1

n

n(n+ 1)

2

=n+ 1

2.

To compute the variance of X we also need

E(X2) =n∑x=1

x2P (X = x) =1

n

n∑x=1

x2

=1

n

n(n+ 1)(2n+ 1)

6=

(n+ 1)(2n+ 1)

6.

Thus,

V (X) = E(X2)− [E(X)]2 =(n+ 1)(2n+ 1)

6− (n+ 1)2

4

=2n2 + n+ 2n+ 1

6− n2 + 2n+ 1

4

=

(1

3− 1

4

)n2 +

(1

2− 1

2

)n+

1

6− 1

4

=1

12n2 − 1

12=n2 − 1

12.

17

12. (10 pts) Consider the positive random variable X (i.e. P (X > 0) = 1)with expectation E(X) = 1 and variance V (X) = 0.002, and is sym-metric. Use Markov’s inequality for a lower bound, and Tchebysheff’sinequality to provide an upper bounds for the probability of the eventX ≤ 0.1. (Hint: think about what allows you to quantify the proba-bility that X ≤ 0.1 or X ≥ 1.9.) Are these bounds useful in this case?Comment on the bounds.

Solution:

(a) Lower Bound: Markov’s inequality yields

P (|X| ≥ 0.1) ≤ E|X|0.1

.

Hence,

P (|X| < 0.1) > 1− E|X|0.1

.

Since X is positive, this is equivalent to

P (X < 0.1) > 1− E(X)

0.1= 1− 10 = −9.

The bound is valid and correct! The probability of any eventis a non-negative number, so it is true that P (X < 0.1) > −9.However, this bound is uninformative. It gives us no additionalinformation about the distribution of X.

There is no straightforward way to apply Tchebyshev’s inequalityfor a lower bound, since it quantifies the union of the tail prob-abilities away from the mean of the random variable (and notzero).

(b) Upper bound: We use the Chebyshev’s inequality with k = 0.9,to have

P (|X−E(X)| ≥ 0.9) = P (|X−1| ≥ 0.9) = P (X ≤ 0.1orX ≥ 1.9) < 0.002/0.9 ' 1/500

This gives the probability associated with both ‘tails’ of the dis-tribution, X ≤ 0.1 or X ≥ 1.9. Then, we notice that (using the‘union bound’, from lecture)

P (X ≤ 0.1) ≤ P (X ≤ 0.1 ∪ X ≥ 1.9) ≤ 1.111

18

(or even use that the two events are disjoint, so that P (X ≤0.1 ∪ X ≥ 0.9) = P (X ≤ 0.1) + P (X ≥ 1.9) ≥ P (X ≥0.1).). This is

P (X ≤ 0.1) ≤ P (X ≤ 0.1 ∪ X ≥ 1.9) ≤ 0.002/0.9

which seems like an informative bound! The lesson: when youknow the variance (i.e. if you have more information), you canuse Tchebyshev to better quantify a ‘tail’ probability.

19

Homework 4 (due Thursday, May 28th)

1. (6 pts) An oil exploration firm is formed with enough capital to financeten explorations. The probability of a particular exploration beingsuccessful is 0.1. Assume the explorations are independent. Find themean and variance of the number of successful explorations.

Solution:Let n ∈ Z+ and p ∈ [0, 1]. For any i ∈ 1, . . . , n, let Xi be theBernoulli(p) random variable describing the success or failure of thei-th exploration of the oil exploration firm:

Xi =

1 if the i-th exploration is successful

0 otherwise.

The number of successful explorations is

Y =n∑i=1

Xi .

Since the explorations are assumed to be independent, it follows thatY ∼ Binomial(n, p) (recall that the sum of n independent Bernoulli(p)random variables is a random variable itself and its distribution isBinomial(n, p)). In this exercise n = 10 and p = 0.1 so we haveE(Y ) = np = 10 ∗ 0.1 = 1 and V (Y ) = np(1− p) = 1 ∗ 0.9 = 0.9.

2. (8 pts) Ten motors are packaged for sale in a certain warehouse. Themotors sell for USD 100 each, but a double-your-money-back guaranteeis in effect for any defectives the purchaser may receive (i.e. if thepurchaser receives one defective motor, the seller has to pay USD 200back to the purchaser). Find the expected net gain for the seller if theprobability of any one motor being defective is 0.08 (assuming thatthe quality of any one motor is independent of that of the others).

Solution:Let n be the number of motors that are packaged for sale in the ware-house and c be the price of each motor. For any i ∈ 1, . . . , n let Di

denote the Bernoulli(p) random variable describing whether the i-thmotor is defective or not:

Di =

1 if the i-th motor is defective

0 otherwise.

20

Everytime a motor is sold the seller gains c dollars, but if the motorturns out to be defective the seller has to pay 2c dollars back to thebuyer because of the double-your-money-back guarantee. Assumingthat the quality of each motor is independent of the quality of everyother motor in stock, the net gain of the seller is the random variable

G = cn− 2cn∑i=1

Di

where∑n

i=1Di ∼ Binomial(n, p) and the expected net gain is theexpectation of G:

E(G) = E

(cn− 2c

n∑i=1

Di

)= cn−2cE

(n∑i=1

Di

)= cn−2cnp = cn(1−2p).

Since c = USD 100, n = 10 and p = 0.08 we have that

E(G) = USD 100 ∗ 10 ∗ (1− 2 ∗ 0.08) = USD 1000 ∗ (0.84) = USD 840.

3. (6 pts) Consider again Exercise 11 of Homework 3. Suppose this timethat the secretary does not discard a password once she tried it anddiscovered that the password is wrong. What is the distribution of therandom variable X corresponding to the number of attempts neededuntil she enters the correct password? What are the mean and thevariance of X?

Solution:We can model the sequence of attempts using a sequence of random

variables X1, X2, . . .i.i.d.∼ Bernoulli(p) where p = 1/n. The number of

attempts needed to enter the correct password for the first time thencorresponds to a random variable Y ∼ Geometric(p). Its expectationis E(Y ) = 1/p = n and its variance is V (Y ) = (1− p)/p2 = n(n− 1).

4. (8 pts) A geological study indicates that an exploratory oil well shouldstrike oil with probability 0.2.

• What is the probability that the first strike comes on the thirdwell drilled?

• What is the probability that the third strike comes on the seventhwell drilled?

• What assumptions did you make to obtain the answers to parts(a) and (b)?

21

• Find the mean and variance of the number of wells that must bedrilled if the company wants to set up three producing wells.

Solution:Let X be the random variable describing the number of the trial onwhich the r-th oil well is observed to strike oil. X is distributed ac-cording to NBinomial(r, p) and its probability mass function is

P (X = x) =

(x−1r−1

)pr(1− p)x−r if x = r, r + 1, . . .

0 otherwise.

• The probability that the first strike comes on the third well drilledis

P (X = 3) =

(3− 1

1− 1

)p1(1− p)3−1 = p(1− p)2

which is equal to the probability of observing the first success onthe third trial (notice that this is really the same as doing thecomputation with a Geometric(p) distribution since here r = 1).p = 0.2, therefore we have

P (X = 3) = 0.2(0.8)2 = 0.128.

• Here r = 3, so the probability that the third strike comes on theseventh well drilled is

P (X = 7) =

(7− 1

3− 1

)p3(1− p)7−3 =

(6

2

)p3(1− p)4

=

(6

2

)0.23 ∗ 0.84 ≈ 0.049.

• In order to apply the Negative Binomial model we need to makesome assumptions. In particular, we have to assume that all thetrials are identical and independent Bernoulli trials (i.e. everytime a well is drilled the fact that it strikes oil or not does notdepend on what happens at other wells and the probability thata well strikes oil is the same for all wells).

• Here we want to compute E(Y ) and V (Y ) when r = 3, namelythe expectation and the variance of the number of the trial onwhich the third well is observed to strike oil. We have

E(Y ) =r

p=

3

0.2= 15

V (Y ) =r(1− p)p2

=3 ∗ 0.8

0.22= 60.

22

5. (6 pts) Cards are dealt at random and without replacement from astandard 52 card deck. What is the probability that the second kingis dealt on the fifth card?

Solution:The probability of the event that the second king is dealt on the fifthcard is equivalent to the probability of the intersection of the twoevents A=“exactly one king is dealt in the first four cards” and B =“exactly a king is dealt on the fifth card”. We have

P (A) =

(41

)(483

)(524

)and

P (B|A) =

(31

)(450

)(481

)so

P (A∩B) = P (A)P (B|A) =

(41

)(483

)(524

) (31

)(450

)(481

) =4 ∗ 17296

270725

3 ∗ 1

48≈ 0.016.

We can also interpret the problem in the following way. Let K1 ∼HGeometric(r,N, n) be the random variable describing the number ofkings that are dealt in the first n cards from a deck of N cards con-taining exactly r kings and let K2 ∼ Hypergeometric(s,M,m) be arandom variable describing the number of kings that are dealt in thefirst m cards from a deck of M cards containing exactly s kings. As-sume moreover that K1 and K2 are independent. Then the probabilitythat the second king is dealt on the fifth hand can be expressed as

P (K1 = 1,K2 = 1) = P (K1 = 1)P (K2 = 1) =

(r1

)(N−rn−1

)(Nn

) (s1

)(M−sm−1

)(Mm

)for a particular choice of the parameters. For this question, we knowthat N = 52, n = 4 and r = 4 and we know how the deck changes afterexactly one king and three other cards are dealt: indeed, we know thatM = N − n, m = 5 − n and s = r − 1. Moreover, the independenceassumption is appropriate since the cards are always dealt randomly,no matter the composition of the deck. It’s easy to check that pluggingthis parameters values into the expression for P (K1 = 1,K2 = 1) yields

23

exactly

P (K1 = 1,K2 = 1) =

(r1

)(N−rn−1

)(Nn

) (s1

)(M−sm−1

)(Mm

) =

(r1

)(N−rn−1

)(Nn

) (r−1

1

)(N−n−(r−1)

5−n−1

)(N−n5−n

)=

(41

)(483

)(524

) (31

)(450

)(481

) ≈ 0.0.16.

6. (8 pts) A parking lot has two entrances. Cars arrive at entrance Iaccording to a Poisson distribution at an average of three per hourand at entrance II according to a Poisson distribution at an average offour per hour. What is the probability that a total of three cars willarrive at the parking lot in a given hour? Assume that the numbersof cars arriving at the two entrances are independent.

Solution:Let C1 ∼ Poisson(λ) and C2 ∼ Poisson(µ) denote the number of carsarriving in an hour at entrance I and entrance II of the parking lotrespectively. We know that λ = 3 and µ = 4 and we assume that C1

and C2 are independent. At a given hour

C = C1 + C2

is the total number of cars arriving at the parking lot. We are inter-ested in computing P (C = 3). The event C = 3 can be written as thefollowing disjoint union of events

C = 3 = (C1 = 0 ∩ C2 = 3) ∪ (C1 = 1 ∩ C2 = 2)∪∪ (C1 = 2 ∩ C2 = 1) ∪ (C1 = 3 ∩ C2 = 0)

whose probability is (here we use the fact that the union is a disjointunion and we exploit the independence of C1 and C2)

P (C = 3) = P (C1 = 0 ∩ C2 = 3) + P (C1 = 1 ∩ C2 = 2) + P (C1 = 2 ∩ C2 = 1)+

+ P (C1 = 3 ∩ C2 = 0) = P (C1 = 0)P (C2 = 3) + P (C1 = 1)P (C2 = 2)+

+ P (C1 = 2)P (C2 = 1) + P (C1 = 3)P (C2 = 0) =

= e−λe−µµ3

3!+ e−λλe−µ

µ2

2+ e−λ

λ2

2e−µµ+ e−λ

λ3

3!e−µ =

= e−(λ+µ)

(µ3

3!+ λ

µ2

2+ µ

λ2

2+λ3

3!

)≈ · · · ≈ 0.052.

24

7. (6 pts) Derive the c.d.f. of X ∼ Uniform(a, b).

Solution:Recall that

f(x) =1

b− a1[a,b](x).

We have

F (x) = P (X ≤ x) =

0 if x < a∫ xa

1b−a if a ≤ x < b

1 if x ≥ b=

0 if x < ax−ab−a if a ≤ x < b

1 if x ≥ b.

8. (4 pts) Let X ∼ Uniform(0, 1). Is X ∼ Beta(α, β) for some α, β > 0?

Solution:Recall the p.d.f. corresponding to the Beta distribution with parame-ters α, β > 0:

f(x) =Γ(α+ β)

Γ(α)Γ(β)xα−1(1− x)β−1

1[0,1](x).

By comparing to the p.d.f. of X

fX(x) = 1[0,1](x),

it is easily seen that X ∼ Uniform(α, β) ≡ Beta(1, 1). Notice that inthis case Γ(α) = Γ(β) = Γ(1) = 0! = 1 and Γ(α+ β) = Γ(2) = 1! = 1.

9. (8 pts) The number of defective circuit boards coming off a solderingmachine follows a Poisson distribution. During a specific eight-hourday, one defective circuit board was found. Conditional on the eventthat a single defective circuit was produced, the time at which it wasproduced has a Uniform distribution.

• Find the probability that the defective circuit was produced dur-ing the first hour of operation during that day.

• Find the probability that it was produced during the last hour ofoperation during that day.

• Given that no defective circuit boards were produced during thefirst four hours of operation, find the probability that the defec-tive board was manufactured during the fifth hour.

25

Solution:Let T be the random variable denoting the time at which the defectivecircuit board was produced. T is uniformly distributed in the inter-val [0, 8] (hours), that is T ∼ Uniform(0, 8). Its probability densityfunction is therefore

f(t) =

18 if t ∈ [0, 8]

0 otherwise.

and its cumulative distribution function is

F (t) = P (T ≤ t) =

0 if t < 0t8 if t ∈ [0, 8)

1 if t ≥ 8 .

• The probability that the defective circuit board was producedduring the first hour of operation during the day is

P (T ∈ (0, 1]) = P (0 < T ≤ 1) = F (1)− F (0) =1

8− 0 =

1

8.

• The probability that the defective circuit board was producedduring the last hour of operation during the day is

P (T ∈ (7, 8]) = P (7 < T ≤ 8) = F (8)− F (7) = 1− 7

8=

1

8.

• Given that no defective circuits boards were produced in the firstfour hours (that is we know that the event T > 4 occurred), theprobability that the defective board was manufactured during thefifth hour is

P (T ∈ (5, 6] |T > 4) =P (T ∈ (5, 6] ∩ T > 4)

P (T > 4)=P (T ∈ (5, 6])

P (T > 4)

=F (6)− F (5)

1− P (T ≤ 4)=

68 − 5

8

1− F (4)=

18

1− 48

=1812

=1

4.

10. (6 pts) Let Z ∼ N (0, 1). Show that, for any z > 0, P (|Z| > z) =2(1− Φ(z)).

Solution:We have

P (|Z| > z) = P (Z > z ∪ Z < −z) = P (Z > z) + P (Z < −z)= 1− P (Z ≤ z) + P (Z < −z) = 1− Φ(z) + Φ(−z)= 1− Φ(z) + 1− Φ(z) = 2(1− Φ(z)).

26

11. (6 pts) Let Z ∼ N (0, 1). Express in terms of Φ, the c.d.f. of Z, thefollowing probabilities

• P (Z2 < 1)

• P (3Z/5 + 3 > 2).

Compute the exact probability of the event Φ−1(α/3) ≤ Z ≤ Φ−1(4α/3),where Φ−1(α) is the inverse c.d.f. computed at α (i.e. the α-quantileof Z).

Solution:We have

• P (Z2 > 1) = P (|Z| > 1). Based on the result from the previousexercise, P (|Z| > 1) = 2(1 − Φ(1)). Thus, P (Z ≤ 1) = 1 −P (|Z| > 1) = 1− 2(1− Φ(1)) = 2Φ(1)− 1.

• P (3Z/5 + 3 > 2) = P (3Z/5 > −1) = P (Z > −5/3) = 1 −Φ(−5/3) = Φ(5/3).

Regarding the last question, we have

P (Φ−1(α/3) ≤ Z ≤ Φ−1(4α/3)) = Φ(Φ−1(4α/3))− Φ(Φ−1(α/3))

= 4α/3− α/3 = α.

12. (8 pts) The SAT and ACT college entrance exams are taken by thou-sands of students each year. The mathematics portions of each of theseexams produce scores that are approximately normally distributed. Inrecent years, SAT mathematics exam scores have averaged 480 withstandard deviation 100. The average and standard deviation for ACTmathematics scores are 18 and 6, respectively.

• An engineering school sets 550 as the minimum SAT math scorefor new students. What percentage of students will score below550 in a typical year? Express your answer in terms of Φ.

• What score should the engineering school set as a comparablestandard on the ACT math test? You can leave your answer interms of Φ−1.

Solution:Let S denote the random variable describing the SAT mathematicsscores and let A denote the random variable describing the ACT

27

mathematics scores. Let also Z ∼ N (0, 1). S is distributed approxi-mately according to N (µs, σ

2s) with µs = 480 and σs = 100 and A is

distributed approximately according to N (µa, σ2a) with µa = 18 and

σa = 6.

• Based on the approximations above the percentage of students inthe engineering school with SAT math score below 550 is

P (S ≤ 550) = P

(S − µsσs

≤ 550− µsσs

)≈ P

(Z ≤ 550− 480

100

)= P (Z ≤ 0.7) = Φ(0.7) ≈ 0.758.

• The minimum score that the engineering school should requirefrom new students in the ACT math test in order for the twominimum scores to be two comparable standards is the numbera that satisfies

P (A ≤ a) ≈ P (S ≤ 550) ≈ Φ(0.7).

We have to solve for a the following equation

P (A ≤ a) = P

(A− µaσa

≤ a− µaσa

)≈ P

(Z ≤ a− µa

σa

)= Φ(0.7),

namely we need to solve

a− µaσa

= 0.7

whose solution is a = 0.7σa + µa = 0.7 ∗ 6 + 18 = 22.2.

13. (6 pts) Suppose that a random variable X has a probability densityfunction given by

f(x) = cx3e−x21[0,∞)(x).

• Find the value of c that makes f a valid p.d.f. (hint: if you thinkabout this carefully, no calculations are needed!)

• Does X have a Chi-Square distribution? If so, with how manydegrees of freedom?

• What are E(X) and V (X)?

Solution:

28

• Given that f is a bona fide p.d.f., c has to be exactly the postiveconstant that makes the kernel of the p.d.f.

x3e−x21[0,∞)(x)

integrate to 1. We recognize that this kernel is the kernel of aGamma p.d.f. with parameters α = 4 and β = 2. It follows thatc must be the number

c =1

βαΓ(α)=

1

24Γ(4)=

1

16 ∗ 3!=

1

16 ∗ 6=

1

96.

• Recall that Gamma(ν/2, 2) ≡ χ2(ν). Therefore, in this case,ν = 8 and the above p.d.f. corresponds to a Chi-Square p.d.f.with ν = 8 degrees of freedom.

• We have E(X) = ν = 8 and V (X) = 2ν = 16.

14. (8 pts) Consider the random variable X ∼ Exponential(β). Show thatthe distribution of X has the memoryless property.

Solution:Let 0 ≤ s ≤ t. We have that

P (X > s+ t|X > s) =P (X > s+ t)

P (X > s)=

1− F (s+ t)

1− F (s)=

1−(

1− e−s+tβ

)1−

(1− e−

s+tβ

)= e− tβ = P (X > t).

Therefore, X has the memoryless property.

15. (6 pts) An interesting property of the Exponential distribution is thatits associated hazard function is constant on [0,∞). The hazard func-tion associated to a density f is defined as the function

h(x) =f(x)

1− F (x). (3)

One way to interpret the hazard function is in terms of the failurerate of a certain component. In this sense, h can be thought of as theratio between the probability density of failure at time x, f(x), andthe prorbability that the component is still working at the same time,which is P (X > x) = 1− F (x).

29

Show that if X ∼ Exponential(β) is a random variable describing thetime at which a particular component fails to work, then the failurerate of X is constant (i.e. h does not depend on x).

Solution:For x > 0 we have

h(x) =f(x)

1− F (x)=

1β e− xβ

1−(

1− e−xβ

) =1

β.

16. Extra credit (10 pts) Consider the Binomial p.m.f.

p(x) =

(n

x

)px(1− p)n−x10,1,...,n(x)

with parameters n and p. Show that

limn→∞

p(x) = e−λλx

x!10,1,... (x)

where the limit is taken as n→∞, p→ 0, and np→ λ > 0.(Hint: For three converging sequences an, bn, cn whose limits are a, b, crespectively, limn→∞ anbncn = abc.)

Solution:Take any x ∈ 0, 1, . . . , n. We have

p(x) =

(n

x

)px(1− p)n−x =

n!

x!(n− x)!

(npn

)x (1− np

n

)n−x=n(n− 1) . . . (n− x+ 1)

nx1

x!(np)x

(1− np

n

)n (1− np

n

)−xNow,

n(n− 1) . . . (n− x+ 1)

nx→ 1 as n→∞

(np)x → λx as np→ λ(1− np

n

)n→(

1− λ

n

)nas np→ λ and

(1− λ

n

)n→ e−λ as n→∞

(1− np

n

)−x→(

1− λ

n

)−xas np→ λ and

(1− λ

n

)−x→ 1 as n→∞.

Thus,

p(x)→ 1 ∗ 1

x!∗ λx ∗ e−λ ∗ 1 = e−λ

λx

x!which is the p.m.f. of a Poisson distribution with parameter λ.

30

17. Extra credit ( 10 pts) Consider the Normal p.d.f.

f(x) =1√

2πσ2e−

(x−µ)2

2σ2

with parameters µ ∈ R and σ2 > 0. Show that µ − σ and µ + σcorrespond to the inflection points of f .

Solution:First, we need to compute the second derivative of f . We have

f ′(x) = − 1√2πσ2

x− µσ2

e−(x−µ)2

2σ2

and therefore

f ′′(x) = − 1√2πσ2

(1

σ2e−

(x−µ)2

2σ2 − (x− µ)2

σ4e−

(x−µ)2

2σ2

)= − 1√

2πσ2

1

σ2e−

(x−µ)2

2σ2

(1− (x− µ)2

σ2

)Notice that

− 1√2πσ2

1

σ2e−

(x−µ)2

2σ2 < 0

for any x ∈ R, therefore the sign of f ′′ depends on

1− (x− µ)2

σ2=σ2 − (x− µ)2

σ2

Clearly,σ2 − (x− µ)2

σ2≥ 0

for |x− µ| ≤ σ ⇐⇒ µ− σ ≤ x ≤ µ+ σ,

σ2 − (x− µ)2

σ2≤ 0

for |x− µ| ≥ σ ⇐⇒ x ≥ µ+ σ ∨ x ≤ µ− σ, and

σ2 − (x− µ)2

σ2= 0

for x = µ± σ. Therefore x = µ± σ are the two inflection points of f .

31

18. Extra credit ( 10 pts) Consider the random variable X ∼ Beta(α, β)with α, β ∈ Z+. Show that, for n ∈ Z+,

E(Xn+1)

E(Xn)=

α+ n

α+ β + n.

Use this result to show that

E(Xn) =

n−1∏i=0

α+ i

α+ β + i.

Solution:For the first claim, fix an arbitrary n ∈ Z+. Then

E(Xn) =Γ(α+ β)

Γ(α)Γ(β)

∫ 1

0xnxα−1(1− x)β−1 dx

=Γ(α+ β)

Γ(α)Γ(β)

∫ 1

0xα+n−1(1− x)β−1 dx.

Now,xα+n−1(1− x)β−1

1[0,1](x)

is the kernel of a Beta(α+n, β) p.d.f., hence its integral on [0, 1] mustbe equal to ∫ 1

0xα+n−1(1− x)β−1 dx =

Γ(α+ n)Γ(β)

Γ(α+ β + n).

Therefore,

E(Xn) =Γ(α+ β)

Γ(α)Γ(β)

Γ(α+ n)Γ(β)

Γ(α+ β + n)

=Γ(α+ β)Γ(α+ n)

Γ(α)Γ(α+ β + n)

Then,

E(Xn+1)

E(Xn)=

Γ(α+ β)Γ(α+ n+ 1)

Γ(α)Γ(α+ β + n+ 1)

Γ(α)Γ(α+ β + n)

Γ(α+ β)Γ(α+ n)

=Γ(α+ n+ 1)

Γ(α+ β + n+ 1)

Γ(α+ β + n)

Γ(α+ n)

=(α+ n)Γ(α+ n)

(α+ β + n)Γ(α+ β + n)

Γ(α+ β + n)

Γ(α+ n)

=α+ n

α+ β + n.

32

For the second claim, it is enough to notice that

E(Xn) =n−1∏i=0

E(Xi+1)

E(Xi).

From the previous result, we have that

E(Xi+1)

E(Xi)=

α+ i

α+ β + i,

thus

E(Xn) =

n−1∏i=0

E(Xi+1)

E(Xi)=

n−1∏i=0

α+ i

α+ β + i

19. Extra credit (10 pt) Download the free version of R and RStudio(Desktop edition) on your personal computer. You can use the instruc-tions here: http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/R-install-instructions.html.

(a) When you’re done, type version at the command line and hit en-ter. Write down the resulting version.string. Next, type print(”Hello,world!”).

(b) Next, type mysample = rexp(n=1000,rate=2) to generate 1, 000i.i.d. samples from an exponential distribution. These are 200possible wait time (in hours) for a bus that is expected to arriveonce every half hour.

(c) Next, type hist(mysample) to produce a ‘histogram’ (an esti-mate of the probability density function (pdf); the outline of thebars will roughly take the shape of the pdf) of the an exponentialdistribution

(d) Next, use abline(v=c(0.25,0.75),col=’red’) to plot two ver-tical lines. The area between these lines is a good estimate ofP (0.25 ≤ X ≤ 0.75) for an exponential random variable X withparameter λ = 2; or, in other words, the probability of the waittime being between 15 and 45 minutes.

(e) Lastly, use dev.copy(png,’mydirectory/histogram.png’); graphics.off()

to save the image as a png file, in your home directory. Includethis picture to your homework.

33

http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/R-install-instructions.html

http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/R-install-instructions.html

Figure 1: Your figure should look something like this!

Homework 5 (due June 2nd)

1. (10 pts) Simple example of multivariate joint distributions,and conditional distributions. All car accidents that occurredin the state of Florida in 1998 and that resulted in an injury wererecorded, along with the seat belt use of the injured person and whetherthe injury was fatal or not. Total number of recorded accidents is577,006. This table can be used to describe the joint distribution oftwo binary random variables X and Y , where X is 1 if the passengerdoes not wear the seat belt and 0 otherwise and Y is 1 is the accident

34

No belt Belt

Fatal 1,601 162,527

Non-fatal 510 412,368

is fatal or not.

(a) What is the probability that an accident is fatal and the passengerdid not wear the seat belt?

(b) What is the probability that an accident is fatal and the passengerwore the seat belt?

(c) What is the marginal distribution of X? (Hint: just in case you’rewondering what you need to produce, notice X is a discrete ran-dom variable. You should completely produce probability massfunction (or CDF) of X!)

(d) What is the conditional distribution of Y |X = No Belt and Y |X =Belt? (the same hint applies here)

Solution:

• Read from the table, P (A) = 1601/577006 = 0.0028

• P (B) = 162527/577006 = 0.28

• P (X = 1) = (1601 + 510)/577006 = 0.0037, P (X = 0) = (1601 +510)/577006 = 0.9963

• P (Y = fatal|X = No belt) = P (Y=fatal,X=No belt)P (X=No belt) = 1601/(1601+

510) = 0.76.

P (Y = Non− fatal|X = No belt) = 510/(1601 + 510) = 0.24.

P (Y = fatal|X = belt) = P (Y=fatal,X=belt)P (X=belt) = 162527/(162527 +

412368) = 0.28.

P (Y = Non − fatal|X = belt) = 412368/(162527 + 412368) =0.72.

2. (10 pts) Consider the following function:

f(x1, x2) =

6x2

1x2 if 0 ≤ x1 ≤ x2 ∧ x1 + x2 ≤ 2

0 otherwise.

• Verify that f is a valid bivariate p.d.f.

35

• Suppose (X1, X2) ∼ f . Are X1 and X2 independent randomvariables?

• Compute P (Y1 + Y2 < 1)

• Compute the marginal p.d.f. of X1. What are the values of itsparameters?

• Compute the marginal p.d.f. of X2

• Compute the conditional p.d.f. of X2 given X1

• Compute P (X2 < 1.1|X1 = 0.6).

Solution:

• It is clear that fX1,X2(x1, x2) ≥ 0 for any x1, x2 ∈ R. We justneed to check that fX1,X2 integrates to 1:∫

R

∫RfX1,X2(x1, x2) dx1dx2 =

∫ 1

0

∫ (2−x1)

x1

6x21x2 dx2dx1

= 6

∫ 1

0x2

1

∫ (2−x1)

x1

x2 dx2dx1 = 3

∫ 1

0x2

1 x22

∣∣2−x1x1

dx1

= 3

∫ 1

0x2

1[(2− x1)2 − x21] dx1 = 12

∫ 1

0x2

1(1− x1) dx1

= 12Γ(3)Γ(2)

Γ(5)= 12

2

4!=

24

24= 1.

• X1 and X2 are not independent since the support of fX1,X2 (theunion of the green region and the red region in the figure) is nota rectangular region.

36

• The region corresponding to the event X1 + X2 < 1 is the redregion in the figure. We have

P (X1 +X2 < 1) =

∫ 12

0

∫ 1−x1

x1

6x21x2 dx2dx1

= 6

∫ 12

0x2

1

∫ 1−x1

x1

x2 dx2dx1 = 3

∫ 12

0x2

1 x22

∣∣1−x1x1

dx1

= 3

∫ 12

0x2

1[(1− x1)2 − x21] dx1 = 3

∫ 12

0x2

1(1− 2x1) dx1

= x31

∣∣ 120− 3

2x4

1

∣∣ 120

=1

8− 3

32=

1

32.

• Notice that supp(X1) = [0, 1]. For x1 ∈ [0, 1] we thus have

fX1(x1) =

∫ 2−x1

x1

6x21x2 dx2 = 3x2

1 x22

∣∣2−x1x1

= 3x21[(2− x1)2 − x2

1] = 12x21(1− x1)

which we recognize as the p.d.f. of a Beta(3, 2) distribution.

• Notice that supp(X2) = [0, 2]. For 0 ≤ x2 < 1 we have

fX2(x2) =

∫ x2

06x2

1x2 dx1 = 2x2 x31

∣∣x20

= 2x42.

For 1 ≤ x2 ≤ 2 we have

fX2(x2) =

∫ 2−x2

06x2

1x2 dx1 = 2x2 x31

∣∣2−x20

= 2x2(2− x2)3.

• The conditional p.d.f. of X2 given X1 = x1 is only well-definedfor x1 ∈ supp(X1) = [0, 1]. For x1 ∈ (0, 1),

fX2|X1=x1(x2) =fX1,X2(x1, x2)

fX1(x1)=

6x21x21[x1,2−x1](x2)

12x21(1− x1)

=x2

2(1− x1)1[x1,2−x1](x2).

37

• We have

P (X2 < 1.1|X1 = 0.6) =

∫ 1.1

−∞fX2|X1=0.6(x2) dx2

=

∫ 1.1

−∞

x2

2(1− 0.6)1[0.6,1.4](x2)

=1

1.6x2

2

∣∣1.10.6

=1.21− 0.36

1.6= 0.53125.

3. (10 pts) Consider the following bivariate p.m.f.:

pX1,X2(x1, x2) =

0.38 if (x1, x2) = (0, 0)

0.17 if (x1, x2) = (1, 0)

0.14 if (x1, x2) = (0, 1)

0.02 if (x1, x2) = (1, 1)

0.24 if (x1, x2) = (0, 2)

0.05 if (x1, x2) = (1, 2).

• Compute the marginal p.m.f. of X1 and the marginal p.m.f. ofX2

• Compute pX2|X1=0, the conditional p.m.f. of X2 given X1 = 0

• Compute P (X1 = 0|X2 = 2)?

Solution:

• Notice first that supp(X1) = 0, 1. Thus, we have

pX1(x1) =

pX1,X2(0, 0) + pX1,X2(0, 1) + pX1,X2(0, 2) if x1 = 0

pX1,X2(1, 0) + pX1,X2(1, 1) + pX1,X2(1, 2) if x1 = 1

0 if x1 /∈ 0, 1

=

0.38 + 0.14 + 0.24 if x1 = 0

0.17 + 0.02 + 0.05 if x1 = 1

0 if x1 /∈ 0, 1=

0.76 if x1 = 0

0.24 if x1 = 1

0 if x1 /∈ 0, 1.

Now notice that supp(X2) = 0, 1, 2. We have

38

• We have

pX2(x2) =

pX1,X2(0, 0) + pX1,X2(1, 0) if x2 = 0

pX1,X2(0, 1) + pX1,X2(1, 1) if x2 = 1

pX1,X2(0, 2) + pX1,X2(1, 2) if x2 = 2

0 if x2 /∈ 0, 1, 2

=

0.38 + 0.17 if x2 = 0

0.14 + 0.02 if x2 = 1

0.24 + 0.05 if x2 = 2

0 if x2 /∈ 0, 1, 2

=

0.55 if x2 = 0

0.16 if x2 = 1

0.29 if x2 = 2

0 if x2 /∈ 0, 1, 2.

pX2|X1=0(x2) =pX1,X2(0, x2)

pX1(0)=

0.380.76 if x2 = 00.140.76 if x2 = 10.240.76 if x2 = 2

0 if x2 /∈ 0, 1, 2

≈

0.5 if x2 = 0

0.184 if x2 = 1

0.316 if x2 = 2

0 if x2 /∈ 0, 1, 2

• We have

P (X1 = 0|X2 = 2) =P (X1 = 0 ∩X2 = 2)

P (X2 = 2)=pX1,X2(0, 2)

pX2(2)

=0.24

0.29≈ 0.828.

4. (10 pts) Let X1 ∼ Uniform(0, 1) and, for 0 < x1 ≤ 1,

fX2|X1=x1(x2) =1

x11(0,x1](x2).

• What named continuous distribution does fX2|X1=x1 seem to re-semble? What are the values of its parameters?

• Compute the joint p.d.f. of (X1, X2)

• Compute the marginal p.d.f. of X2.

39

Solution:

• It is clear that fX2|X1=x1 for 0 < x1 ≤ 1 is a Uniform p.d.f. overthe interval (0, x1].

• We have

fX1,X2(x1, x2) = fX1(x1)fX2|X1=x1(x2) = 1(0,1](x1)1

x11(0,x1](x2)

=1

x110<x2≤x1≤1(x1, x2).

• We have, for x2 ∈ (0, 1],

fX2(x2) =

∫ 1

x2

1

x1dx1 = log(x1)|1x2 = − log(x2)

otherwise fX2(x2) = 0 for x2 /∈ (0, 1].

5. (5 pts) Let X1 ∼ Binomial(n1, p) and X2 ∼ Binomial(n2, p). Assumethat X1 and X2 are independent. Consider W = X1 + X2. Onecan show that W ∼ Binomial(n1 + n2, p). Use this result to showthat pX1|W=w is a Hypergeometric p.m.f. with parameters r = n1,N = n1 + n2, and n = w.

Solution:We have, for x1 ∈ 0, 1, . . . , w

pX1|W=w(x1) = P (X1 = x1|W = w) =P (X1 = x1 ∩W = w)

P (W = w)=P (X1 = x1 ∩X2 = w − x1)

P (W = w)

=P (X1 = x1)P (X2 = w − x1)

P (W = w)

=

(n1

x1

)px1(1− p)n1−x1

(n2

w−x1

)pw−x1(1− p)n2−w+x1(

n1+n2

w

)pw(1− p)n1+n2−w

=

(n1

x1

)(n2

w−x1

)(n1+n2

w

)and clearly pX1|W=w(x1) = 0 if x1 /∈ 0, 1, . . . , w.

6. (10 pts) Suppose that you have a coin such that, if tossed, head appearswith probability p ∈ [0, 1] and tail occurs with probability 1−p. PersonA tosses the coin. The number of tosses until the first head appears for

40

person A is X1. Then person B tosses the coin. The number of tossesuntil the first head appears for person B is X2. Assume that X1 andX2 are independent. What is the probability of the event X1 = X2?

Solution:Obviously, X ∼ Geometric(p) and Y ∼ Geometric(p). Because X andY are independent, we have

pX1,X2(x1, x2) = pX1(x1)pX2(x2) = (1− p)x1−1p11,2,... (x1)(1− p)x2−1p11,2,... (x2)

= (1− p)x1+x2−2p211,2,... (x1)11,2,... (x2).

Then,

P (X1 = X2) =∑

x1=x2=xx∈1,2,...

pX1,X2(x1, x2) =∑

x∈1,2,...

pX1,X2(x, x)

= p2∑

x∈1,2,...

(1− p)2x−2 = p2∑

x∈1,2,...

[(1− p)2]x−1

=p2

1− (1− p)2=

p

2− p.

7. (10 pts) Let (X1, X2) ∼ fX1,X2 with

fX1,X2(x1, x2) =1

x110<x2≤x1≤1(x1, x2).

Compute E(X1 −X2).

Solution:We have E(X1−X2) = E(X1)−E(X2). First we compute the marginalp.d.f.’s. They are

fX1(x1) =

∫RfX1,X2(x1, x2) dx2 =

∫R

1

x110<x2≤x1≤1(x1, x2) dx2

=1

x11(0,1](x1)

∫R1(0,x1](x2) dx2

=1

x11(0,1](x1)

∫ x1

0(x2) dx2

= 1(0,1](x1)

41

and

fX2(x2) =

∫RfX1,X2(x1, x2) dx1 =

∫R

1

x110<x2≤x1≤1(x1, x2) dx1

=

∫R

1

x11(0,1](x2)1[x2,1](x2) dx1

= 1(0,1](x2)

∫ 1

x2

1

x1dx1 = − log(x2)1(0,1](x2).

Notice that fX1 is the p.d.f. of Uniform(0, 1), therefore E(X1) = 1/2.On the other hand

E(X2) = −∫ 1

0x2 log(x2) dx2 = −

[x2

2

2log(x2)−

∫x2

2dx2

]1

0

=1

4

[x2

2

]10

=1

4.

Therefore E(X1 −X2) = E(X1)− E(X2) = 1/2− 1/4 = 1/4.

8. (10 pts) Consider

fX1,X2(x1, x2) = 6(1− x2)10≤x1≤x2≤1(x1, x2).

• Are X1 and X2 independent?

• Are X1 and X2 correlated (positively or negatively) or uncorre-lated?

Solution:

42

• X1 and X2 are not independent. We can see this in two ways:1) we draw the support of their joint p.d.f. (which is the setindicated by the indicator function; the red triangle in the figure)and we easily see that it does not correspond to a rectangularregion; 2) we rewrite the indicator function as

10≤x1≤x2≤1(x1, x2) = 1[0,x2](x1)1[x1,1](x2)

and then we easily see that there is no way to factorize the jointp.d.f. into two functions g and h such that g only depends on x1

and h only depends on x2.

• To answer the second question, we need to compute Cov(X1, X2).For that we need E(X1), E(X2), and E(X1X2). We have

E(X1X2) =

∫R

∫Rx1x2fX1,X2(x1, x2) dx1dx2

=

∫ 1

0

∫ x2

0x1x26(1− x2) dx1dx2

= 6

∫ 1

0x2(1− x2)

∫ x2

0x1 dx1dx2

= 3

∫ 1

0x2(1− x2) x2

1

∣∣x20dx2

= 3

∫ 1

0x3

2(1− x2)dx2

= 3Γ(4)Γ(2)

Γ(6)= 3

3!

5!=

18

120=

3

20.

Then,

E(X1) =

∫ 1

0x1

∫ 1

x2

6(1− x2) dx2dx1

= 6

∫ 1

0x1

[x2|1x1 −

x22

2

∣∣∣∣1x1

]dx1

= 6

∫ 1

0

(x1

2− x2

1 +x3

1

3

)dx1

= 6

(1

4− 1

3+

1

8

)=

1

4

43

and

E(X2) =

∫ 1

06x2(1− x2)

∫ x2

0dx1dx2

6

∫ 1

0x2

2(1− x2) dx2 = 6Γ(3)Γ(2)

Γ(5)= 6

2

4!=

12

24=

1

2.

Thus, Cov(X1, X2) = E(X1X2) − E(X1)E(X2) = 320 − 1

412 =

320 − 1

8 = 140 . Since the covariance is positive, X1 and X2 are

positively correlated and therefore not independent.

9. (10 pts) Let X1 and X2 be uncorrelated random variables. ConsiderY1 = X1 +X2 and Y2 = X1 −X2.

• Express Cov(Y1, Y2) in terms of V (X1) and V (X2)

• Give an expression for Cor(Y1, Y2)

• Is it possible that Cor(Y1, Y2) = 0? If so, when does this happen?

Solution:

• We have

Cov(Y1, Y2) = Cov(X1 +X2, X1 −X2)

= Cov(X1, X1)− Cov(X1, X2) + Cov(X2, X1)− Cov(X2, X2)

= V (X1)− V (X2).

• By definition

Cor(Y1, Y2) =Cov(Y1, Y2)√V (Y1)V (Y2)

.

We have

V (Y1) = V (X1 +X2) = V (X1) + V (X2) + 2Cov(X1, X2)

= V (X1) + V (X2) + 0 = V (X1) + V (X2)

and

V (Y2) = V (X1 −X2) = V (X1) + V (X2)− 2Cov(X1, X2)

= V (X1) + V (X2) + 0 = V (X1) + V (X2),

thus

Cor(Y1, Y2) =Cov(Y1, Y2)√V (Y1)V (Y2)

=V (X1)− V (X2)

V (X1) + V (X2).

44

• If V (X1) > V (X2), then Y1 and Y2 are positively correlated.If V (X1) < V (X2), then Y1 and Y2 are negatively correlated.Finally, If V (X1) = V (X2), then Y1 and Y2 are uncorrelated.

10. (10 pts) Let

fX1(x1) =1

6x3

1e−x11[0,∞)(x1)

and

fX2(x2) =1

2e−

x22 1[0,∞)(x2).

Let Y = X1 −X2.

• Compute E(Y )

• Let Cov(X1, X2) = γ. Compute V (Y )

• Suppose that X1 and X2 are uncorrelated. What is V (Y ) in thiscase?

Solution:

• It is clear from the inspection of the two p.d.f.’s that X1 ∼Gamma(4, 1) and X2 ∼ Exponential(2). Thus, E(Y ) = E(X1 −X2) = E(X1)− E(X2) = 4− 2 = 2.

• V (Y ) = V (X1 − X2) = V (X1) + V (X2) − 2Cov(X1, X2) = 4 +4− 2γ = 2(4− γ).

• If X1 and X2 are uncorrelated, then γ = 0 and V (Y ) = 8.

11. (5 pts) Use the definition of variance to show that V (X±Y ) = V (X)+V (Y )± 2Cov(X,Y ).

Solution: We have

V (X ± Y ) = E[(X ± Y )2]− [E(X ± Y )]2

= E(X2 + Y 2 ± 2XY )− [E(X)]2 − [E(X)]2 ∓ 2E(X)E(Y )

= E(X2)− [E(X)]2 + E(Y 2)− [E(Y )]2 ± 2E(XY )∓ 2E(X)E(Y )

= V (X) + V (Y )± 2[E(XY )− E(X)E(Y )] = V (X) + V (Y )± 2Cov(X,Y ).

12. Extra credit (10 pts) Let Z ∼ N (0, 1). Show that for any odd n ≥ 1,E(Zn) = 0.

45

Solution: Let f denote the p.d.f. associated to N (0, 1). Notice thatf is an even function, i.e. f(z) = f(−z) for any z ∈ R. We have

E(Zn) =

∫Rznf(z) dz

=

∫ 0

−∞znf(z) dz +

∫ +∞

0znf(z) dz

=

∫ 0

−∞znf(−z) dz +

∫ +∞

0znf(z) dz

=

∫ 0

−∞−((−z)n)f(−z) dz +

∫ +∞

0znf(z) dz

=

∫ −∞0

(−z)nf(−z) dz +

∫ +∞

0znf(z) dz

Make the change of variable x = −z in the first integral. We get∫ −∞0

(−z)nf(−z) dz +

∫ +∞

0znf(z) dz

=

∫ +∞

0(xn)f(x)(−1) dx+

∫ +∞

0znf(z) dz

= −∫ +∞

0(xn)f(x) dx+

∫ +∞

0znf(z) dz = 0.

13. Extra Credit: (10 pts) Show that for two random variables X andY , Cor(X,Y ) ∈ [−1, 1].Hint: consider X ′ = X − E(X) and Y ′ = Y − E(Y ) and study thequadratic function h(t,X ′, Y ′) = (tX ′−Y ′)2 as a function of t. Noticethat h(t,X ′, Y ′) ≥ 0, therefore E(h(t,X ′, Y ′)) ≥ 0 which implies thatthe discriminant of h(t,X ′, Y ′) as a quadratic function of t must benon-positive! Try to use this fact to prove the claim that Cor(X,Y ) ∈[−1, 1].

Solution: We have E(h(t,X ′, Y ′)) ≥ 0. This means

E[(tX ′ − Y ′)2] = E(t2X ′2 + Y ′2 − 2tX ′Y ′)

= t2E(X ′2)− 2tE(X ′Y ′) + E(Y ′2) ≥ 0

This can only happen if the discriminant is non-positive, i.e. if

4[E(X ′Y ′)]2 − 4E(X ′2)E(Y ′2) ≤ 0,

46

i.e.[E(X ′Y ′)]2 ≤ E(X ′2)E(Y ′2)

or, equivalently,[E(X ′Y ′)]2

E(X ′2)E(Y ′2)≤ 1.

Notice that

[E(X ′Y ′)]2

E(X ′2)E(Y ′2)=

[E((X − E(X))(Y − E(Y )))]2

E((X − E(X))2)E((Y − E(Y ))2)

=[Cov(X,Y )]2

V (X)V (Y )= [Cor(X,Y )]2.

Thus, it follows that |Cor(X,Y )| ≤ 1 or, equivalently, −1 ≤ Cor(X,Y ) ≤1.

14. Extra Credit (10 pts): In an earlier homework, you downloaded Rand RStudio and printed Hello, world!, and generated some samplesfrom a given distribution. We will use R again for a fun new experi-ment. The Hypergeometric distribution can be approximated by a Bi-nomial distribution. Specifically, HG(n,N, r) ' Bin(n = n, p = r/N)when N ≥ 20n, as a rule of thumb! (Fun exercise: first think aboutintuitively why this is true, then prove that the former distributiongoes to the latter as N →∞ and p = r/N !)

Conduct a simulation in R to demonstrate that this is true. Startwith the code below and make modifications as you see fit. Amongother things, you may need to increase the number of simulations(n.sample), adjust the parameters (n,N, r) so that we have a validscenario where the Binomial distribution can approximate the Hyper-geometric distribution, and adjust the plotting parameters so that youget a meaningful, easy to read graphic that shows that the two simu-lated distributions are approximately the same.

# Initialize some useful information

n.sample <- 100

n <- 500

N <- 1000

r <- 750

p <- r/N

# Generate random sample from binomial distribution

# Dont confuse n and size here. The folks who wrote this function

47

# used some unfortunately confusing parameter names

bin.sample <- rbinom(n = n.sample, size = n, prob = r/N)

# Generate random sample from hypergeometric distribution

# Dont get confused here -- the folks who write this function

# used a different parameterization of the hypergeometric distribution

hg.sample <- rhyper(nn = n.sample, m = r, n = N-r, k = n)

# Plot the results

par(mfrow = c(2,1))

hist(bin.sample, breaks = 50,

main = "Simulated Binomial Distribution", col = 2,

xlab = "Hey there, fix my axis label!!")

hist(hg.sample, breaks = 50,

main = "Simulated Hypergeometric Distribution", col = 3,

xlab = "Master Chief")

48

Homework 6 (due June 9th)

1. (5 pts) Let X|Q = q ∼ Binomial(n, q) and Q ∼ Beta(α, β). ComputeE(X).

Solution:We have

E(X) = E[E(X|Q)] = E(nQ) = nE(Q) =nα

α+ β.

2. (5 pts) Let X|Q = q ∼ Binomial(n, q) and Q ∼ Beta(3, 2). Find themarginal p.d.f. of X. (Notice that here we are mixing discrete andcontinuous random variables, but the usual definitions still apply!)

Solution:The support of X is supp(X) = 0, 1, . . . , n. For x ∈ supp(X) andq ∈ (0, 1) we have

pX(x) =

∫ 1

0pX,Q(x, q) dq =

∫ 1

0pX|Q=q(x)pQ(q) dq

=

∫ 1

0

(n

x

)qx(1− q)n−x Γ(3 + 2)

Γ(3)Γ(2)q2(1− q) dq

= 12

(n

x

)∫ 1

0q2+x(1− q)n−x+1 dq

= 12

(n

x

)Γ(3 + x)Γ(n− x+ 2)

Γ(3 + x+ n− x+ 2)

= 12

(n

x

)Γ(3 + x)Γ(n− x+ 2)

Γ(5 + n).

For any x /∈ supp(X), of course pX(x) = 0.

3. (5 pts) Let X ∼ Exponential(4). Consider Y = 3X + 1. Find thep.d.f. of Y and E(Y ).

Solution:We have g(x) = 3x+1 thus the support of Y is [1,∞)), g−1(x) = x−1

3 ,and

d

dxg−1(x) =

1

3.

49

It follows that, for y ∈ [1,∞),

fY (y) = fX(g−1(y))

∣∣∣∣∣ ddxg−1(x)

∣∣∣∣x=y

∣∣∣∣∣ =1

4e−

y−112

1

3

=1

12e−

x−112

otherwise fY (y) = 0. Now,

E(Y ) =

∫ ∞1

y1

12e−

y−112 dy =

∫ ∞0

(12x+ 1)e−x dx

= 12

∫ ∞0

xe−x dx+

∫ ∞0

e−x dx = 12 + 1 = 13.

4. (10 pts) We mentioned in class that if X ∼ Gamma(α, β) then, forc > 0, Y = cX ∼ Γ(α, cβ). Prove it.

Solution:We have g(x) = cx, g−1(x) = x/c, and

d

dxg−1(x) =

1

c.

The image of the support of X through g is still [0,∞). Then, fory ∈ [0,∞),

fY (y) = fX(g−1(y))

∣∣∣∣∣ ddxg−1(x)

∣∣∣∣x=y

∣∣∣∣∣=

1

βαΓ(α)

(yc

)α−1e− ycβ

1

c

=1

(cβ)αΓ(α)yα−1e

− ycβ ,

otherwise fY (y) = 0. This is clearly the p.d.f. of a Gamma(α, cβ)distribution.

5. (10 pts) Let F be a given continuous and strictly increasing c.d.f.. LetU ∼ Uniform(0, 1). Find a function g such that g(U) ∼ F . You canassume that g is strictly increasing.

50

Solution:Let Y = g(U). We have

F (y) = P (Y ≤ y) = P (g(U) ≤ y) = P (U ≤ g−1(y))

=

0 if g−1(y) < 0

g−1(y) if g−1(y) ∈ [0, 1)

1, if g−1(y) ≥ 1.

Thus, it is enough to set g−1 = F , i.e. g = F−1, and then Y = g(U) ∼F .

6. (10 pts) Let X ∼ fX with

fX(x) =b

x21[b,∞)(x)

and let U ∼ Uniform(0, 1). Find a function g such that g(U) has thesame distribution of X.

Solution:Using the result of exercise 5, we need to set g = F−1

X . We have

FX(x) =

0 if x < b∫ xb fX(y) dy if x ≥ b

=

0 if x < b∫ xb

by2dy if x ≥ b

=

0 if x < b

− by

∣∣∣xb

if x ≥ b =

0 if x < b

1− bx if x ≥ b.

It follows that

g(x) = F−1Y (x) =

b

1− x.

7. (10 pts) Let θ > 0 and X ∼ fX with

fX(x) =2x

θe−

x2

θ 1[0,∞)(x).

Consider Y = X2. What is fY , the p.d.f. of Y ? What are E(Y ) andV (Y ).

Solution:Notice that the function g(x) = x2 is invertible over the support of

51

X, [0,∞). Furthermore, g is such that supp(Y ) = supp(X). We haveg−1(x) =

√x and

d

dxg−1(x) =

1

2√x.

Therefore

fY (y) = fX(g−1(y))

∣∣∣∣∣ ddxg−1(x)

∣∣∣∣x=y

∣∣∣∣∣=

2√y

θe−

yθ 1[0,∞)(

√y)

1

2√y

=1

θe−

xθ 1[0,∞)(y).

It follows that Y ∼ Exponential(θ). Thus, E(Y ) = θ and V (Y ) = θ2.

8. Extra credit (10pts) Let X ∼ Beta(α, β). Consider Y = 1−X. Whatis the probability distribution of Y ? Now, let α = β = 1. Claim thatX and Y have the same distribution. Are X and Y also independent?

Solution:Here g(x) = 1− x, so that g−1(x) = g(x) and d

dxg−1(x) = −1. Notice

that, for this g, supp(Y ) = supp(X) = [0, 1]. Then, for y ∈ [0, 1] wehave

fY (y) = fX(g−1(y))

∣∣∣∣∣ ddxg−1(x)

∣∣∣∣x=y

∣∣∣∣∣=

Γ(α+ β)

Γ(α)Γ(β)(1− y)α−1yβ−1| − 1|

=Γ(α+ β)

Γ(α)Γ(β)(1− y)α−1yβ−1,

otherwise fY (y) = 0. It is clear then that Y ∼ Beta(β, α). If α =β = 1, then X and Y share the exact same distribution, which isUniform(0, 1). However, since Y is a deterministic function of X (inparticular, Y = 1 − X), it is obvioulsy false that X and Y are inde-pendent random variables.

9. (5 pts) Let X ∼ FX . What is E(X|X)?

Solution:For x ∈ supp(X), we have E(X|X = x) = E(x|X = x) = E(x) = x,i.e. E(X|X) = X.

52

10. (10 pts) Let X,Y, Z be three random variables such that E(X|Z) =3Z, E(Y |Z) = 1, and E(XY |Z) = 4Z. Assume furthermore thatE(Z) = 0. Are X and Y uncorrelated?

Solution:We have

Cov(X,Y ) = E[Cov(X,Y |Z)] + Cov[E(X|Z), E(Y |Z)]

= E[E(XY |Z)− E(X|Z)E(Y |Z)] + Cov(3Z, 1)

= E(4Z − 3Z ∗ 1) + 3Cov(Z, 1) = E(Z) + 3 ∗ 0 = 0 + 0 = 0.

It follows that X and Y are uncorrelated.

11. Extra credit (5 pts) Let Y be a continuous random variable withdistribution Uniform(a, b), for some a < b. Find the cdf of Y .

Solutions: Since Y ∼ uniform(a, b), cdf of Y is:

FX(x) =

0 if x < ax−ab−a if a ≤ x < b

1 if x ≥ b

12. Extra credit (10 pts) Suppose we want to simulate from a continuousrandom variable X having a strictly increasing c.d.f. FX . However,the computer at our disposal can only simulate from a Uniform(0, 1)distribution.

(a) Show that if U ∼ Uniform(0, 1), its c.d.f. is

FU (u) =

0 if u < 0u if u ∈ [0, 1]1 if u > 1

(b) Consider the random variable Y = F−1X (U), where U ∼ Uniform(0, 1).

Show that Y has the same distribution as X by proving that

P (Y ≤ x) = FX(x), for all x ∈ R.

Hint: use part (a) and the monotonicity of FX .

(c) Using the same computer, how would you go about generating anExp(λ) random variable (you can use the following fact, withoutproving it: if X ∼ Exp(λ), then FX(x) = 1− e−x/λ)?

53

Solutions:

(a) Since, U ∼ Uniform(0, 1),

fU (u) =

1 if 0 ≤ u ≤ 1;0 otherwise.

Therefore, integrating fU (u) we have,

FU (u) =

0 if i < 0;u if 0 ≤ u ≤ 1;1 if u > 1.

(b) P (Y ≤ x) = P (F−1X (U) ≤ x) = P (U ≤ FX(x)) = FX(x) using

the fact that FX is strictly increasing (monotonicity) and U hasUniform(0,1) distribution.

(c) The exponental CDF has the form F (x) = 1 − e−x/λ for x ≥ 0.Thus, to generate values of X, we should generate values u ∈(0, 1) of a uniformly distributed random variable U , and set X tothe value for which 1− e−x/λ = u⇒ x = −λln(1− u).

13. (10 points, 2 pts each) We will study estimation for a normally dis-tributed sample.

(a) Obtain the maximum likelihood estimator for the mean in a Nor-mal model (assume σ2 > 0 is known):

X1, · · · , Xn ∼ N (µ, σ2)

(Hint: write down the likelihood function L(µ, σ2|y) =∏i f(yi|µ, σ2).)

(b) Obtain the maximum likelihood estimator for the noise parameterin the Normal model. (Hint: same hint)

(c) Is the maximum likelihood estimate of µ unbiased?

(d) Is the maximum likelihood estimate of σ2 unbiased?

(e) What is an unbiased estimate of σ2? (Two hints: (1) considermultiplying a constant, (2) n is a constant i.e. there is nothingrandom about the sample size.)

Solution:

54

(a) In finding the estimators, the first thing we’ll do is write theprobability density function as a function of θ1 = µ and θ2 = σ2.

f(xi; θ1, θ2) =1√

θ2

√2π

exp

[−(xi − θ1)2

2θ2

]for −∞ < θ1 <∞ and 0 < θ2 <∞.

Now, that makes the likelihood function:

L(θ1, θ2) =

n∏i=1

f(xi; θ1, θ2) = θ−n/22 (2π)−n/2exp

[− 1

2θ2

n∑i=1

(xi − θ1)2

]

and therefore the log of the likelihood function:

logL(θ1, θ2) = −n2

logθ2 −n

2log(2π)−

∑(xi − θ1)2

2θ2

(b) Taking the partial derivative of the log likelihood with respect toθ1, and setting to 0, ∑

xi − nθ1 = 0

Thus θ1 = µ =

∑xin

= x.

Taking the partial derivative of the log likelihood with respect toθ2, and setting to 0,

−nθ2 +∑

(xi − θ1)2 = 0

Thus θ2 = σ2 =

∑(xi − x)2

n

(c) E(X) = E

(1

n

n∑i=1

Xi

)=

1

n

n∑i=1

E(Xi) =1

n

∑i=1

µ =1

n(nµ) = µ.

Unbiased.

55

(d) Reorginizing the expression, σ2 =

(1

n

n∑i=1

X2i

)− X2. Therefore,

E(σ2) = E

[1

n

n∑i=1

X2i − X2

]=

[1

n

n∑i=1

E(X2i )

]− E(X2) (4)

=1

n

n∑i=1

(σ2 + µ2)−(σ2

n+ µ2

)(5)

=1

n(nσ2 + nµ2)− σ2

n− µ2 (6)

= σ2 − σ2

n=nσ2 − σ2

n=

(n− 1)σ2

n(7)

(8)

Biased.

(e) Notice that we are only nn−1 off here. Thus

σ2ubiased =

∑(xi − x)2

n− 1

14. (10 points, 2.5 pts each) We will now study a Chi-square randomvariable and the sample standard deviation S2.

(a) Prove that the random variable

U = (n− 1)S2/σ2

for S2 =∑ni=1(Xi−X)2

(n−1) , is distributed as a Chi square random

variable (i.e. Z ∼ χ2(n− 1)).

(b) What is expectation E(U)?

(c) What is the expectation of E(S2)?

(d) What is the bias of E(S2)? Is it unbiased?

Solution:

(a) Recall that if Xi is a normally distributed random variable withmean µ and variance σ2, then:

(n− 1)S2

σ2∼ χ2

n−1

56

(b) Rcall that the expected value of a chi-square random variable isits degrees of freedom. That is, if X ∼ χ2(r), then E(X) = r.Therefore,

E((n− 1)S2

σ2) = n− 1

(c) E(S2) = E

[σ2

n− 1· (n− 1)S2

σ2

]=

σ2

n− 1E

[(n− 1)S2

σ2

]=

σ2

n− 1·

(n− 1) = σ2

(d) 0. Unbiased.

Note: There is no extra credit for the following problem, because itis a duplicate with 13. The problem is reproduced here for reference:

15. Extra Credit (10 pts) Let’s say it is reasonable that an n-sized ran-dom sample of body weights in the age range of 17-25:

X1, · · · , Xn

is i.i.d. distributed as a Normal distribution with mean µ and varianceσ2. A public health official would like to know what the ‘spread’(variance) of the body weights of this population – we already learnedfrom class that a maximum likelihood estimator of the two parameters,(µMLE, σ

2MLE) is (

n∑i=1

Xi

n,

n∑i=1

(Xi − X)2

n

)

(a) Is the above estimator for σ2 biased? If so, what is the bias?(7pts)

(b) Also, can you guess what an unbiased estimator is? (3pts)

57


(For this section of the class, I highly recommend trying some problems inthe WMS book, for the relevant sections as written in the lecture notes.)

1. (20 pts, 5 points each) Maximum likelihood estimation

(a) What is the MLE of the variance of Xi in a setting where we haven i.i.d. observations X1, · · · , Xn ∼ Bernoulli(p)?

(b) What is the MLE of the variance of X in a setting where we haveone binomial observation?

(c) What is the MLE of λ in a setting where we have samplesX1, · · · , Xn ∼exp(θ), where the exponential distribution is defined differentlyas

fXi(x) = θe−θx.

Is it easy to see that this estimator is unbiased? Explain.

(d) The MLE has the remarkable property that it converges in distri-bution to a normal random variable (under some mild conditionsthat we will not worry about) in the following way:

√n(θMLE − θ) d→ N(0,

1

I(θ))

for the Fisher’s information number :

I(θ) = V

[d

dθlogf(X; θ)

]= −E

[d2

dθ2logf(X; θ)

]Continuing with the setting of (c) and using this result about theasymptotic normality of the maximum likelihood estimator, whatis a good approximate distribution of θMLE, when n is large?

Solution:

• Book Example 9.17. It is not easy, because E( 1X

is not somethingthat is easily calculable; we do not have a good immediate methodof calculating E(g(Y )) for some function g.

• The answer is X. This is because if you take the derivative of thejoint pmf of the sample, which is just the pmf of a single sample,

logpX(x|p) = xlogp+ (1− x)log(1− p)d

dppx(x|p) = x

1

p− (1− x)

(1− p)

58

Set the last expression equal to zero, to get that pMLE = X.

• Find the MLE that minimizes

logL(θ|X) = fX1··· ,Xn(x1, · · · , xn|θ) = e−θ∑iXi+nlogθ

then calculate

I(θ) = E[− d2

dθ2logL(θ|X)] =

n

θ2

. The answer follows that θd→ N(θ, θ

2

n )

•θn

d→ N(0,1

I(θ)i)

2. (30 pts) Let X1, . . . , X10iid∼ N (µ, σ2). We don’t know the value of

σ2, but we want to decide whether, based on the data X1, . . . , X10,σ2 ≤ σ2

0 = 3 or not. We observe from the data that S2 = s2 = 10.

• Use the sampling distribution of S2 (read lecture 13) to computethe probability of the event S2 ≥ s2 under the assumptionthat the true unknown variance is σ2 = σ2

0 (you will need to usea statistical software and use the c.d.f. of a χ2 with a certainnumber of degrees of freedom to compute this probability)1.

• Based on the probability that you computed, explain why thedata X1, . . . , X10 suggest that it is likely not true that σ2≤σ2

0.

Solution:

• We know that(n− 1)S2

σ2∼ χ2(n− 1).

Here, σ2 is unknown. However, let’s assume that σ2 is at least aslarge as σ2

0 and let’s therefore plug σ2 = σ20 in the above result

1We can safely plug-in σ20 because

sup0<σ2≤σ2

0

Pσ2(S2 > s2) = Pσ20(S2 > s2).

Here the notation Pσ20(A) should be interpreted as the ‘probability of the event A when

we assume that the true unknown variance is σ20 ’.

59

about the sampling distribution of S2. It then follows that,

P (S2 ≥ s2) = P

((n− 1)S2

σ20

≥ (n− 1)s2

σ20

)= P

(χ2(9) ≥ 30

)≈ 0.0004

(with a little abuse of notation).

• We thus discovered that, if the true unknown variance was notlarger than σ2

0 = 3 as we conjectured, then the probability ofobserving a value of the sample variance s2 as extreme as the onewe observed from the data (s2 = 10) is extremely small. Based onthe probability that we computed, it looks therefore very unlikelythat the true unknown variance σ2 is not larger than σ2

0 = 3.

3. (20 pts) The times a cashier spends processing individual customer’sorders are independent random variables with mean 2.5 minutes andstandard deviation 2 minutes. What is the approximate probabilitythat it will take more than 4 hours to process the orders of 100 people?You can leave your answer in terms of Φ.

Solution:Let Z ∼ N (0, 1) and let X1, . . . , Xn, . . . denote the times spent by thecahsier to process the 100 individual customer’s orders. Without anyother assumption on the distribution of the X’s other than indepen-dence, finite mean and finite variance, we can only rely on the CentralLimit Theorem. In this case, n = 100 is likely large enough to give agood approximation to the event of interest, which is

n∑i=1

Xi > 240

60

(notice that there are exactly 240 minutes in 4 hours). Thus, we have

P

(n∑i=1

Xi > 240

)= P

(Xn >

240

n

)

= P

(√n(Xn − µ)

σ>

√n(

240n − µ

)σ

)

≈ P(Z >

√n(

240n − µ

)σ

)= 1− P

(Z ≤

√n(

240n − µ

)σ

)

= 1− Φ

(√n(

240n − µ

)σ

)

= 1− Φ

(10(

240100 − 2.5

)2

)= 1− Φ

(−1

2

)= Φ

(1

2

).

4. (15 pts) Let X1, . . . , Xn be iid random variables with c.d.f. F , finiteexpectation µ and finite variance σ2. Consider the random variableY = aXn + b for a, b ∈ R and a > 0. Assume that n is ‘large’. Givean approximate expression for the α-quantile of Y . Your answer willbe in terms of Φ−1.

Solution:Let Z ∼ N (0, 1). Without further information about F , we can onlyuse the Central Limit Theorem. By definition, the α-quantile of Y isthe value q ∈ R such that P (Y ≤ q) = α. We can use the CentralLimit Theorem to approximate this number when n is large. We have

α = P (Y ≤ q) = P (aXn + b ≤ q) = P

(Xn ≤

q − ba

)

= P

√n(Xn − µ)

σ≤ √n

(q−ba − µ

)σ

≈ P

Z ≤ √n(q−ba − µ

)σ

= Φ

√n(q−ba − µ

)σ

.

Thus,

√n

(q−ba − µ

)σ

≈ Φ−1(α),

61

i.e.

q ≈ b+ a

(µ+

σΦ−1(α)√n

).

5. (15 pts) Let X1, . . . , Xniid∼ Bernoulli(p) be a sequence of binary ran-

dom variables where

Xi =

1, if the Pittsburgh Pirates won the i-th game against the Cincinnati Reds

0, if the Pittsburgh Pirates lost the i-th game against the Cincinnati Reds

and p ∈ (0, 1). Suppose that we are interested in the odds that thePittsburgh Pirates win a game agains the Cincinnati Reds. To do that,we need to consider the function

g(p) =p

1− p

representing the odds of the Pirates winning agains the Reds. It isnatural to estimate p by means of Xn. Provide a distribution that canbe used to approximate probability statements about g(Xn).

Solution:We have E(Xi) = p ∈ (0, 1) and V (Xi) = p(1 − p) ∈ (0, 1) for anyi ∈ 1, . . . , n so both the expectation and the variance of the X’s existand are finite. Consider the function g(x) = x

1−x defined for x ∈ (0, 1).We have

g′(x) =1

(1− x)2.

Notice that g′(x) > 0 for any x in (0, 1). Thus, g′(E(Xi)) 6= 0. TheDelta Method allows us to write

√n[g(Xn)− g(E(Xi))]

|g′(E(Xi))|√V (Xi)

=

√n(

Xn1−Xn

− p1−p

)1

(1−p)2√p(1− p)

d→ N (0, 1).

Hence, we can use the distribution

N(

p

1− p,p

n(1− p)3

)to approximate probability statements about g(Xn) = Xn

1−Xn.

62

6. Extra credit (10 pts) Let U1, U2, . . . , Un, . . . be a sequence of random

variables such that Und→ Uniform(0, 1). Consider the function g(x) =

− log x and the new sequence of random variables X1, X2, . . . , Xn, . . .

with Xi = − logUi. Prove that Xnd→ Exponential(1). Hint: use the

method of the c.d.f. from lecture.

Solution:We first want to compute the c.d.f. of the n-th transformed variableXn in the sequence. We have

FXn(x) = P (Xn ≤ x) = P (− logUn ≤ x) = P(Un ≥ e−x

)Because Un

d→ Uniform(0, 1), it follows that

P(Un ≥ e−x

)→

0 if x < 0

1− e−x if x ≥ 0= F (x).

We recognize F this as the c.d.f. of an Exponential(1) distribution.

Thus, we conclude that Xnd→ Exponential(1).

63


1. (10 pts) Prove the weak law of large numbers under the additionalassumption that the variance σ2 of the iid sequence X1, X2, . . . , Xn, . . .exists and is finite. Hint: use Tchebysheff’s inequality. (Go back toearlier lecture notes and homework for how tchebysheff was used)

Solutions:take any ε > 0. by tchebysheff’s inequality we have that

p(|xn − µ| > ε) ≤ v(xn)

ε2=

σ2

nε2

clearly,

0 ≤ limn→∞

p(|xn − µ| > ε) ≤ limn→∞

σ2

nε2= 0,

hence the result.

2. (10 pts) consider the sequence of random variables x1, x2, . . . , xn, . . .

with xn ∼ \(µ, 3/(1+log n)). show that xnp→ µ. (hint: use tchebysheff

again)Solutions:take any arbitrary ε > 0. tchebysheff’s inequality implies

p(|xn − µ| > ε) ≤ v(xn)

ε2=

3

(1 + log n)ε2.

clearly,

0 ≤ limn→∞

p(|xn − µ| > ε) ≤ limn→∞

3

(1 + log n)ε2= 0,

and thus xnp→ µ.

3. (15 pts, 5 pts each)

(a) What is the mgf of Xi ∼ Bernoulli(p)?

(b) What is the mgf of X =∑Xi?

(c) What is the distribution of X? (We knew this, but now we knowa convenient way of proving it!)

Solutions:

64

(a) The moment generating function of Xi is

mXi(t) = E(etXi) =∑x=0,1

etxpx(1− p)1−x = 1− p+ etp

(b) The mgf of the sum of Xi is the multiple of the mgfs!

mX(t) =n∏i=1

mXi(t) = (1− p+ etp)n

(c) We will actually approach this in the more general case, wherefor Xi which follow a Binomial (ni, p) distribution, the sum X =∑

iXi follows a Binomial (∑

i ni, p) distribution.

(d) By the convolution formula, Z = X1 +X2 has probability distri-bution:

fZ(z) =

z∑x=0

(n1

x

)px(1− p)n1−x

(n

z − x

)pz−x(1− p)n2−(z−x)

= pz(1− p)n1+n2−zz∑

x=0

(n1

x

)(n2

z − x

)=

(n1 + n2

z

)pz(1− p)n1+n2−z

which is the pmf of a Binomial distribution with parametersn1 + n2 and p. The general case for k > 1 variables followsby induction.

4. (10 pts) Sum of Poissons:

(a) Prove the sum of Poisson distributed X,Y each with parametersλ1, λ2 is a Poisson random variable. (What is the parameter?)(Hint: the following formula – sometimes called the discrete con-volution formula for Z = X1 +X2

P (X1 +X2 = z) = fZ(z) =

z∑x=0

fX1(x)fX2(z − x)

may come in handy)

(b) Prove the above with moment generating functions.

65

Solutions:

(a) By the convolution formula, Z = X+Y has probability distribu-tion:

fZ(z) =z∑

x=0

1

x!e−λ1λx1 ·

1

(z − x)!e−λ2

= e−(λ1+λ2)z∑

x=0

λx1x!

λz−x2

(z − x)!(9)

Now, notice that you can use the so-called binomial formula withN = z,a = λ1,b = λ2:

(a+ b)N =N∑x=0

(N

x

)axbN−x

to get that 9 is equal to

e−(λ1+λ2)z∑

x=0

(λ1 + λ2)z

z!

You can recognize a new Poisson pmf with parameter λ1 + λ2 –this completes the proof. (You can also see this generalizes easilyto the sum of many Poissons)

(b) Each of X and Y have the moment generating function

mX(t) = eλ1(et−1), mY (t) = eλ2(et−1)

and the mgf of X + Y is

mX+Y (t) = mX(t)mY (t) = eλ1+λ2(et − 1)

which is easily seen as the mgf of a Poisson random variable withparameter λ1 + λ2. (This is much easier than (a), wouldn’t youagree?)

5. (10 pts) Assume that X1, · · · , Xn have finite moment generating func-tions; i.e. mXi(t) <∞ for all t, for all i = 1, · · · , n. Prove the centrallimit theorem again, using moment generating functions. (Hint: cometo office hours for a hint.)

66

Solutions:

Let Tn = Sn−nµσ√n

, for the sum Sn =∑

iXi. Also denote the mean and

variance of X as E(Xi) = µ and V (Xi)σ2. Then, we want to prove

thatP (Tn ≤ x)

n→∞→ P (Z ≤ x)

Denote as the ‘standardized’ version of Xi as Yi = Xi−µσ – we see that

Yi are still i.i.d. but with mean E(Yi) = 0 and V (Yi) = 1, and

Tn =

∑ni=1 Yi√n

Now, the central limit theorem is to show that the moment generatingfunction of Tn goes to that of a standard normal random variableZ ∼ N (0, 1), whose mgf is exp( t

2

2 ).

mTn(t) = E(et·Tn)

=(E(e

t√nYi))n

=

(1 +

t√nEY +

1

2

t2

nE(Y 2) +

1

6

t3

n3/2E(Y 3) + · · ·

)n=

(1 + 0 +

1

2

t2

n+

1

6

t3

n3/2E(Y 3) + · · ·

)n'(

1 +t2

2

1

n

)2

→ et2

2

6. (10 pts) Using the central limit theorem for Poisson random variables,compute the value of

limn→∞

e−nn∑i=0

ni

i!

Solutions: If you denote X1, · · · , Xn each i.i.d. Poisson random vari-ables with parameter 1. Then, Sn is Poisson(n), and the expectationof Sn is E(Sn) = n. The central limit theorem states that the sum ofPoissons should go to P (Sn ≤ n)→ 1

2 . Lastly, notice that the quantityof interest is exactly P (Sn ≤ n)!

7. (15 pts) Roll a fair die n times, call the outcomes Xi. Then, estimatethe probability that Sn =

∑iXi exceeds its expectation by at least n,

for n = 100 and n = 1000.

67

• (5 pts) Use Tchebysheff’s inequality.

• (5 pts) Obtain the mgf of Sn.

• (5 pts) Use large deviation bounds. (Hint: you will need to usea computer program of your choice, to get a compenent in thelarge deviation bound.)

Solutions:

• The Tchebysheff’s upper bound is 3512n . This can be found by

P (Sn − E(Sn) ≥ k ·√V (Sn)) ≤ 1

k2

for the appropriate k that matches our problem. Indeed, k ·√V (Sn) = k ·

√3512 · n which gives k such that 1

k2= 35

12n .

• The mgf is

mXi(t) =1

6

6∑i=1

eit =et(e6t − 1)

6(et − 1).

• Observe the E(Xi) = 3.5 and E(Sn) = 3.5n. We want to find anupper bound on P (Sn − 3.5n ≥ n) = P (Sn ≥ 4.5n). We knowthat the mgf of Xi is

mXi(t) =1

6

6∑i=1

eit =et(e6t − 1)

6(et − 1).

from part (b). Then, we need to compute the maximum of

I(t) = supt≥0

4.5t− log(mXi(t))

which can be solved using a numerical solver of your choice. The(unique) maximizer can be verified to be t ' 0.37105, so thatI(4.5) ' 0.178. The resulting large deviation bound is:

P (Sn ≥ 4.5n) ≤ e−0.178n

for, say, n = 100, this is 1.08·10−8, while the Tchebysheff’s boundis 0.03 – in short, the large deviation is shown to be a much tighterupper bound on the quantity of interest.

68

8. (10 pts) In each game a gambler wins the dollars he bets with prob-ability p and looses with probability 1 p. If he has less than $3, hewill bet all that he has. Otherwise, since his goal is to have $5, he willonly bet the difference between $5 and what he has. He continues tobet until wither he has either $0 or $5. Let X n denote the amount hehas right after the n-th bet.

(a) Draw the transition graph.

(b) Classify the states. (We haven’t covered this material yet! Fullpoints given)

Solutions:

9. The possible states are 0, 1, 2, 3, 4, 5. The probability transition ma-trix P is

1 0 0 0 0 01− p 0 p 0 0 01− p 0 0 0 p 0

0 1− p 0 0 0 p0 0 0 1− p 0 p0 0 0 0 0 1

and the transition probability matrix is

10. The states 0-5 are recurrent, and all others are transient.

11. (10 pts) The number of failures Xt, which occur in a computer networkover the time interval [0, t), can be described by a homogeneous Poissonprocess Xt, t ≥ 0. On average, there is a failure after every 4 hours,i.e. the intensity of the process is equal to λ = 0.25.

(a) What is the probability of at most 1 failure in [0, 8), at least 2failures in [8, 16), and at most 1 failure in [16, 24) (time unit:hour)?

(b) What is the probability that the third failure occurs after 8 hours?

Solutions:

(a)

P [T9 − T0 ≤ 1&T16 − T8 ≥ 2&T24 − T16 ≤ 1] = P [T9 − T0 ≤ 1&T16 − T8 ≥ 2&T24 − T16 ≤ 1]

= P [T8 ≤ 1]P [T8 ≥ 2]P [T8 ≤ 1] · · · · · ·By independence and stationarity of increments

69

Now, notice

P (T8 ≤ 1) = P (T8 = 0)+P (T8 = 1) = e−.25·8+0.25·8·4−0.25·8 = 0.406

andP (T8 ≥ 2) = 1− P (T8 ≤ 1) = 0.594

So, the answer is0.406 · 0.594 · 0.406.

12. Extra credit (10 pts) Prove the large deviation bound. (Hint: startwith the Markov inequality)

Solutions:Denote Sn = X1 + · · · + Xn. Then, for any t > 0, by Markov’sInequality:

P (Sn ≥ a·n) = P (etSn−tan ≥ 1) ≤ E(dtSn−tan = etanmXi(t)n = exp(−n(at−logmXi(t)))

Since this holds for any t > 0, we will simply minimize this over t > 0– this is accomplished by the sup(·) operation – this gives the desiredexpression.

70

Homework 9 (due June 23rd)

1. (15 pts) Consider a Bernoulli process X and its inter-arrival timesTi∞i=1. Show that Tr =

∑ri=1 Ti with r ∈ 1, 2, . . . is a Negative

Binomial random variable with parameters r and p, where p is theprobability of success of each of the random variables Xi∞i=1.

Solutions: The inter-arrival times of a Bernoulli process are iid Geometric(p)random variables. The m.g.f. of a Geometric distribution of parameterp is

m(t) =pet

1− (1− p)et

for t < − log(1− p). Consider the random variable Tr =∑r

i=1 Ti. Wehave

mTr(t) = mr(t) =

[pet

1− (1− p)et]r

for t < − log(1 − p), which is the m.g.f. of the Negative Binomialdistribution with parameters r and p. Thus, Tr ∼ NBinomial(r, p).

2. (15 pts) A rat is wondering inside a maze, which has rooms labeled asF , 2, 3, 4, 5 and S. If a room has k doors, then the probability thatthe rat chooses any particular door is 1/k. The rat always moves to adifferent room at each time step. If the rate reaches room F , whichcontains food, or room S, which gives it a harmless electrical shock,the rat remains there and the experiment terminates.

(a) Compute by hand the probability that the rat gets to room S in4 steps given that it starts in room 3.

(b) Compute the n-step ahead transition probability matrix for n =2, n = 4, n = 10, n = 50. Hint: you definitely want to use acomputer for this.

(c) Identify the classes of transient and recurrent states. (i.e. classifythe states as such)

(d) Does this chain admit a stationary distribution? Explain.

(e) Now suppose that the rules of the experiment change: the experi-ment terminates only when the rat reaches room S for the secondtime (in particular, the rat can visit room F multiple times). De-rive the transition probability matrix for this process.

Solutions

71

(a) (3 points) Firstly, for the states: 2, 3, 4, 5, F, S we obtain thematrix of transition probabilities as

P =

0 0 1/2 0 1/2 00 0 1/3 1/3 1/3 0

1/3 1/3 0 0 0 1/30 1/2 0 0 0 1/20 0 0 0 1 00 0 0 0 0 1

Now, there are 7 possible paths, indicated below along with theirprobabilities

Path Probability

3− 5− 3− 4− S 1/543− 5− 3− 5− S 1/363− 5− S − S − S 1/63− 4− 2− 4− S 1/543− 4− 3− 4− S 1/813− 4− 3− 5− S 1/543− 4− S − S − S 1/9.

The probabilities are obtained using the transition probabilitymatrix. For example, the path 3− 5− S − S − S has probability

(1/3)(1/2)(1)(1) = 1/6

and the path 3− 4− 3− 4− S has probability

(1/3)(1/3)(1/3)(1/3) = 1/81.

So the sum is 121324

(b) (3 points) Using R: Please check Fig. 3.

(c) (3 points) The states F and S are recurrent. The other states aretransient.

(d) (3 points) This chain does not admit a stationary distributionsince it has two recurrent states.

(e) (3 points) Consider the following state space

(F, 0), (2, 0), (3, 0), (4, 0), (5, 0), (S, 1), (F, 1), (2, 1), (3, 1), (4, 1), (5, 1), (S, 2)

72

where each state consist of a pair, the first element denoting theroom and the second the number of visits to room S.

Then, the sequence of rooms visited by the rat under the newrule forms a first order Markov chain having this enlarged statespace. The transition probability matrix is Fig. 4. Notice thatthe state (S, 2) is the only recurrent state (i.e. it is an absorbingstate).

3. (15 pts) Suppose a student in 36-217 can either be up-to-date (U)with them material covered in class or behind (B). The probability ofa student being up-to-date or behind on a particular week dependson whether he/she has been behind or up-do-date in the previous twoweeks. In particular

• If behind both this week and last week, the student will be behindnext week as well with probability 0.8

• If up-to-date both this week and last week, the student will beup-to-date next week as well with probability 0.9.

• If behind last week and up-to-date this week, the student will bebehind with probability 0.5.

• If up-to-date last week and behind this week, the student will bebehind with probability 0.7.

(a) Is this a first-order Markov chain? Why?

(b) Explain how you can enlarge the state space and obtain a first-order Markov chain.

(c) More generally, if you have a k-th order Markov chain on a statespace of cardinality m, explain how you can always derive a first-order Markov chain on a larger state space and find the cardinal-ity if this enlarged state space.

Solutions:

(a) (2 points) This is not a first-order Markov chain since the prob-ability of being in a given state depends on the states of the twoprevious steps.

(b) (3 points) We can consider the state space U,B2 = (U,U), (U,B), (B,U), (B,B)which is the set of paths that matter for determining the statesof the next step. We consider the new random variable Yt =

73

(Xt, Xt−1). In order to define the probabilities of the new first-order Markov chain we define

P (Yt = (Xt, Xt−1) = (i, j)|Yt−1 = (Xt−1, Xt−2) = (k, l)) =

P (Xt = i|Xt−1 = k,Xt−2 = l) if j = k

0 if j 6= k

Thus, in this case we obtain the probability transition matrix forthe states (U,U), (U,B), (B,U), (B,B)

0.9 0.1 0 00 0 0.3 0.7

0.5 0.5 0 00 0 0.2 0.8

(c) (4 points) In general if we have a state space E1, . . . Em we

can consider the new state space E1, . . . Emk with cardinalitymk which contents the set of paths that matter for determiningthe states of the next step. We consider the new random variableYt = (Xt, Xt−1, . . . , Xt−(k−1)). In order to define the probabilitiesof the new first-order Markov chain we define

P (Yt = (Xt, Xt−1, . . . , Xt−(k−1)) = (i0, i1, . . . , ik−1)|Yt−1 = (Xt−1, Xt−2, . . . , Xt−k)

= (j1, j2, . . . , jk))

=

P (Xt = i0|Xt−1 = j1, Xt−2 = j2, . . . , Xt−k = jk) if i1 = j1, . . . , ik−1 = jk−1

0 otherwise

4. (10 pts) For a Markov chain with state space a, b, c, d, e, f and tran-sition probability matrix

P =

1 0 0 0 0 0 00 0 0 0 0 0 1.3 .1 .1 .1 0 .2 .20 0 0 1 0 0 00 .3 .1 0 .2 0 .4.2 .4 0 .3 0 0 .10 1 0 0 0 0 0

Classify the states (recurrent and transient). Does the chain admita limiting distribution? Solutions: The chains contains 3 recurrentclasses: a, b, g and d. The others are transient. This chain doesnot admit a limiting distribution because there are more than onerecurrent class (i.e. the limiting distribution depends on the initialstate).

74

5. (15 pts) Let X0, X1, · · · be a Markov Chain with state space 1, 2, 3,initial distribution pX0 = (1/5, ?, 2/5) and transition probability ma-trix

P =

1/5 4/5 ?2/5 1/2 ?0 1/10 ?

Fill in the entries for P and pX0 , and answer the following:

(a) Compute P (X1 = 1|X0 = 2).

(b) The row vector pX0 describes the distribution of X0 . What isthe row vector describing the distribution of X1 ?

(c) What is P (X1 = 3)?

(d) What is the row vector describing the distribution of X2 ?

(e) What is P (X2 = 1)?

Solutions:

The probabilities along the rows of the transition matrix must sum to1, as must the initial distribution. Hence we have

pX0 = (1/5, 2/5, 2/5),

and

P =

1/5 4/5 02/5 1/2 1/100 1/10 9/10

(a) (3 points) P (X1 = 1|X0 = 2) = P21 = 2/5 = .4

(b) (3 points) The row vector describing X1 is obtained by matrixmultiplication:

pX1 = pX0 ∗ P = (.2, .4, .4)

.2 .8 0.4 .5 .10 .1 .9

= (.2, .4, .4)

Note that this matrix multiplication implements the law of totalprobability for each value that X1 could take, namely:

P (X1 = k) =3∑i=1

P (X1 = k|X0 = i)P (X0 = i)

75

(c) (3 points) P (X1 = 3) = pX1 [3] = .4 using the distribution calcu-lated in the previous part.

(d) (3 points) To get the row vector describing X2 we need to use the2-step transition matrix P2 which is just the square of P :

P2 = P ∗P =

.2 .8 0.4 .5 .10 .1 .9

.2 .8 0.4 .5 .10 .1 .9

=

.36 .56 .08.28 .58 .14.04 .14 .82

Now we compute the probability distribution for X2 using thelaw of total probability again:

P (X2 = k) =3∑i=1

P (X2 = k|X0 = i)P (X0 = i).

Or in matrix form:

pX2 = pX0 ∗ P2 = (.2, .4, .4)

.36 .56 .08.28 .58 .14.04 .14 .82

= (.2, .4, .4)

(e) (3 points) P (X2 = 1) = pX2 [1] = .2 using the distribution calcu-lated in the previous part.

6. (15 pts) Let Xn be the maximum reading obtained in the first n throwsof a fair die and Y n be the minimum reading obtained in the first nthrows of a fair die.

(a) Explain why X1, X2, X3, · · · forms a Markov chain.

(b) Find the transition probability matrix for the chain and draw thetransition graph.

(c) Is Xn irreducible or reducible? Explain.

(d) Suppose that the fair die is 3- rather than 6-faced. Let Zn denotethe maximum reading obtained in the first n throws. Does Znhave a limiting (equilibrium) distribution? If yes, find the limitingdistribution. If not, explain why not (do not give a mathematicalreason, but rather explain what makes this Markov chain special).

(e) Is Y1, Y2, Y3, · · · a Markov chain? Explain.

76

Solutions:

(a) (3 points) First, X1, X2, X3, ... a sequence of random variablesthat take values in the same set of 1, 2, 3, 4, 5, 6.Second, X1, X2, X3, ... has the Markov Property, that is, for n =1, 2, 3, . . .

PXn+1 = sn+1|Xn = sn, . . . , X1 = s1 =

16 sn < sn+1 ≤ 6sn6 sn+1 = sn0 sn+1 < sn

That is, given the current state sn and past states sn−1, . . . , s1,the distribution of the next state Xn+1 depends only on sn, thevalue of the current state. So X1, X2, X3, ... forms a Markov chainwith 6 states 1, 2, 3, 4, 5, 6.

(b) (3 points) Its transition probability matrix is:

P =

1/6 1/6 1/6 1/6 1/6 1/60 2/6 1/6 1/6 1/6 1/60 0 3/6 1/6 1/6 1/60 0 0 4/6 1/6 1/60 0 0 0 5/6 1/60 0 0 0 0 1

and the graph is

(c) (3 points) Xn is reducible because one state, state 6, is theabsorbing state, i.e., you cannot go to any other state once youget to state 6.

(d) (3 points) Zn is a Markov chain with states 1,2,3 and itstransition probability matrix is

P =

1/3 1/3 1/30 2/3 1/30 0 1

Since one of the states, state 3, is absorbing, Zn is reducible.The limiting distribution is the trivial probability distributionthat assign probability one to state 3.

(e) (3 points) Y1, Y2, Y3, ... is also a Markov chain.

77

First, Y1, Y2, Y3, ... a sequence of random variables that take val-ues in the same set of 1, 2, 3, 4, 5, 6.

Second, Y1, Y2, Y3, ... has the Markov Property, that is, for n =1, 2, 3, . . .

PYn+1 = sn+1|Yn = sn, . . . , Y1 = s1 =

16 1 ≤ sn+1 < sn

1− sn6 sn+1 = sn

0 sn+1 > sn

That is, given the current state sn and past states sn−1, . . . , s1,the distribution of the next state Yn+1 depends only on sn, thevalue of the current state.

So Y1, Y2, Y3, ... forms a Markov chain with 6 states 1, 2, 3, 4, 5, 6.

7. A few more to be added..

Practice problems (Each extra credit of 5 pts; some of these have beensolved/sketched in class!)

1. Consider a Poisson process X = Xtt≥0. Fix a time t0 ≥ 0. Showthat Y satisfies the fresh-start property, i.e. show that the processY = Ytt≥0 with Yt = Xt+t0 −Xt0 is also a Poisson process and it isindependent from Xt0≤t<t0 . Hint: show that Y satisfies the proper-ties that describe a Poisson process according to the first definition ofPoisson process that we discussed in class.

Solution:We need to check that the process Y satisfies the four properties thatdefine a Poisson process. We have

• Y0 = Xt0+0 −Xt0 = X0 = 0

• for any two time intervals (s, t] and (u, v] such that (s, t]∩(u, v] =∅, we have Yt−Ys = Xt0+t−Xt0+s and Yv−Yu = Xt0+v−Xt0+u;since X is a Poisson process and (t0+s, t0+t]∩(t0+u, t0+v] = ∅ if(s, t]∩(u, v] = ∅, it follows that Yt−Ys and Yv−Yu are independent

• for any s, t ≥ 0 we have Yt+s−Ys = Xt+s+t0−Xs+t0 ∼ Poisson(λ(t+s+ t0 − (s+ t0)) ≡ Poisson(λt).

78

Notice that, because X is a Poisson process, for any t ≥ 0, Yt =Xt+t0 − Xt0 is independent of any increment Xv − Xu with 0 ≤ u <v < t0. Thus, Y is independent from Xt0≤t<t0 . We thus concludethat Poisson processes satisfy the fresh-start property.

2. Consider a Poisson process X = Xtt≥0 and the time of the firstarrival T1. Show that, conditional on the event Xt = 1 (i.e. therewas only one arrival until time t > 0), T1 is uniformly distributed on[0, t]. Use the definition of conditional probability and the express theevent T1 ≤ s∩Xt = 1 in terms of Poisson process count incrementsto show that the function s 7→ P (T1 ≤ s|Xt = 1) corresponds to aUniform(0, t) c.d.f..

Solution:We want to compute P (T1 ≤ s|Xt = 1). We have

P (T1 ≤ s|Xt = 1) =P (T1 ≤ s ∩Xt = 1)

P (Xt = 1)

=

0 if s < 0P (Xs−X0=1∩Xt−Xs=0)

P (Xt=1) if 0 ≤ s < t

1 if s ≥ t

since, when 0 ≤ s < t, the intersection of the events ‘the first arrivalwas before time s’ and ‘there was an arrival before time t’ is ‘there wasan arrival in the time interval [0, s] and there were no other arrivals inthe time interval (s, t]’.

For 0 ≤ s < t, the independent independence and the stationarity ofthe increments of a Poisson process imply that

P (Xs −X0 = 1 ∩Xt −Xs = 0)

P (Xt = 1)=P (Xs −X0 = 1)P (Xt −Xs = 0)

P (Xt = 1)

=e−λsλs e−λ(t−s)

e−λtλt=s

t.

It follows that

P (T1 ≤ s|Xt = 1) =

0 if s < 0st if 0 ≤ s < t

1 if s ≥ t

which is the c.d.f. of a Uniform(0, t) distribution.

79

3. Show that if Xi ∼ Gamma(αi, β) for 1, . . . , n and X1, . . . , Xn areindependent, then X =

∑ni=1Xi ∼ Gamma(

∑ni=1 αi, β). Hint: the

m.g.f. of Y ∼ Gamma(α, β) is

mY (t) = (1− βt)−α

which is well-defined and finite for t < 1/β.

Solution:For t < 1/β, we have

mX(t) =

n∏i=1

mXi(t) =

n∏i=1

(1− βt)−αi = (1− βt)−∑ni=1 αi .

We easily see that the m.g.f. of X thus corresponds to the m.g.f. of aGamma(

∑ni=1 αi, β) distribution.

4. We showed in class that if X ∼ N (0, 1), then X2 ∼ χ2(1). We men-

tioned that if X1, . . . , Xniid∼ N (0, 1), then Y =

∑ni=1X

2i ∼ χ2(n). Use

the result that you obtained from the previous exercise to prove thatY ∼ χ2(n).

Solution:Since X2

i ∼ χ2(1), we have

mXi(t) = (1− 2t)−12

for any i ∈ 1, . . . , n. By applying the result from the previous exer-cise we obtain

mY (t) = (1− 2t)−∑ni=1

12 = (1− 2t)−

n2

for t < 1/2, which corresponds to the m.g.f. of a Gamma(n/2, 2) ≡χ2(n) distribution.

5. Let X ∼ Binomial(n1, p) and Y ∼ Binomial(n2, p) with X and Yindependent random variables. What is the distribution of Z = X+Y ?(Hint: you have proven the Bernoulli version of this problem in aprevious homework)

Solution:We have

mZ(t) = mX(t)mY (t) = [pet+1−p]n1 [pet+1−p]n2 = [pet+1−p]n1+n2

which clearly corresponds to the m.g.f. of a Binomial(n1 + n2, p) dis-tribution.

80

A First Look at Long-Term Behavior

Define P(∞)

ij = limn→∞ P(n)

ij , if the limit exists. If this exists for all states i and

j, then put these into the matrix P (∞).

Exercise 4. [Exists?] Construct a transition matrix P such that P (∞) does not

exist.

Exercise 5. [The Rat Maze]

A rat is placed into the above maze.

If the rat is in one of the other four

rooms, it chooses one of the doors atrandom with equal probability. The rat

stays put once it reaches either the food

(F) or the shock (S).

Food (F)

2 4

3 5

Shock (S)

Write out the transition matrix P and the limiting transition matrix P(∞) for the

rat maze example. If the rat starts in room “2,” what is the probability it reaches

the food before the shock?13

Figure 2: The maze.

82

Figure 3: The solution for Problem 2 Part b

83

(F, 0) (2, 0) (3, 0) (4, 0) (5, 0) (S, 1) (F, 1) (2, 1) (3, 1) (4, 1) (5, 1) (S, 2)

(F, 0) 1/2 1/2(2, 0) 1/2 1/2(3, 0) 1/3 1/3 1/3(4, 0) 1/3 1/3 1/3(5, 0) 1/2 1/2(S, 1) 1/2 1/2(F, 1) 1/2 1/2(2, 1) 1/2 1/2(3, 1) 1/3 1/3 1/3(4, 1) 1/3 1/3 1/3(5, 1) 1/2 1/2(S, 2) 1

(10)

Figure 4: Maze Solution

84

36-217 Probability Theory and Random Processes...

Documents

Transcript of 36-217 Probability Theory and Random Processes...