6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR...

16

Click here to load reader

Transcript of 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR...

Page 1: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� �

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008

DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

Contents

1. A few useful discrete random variables

2. Joint, marginal, and conditional PMFs

3. Independence of random variables

4. Expected values

1 A FEW USEFUL RANDOM VARIABLES

Recall that a random variable X : Ω R is called discrete if its range (i.e., the →set of values that it can take) is a countable set. The PMF of X is a function pX : R [0, 1], defined by pX (x) = P(X = x), and completely determines →the probability law of X .

The following are some important PMFs. (a) Discrete uniform with parameters a and b, where a and b are integers with

a < b. Here,

pX (k) = 1/(b − a + 1), k = a, a + 1, . . . , b,

and pX (k) = 0, otherwise.1

(b) Bernoulli with parameter p, where 0 ≤ p ≤ 1. Here, pX (0) = p, pX (1) = 1 − p.

(c) Binomial with parameters n and p, where n ∈ N and p ∈ [0, 1]. Here,

pX (k) = n

p k(1 − p)n−k , k = 0, 1, . . . , n. k

1In the remaining examples, the qualification “pX (k) = 0, otherwise,” will be omitted for brevity.

1

Page 2: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

A binomial random variable with parameters n and p represents the number of heads observed in n independent tosses of a coin if the probability of heads at each toss is p.

(d) Geometric with parameter p, where 0 < p ≤ 1. Here,

pX (k) = (1 − p)k−1 p, k = 1, 2, . . . , .

A geometric random variable with parameter p represents the number of independent tosses of a coin until heads are observed for the first time, if the probability of heads at each toss is p.

(e) Poisson with parameter λ, where λ > 0. Here,

pX (k) = e−λ λk

, k = 0, 1, . . . . k!

As will be seen shortly, a Poisson random variable can be thought of as a limiting case of a binomial random variable. Note that this is a legitimate PMF (i.e., it sums to one), because of the series expansion of the exponential function, eλ = ∞

k=0 λk/k!.

(f) Power law with parameter α, where α > 0. Here,

1 1 pX (k) =

kα − (k + 1)α , k = 1, 2, . . . .

An equivalent but more intuitive way of specifying this PMF is in terms of the formula

1P(X ≥ k) = , k = 1, 2, . . . .

Note that when α is small, the “tail” P(X ≥ k) of the distribution decays slowly (slower than an exponential) as k increases, and in some sense such a distribution has “heavy” tails.

Notation: Let us use the abbreviations dU(a, b), Ber(p), Bin(n, p), Geo(p), Pois(λ), and Pow(α) to refer the above defined PMFs. We will use notation

dsuch as X = dU(a, b) or X ∼ dU(a, b) as a shorthand for the statement that X is a discrete random variable whose PMF is uniform on (a, b), and similarly for

the other PMFs we defined. We will also use the notation X = d

Y to indicate that two random variables have the same PMFs.

2

Page 3: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

1.1 Poisson distribution as a limit of the binomial

To get a feel for the Poisson random variable, think of a binomial random vari­able with very small p and very large n. For example, consider the number of typos in a book with a total of n words, when the probability p that any one word is misspelled is very small (associate a word with a coin toss that results in a head when the word is misspelled), or the number of cars involved in accidents in a city on a given day (associate a car with a coin toss that results in a head when the car has an accident). Such random variables can be well modeled with a Poisson PMF.

More precisely, the Poisson PMF with parameter λ is a good approximation for a binomial PMF with parameters n and p, i.e.,

e−λ λ

k

k

! ≈

k! (nn

− !

k)!p k(1 − p)n−k , k = 0, 1, . . . , n,

provided λ = np, n is large, and p is small. In this case, using the Poisson PMF may result in simpler models and calculations. For example, let n = 100 and p = 0.01. Then the probability of k = 5 successes in n = 100 trials is calculated using the binomial PMF as

100! 0.015(1 − 0.01)95 = 0.00290.

95! 5! ·

Using the Poisson PMF with λ = np = 100 0.01 = 1, this probability is · approximated by

1 e−1 = 0.00306.

5!

Proposition 1. (Binomial convergence to Poisson) Let us fix some λ > 0, d dand suppose that Xn = Bin(n, λ/n), for every n. Let X = Pois(λ). Then,

as n → ∞, the PMF of Xn converges to the PMF of X , in the sense that limn→∞ P(Xn = k) = P(X = k), for any k ≥ 0.

Proof: We have

P(Xn = k) = n(n − 1) · · ·

nk

(n − k + 1) · λk

k

! · � 1 −

n

λ �n−k .

Fix k and let n →∞. We have, for j = 1, . . . , k,

n − k + j n

→ 1, � 1 −

λ n

�−k → 1,

� 1 −

λ n

�n → e−λ .

3

Page 4: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

Thus, for any fixed k, we obtain

lim P(Xn = k) = e−λ λk

= P(X = k), n→∞ k!

as claimed.

2 JOINT, MARGINAL, AND CONDITIONAL PMFS

In most applications, one typically deals with several random variables at once. In this section, we introduce a few concepts that are useful in such a context.

2.1 Marginal PMFs

Consider two discrete random variables X and Y associated with the same ex­periment. The probability law of each one of them is described by the corre­sponding PMF, pX or pY , called a marginal PMF. However, the marginal PMFs do not provide any information on possible relations between these two random variables. For example, suppose that the PMF of X is symmetric around the origin. If we have either Y = X or Y = −X , the PMF of Y remains the same, and fails to capture the specifics of the dependence between X and Y .

2.2 Joint PMFs

The statistical properties of two random variables X and Y are captured by a function pX,Y : R2 [0, 1], defined by →

pX,Y (x, y) = P(X = x, Y = y),

called the joint PMF of X and Y . Here and in the sequel, we will use the abbreviated notation P(X = x, Y = y) instead of the more precise notations P({X = x} ∩ {Y = y}) or P(X = x and Y = x). More generally, the PMF of finitely many discrete random variables, X1, . . . , Xn on the same probability space is defined by

pX1,...,Xn (x1, . . . , xn) = P(X1 = x1, . . . , Xn = xn).

Sometimes, we define a vector random variable X , by letting X = (X1, . . . , Xn), in which case the joint PMF will be denoted simply as pX (x), where now the argument x is an n-dimensional vector.

4

Page 5: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� � �

� �

� �

The joint PMF of X and Y determines the probability of any event that can be specified in terms of the random variables X and Y . For example if A is the set of all pairs (x, y) that have a certain property, then

P (X, Y ) ∈ A = pX,Y (x, y). (x,y)∈A

In fact, we can calculate the marginal PMFs of X and Y by using the formulas

pX (x) = pX,Y (x, y), pY (y) = pX,Y (x, y). y x

The formula for pX (x) can be verified using the calculation

pX (x) = P(X = x) = P(X = x, Y = y) = pX,Y (x, y), y y

where the second equality follows by noting that the event {X = x} is the union of the countably many disjoint events {X = x, Y = y}, as y ranges over all the different values of Y . The formula for pY (y) is verified similarly.

2.3 Conditional PMFs

Let X and Y be two discrete random variables, defined on the same probability space, with joint PMF pX,Y . The conditional PMF of X given Y is a function pX|Y , defined by

pX|Y (x | y) = P(X = x | Y = y), if P(Y = y) > 0;

if P(Y = y) = 0, the value of pX|Y (y | x) is left undefined. Using the definition of conditional probabilities, we obtain

pX,Y (x, y) pX|Y (x | y) =

pY (y) ,

whenever pY (y) > 0. More generally, if we have random variables X1, . . . , Xn ad Y1, . . . , Ym,

defined on the same probability space, we define a conditional PMF by letting

pX1,...,Xn | Y1,...,Ym (x1, . . . , xn | y1, . . . , ym)

= P(X1 = x1, . . . , Xn = xn | Y1 = y1, . . . , Ym = ym) pX1,...,Xn,Y1,...,Ym (x1, . . . , xn, y1, . . . , ym)

= , pY1,...,Ym (y1, . . . , ym)

5

Page 6: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

whenever pY1,...,Ym (y1, . . . , ym) > 0. Again, if we define X = (X1, . . . , Xn) and Y = (Y1, . . . , Ym), the shorthand notation pX|Y (x | y) can be used.

Note that if pY (y) > 0, then x pX|Y (x | y) = 1, where the sum is over all x in the range of the random variable X . Thus, the conditional PMF is es­sentially the same as an ordinary PMF, but with redefined probabilities that take into account the conditioning event Y = y. Visually, if we fix y, the conditional PMF pX|Y (x | y), viewed as a function of x is a “slice” of the joint PMF pX,Y , renormalized so that its entries sum to one.

3 INDEPENDENCE OF RANDOM VARIABLES

We now define the important notion of independence of random variables. We start with a general definition that applies to all types of random variables, in­cluding discrete and continuous ones. We then specialize to the case of discrete random variables.

3.1 Independence of general random variables

Intuitively, two random variables are independent if any partial information on the realized value of one random variable does not change the distribution of the other. This notion is formalized in the following definition.

Definition 1. (Independence of random variables) (a) Let X1, . . . , Xn be random variables defined on the same probability space.

We say that these random variables are independent if

P(X1 ∈ B1, . . . , Xn ∈ Bn) = P(X1 ∈ B1) P(Xn ∈ Bn),· · ·

for any Borel subsets B1, . . . , Bn of the real line.

(b) Let {Xs | s ∈ S} be a collection of random variables indexed by the ele­ments of a (possibly infinite) index set S. We say that these random variables are independent if for every finite subset {s1, . . . , sn} of S, the random vari­ables Xs1 , . . . , Xsn are independent.

Recalling the definition of independent events, we see that independence of the random variables Xs, s ∈ S, is the same as requiring independence of the events {Xs ∈ Bs}, s ∈ S, for every choice of Borel subsets Bs, s ∈ S, of the real line.

6

Page 7: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

Verifying the independence of random variables using the above definition (which involves arbitrary Borel sets) is rather difficult. It turns out that one only needs to examine Borel sets of the form (−∞, x].

Proposition 2. Suppose that for every n, every x1, . . . , xn, and every finite subset {s1, . . . , sn} of S, the events {Xsi ≤ xi}, i = 1, . . . , n, are indepen­dent. Then, the random variables Xs, s ∈ S, are independent.

We omit the proof of the above proposition. However, we remark that ul­timately, the proof relies on the fact that the collection of events of the form (−∞, x] generate the Borel σ-field.

Let us define the joint CDF of of the random variables X1, . . . , Xn by

FX1,...,Xn (x1, . . . , xn) = P(X1 ≤ x1, . . . , Xn ≤ xn).

In view of Proposition 2, independence of X1, . . . , Xn is equivalent to the con­dition

FX1,...,Xn (x1, . . . , xn) = FX1 (x1) FXn (xn), ∀ x1, . . . , xn.· · ·

Exercise 1. Consider a collection {As | s ∈ S} of events, where S is a (possibly infi­nite) index set. Prove that the events As are independent if and only if the corresponding indicator functions IAs , s ∈ S, are independent random variables.

3.2 Independence of discrete random variables

For a finite number of discrete random variables, independence is equivalent to having a joint PMF which factors into a product of marginal PMFs.

Theorem 1. Let X and Y be discrete random variables defined on the same probability space. The following are equivalent.

(a) The random variables X and Y are independent.

(b) For any x, y ∈ R, the events {X = x} and {Y = y} are independent.

(c) For any x, y ∈ R, we have pX,Y (x, y) = pX (x)pY (y).

(d) For any x, y ∈ R such that pY (y) > 0, we have pX|Y (x | y) = pX (x).

Proof: The fact that (a) implies (b) is immediate from the definition of indepen­dence. The equivalence of (b), (c), and (d) is also an immediate consequence of our definitions. We complete the proof by verifying that (c) implies (a).

7

Page 8: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

Suppose that X and Y are independent, and let A, B, be two Borel subsets of the real line. We then have

P(X ∈ A, Y ∈ B) = P(X = x, Y = y) x∈A, y∈B

= pX,Y (x, y) x∈A, y∈B

= pX (x)pY (x) x∈A, y∈B � � �� �

= pX (x) pY (y)) x∈A y∈B

= P(X ∈ A) P(Y ∈ B).

Since this is true for any Borel sets A and B, we conclude that X and Y are independent.

We note that Theorem 1 generalizes to the case of multiple, but finitely many, random variables. The generalization of conditions (a)-(c) should be obvious. As for condition (d), it can be generalized to a few different forms, one of which is the following: given any subset S0 of the random variables under considera­tion, the conditional joint PMF of the random variables Xs, s ∈ S0, given the values of the remaining random variables, is the same as the unconditional joint PMF of the random variables Xs, s ∈ S0, as long as we are conditioning on an event with positive probability.

We finally note that functions g(X) and h(Y ) of two independent random variables X and Y must themselves be independent. This should be expected on intuitive grounds: If X is independent from Y , then the information provided by the value of g(X) should not affect the distribution of Y , and consequently should not affect the distribution of h(Y ). Observe that when X and Y are discrete, then g(X) and h(Y ) are random variables (the required measurability conditions are satisfied) even if the functions g and h are not measurable (why?).

Theorem 2. Let X and Y be independent discrete random variables. Let g and h be some functions from R into itself. Then, the random variables g(X) and h(Y ) are independent.

The proof is left as an exercise.

8

Page 9: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� �

3.3 Examples

Example. Let X1, . . . , Xn be independent Bernoulli random variables with the same parameter p. Then, the random variable X defined by X = X1 + + Xn is binomial · · · with parameters n and p. To see this, consider n independent tosses of a coin in which every toss has probability p of resulting in a one, and let Xi be the result of the ith coin toss. Then, X is the number of ones observed in n independent tosses, and is therefore a binomial random variable.

Example. Let X and Y be independent binomial random variables with parameters (n, p) and (m, p), respectively. Then, the random variable Z, defined by Z = X + Y is binomial with parameters (n + m, p). To see this, consider n + m independent tosses of a coin in which every toss has probability p of resulting in a one. Let X be the number of ones in the first n tosses, and let Y be the number of ones in the last m tosses. Then, Z is the number of ones in n+m independent tosses, which is binomial with parameters (n + m, p).

Example. Consider n independent tosses of a coin in which every toss has probability p of resulting in a one. Let X be the number of ones obtained, and let Y = n − X , which is the number of zeros. The random variables X and Y are not independent. For example, P(X = 0) = (1 − p)n and P(Y = 0) = pn, but P(X = 0, Y = 0) = 0 =�P(X = 0)P(Y = 0). Intuitively, knowing that there was a small number of heads gives us information that the number of tails must be large.

However, in sharp contrast to the intuition from the preceding example, we obtain independence when the number of coin tosses is itself random, with a Poisson distribution. More precisely, let N be a Poisson random variable with parameter λ. We assume that the conditional PMF pX|N (· | n) is binomial with parameters n and p (representing the number of ones observed in n coin tosses), and define Y = N − X , which represents the number of zeros obtained. We have the following surprising result. An intuitive justification will have to wait until we consider the Poisson process, later in this course. The proof is left as an exercise.

Theorem 3. (Splitting of a Poisson random variable) The random variables X and Y are independent. Moreover, X =

d Pois(λp) and Y = d

Pois λ(1 − p) .

9

Page 10: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� �

��

4 EXPECTED VALUES

4.1 Preliminaries: infinite sums

Consider a sequence {an} of nonnegative real numbers and the infinite sum ∞

in =1 ai, of the partial sums. The infinite i=1 ai, defined as the limit, limn→∞

sum can be finite or infinite; in either case, it is well defined, as long as we allow the limit to be an extended real number. Furthermore, it can be verified that the value of the infinite sum is the same even if we reorder the elements of the sequence {an} and carry out the summation according to this different order. Because the order of the summation does not matter, we can use the notation n∈N an for the infinite sum. More generally, if C is a countable set and g : C [0, ∞) is a nonnegative function, we can use the notation � →

x∈C g(x), which is unambiguous even without specifying a particular order in which the values g(x) are to be summed.

When we consider a sequence of nonpositive real numbers, the discussion remains the same, and infinite sums can be unambiguously defined. However, when we consider sequences that involve both positive and negative numbers, the situation is more complicated. In particular, the order at which the elements of the sequence are added can make a difference.

Example. Let an = (−1)n/n. It can be verified that the limit limn→∞ n exists, i=1 an

and is finite, but that the elements of the sequence {an} can be reordered to form a new sequence {bn} for which the limit of n

bn does not exist. i=1

In order to deal with the general case, we proceed as follows. Let S be a countable set, and consider a collection of real numbers as, s ∈ S. Let S+

(respectively, S ) be the set of indices s for which as ≥ 0 (respectively, as < 0).� − � Let S+ = s∈S+

as and S− = s∈S− |as|. We distinguish four cases.

(a) If both S+�and S− are finite (or equivalently, if s∈S |as| < ∞, we say that the sum s∈S as is absolutely convergent, and is equal to S+ − S−. In this case, for every possible arrangement of the elements of S in a sequence {sn}, we have

n

lim asi = S+ − S−. n→∞

i=1

(b) If S+ = ∞ and S− < ∞, the sum s∈S as is not absolutely convergent; we define it to be equal to ∞. In this case, for every possible arrangement of the elements of S in a sequence {sn}, we have

n

lim asi = ∞. n→∞

i=1

10

Page 11: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� � � �

(c) If S+ < ∞ and S− = ∞, the sum s∈S as is not absolutely convergent; we define it to be equal to −∞. In this case, for every possible arrangement of the elements of S in a sequence {sn}, we have

n

lim asi = −∞. n→∞

i=1

(d) If S+ = ∞ and S− = ∞, the sum s∈S an is left undefined. In fact, in this case, different arrangements of the elements of S in a sequence {sn} will result into different or even nonexistent values of the limit

n

lim asi . n→∞ i=1

To summarize, we consider a countable sum to be well defined in cases (a)­(c), and call it absolutely convergent only in case (a).

We close by recording a related useful fact. If we have a doubly indexed family of nonnegative numbers aij , i, j ∈ N, and if either (i) the numbers are nonnegative, or (ii) the sum (i,j) aij is absolutely convergent, then

∞ ∞ ∞ ∞ � aij = aij = aij . (1)

i=1 j=1 j=1 i=1 (i,j)∈N2

More important, we stress that the first equality need not hold in the absence of conditions (i) or (ii) above.

4.2 Definition of the expectation

The PMF of a random variable X provides us with several numbers, the prob­abilities of all the possible values of X . It is often desirable to summarize this information in a single representative number. This is accomplished by the ex­pectation of X , which is a weighted (in proportion to probabilities) average of the possible values of X .

As motivation, suppose you spin a wheel of fortune many times. At each spin, one of the numbers m1,m2, . . . ,mn comes up with corresponding proba­bility p1, p2, . . . , pn, and this is your monetary reward from that spin. What is the amount of money that you “expect” to get “per spin”? The terms “expect” and “per spin” are a little ambiguous, but here is a reasonable interpretation.

11

Page 12: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� �

Suppose that you spin the wheel k times, and that ki is the number of times that the outcome is mi. Then, the total amount received is m1k1 + m2k2 + +· · · mnkn. The amount received per spin is

m1k1 + m2k2 + + mnknM =

· · · .

k

If the number of spins k is very large, and if we are willing to interpret proba­bilities as relative frequencies, it is reasonable to anticipate that mi comes up a fraction of times that is roughly equal to pi:

ki ≈ pi, i = 1, . . . , n. k

Thus, the amount of money per spin that you “expect” to receive is

M = m1k1 + m2k2

k + · · · + mnkn ≈ m1p1 + m2p2 + · · · + mnpn.

Motivated by this example, we introduce the following definition.

Definition 2. (Expectation) We define the expected value (also called the expectation or the mean) of a discrete random variable X , with PMF pX , as

E[X] = xpX (x), x

whenever the sum is well defined, and where the sum is taken over the count­able set of values in the range of X .

4.3 Properties of the expectation

We start by pointing out an alternative formula for the expectation, and leave its proof as an exercise. In particular, if X can only take nonnegative integer values, then

E[X] = P(X > n). (2) n≥0

Example. Using this formula, it is easy to give an example of a random variable for dwhich the expected value is infinite. Consider X = Pow(α), where α ≤ 1. Then, it can

be verified, using the fact ∞ n=1 1/n = ∞, that E[X] = n≥0 n

1 = ∞. On the other α

hand, if α > 1, then E[X] < ∞.

12

Page 13: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

�� � �

Here is another useful fact, whose proof is again left as an exercise.

Proposition 3. Given a discrete random variable X and a function g : R →R, we have

E[g(X)] = g(x)pX (x). (3) {x | pX (x)>0}

More generally, this formula remains valid given a vector X = (X1, . . . , Xn) of random variables with joint PMF pX = pX1,...,Xn , and a function g : Rn Rn .→

For example, suppose that X is a discrete random variable and consider the function g : R R defined by g(x) = x2 . Let Y = g(X). In order →to calculate the expectation E[Y ] according to Definition 1, we need to first find the PMF of Y , and then use the formula E[Y ] = y ypY (y). However, according to Proposition 3, we can work directly with the PMF of X , and write E[Y ] = E[X2] = x2pX (x). x

The quantity E[X2] is called the second moment of X . More generally, if r ∈ N, the quantity E[Xr] is called the rth moment of X . Furthermore, E X − E[X] r is called the rth central moment of X . The second central moment, E[(X −E[X])2] is called the variance of X , and is denoted by var(X). The square root of the variance is called the standard deviation of X , and is often denoted by σX , or just σ. Note, that for every even r, the rth moment and the rth central moment are always nonnegative; in particular, the standard deviation is always well defined.

We continue with a few more important properties of expectations. In the sequel, notations such as X ≥ 0 or X = c mean that X(ω) ≥ 0 or X(ω) = c, respectively, for all ω ∈ Ω. Similarly, a statement such as “X ≥ 0, almost surely” or “X ≥ 0, a.s.,” means that P(X ≥ 0) = 1.

13

Page 14: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� �

� � � �

� �

Proposition 4. Let X and Y be discrete random variables defined on the same probability space.

(a) If X ≥ 0, a.s., then E[X] ≥ 0.

(b) If X = c, a.s., for some constant c ∈ R, then E[X] = c.

(c) For any a, b ∈ R, we have E[aX + bY ] = aE[X] + bE[Y ] (as long as the sum aE[X] + bE[Y ] is well-defined).

(d) If E[X] is finite, then var(X) = E[X2] − (E[X])2 .

(e) For every a ∈ R, we have var(aX) = a2var(X).

(f) If X and Y are independent, then E[XY ] = E[X]E[Y ], and var(X + Y ) = var(X)+var(Y ), provided all of the expectations involved are well defined.

(g) More generally, if X1, . . . , Xn are independent, then

� n � n

E Xi = E[Xi], i=1 i=1

and

n n

var Xi = var(Xi), i=1 i=1

as long as all expectations involved are well defined.

Remark: We emphasize that property (c) does not require independence.

Proof: We only give the proof for the case where all expectations involved are well defined and finite, and leave it to the reader to verify that the results extend to the case where all expectations involved are well defined but possibly infinite.

Parts (a) and (b) are immediate consequences of the definitions. For part (c), we use the second part of Proposition 3, and then Eq. (1), we obtain

E[aX + bY ] = (ax + by)pX,Y (x, y)x,y� � � � � � � �

= ax pX,Y (x, y) + by pX,Y (x, y) x y y x

= a xpX (x) + b ypY (y) x y

= aE[X] + bE[Y ].

14

Page 15: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

� � � �

� �

For part (d), we have

var(X) = E X2 − 2XE[X] + (E[X])2

= E[X2] − 2E XE[X] + (E[X])2

= E[X2] − 2(E[X])2 + (E[X])2

= E[X2] − (E[X])2 .

where the second equality made use of property (c). Part (e) follows easily from (d) and (c). For part (f), we apply Proposition 3

and then use independence to obtain

E[XY ] = xypX,Y (x, y) x,y

= xypX (x)pY (y) �x,y � �� � � = xpX (x) pY (y)

x y

= E[X] E[Y ].

Furthermore, using property (d), we have

var(X + Y ) = E[(X + Y )2] − (E[X + Y ])2

= E[X2] + E[Y 2] + 2E[XY ] − (E[X])2 − (E[Y ])2 − 2E[X]E[Y ].

Using the equality E[XY ] = E[X]E[Y ], the above expression becomes var(X)+ var(Y ). The proof of part (g) is similar and is omitted.

Remark: The equalities in part (f) need not hold in the absence of independence. For example, consider a random variable X that takes either value 1 or −1, with probability 1/2. Then, E[X] = 0, but E[X2] = 1. If we let Y = X , we see that E[XY ] = E[X2] = 1 = 0 = � E[X] . Furthermore, var(X + Y ) = var(2X) = 4var(X), while var(X) + var(Y ) = 2.

Exercise 2. Show that var(X) = 0 if and only if there exists a constant c such that P(X = c) = 1.

15

Page 16: 6.436J Lecture 06: Discrete random variables and their ... · DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS ... Poisson with parameter λ, where λ > 0. Here, p X (k) ... Discrete

MIT OpenCourseWarehttp://ocw.mit.edu

6.436J / 15.085J Fundamentals of ProbabilityFall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.