• date post

19-Apr-2018
• Category

## Documents

• view

234

5

Embed Size (px)

### Transcript of EE 178 Lecture Notes 0 Course Introduction · EE 178 Lecture Notes 0 Course Introduction ... A...

• EE 178 Lecture Notes 0

Course Introduction

Course Goals

Course Topics

Lecture Notes

EE 178: Course Introduction Page 0 1

EE 178

EE 178 provides an introduction to probabilistic system analysis:

Basic probability theory

Some statistics

Examples from EE systems

Fulfills the EE undergraduate probability and statistics requirement (alternativesinclude Stat 116 and Math 151)

Very important background for EEs

Signal processing

Communication

Communication networks

Reliability of systems

Noise in circuits

EE 178: Course Introduction Page 0 2

• Also important in many other fields, including

Computer science (Artificial intelligence)

Economics

Finance (stock market)

Physics (statistical physics, quantum physics)

Medicine (drug trials)

Knowledge of probability is almost as necessary as calculus and linear algebra

EE 178: Course Introduction Page 0 3

Probability Theory

Probability provides mathematical models for random phenomena andexperiments, such as: gambling, stock market, packet transmission in networks,electron emission, noise, statistical mechanics, etc.

Its origin lies in observations associated with games of chance:

If an unbiased coin is tossed n times and heads come up nH times, therelative frequency of heads, nH/n, is likely to be very close to 1/2

If a card is drawn from a perfectly shuffled deck and then replaced, the deckis reshuffled, and the process is repeated n times, the relative frequency ofspades is around 1/4

The purpose of probability theory is to describe and predict such relativefrequencies (averages) in terms of probabilities of events

But how is probability defined?

EE 178: Course Introduction Page 0 4

• Classical definition of probability: This was the prevailing definition for manycenturies.Define the probability of an event A as:

P(A) =NAN

,

where N is the number of possible outcomes of the random experiment and NAis the number of outcomes favorable to the event AFor example for a 6-sided die there are 6 outcomes and 3 of them are even, thusP(even) = 3/6Problems with this classical definition:

Here the assumption is that all outcomes are equally likely (probable). Thus,the concept of probability is used to define probability itself! Cannot be usedas basis for a mathematical theory

In many random experiments, the outcomes are not equally likely

The definition doesnt work when the number of possible outcomes is infinite

EE 178: Course Introduction Page 0 5

Relative frequency definition of probability: Here the probability of an event A isdefined as:

P(A) = limn

nAn,

where nA is the number of times A occurs in n performances of the experiment

Problems with the frequency definition:

How do we assert that such limit exists?

We often cannot perform the experiment multiple times (or even once, e.g.,defining the probability of a nuclear plant failure)

EE 178: Course Introduction Page 0 6

• Axiomatic Definition of Probability

The axiomatic definition of probability was introduced by Kolmogorov in 1933. Itprovides rules for assigning probabilities to events in a mathematically consistentway and for deducing probabilities of events from probabilities of other events

Elements of axiomatic definition:

Set of all possible outcomes of the random experiment (Sample space)

Set of events, which are subsets of

A probability law (measure or function) that assigns probabilities to eventssuch that:1. P(A) 02. P() = 1, and3. If A and B are disjoint events, i.e., A B = , then

P(A B) = P(A) + P(B)

These rules are consistent with relative frequency interpretation

EE 178: Course Introduction Page 0 7

Unlike the relative frequency definition, where the limit is assumed to exist, theaxiomatic approach itself is used to prove when and under what conditions suchlimit exists (Laws of Large Numbers) and how fast it is approached (CentralLimit Theorem)

The theory provides mathematically precise ways for dealing with experimentswith infinite number of possible outcomes, defining random variables, etc.

The theory does not deal with what the values of the probabilities are or howthey are obtained. Any assignment of probabilities that satisfies the axioms islegitimate

How do we find the actual probabilities? They are obtained from:

Physics, e.g., Boltzmann, Bose-Einstein, Fermi-Dirac statistics

Engineering models, e.g., queuing models for communication networks,models for noisy communication channels

Social science theories, e.g., intelligence, behavior

Empirically. Collect data and fit a model (may have no physical, engineering,or social meaning ...)

EE 178: Course Introduction Page 0 8

• Statistics and Statistical Signal Processing

Statistics applies probability theory to real world problems. It deals with:

Methods for data collection and construction of experiments described byprobabilistic models, and

Methods for making inferences from data, e.g., estimating probabilities,deciding if a drug is effective from drug trials

Statistical signal processing:

Develops probabilistic models for EE signals, e.g., speech, audio, images,video, and systems, e.g., communication, storage, compression, errorcorrection, pattern recognition

Uses tools from probability and statistics to design signal processingalgorithms to extract, predict, estimate, detect, compress, . . . such signals

The field has provided the foundation for the Information Age. Its impact onour lives has been huge: cell phones, DSL, digital cameras, DVD, MP3players, digital TV, medical imaging, etc.

EE 178: Course Introduction Page 0 9

Course Goals

Provide an introduction to probability theory (precise but not overlymathematical)

Provide, through examples, some applications of probability to EE systems

Provide some exposure to statistics and statistical signal processing

Provide background for more advanced courses in statistical signal processingand communication (e.g., EE 278, 279)

Have a lot of fun solving cool probability problems

WARNING: To really understand probability you have to work homeworkproblems yourself. Just reading solutions or having others give you solutionswill not do

EE 178: Course Introduction Page 0 10

• Course Outline

1. Sample Space and probability (B&T: Chapter 1)

Set theory Sample space: discrete, continuous Events Probability law Conditional probability; chain rule Law of total probability Bayes rule Independence Counting

EE 178: Course Introduction Page 0 11

2. Random Variables (B&T: 2.12.4, 3.13.3, 3.6 pp. 179186)

Basic concepts Probability mass function (pmf) Famous discrete random variables Probability density function (pdf) Famous continuous random variables Mean and variance Cumulative distribution function (cdf) Functions of random variables

3. Multiple Random variables (B&T: 2.52.7, 3.43.5, 3.6 pp. 186190, 4.2)

Joint, marginal, and conditional pmf: Bayes rule for pmfs, independence Joint, marginal, and conditional pdf: Bayes rule for pdfs, independence Functions of multiple random variables

EE 178: Course Introduction Page 0 12

• 4. Expectation: (B&T: 2.4, 3.1, pp. 144148, 4.3, 4.5, 4.6)

Definition and properties Correlation and covariance Sum of random variables Conditional expectation Iterated expectation Mean square error estimation

5. Transforms (B&T: 4.1)

6. Limit Theorems (B&T: 7.1, 7.2, 4.1, 7.4)

Sample mean Markov and Chebychev inequalities Weak law of large numbers The Central Limit Theorem Confidence Intervals

EE 178: Course Introduction Page 0 13

7. Random Processes: (B&T: 5.1, 6.16.4)

Basic concepts Bernoulli process Markov chains

EE 178: Course Introduction Page 0 14

• Lecture Notes

Based on B&T text, Prof. Gray EE 178 notes, and other sources

Help organize the material and reduce note taking in lectures

You will need to take some notes, e.g. clarifications, missing steps inderivations, solutions to additional examples

Slide title indicates a topic that often continues over several consecutive slides

Lecture notes + your notes + review sessions should be sufficient (B&T is onlyrecommended)

EE 178: Course Introduction Page 0 15

• EE 178 Lecture Notes 1

Basic Probability

Set Theory

Elements of Probability

Conditional probability

Sequential Calculation of Probability

Total Probability and Bayes Rule

Independence

Counting

EE 178: Basic Probability Page 1 1

Set Theory Basics

A set is a collection of objects, which are its elements

A means that is an element of the set A

A set with no elements is called the empty set, denoted by

Types of sets:

Finite: A = {1, 2, . . . , n}

Countably infinite: A = {1, 2, . . .}, e.g., the set of integers

Uncountable: A set that takes a continuous set of values, e.g., the [0, 1]interval, the real line, etc.

A set can be described by all having a certain property, e.g., A = [0, 1] can bewritten as A = { : 0 1}

A set B A means that every element of B is an element of A

A universal set contains all objects of particular interest in a particularcontext, e.g., sample space for random experiment

EE 178: Basic Probability Page 1 2

• Set Operations

Assume a universal set

Three basic operations:

Complementation: A complement of a set A with respect to isAc = { : / A}, so c =

Intersection: A B = { : A and B}

Union: A B = { : A or B}

Notation:

ni=1Ai = A1 A2 . . . An

ni=1Ai = A1 A2 . . . An

A collection of sets A1, A2, . . . , An are disjoint or mutually exclusive ifAi Aj = for all i 6= j , i.e., no two of them have a common element

A collection of sets A1, A2, . . . , An partition if they are disjoint andni=1Ai =

EE 178: Basic Probability Page 1 3

Venn Diagrams

(e) A B (f) A B

(b) A

(d) Bc

(a)

(c) B

EE 178: Basic Probability Page 1 4

• Algebra of Sets

Basic relations:

1. S = S

2. (Ac)c = A

3. A Ac =

4. Commutative law: A B = B A

5. Associative law: A (B C) = (A B) C

6. Distributive law: A (B C) = (A B) (A C)

7. DeMorgans law: (A B)c = Ac Bc

DeMorgans law can be generalized to n events:

(ni=1Ai)c= ni=1A

ci

These can all be proven using the definition of set operations or visualized usingVenn Diagrams

EE 178: Basic Probability Page 1 5

Elements of Probability

Probability theory provides the mathematical rules for assigning probabilities tooutcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage

Basic elements of probability:

Sample space: The set of all possible elementary or finest grainoutcomes of the random experiment (also called sample points) The sample points are all disjoint The sample points are collectively exhaustive, i.e., together they make upthe entire sample space

Events: Subsets of the sample space

Probability law : An assignment of probabilities to events in a mathematicallyconsistent way

EE 178: Basic Probability Page 1 6

• Discrete Sample Spaces

Sample space is called discrete if it contains a countable number of samplepoints

Examples:

Flip a coin once: = {H, T}

Flip a coin three times: = {HHH,HHT,HTH, . . .} = {H,T}3

Flip a coin n times: = {H,T}n (set of sequences of H and T of length n)

Roll a die once: = {1, 2, 3, 4, 5, 6}

Roll a die twice: = {(1, 1), (1, 2), (2, 1), . . . , (6, 6)} = {1, 2, 3, 4, 5, 6}2

Flip a coin until the first heads appears: = {H,TH, TTH, TTTH, . . .}

Number of packets arriving in time interval (0, T ] at a node in acommunication network : = {0, 1, 2, 3, . . . }

Note that the first five examples have finite , whereas the last two havecountably infinite . Both types are called discrete

EE 178: Basic Probability Page 1 7

Sequential models: For sequential experiments, the sample space can bedescribed in terms of a tree, where each outcome corresponds to a terminalnode (or a leaf)

Example: Three flips of a coin

|

H

|

H

|

@@@@@@

T|

AAAAAAAAAAA

T

|

H

|

@@@@@@

T|

HHHHHHHT

HHHHHHHT

HHHHHHHT

HHHHHHHT

|HHH

|HHT

|HTH

|HTT

|THH

|THT

|TTH

|TTT

EE 178: Basic Probability Page 1 8

• Example: Roll a fair four-sided die twice.

Sample space can be represented by a tree as above, or graphically

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

2nd roll

1st roll

1

2

3

4

1 2 3 4

EE 178: Basic Probability Page 1 9

Continuous Sample Spaces

A continuous sample space consists of a continuum of points and thus containsan uncountable number of points

Examples:

Random number between 0 and 1: = [0, 1]

Suppose we pick two numbers at random between 0 and 1, then the samplespace consists of all points in the unit square, i.e., = [0, 1]2

6

-

1.0

1.0x

y

EE 178: Basic Probability Page 1 10

• Packet arrival time: t (0,), thus = (0,)

Arrival times for n packets: ti (0,), for i = 1, 2, . . . , n, thus = (0,)n

A sample space is said to be mixed if it is neither discrete nor continuous, e.g., = [0, 1] {3}

EE 178: Basic Probability Page 1 11

Events

Events are subsets of the sample space. An event occurs if the outcome of theexperiment belongs to the event

Examples:

Any outcome (sample point) is an event (also called an elementary event),e.g., {HTH} in three coin flips experiment or {0.35} in the picking of arandom number between 0 and 1 experiment

Flip coin 3 times and get exactly one H. This is a more complicated event,consisting of three sample points {TTH, THT, HTT}

Flip coin 3 times and get an odd number of Hs. The event is{TTH, THT, HTT, HHH}

Pick a random number between 0 and 1 and get a number between 0.0 and0.5. The event is [0, 0.5]

An event might have no points in it, i.e., be the empty set

EE 178: Basic Probability Page 1 12

• Axioms of Probability

Probability law (measure or function) is an assignment of probabilities to events(subsets of sample space ) such that the following three axioms are satisfied:

1. P(A) 0, for all A (nonnegativity)

2. P() = 1 (normalization)

3. If A and B are disjoint (A B = ), then

P(A B) = P(A) + P(B) (additivity)

More generally,

3. If the sample space has an infinite number of points and A1, A2, . . . aredisjoint events, then

P (i=1Ai) =

i=1

P(Ai)

EE 178: Basic Probability Page 1 13

Mimics relative frequency, i.e., perform the experiment n times (e.g., roll a die ntimes). If the number of occurances of A is nA, define the relative frequency ofan event A as fA = nA/n

Probabilities are nonnegative (like relative frequencies)

Probability something happens is 1 (again like relative frequencies)

Probabilities of disjoint events add (again like relative frequencies)

Analogy: Except for normalization, probability is a measure much like

mass

length

area

volume

They all satisfy axioms 1 and 3

This analogy provides some intuition but is not sufficient to fully understandprobability theory other aspects such as conditioning, independence, etc.., areunique to probability

EE 178: Basic Probability Page 1 14

• Probability for Discrete Sample Spaces

Recall that sample space is said to be discrete if it is countable

The probability measure P can be simply defined by first assigning probabilitiesto outcomes, i.e., elementary events {}, such that:

P({}) 0, for all , and

P({}) = 1

The probability of any other event A (by the additivity axiom) is simply

P(A) =

A

P({})

EE 178: Basic Probability Page 1 15

Examples:

For the coin flipping experiment, assign

P({H}) = p and P({T}) = 1 p, for 0 p 1

Note: p is the bias of the coin, and a coin is fair if p = 12

For the die rolling experiment, assign

P({i}) =1

6, for i = 1, 2, . . . , 6

The probability of the event the outcome is even, A = {2, 4, 6}, is

P(A) = P({2}) + P({4}) + P({6}) =1

2

EE 178: Basic Probability Page 1 16

• If is countably infinite, we can again assign probabilities to elementaryeventsExample: Assume = {1, 2, . . .}, assign probability 2k to event {k}The probability of the event the outcome is even

P( outcome is even ) = P({2, 4, 6, 8, . . .})

= P({2}) + P({4}) + P({6}) + . . .

=

k=1

P({2k})

=

k=1

22k =1

3

EE 178: Basic Probability Page 1 17

Probability for Continuous Sample Space

Recall that if a sample space is continuous, is uncountably infinite

For continuous , we cannot in general define the probability measure P by firstassigning probabilities to outcomes

To see why, consider assigning a uniform probability measure to = (0, 1]

In this case the probability of each single outcome event is zero

How do we find the probability of an event such as A =[

12,

34

]

?

For this example we can define uniform probability measure over [0, 1] byassigning to an event A, the probability

P(A) = length of A,

e.g., P([0, 1/3] [2/3, 1]) = 2/3

Check that this is a legitimate assignment

EE 178: Basic Probability Page 1 18

• Another example: Romeo and Juliet have a date. Each arrives late with arandom delay of up to 1 hour. Each will wait only 1/4 of an hour before leaving.What is the probability that Romeo and Juliet will meet?

Solution: The pair of delays is equivalent to that achievable by picking tworandom numbers between 0 and 1. Define probability of an event as its area

The event of interest is represented by the cross hatched region

6

-

1.0

1.0x

y

Probability of the event is:

area of crosshatched region = 1 21

2(0.75)2 = 0.4375

EE 178: Basic Probability Page 1 19

Basic Properties of Probability

There are several useful properties that can be derived from the axioms ofprobability:

1. P(Ac) = 1 P(A) P() = 0 P(A) 1

2. If A B , then P(A) P(B)

3. P(A B) = P(A) + P(B) P(A B)

4. P(A B) P(A) + P(B), or in general

P(ni=1Ai) n

i=1

P(Ai)

This is called the Union of Events Bound

These properties can be proved using the axioms of probability and visualizedusing Venn diagrams

EE 178: Basic Probability Page 1 20

• Conditional Probability

Conditional probability allows us to compute probabilities of events based onpartial knowledge of the outcome of a random experiment

Examples:

We are told that the sum of the outcomes from rolling a die twice is 9. Whatis the probability the outcome of the first die was a 6?

A spot shows up on a radar screen. What is the probability that there is anaircraft?

You receive a 0 at the output of a digital communication system. What is theprobability that a 0 was sent?

As we shall see, conditional probability provides us with two methods forcomputing probabilities of events: the sequential method and thedivide-and-conquer method

It is also the basis of inference in statistics: make an observation and reasonabout the cause

EE 178: Basic Probability Page 1 21

In general, given an event B has occurred, we wish to find the probability ofanother event A, P(A|B)

If all elementary outcomes are equally likely, then

P(A|B) =# of outcomes in both A and B

# of outcomes in B

In general, if B is an event such that P(B) 6= 0, the conditional probability ofany event A given B is defined as

P(A|B) =P(A B)

P(B), or

P(A,B)

P(B)

The function P(|B) for fixed B specifies a probability law, i.e., it satisfies theaxioms of probability

EE 178: Basic Probability Page 1 22

• Example

Roll a fair four-sided die twice. So, the sample space is {1, 2, 3, 4}2. All samplepoints have probability 1/16

Let B be the event that the minimum of the two die rolls is 2 and Am, form = 1, 2, 3, 4, be the event that the maximum of the two die rolls is m. FindP(Am|B)

Solution:

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f f

2nd roll

1st roll

1

2

3

4

1 2 3 4

-

6

m

P(Am|B)

1 2 3 4

EE 178: Basic Probability Page 1 23

Conditional Probability Models

Before: Probability law conditional probabilities

Reverse is often more natural: Conditional probabilities probability law

We use the chain rule (also called multiplication rule):

By the definition of conditional probability, P(A B) = P(A|B)P(B). Supposethat A1, A2, . . . , An are events, then

EE 178: Basic Probability Page 1 24

• P(A1 A2 A3 An)

= P(A1 A2 A3 An1) P(An|A1 A2 A3 An1)

= P(A1 A2 A3 An2) P(An1|A1 A2 A3 An2)

P(An|A1 A2 A3 An1)

...

= P(A1) P(A2|A1) P(A3|A1 A2) P(An|A1 A2 A3 An1)

=n

i=1

P(Ai|A1, A2, . . . , Ai1),

where A0 =

EE 178: Basic Probability Page 1 25

Sequential Calculation of Probabilities

Procedure:

1. Construct a tree description of the sample space for a sequential experiment

2. Assign the conditional probabilities on the corresponding branches of the tree

3. By the chain rule, the probability of an outcome can be obtained bymultiplying the conditional probabilities along the path from the root to theleaf node corresponding to the outcome

Example (Radar Detection): Let A be the event that an aircraft is flying aboveand B be the event that the radar detects it. Assume P(A) = 0.05,P(B|A) = 0.99, and P(B|Ac) = 0.1

What is the probability of

Missed detection?, i.e., P(A Bc)

False alarm?, i.e., P(B Ac)

The sample space is: = {(A,B), (Ac, B), (A,Bc), (Ac, Bc)

EE 178: Basic Probability Page 1 26

• Solution: Represent the sample space by a tree with conditional probabilities onits edges

}

P(A) = 0.05

}

P(B|A) = 0.99

}

@@@@@@

P(Bc|A) = 0.01

(A,B)

}

AAAAAAAAAAAA

P(Ac) = 0.95

}

P(B|Ac) = 0.10

}

(A,Bc) (Missed detection)

@@@@@@

P(Bc|Ac) = 0.90

}

(Ac, B) (False alarm)

(Ac, Bc)

EE 178: Basic Probability Page 1 27

Example: Three cards are drawn at random (without replacement) from a deckof cards. What is the probability of not drawing a heart?

Solution: Let Ai, i = 1, 2, 3, represent the event of no heart in the ith draw.We can represent the sample space as:

= {(A1, A2, A3), (Ac1, A2, A3), . . . , (A

c1, A

c2, A

c3)}

To find the probability law, we represent the sample space by a tree, writeconditional probabilities on branches, and use the chain rule

EE 178: Basic Probability Page 1 28

• ~

P(A1) = 39/52

not heart~

P(A2|A1) = 38/51

(not heart, not heart)~

@@@@@@@

13/51

(not heart, heart)~

AAAAAAAAAAAAAA

P(Ac1) = 13/52

heart~

P(A3|A1, A2) = 37/50

(not heart, not heart, not heart)

HHHHHHH13/50 (not heart, not heart, heart)

~

~

EE 178: Basic Probability Page 1 29

Total probability Divide and Conquer Method

Let A1, A2, . . . , An be events that partition , i.e., that are disjoint(Ai Aj = for i 6= j) and

ni=1Ai = . Then for any event B

P(B) =n

i=1

P(Ai B) =n

i=1

P(Ai)P(B|Ai)

This is called the Law of Total Probability. It also holds for n = . It allows usto compute the probability of a complicated event from knowledge ofprobabilities of simpler events

Example: Chess tournament, 3 types of opponents for a certain player.

P(Type 1) = 0.5, P(Win |Type1) = 0.3

P(Type 2) = 0.25, P(Win |Type2) = 0.4

P(Type 3) = 0.25, P(Win |Type3) = 0.5

What is probability of player winning?

EE 178: Basic Probability Page 1 30

• Solution: Let B be the event of winning and Ai be the event of playing Type i,i = 1, 2, 3:

P(B) =3

i=1

P(Ai)P(B|Ai)

= 0.5 0.3 + 0.25 0.4 + 0.25 0.5

= 0.375

EE 178: Basic Probability Page 1 31

Bayes Rule

Let A1, A2, . . . , An be nonzero probability events (the causes) that partition ,and let B be a nonzero probability event (the effect)

We often know the a priori probabilities P(Ai), i = 1, 2, . . . , n and theconditional probabilities P(B|Ai)s and wish to find the a posteriori probabilitiesP(Aj|B) for j = 1, 2, . . . , n

From the definition of conditional probability, we know that

P(Aj|B) =P(B,Aj)

P(B)=

P(B|Aj)

P(B)P(Aj)

By the law of total probability

P(B) =n

i=1

P(Ai)P(B|Ai)

EE 178: Basic Probability Page 1 32

• Substituting we obtain Bayes rule

P(Aj|B) =P(B|Aj)

ni=1P(Ai)P(B|Ai)

P(Aj) for j = 1, 2, . . . , n

Bayes rule also applies when the number of events n =

Radar Example: Recall that A is event that the aircraft is flying above and B isthe event that the aircraft is detected by the radar. What is the probability thatan aircraft is actually there given that the radar indicates a detection?

Recall P(A) = 0.05, P(B|A) = 0.99, P(B|Ac) = 0.1. Using Bayes rule:

P(there is an aircraft|radar detects it) = P(A|B)

=P(B|A)P(A)

P(B|A)P(A) + P(B|Ac)P(Ac)

=0.05 0.99

0.05 0.99 + 0.95 0.1

= 0.3426

EE 178: Basic Probability Page 1 33

Binary Communication Channel

Consider a noisy binary communication channel, where 0 or 1 is sent and 0 or 1is received. Assume that 0 is sent with probability 0.2 (and 1 is sent withprobability 0.8)

The channel is noisy. If a 0 is sent, a 0 is received with probability 0.9, and if a1 is sent, a 1 is received with probability 0.975

We can represent this channel model by a probability transition diagram

00

1 1

P({0}|{0}) = 0.9

P({1}|{1}) = 0.975

P({0}) = 0.2

P({1}) = 0.8

0.1

0.025

Given that 0 is received, find the probability that 0 was sent

EE 178: Basic Probability Page 1 34

• This is a random experiment with sample space = {(0, 0), (0, 1), (1, 0), (1, 1)},where the first entry is the bit sent and the second is the bit received

Define the two events

A = {0 is sent} = {(0, 1), (0, 0)}, and

B = {0 is received} = {(0, 0), (1, 0)}

The probability measure is defined via the P(A), P(B|A), and P(Bc|Ac)provided on the probability transition diagram of the channel

To find P(A|B), the a posteriori probability that a 0 was sent. We use Bayesrule

P(A|B) =P(B|A)P(A)

P(A)P(B|A) + P(Ac)P(B|Ac),

to obtain

P(A|B) =0.9

0.2 0.2 = 0.9,

which is much higher than the a priori probability of A (= 0.2)

EE 178: Basic Probability Page 1 35

Independence

It often happens that the knowledge that a certain event B has occurred has noeffect on the probability that another event A has occurred, i.e.,

P(A|B) = P(A)

In this case we say that the two events are statistically independent

Equivalently, two events are said to be statistically independent if

P(A,B) = P(A)P(B)

So, in this case, P(A|B) = P(A) and P(B|A) = P(B)

Example: Assuming that the binary channel of the previous example is used tosend two bits independently, what is the probability that both bits are in error?

EE 178: Basic Probability Page 1 36

• Solution:

Define the two events

E1 = {First bit is in error}

E2 = {Second bit is in error}

Since the bits are sent independently, the probability that both are in error is

P(E1, E2) = P(E1)P(E2)

Also by symmetry, P(E1) = P(E2)To find P(E1), we express E1 in terms of the events A and B as

E1 = (A1 Bc1) (A

c1 B1),

EE 178: Basic Probability Page 1 37

Thus,

P(E1) = P(A1, Bc1) + P(A

c1, B1)

= P(A1)P(Bc1|A1) + P(A

c1)P(B1|A

c1)

= 0.2 0.1 + 0.8 0.025 = 0.04

The probability that the two bits are in error

P(E1, E2) = (0.04)2 = 16 104

In general A1, A2, . . . , An are mutually independent if for each subset of theevents Ai1, Ai2, . . . , Aik

P(Ai1, Ai2, . . . , Aik) =k

j=1

P(Aij)

Note: P(A1, A2, . . . , An) =n

j=1P(Ai) alone is not sufficient for independence

EE 178: Basic Probability Page 1 38

• Example: Roll two fair dice independently. Define the events

A = {First die = 1, 2, or 3}

B = {First die = 2, 3, or 6}

C = {Sum of outcomes = 9}

Are A, B , and C independent?

EE 178: Basic Probability Page 1 39

Solution:

Since the dice are fair and the experiments are performed independently, theprobability of any pair of outcomes is 1/36, and

P(A) =1

2, P(B) =

1

2, P(C) =

1

9

Since A B C = {(3, 6)}, P(A,B,C) = 136 = P(A)P(B)P(C)

But are A, B , and C independent? Lets find

P(A,B) =2

6=

1

36=

1

4= P(A)P(B),

and thus A, B , and C are not independent!

EE 178: Basic Probability Page 1 40

• Also, independence of subsets does not necessarily imply independence

Example: Flip a fair coin twice independently. Define the events:

A: First toss is Head

B : Second toss is Head

C : First and second toss have different outcomes

A and B are independent, A and C are independent, and B and C areindependent

Are A,B,C mutually independent?

Clearly not, since if you know A and B , you know that C could not haveoccured, i.e., P(A,B,C) = 0 6= P(A)P(B)P(C) = 1/8

EE 178: Basic Probability Page 1 41

Counting

Discrete uniform law:

Finite sample space where all sample points are equally probable:

P(A) =number of sample points in A

total number of sample points

Variation: all outcomes in A are equally likely, each with probability p. Then,

P(A) = p (number of elements of A)

In both cases, we compute probabilities by counting

EE 178: Basic Probability Page 1 42

• Basic Counting Principle

Procedure (multiplication rule):

r steps

ni choices at step i

Number of choices is n1 n2 nr

Example:

Number of license plates with 3 letters followed by 4 digits:

With no repetition (replacement), the number is:

Example: Consider a set of objects {s1, s2, . . . , sn}. How many subsets of theseobjects are there (including the set itself and the empty set)?

Each object may or may not be in the subset (2 options)

The number of subsets is:

EE 178: Basic Probability Page 1 43

Counting can be very tricky

Classic example: Throw three 6-sided dice. Is the probability that the sum of theoutcomes is 11 equal to the the probability that the sum of the outcomes is 12?

De Meres argued that they are, since the number of different ways to obtain 11and 12 are the same:

Sum=11: {6, 4, 1}, {6, 3, 2}, {5, 5, 1}, {5, 4, 2}, {5, 3, 3}, {4, 4, 3}

Sum=12: {6, 5, 1}, {6, 4, 2}, {6, 3, 3}, {5, 5, 2}, {5, 4, 3}, {4, 4, 4}

This turned out to be false. Why?

EE 178: Basic Probability Page 1 44

• Basic Types of Counting

Assume we have n distinct objects a1, a2, . . . , an, e.g., digits, letters, etc.

Some basic counting questions:

How many different ordered sequences of k objects can be formed out of then objects with replacement?

How many different ordered sequences of k n objects can be formed out ofthe n objects without replacement? (called k-permutations)

How many different unordered sequences (subsets) of k n objects can beformed out of the n objects without replacement? (called k-combinations)

Given r nonnegative integers n1, n2, . . . , nr that sum to n (the number ofobjects), how many ways can the n objects be partitioned into r subsets(unordered sequences) with the ith subset having exactly ni objects? (callerpartitions, and is a generalization of combinations)

EE 178: Basic Probability Page 1 45

Ordered Sequences With and Without Replacement

The number of ordered k-sequences from n objects with replacement isn n n . . . n k times, i.e., nk

Example: If n = 2, e.g., binary digits, the number of ordered k-sequences is 2k

The number of different ordered sequences of k objects that can be formed fromn objects without replacement, i.e., the k-permutations, is

n (n 1) (n 2) (n k + 1)

If k = n, the number is

n (n 1) (n 2) 2 1 = n! (n-factorial)

Thus the number of k-permutations is: n!/(n k)!

Example: Consider the alphabet set {A,B,C,D}, so n = 4

The number of k = 2-permutations of n = 4 objects is 12

They are: AB,AC,AD,BA,BC,BD,CA,CB,CD,DA,DB, and DC

EE 178: Basic Probability Page 1 46

• Unordered Sequences Without Replacement

Denote by(

nk

)

(n choose k) the number of unordered k-sequences that can beformed out of n objects without replacement, i.e., the k-combinations

Two different ways of constructing the k-permutations:

1. Choose k objects ((

nk

)

), then order them (k! possible orders). This gives(

nk

)

k!

2. Choose the k objects one at a time:

n (n 1) (n k + 1) =n!

(n k)!choices

Hence(

nk

)

k! = n!(nk)! , or(

nk

)

= n!k!(nk)! Example: The number of k = 2-combinations of {A,B,C,D} is 6

What is the number of binary sequences of length n with exactly k n ones?

Question: What isn

k=0

(

nk

)

?

EE 178: Basic Probability Page 1 47

Finding Probablity Using Counting

Example: Die Rolling

Roll a fair 6-sided die 6 times independently. Find the probability that outcomesfrom all six rolls are different

Solution:

# outcomes yielding this event =

# of points in sample space =

Probability =

EE 178: Basic Probability Page 1 48

• Example: The Birthday Problem

k people are selected at random. What is the probability that all k birthdayswill be different (neglect leap years)

Solution:

Total # ways of assigning birthdays to k people:

# of ways of assigning birthdays to k people with no two having the samebirthday:

Probability:

EE 178: Basic Probability Page 1 49

Binomial Probabilities

Consider n independent coin tosses, where P(H) = p for 0 < p < 1

Outcome is a sequence of H s and T s of length n

P(sequence) = p#heads(1 p)#tails

The probability of k heads in n tosses is thus

sequences with k heads

P(sequence)

= #(khead sequences) pk(1 p)nk

=

(

n

k

)

pk(1 p)nk

Check that it sums to 1

EE 178: Basic Probability Page 1 50

• Example: Toss a coin with bias p independently 10 times.

Define the events B = {3 out of 10 tosses are heads} andA = {first two tosses are heads}. Find P(A|B)

Solution: The conditional probability is

P(A|B) =P(A,B)

P(B)

All points in B have the same probability p3(1 p)7, so we can find theconditional probability by counting:

# points in B beginning with two heads =

# points in B =

Probability =

EE 178: Basic Probability Page 1 51

Partitions

Let n1, n2, n3, . . . , nr be such that

r

i=1

ni = n

How many ways can the n objects be partitioned into r subsets (unorderedsequences) with the ith subset having exactly ni objects?

If r = 2 with n1, n n1 the answer is the n1-combinations( (nn1)

)

EE 178: Basic Probability Page 1 52

• (

n

n1 n2 . . . nr

)

=

(

n

n1

)

(

n n1n2

)

(

n (n1 + n2)

n3

)

. . .

(

nr1

i=1 ninr

)

=n!

n1!(n n1)!

(n n1)!

n2!(n (n1 + n2))!

=n!

n1!n2! nr!

EE 178: Basic Probability Page 1 53

Example: Balls and bins

We have n balls and r bins. We throw each ball independently and at randominto a bin. What is the probability that bin i = 1, 2, . . . , r will have exactly niballs, where

ri=1 ni = n?

Solution:

The probability of each outcome (sequence of ni balls in bin i) is:

# of ways of partitioning the n balls into r bins such that bin i has exactlyni balls is:

Probability:

EE 178: Basic Probability Page 1 54

• Example: Cards

Consider a perfectly shuffled 52-card deck dealt to 4 players. FindP(each player gets an ace)

Solution:

Size of the sample space is:

# of ways of distributing the four aces:

# of ways of dealing the remaining 48 cards:

Probability =

EE 178: Basic Probability Page 1 55

• EE 178 Lecture Notes 1

Basic Probability

Set Theory

Elements of Probability

Conditional probability

Sequential Calculation of Probability

Total Probability and Bayes Rule

Independence

Counting

EE 178: Basic Probability Page 1 1

Set Theory Basics

A set is a collection of objects, which are its elements

A means that is an element of the set A

A set with no elements is called the empty set, denoted by

Types of sets:

Finite: A = {1, 2, . . . , n}

Countably infinite: A = {1, 2, . . .}, e.g., the set of integers

Uncountable: A set that takes a continuous set of values, e.g., the [0, 1]interval, the real line, etc.

A set can be described by all having a certain property, e.g., A = [0, 1] can bewritten as A = { : 0 1}

A set B A means that every element of B is an element of A

A universal set contains all objects of particular interest in a particularcontext, e.g., sample space for random experiment

EE 178: Basic Probability Page 1 2

• Set Operations

Assume a universal set

Three basic operations:

Complementation: A complement of a set A with respect to isAc = { : / A}, so c =

Intersection: A B = { : A and B}

Union: A B = { : A or B}

Notation:

ni=1Ai = A1 A2 . . . An

ni=1Ai = A1 A2 . . . An

A collection of sets A1, A2, . . . , An are disjoint or mutually exclusive ifAi Aj = for all i 6= j , i.e., no two of them have a common element

A collection of sets A1, A2, . . . , An partition if they are disjoint andni=1Ai =

EE 178: Basic Probability Page 1 3

Venn Diagrams

(e) A B (f) A B

(b) A

(d) Bc

(a)

(c) B

EE 178: Basic Probability Page 1 4

• Algebra of Sets

Basic relations:

1. S = S

2. (Ac)c = A

3. A Ac =

4. Commutative law: A B = B A

5. Associative law: A (B C) = (A B) C

6. Distributive law: A (B C) = (A B) (A C)

7. DeMorgans law: (A B)c = Ac Bc

DeMorgans law can be generalized to n events:

(ni=1Ai)c= ni=1A

ci

These can all be proven using the definition of set operations or visualized usingVenn Diagrams

EE 178: Basic Probability Page 1 5

Elements of Probability

Probability theory provides the mathematical rules for assigning probabilities tooutcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage

Basic elements of probability:

Sample space: The set of all possible elementary or finest grainoutcomes of the random experiment (also called sample points) The sample points are all disjoint The sample points are collectively exhaustive, i.e., together they make upthe entire sample space

Events: Subsets of the sample space

Probability law : An assignment of probabilities to events in a mathematicallyconsistent way

EE 178: Basic Probability Page 1 6

• Discrete Sample Spaces

Sample space is called discrete if it contains a countable number of samplepoints

Examples:

Flip a coin once: = {H, T}

Flip a coin three times: = {HHH,HHT,HTH, . . .} = {H,T}3

Flip a coin n times: = {H,T}n (set of sequences of H and T of length n)

Roll a die once: = {1, 2, 3, 4, 5, 6}

Roll a die twice: = {(1, 1), (1, 2), (2, 1), . . . , (6, 6)} = {1, 2, 3, 4, 5, 6}2

Flip a coin until the first heads appears: = {H,TH, TTH, TTTH, . . .}

Number of packets arriving in time interval (0, T ] at a node in acommunication network : = {0, 1, 2, 3, . . . }

Note that the first five examples have finite , whereas the last two havecountably infinite . Both types are called discrete

EE 178: Basic Probability Page 1 7

Sequential models: For sequential experiments, the sample space can bedescribed in terms of a tree, where each outcome corresponds to a terminalnode (or a leaf)

Example: Three flips of a coin

|

H

|

H

|

@@@@@@

T|

AAAAAAAAAAA

T

|

H

|

@@@@@@

T|

HHHHHHHT

HHHHHHHT

HHHHHHHT

HHHHHHHT

|HHH

|HHT

|HTH

|HTT

|THH

|THT

|TTH

|TTT

EE 178: Basic Probability Page 1 8

• Example: Roll a fair four-sided die twice.

Sample space can be represented by a tree as above, or graphically

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

2nd roll

1st roll

1

2

3

4

1 2 3 4

EE 178: Basic Probability Page 1 9

Continuous Sample Spaces

A continuous sample space consists of a continuum of points and thus containsan uncountable number of points

Examples:

Random number between 0 and 1: = [0, 1]

Suppose we pick two numbers at random between 0 and 1, then the samplespace consists of all points in the unit square, i.e., = [0, 1]2

6

-

1.0

1.0x

y

EE 178: Basic Probability Page 1 10

• Packet arrival time: t (0,), thus = (0,)

Arrival times for n packets: ti (0,), for i = 1, 2, . . . , n, thus = (0,)n

A sample space is said to be mixed if it is neither discrete nor continuous, e.g., = [0, 1] {3}

EE 178: Basic Probability Page 1 11

Events

Events are subsets of the sample space. An event occurs if the outcome of theexperiment belongs to the event

Examples:

Any outcome (sample point) is an event (also called an elementary event),e.g., {HTH} in three coin flips experiment or {0.35} in the picking of arandom number between 0 and 1 experiment

Flip coin 3 times and get exactly one H. This is a more complicated event,consisting of three sample points {TTH, THT, HTT}

Flip coin 3 times and get an odd number of Hs. The event is{TTH, THT, HTT, HHH}

Pick a random number between 0 and 1 and get a number between 0.0 and0.5. The event is [0, 0.5]

An event might have no points in it, i.e., be the empty set

EE 178: Basic Probability Page 1 12

• Axioms of Probability

Probability law (measure or function) is an assignment of probabilities to events(subsets of sample space ) such that the following three axioms are satisfied:

1. P(A) 0, for all A (nonnegativity)

2. P() = 1 (normalization)

3. If A and B are disjoint (A B = ), then

P(A B) = P(A) + P(B) (additivity)

More generally,

3. If the sample space has an infinite number of points and A1, A2, . . . aredisjoint events, then

P (i=1Ai) =

i=1

P(Ai)

EE 178: Basic Probability Page 1 13

Mimics relative frequency, i.e., perform the experiment n times (e.g., roll a die ntimes). If the number of occurances of A is nA, define the relative frequency ofan event A as fA = nA/n

Probabilities are nonnegative (like relative frequencies)

Probability something happens is 1 (again like relative frequencies)

Probabilities of disjoint events add (again like relative frequencies)

Analogy: Except for normalization, probability is a measure much like

mass

length

area

volume

They all satisfy axioms 1 and 3

This analogy provides some intuition but is not sufficient to fully understandprobability theory other aspects such as conditioning, independence, etc.., areunique to probability

EE 178: Basic Probability Page 1 14

• Probability for Discrete Sample Spaces

Recall that sample space is said to be discrete if it is countable

The probability measure P can be simply defined by first assigning probabilitiesto outcomes, i.e., elementary events {}, such that:

P({}) 0, for all , and

P({}) = 1

The probability of any other event A (by the additivity axiom) is simply

P(A) =

A

P({})

EE 178: Basic Probability Page 1 15

Examples:

For the coin flipping experiment, assign

P({H}) = p and P({T}) = 1 p, for 0 p 1

Note: p is the bias of the coin, and a coin is fair if p = 12

For the die rolling experiment, assign

P({i}) =1

6, for i = 1, 2, . . . , 6

The probability of the event the outcome is even, A = {2, 4, 6}, is

P(A) = P({2}) + P({4}) + P({6}) =1

2

EE 178: Basic Probability Page 1 16

• If is countably infinite, we can again assign probabilities to elementaryeventsExample: Assume = {1, 2, . . .}, assign probability 2k to event {k}The probability of the event the outcome is even

P( outcome is even ) = P({2, 4, 6, 8, . . .})

= P({2}) + P({4}) + P({6}) + . . .

=

k=1

P({2k})

=

k=1

22k =1

3

EE 178: Basic Probability Page 1 17

Probability for Continuous Sample Space

Recall that if a sample space is continuous, is uncountably infinite

For continuous , we cannot in general define the probability measure P by firstassigning probabilities to outcomes

To see why, consider assigning a uniform probability measure to = (0, 1]

In this case the probability of each single outcome event is zero

How do we find the probability of an event such as A =[

12,

34

]

?

For this example we can define uniform probability measure over [0, 1] byassigning to an event A, the probability

P(A) = length of A,

e.g., P([0, 1/3] [2/3, 1]) = 2/3

Check that this is a legitimate assignment

EE 178: Basic Probability Page 1 18

• Another example: Romeo and Juliet have a date. Each arrives late with arandom delay of up to 1 hour. Each will wait only 1/4 of an hour before leaving.What is the probability that Romeo and Juliet will meet?

Solution: The pair of delays is equivalent to that achievable by picking tworandom numbers between 0 and 1. Define probability of an event as its area

The event of interest is represented by the cross hatched region

6

-

1.0

1.0x

y

Probability of the event is:

area of crosshatched region = 1 21

2(0.75)2 = 0.4375

EE 178: Basic Probability Page 1 19

Basic Properties of Probability

There are several useful properties that can be derived from the axioms ofprobability:

1. P(Ac) = 1 P(A) P() = 0 P(A) 1

2. If A B , then P(A) P(B)

3. P(A B) = P(A) + P(B) P(A B)

4. P(A B) P(A) + P(B), or in general

P(ni=1Ai) n

i=1

P(Ai)

This is called the Union of Events Bound

These properties can be proved using the axioms of probability and visualizedusing Venn diagrams

EE 178: Basic Probability Page 1 20

• Conditional Probability

Conditional probability allows us to compute probabilities of events based onpartial knowledge of the outcome of a random experiment

Examples:

We are told that the sum of the outcomes from rolling a die twice is 9. Whatis the probability the outcome of the first die was a 6?

A spot shows up on a radar screen. What is the probability that there is anaircraft?

You receive a 0 at the output of a digital communication system. What is theprobability that a 0 was sent?

As we shall see, conditional probability provides us with two methods forcomputing probabilities of events: the sequential method and thedivide-and-conquer method

It is also the basis of inference in statistics: make an observation and reasonabout the cause

EE 178: Basic Probability Page 1 21

In general, given an event B has occurred, we wish to find the probability ofanother event A, P(A|B)

If all elementary outcomes are equally likely, then

P(A|B) =# of outcomes in both A and B

# of outcomes in B

In general, if B is an event such that P(B) 6= 0, the conditional probability ofany event A given B is defined as

P(A|B) =P(A B)

P(B), or

P(A,B)

P(B)

The function P(|B) for fixed B specifies a probability law, i.e., it satisfies theaxioms of probability

EE 178: Basic Probability Page 1 22

• Example

Roll a fair four-sided die twice. So, the sample space is {1, 2, 3, 4}2. All samplepoints have probability 1/16

Let B be the event that the minimum of the two die rolls is 2 and Am, form = 1, 2, 3, 4, be the event that the maximum of the two die rolls is m. FindP(Am|B)

Solution:

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f f

2nd roll

1st roll

1

2

3

4

1 2 3 4

-

6

m

P(Am|B)

1 2 3 4

EE 178: Basic Probability Page 1 23

Conditional Probability Models

Before: Probability law conditional probabilities

Reverse is often more natural: Conditional probabilities probability law

We use the chain rule (also called multiplication rule):

By the definition of conditional probability, P(A B) = P(A|B)P(B). Supposethat A1, A2, . . . , An are events, then

EE 178: Basic Probability Page 1 24

• P(A1 A2 A3 An)

= P(A1 A2 A3 An1) P(An|A1 A2 A3 An1)

= P(A1 A2 A3 An2) P(An1|A1 A2 A3 An2)

P(An|A1 A2 A3 An1)

...

= P(A1) P(A2|A1) P(A3|A1 A2) P(An|A1 A2 A3 An1)

=n

i=1

P(Ai|A1, A2, . . . , Ai1),

where A0 =

EE 178: Basic Probability Page 1 25

Sequential Calculation of Probabilities

Procedure:

1. Construct a tree description of the sample space for a sequential experiment

2. Assign the conditional probabilities on the corresponding branches of the tree

3. By the chain rule, the probability of an outcome can be obtained bymultiplying the conditional probabilities along the path from the root to theleaf node corresponding to the outcome

Example (Radar Detection): Let A be the event that an aircraft is flying aboveand B be the event that the radar detects it. Assume P(A) = 0.05,P(B|A) = 0.99, and P(B|Ac) = 0.1

What is the probability of

Missed detection?, i.e., P(A Bc)

False alarm?, i.e., P(B Ac)

The sample space is: = {(A,B), (Ac, B), (A,Bc), (Ac, Bc)

EE 178: Basic Probability Page 1 26

• Solution: Represent the sample space by a tree with conditional probabilities onits edges

}

P(A) = 0.05

}

P(B|A) = 0.99

}

@@@@@@

P(Bc|A) = 0.01

(A,B)

}

AAAAAAAAAAAA

P(Ac) = 0.95

}

P(B|Ac) = 0.10

}

(A,Bc) (Missed detection)

@@@@@@

P(Bc|Ac) = 0.90

}

(Ac, B) (False alarm)

(Ac, Bc)

EE 178: Basic Probability Page 1 27

Example: Three cards are drawn at random (without replacement) from a deckof cards. What is the probability of not drawing a heart?

Solution: Let Ai, i = 1, 2, 3, represent the event of no heart in the ith draw.We can represent the sample space as:

= {(A1, A2, A3), (Ac1, A2, A3), . . . , (A

c1, A

c2, A

c3)}

To find the probability law, we represent the sample space by a tree, writeconditional probabilities on branches, and use the chain rule

EE 178: Basic Probability Page 1 28

• ~

P(A1) = 39/52

not heart~

P(A2|A1) = 38/51

(not heart, not heart)~

@@@@@@@

13/51

(not heart, heart)~

AAAAAAAAAAAAAA

P(Ac1) = 13/52

heart~

P(A3|A1, A2) = 37/50

(not heart, not heart, not heart)

HHHHHHH13/50 (not heart, not heart, heart)

~

~

EE 178: Basic Probability Page 1 29

Total probability Divide and Conquer Method

Let A1, A2, . . . , An be events that partition , i.e., that are disjoint(Ai Aj = for i 6= j) and

ni=1Ai = . Then for any event B

P(B) =n

i=1

P(Ai B) =n

i=1

P(Ai)P(B|Ai)

This is called the Law of Total Probability. It also holds for n = . It allows usto compute the probability of a complicated event from knowledge ofprobabilities of simpler events

Example: Chess tournament, 3 types of opponents for a certain player.

P(Type 1) = 0.5, P(Win |Type1) = 0.3

P(Type 2) = 0.25, P(Win |Type2) = 0.4

P(Type 3) = 0.25, P(Win |Type3) = 0.5

What is probability of player winning?

EE 178: Basic Probability Page 1 30

• Solution: Let B be the event of winning and Ai be the event of playing Type i,i = 1, 2, 3:

P(B) =3

i=1

P(Ai)P(B|Ai)

= 0.5 0.3 + 0.25 0.4 + 0.25 0.5

= 0.375

EE 178: Basic Probability Page 1 31

Bayes Rule

Let A1, A2, . . . , An be nonzero probability events (the causes) that partition ,and let B be a nonzero probability event (the effect)

We often know the a priori probabilities P(Ai), i = 1, 2, . . . , n and theconditional probabilities P(B|Ai)s and wish to find the a posteriori probabilitiesP(Aj|B) for j = 1, 2, . . . , n

From the definition of conditional probability, we know that

P(Aj|B) =P(B,Aj)

P(B)=

P(B|Aj)

P(B)P(Aj)

By the law of total probability

P(B) =n

i=1

P(Ai)P(B|Ai)

EE 178: Basic Probability Page 1 32

• Substituting we obtain Bayes rule

P(Aj|B) =P(B|Aj)

ni=1P(Ai)P(B|Ai)

P(Aj) for j = 1, 2, . . . , n

Bayes rule also applies when the number of events n =

Radar Example: Recall that A is event that the aircraft is flying above and B isthe event that the aircraft is detected by the radar. What is the probability thatan aircraft is actually there given that the radar indicates a detection?

Recall P(A) = 0.05, P(B|A) = 0.99, P(B|Ac) = 0.1. Using Bayes rule:

P(there is an aircraft|radar detects it) = P(A|B)

=P(B|A)P(A)

P(B|A)P(A) + P(B|Ac)P(Ac)

=0.05 0.99

0.05 0.99 + 0.95 0.1

= 0.3426

EE 178: Basic Probability Page 1 33

Binary Communication Channel

Consider a noisy binary communication channel, where 0 or 1 is sent and 0 or 1is received. Assume that 0 is sent with probability 0.2 (and 1 is sent withprobability 0.8)

The channel is noisy. If a 0 is sent, a 0 is received with probability 0.9, and if a1 is sent, a 1 is received with probability 0.975

We can represent this channel model by a probability transition diagram

00

1 1

P({0}|{0}) = 0.9

P({1}|{1}) = 0.975

P({0}) = 0.2

P({1}) = 0.8

0.1

0.025

Given that 0 is received, find the probability that 0 was sent

EE 178: Basic Probability Page 1 34

• This is a random experiment with sample space = {(0, 0), (0, 1), (1, 0), (1, 1)},where the first entry is the bit sent and the second is the bit received

Define the two events

A = {0 is sent} = {(0, 1), (0, 0)}, and

B = {0 is received} = {(0, 0), (1, 0)}

The probability measure is defined via the P(A), P(B|A), and P(Bc|Ac)provided on the probability transition diagram of the channel

To find P(A|B), the a posteriori probability that a 0 was sent. We use Bayesrule

P(A|B) =P(B|A)P(A)

P(A)P(B|A) + P(Ac)P(B|Ac),

to obtain

P(A|B) =0.9

0.2 0.2 = 0.9,

which is much higher than the a priori probability of A (= 0.2)

EE 178: Basic Probability Page 1 35

Independence

It often happens that the knowledge that a certain event B has occurred has noeffect on the probability that another event A has occurred, i.e.,

P(A|B) = P(A)

In this case we say that the two events are statistically independent

Equivalently, two events are said to be statistically independent if

P(A,B) = P(A)P(B)

So, in this case, P(A|B) = P(A) and P(B|A) = P(B)

Example: Assuming that the binary channel of the previous example is used tosend two bits independently, what is the probability that both bits are in error?

EE 178: Basic Probability Page 1 36

• Solution:

Define the two events

E1 = {First bit is in error}

E2 = {Second bit is in error}

Since the bits are sent independently, the probability that both are in error is

P(E1, E2) = P(E1)P(E2)

Also by symmetry, P(E1) = P(E2)To find P(E1), we express E1 in terms of the events A and B as

E1 = (A1 Bc1) (A

c1 B1),

EE 178: Basic Probability Page 1 37

Thus,

P(E1) = P(A1, Bc1) + P(A

c1, B1)

= P(A1)P(Bc1|A1) + P(A

c1)P(B1|A

c1)

= 0.2 0.1 + 0.8 0.025 = 0.04

The probability that the two bits are in error

P(E1, E2) = (0.04)2 = 16 104

In general A1, A2, . . . , An are mutually independent if for each subset of theevents Ai1, Ai2, . . . , Aik

P(Ai1, Ai2, . . . , Aik) =k

j=1

P(Aij)

Note: P(A1, A2, . . . , An) =n

j=1P(Ai) alone is not sufficient for independence

EE 178: Basic Probability Page 1 38

• Example: Roll two fair dice independently. Define the events

A = {First die = 1, 2, or 3}

B = {First die = 2, 3, or 6}

C = {Sum of outcomes = 9}

Are A, B , and C independent?

EE 178: Basic Probability Page 1 39

Solution:

Since the dice are fair and the experiments are performed independently, theprobability of any pair of outcomes is 1/36, and

P(A) =1

2, P(B) =

1

2, P(C) =

1

9

Since A B C = {(3, 6)}, P(A,B,C) = 136 = P(A)P(B)P(C)

But are A, B , and C independent? Lets find

P(A,B) =2

6=

1

36=

1

4= P(A)P(B),

and thus A, B , and C are not independent!

EE 178: Basic Probability Page 1 40

• Also, independence of subsets does not necessarily imply independence

Example: Flip a fair coin twice independently. Define the events:

A: First toss is Head

B : Second toss is Head

C : First and second toss have different outcomes

A and B are independent, A and C are independent, and B and C areindependent

Are A,B,C mutually independent?

Clearly not, since if you know A and B , you know that C could not haveoccured, i.e., P(A,B,C) = 0 6= P(A)P(B)P(C) = 1/8

EE 178: Basic Probability Page 1 41

Counting

Discrete uniform law:

Finite sample space where all sample points are equally probable:

P(A) =number of sample points in A

total number of sample points

Variation: all outcomes in A are equally likely, each with probability p. Then,

P(A) = p (number of elements of A)

In both cases, we compute probabilities by counting

EE 178: Basic Probability Page 1 42

• Basic Counting Principle

Procedure (multiplication rule):

r steps

ni choices at step i

Number of choices is n1 n2 nr

Example:

Number of license plates with 3 letters followed by 4 digits:

With no repetition (replacement), the number is:

Example: Consider a set of objects {s1, s2, . . . , sn}. How many subsets of theseobjects are there (including the set itself and the empty set)?

Each object may or may not be in the subset (2 options)

The number of subsets is:

EE 178: Basic Probability Page 1 43

Counting can be very tricky

Classic example: Throw three 6-sided dice. Is the probability that the sum of theoutcomes is 11 equal to the the probability that the sum of the outcomes is 12?

De Meres argued that they are, since the number of different ways to obtain 11and 12 are the same:

Sum=11: {6, 4, 1}, {6, 3, 2}, {5, 5, 1}, {5, 4, 2}, {5, 3, 3}, {4, 4, 3}

Sum=12: {6, 5, 1}, {6, 4, 2}, {6, 3, 3}, {5, 5, 2}, {5, 4, 3}, {4, 4, 4}

This turned out to be false. Why?

EE 178: Basic Probability Page 1 44

• Basic Types of Counting

Assume we have n distinct objects a1, a2, . . . , an, e.g., digits, letters, etc.

Some basic counting questions:

How many different ordered sequences of k objects can be formed out of then objects with replacement?

How many different ordered sequences of k n objects can be formed out ofthe n objects without replacement? (called k-permutations)

How many different unordered sequences (subsets) of k n objects can beformed out of the n objects without replacement? (called k-combinations)

Given r nonnegative integers n1, n2, . . . , nr that sum to n (the number ofobjects), how many ways can the n objects be partitioned into r subsets(unordered sequences) with the ith subset having exactly ni objects? (callerpartitions, and is a generalization of combinations)

EE 178: Basic Probability Page 1 45

Ordered Sequences With and Without Replacement

The number of ordered k-sequences from n objects with replacement isn n n . . . n k times, i.e., nk

Example: If n = 2, e.g., binary digits, the number of ordered k-sequences is 2k

The number of different ordered sequences of k objects that can be formed fromn objects without replacement, i.e., the k-permutations, is

n (n 1) (n 2) (n k + 1)

If k = n, the number is

n (n 1) (n 2) 2 1 = n! (n-factorial)

Thus the number of k-permutations is: n!/(n k)!

Example: Consider the alphabet set {A,B,C,D}, so n = 4

The number of k = 2-permutations of n = 4 objects is 12

They are: AB,AC,AD,BA,BC,BD,CA,CB,CD,DA,DB, and DC

EE 178: Basic Probability Page 1 46

• Unordered Sequences Without Replacement

Denote by(

nk

)

(n choose k) the number of unordered k-sequences that can beformed out of n objects without replacement, i.e., the k-combinations

Two different ways of constructing the k-permutations:

1. Choose k objects ((

nk

)

), then order them (k! possible orders). This gives(

nk

)

k!

2. Choose the k objects one at a time:

n (n 1) (n k + 1) =n!

(n k)!choices

Hence(

nk

)

k! = n!(nk)! , or(

nk

)

= n!k!(nk)! Example: The number of k = 2-combinations of {A,B,C,D} is 6

What is the number of binary sequences of length n with exactly k n ones?

Question: What isn

k=0

(

nk

)

?

EE 178: Basic Probability Page 1 47

Finding Probablity Using Counting

Example: Die Rolling

Roll a fair 6-sided die 6 times independently. Find the probability that outcomesfrom all six rolls are different

Solution:

# outcomes yielding this event =

# of points in sample space =

Probability =

EE 178: Basic Probability Page 1 48

• Example: The Birthday Problem

k people are selected at random. What is the probability that all k birthdayswill be different (neglect leap years)

Solution:

Total # ways of assigning birthdays to k people:

# of ways of assigning birthdays to k people with no two having the samebirthday:

Probability:

EE 178: Basic Probability Page 1 49

Binomial Probabilities

Consider n independent coin tosses, where P(H) = p for 0 < p < 1

Outcome is a sequence of H s and T s of length n

P(sequence) = p#heads(1 p)#tails

The probability of k heads in n tosses is thus

sequences with k heads

P(sequence)

= #(khead sequences) pk(1 p)nk

=

(

n

k

)

pk(1 p)nk

Check that it sums to 1

EE 178: Basic Probability Page 1 50

• Example: Toss a coin with bias p independently 10 times.

Define the events B = {3 out of 10 tosses are heads} andA = {first two tosses are heads}. Find P(A|B)

Solution: The conditional probability is

P(A|B) =P(A,B)

P(B)

All points in B have the same probability p3(1 p)7, so we can find theconditional probability by counting:

# points in B beginning with two heads =

# points in B =

Probability =

EE 178: Basic Probability Page 1 51

Partitions

Let n1, n2, n3, . . . , nr be such that

r

i=1

ni = n

How many ways can the n objects be partitioned into r subsets (unorderedsequences) with the ith subset having exactly ni objects?

If r = 2 with n1, n n1 the answer is the n1-combinations( (nn1)

)

EE 178: Basic Probability Page 1 52

• (

n

n1 n2 . . . nr

)

=

(

n

n1

)

(

n n1n2

)

(

n (n1 + n2)

n3

)

. . .

(

nr1

i=1 ninr

)

=n!

n1!(n n1)!

(n n1)!

n2!(n (n1 + n2))!

=n!

n1!n2! nr!

EE 178: Basic Probability Page 1 53

Example: Balls and bins

We have n balls and r bins. We throw each ball independently and at randominto a bin. What is the probability that bin i = 1, 2, . . . , r will have exactly niballs, where

ri=1 ni = n?

Solution:

The probability of each outcome (sequence of ni balls in bin i) is:

# of ways of partitioning the n balls into r bins such that bin i has exactlyni balls is:

Probability:

EE 178: Basic Probability Page 1 54

• Example: Cards

Consider a perfectly shuffled 52-card deck dealt to 4 players. FindP(each player gets an ace)

Solution:

Size of the sample space is:

# of ways of distributing the four aces:

# of ways of dealing the remaining 48 cards:

Probability =

EE 178: Basic Probability Page 1 55

• Lecture Notes 2

Random Variables

Definition

Discrete Random Variables: Probability mass function (pmf)

Continuous Random Variables: Probability density function (pdf)

Mean and Variance

Cumulative Distribution Function (cdf)

Functions of Random Variables

Corresponding pages from B&T textbook: 7283, 86, 88, 90, 140144, 146150, 152157, 179186.

EE 178: Random Variables Page 2 1

Random Variable

A random variable is a real-valued variable that takes on values randomlySounds nice, but not terribly precise or useful

Mathematically, a random variable (r.v.) X is a real-valued function X() overthe sample space of a random experiment, i.e., X : R

X()

Randomness comes from the fact that outcomes are random (X() is adeterministic function)

Notations:

Always use upper case letters for random variables (X , Y , . . .)

Always use lower case letters for values of random variables: X = x meansthat the random variable X takes on the value x

EE 178: Random Variables Page 2 2

• Examples:

1. Flip a coin n times. Here = {H,T}n. Define the random variableX {0, 1, 2, . . . , n} to be the number of heads

2. Roll a 4-sided die twice.(a) Define the random variable X as the maximum of the two rolls

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

e

2nd roll

1st roll

1

2

3

4

1 2 3 4

- Real Linee e e e1 2 3 4

BBBBBBBBBBBBBBN

CCCCCCCCCCCCCCCCCCCW

EE 178: Random Variables Page 2 3

(b) Define the random variable Y to be the sum of the outcomes of the tworolls

(c) Define the random variable Z to be 0 if the sum of the two rolls is odd and1 if it is even

3. Flip coin until first heads shows up. Define the random variableX {1, 2. . . .} to be the number of flips until the first heads

4. Let = R. Define the two random variables(a) X =

(b) Y =

{

+1 for 0

1 otherwise

5. n packets arrive at a node in a communication network. Here is the set ofarrival time sequences (t1, t2, . . . , tn) (0,)

n

(a) Define the random variable N to be the number of packets arriving in theinterval (0, 1]

(b) Define the random variable T to be the first interarrival time

EE 178: Random Variables Page 2 4

• Why do we need random variables?

Random variable can represent the gain or loss in a random experiment, e.g.,stock market

Random variable can represent a measurement over a random experiment,e.g., noise voltage on a resistor

In most applications we care more about these costs/measurements than theunderlying probability space

Very often we work directly with random variables without knowing (or caring toknow) the underlying probability space

EE 178: Random Variables Page 2 5

Specifying a Random Variable

Specifying a random variable means being able to determine the probability thatX A for any event A R, e.g., any interval

To do so, we consider the inverse image of the set A under X(),{w : X() A}

R

set A

inverse image of A under X(), i.e., { : X() A}

So, X A iff {w : X() A}, thus P({X A}) = P({w : X() A}),or in short

P{X A} = P{w : X() A}

EE 178: Random Variables Page 2 6

• Example: Roll fair 4-sided die twice independently: Define the r.v. X to be themaximum of the two rolls. What is the P{0.5 < X 2}?

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

2nd roll

1st roll

1

2

3

4

1 2 3 4

- xf f f f

1 2 3 4

EE 178: Random Variables Page 2 7

We classify r.v.s as:

Discrete: X can assume only one of a countable number of values. Such r.v.can be specified by a Probability Mass Function (pmf). Examples 1, 2, 3,4(b), and 5(a) are of discrete r.v.s

Continuous: X can assume one of a continuum of values and the probabilityof each value is 0. Such r.v. can be specified by a Probability DensityFunction (pdf). Examples 4(a) and 5(b) are of continuous r.v.s

Mixed: X is neither discrete nor continuous. Such r.v. (as well as discrete andcontinuous r.v.s) can be specified by a Cumulative Distribution Function (cdf)

EE 178: Random Variables Page 2 8

• Discrete Random Variables

A random variable is said to be discrete if for some countable set

X R, i.e., X = {x1, x2, . . .}, P{X X} = 1

Examples 1, 2, 3, 4(b), and 5(a) are discrete random variables

Here X() partitions into the sets { : X() = xi} for i = 1, 2, . . ..Therefore, to specify X , it suffices to know P{X = xi} for all i

. . .. . .x1x2 x3 xnR

A discrete random variable is thus completely specified by its probability massfunction (pmf)

pX(x) = P{X = x} for all x X

EE 178: Random Variables Page 2 9

Clearly pX(x) 0 and

xX pX(x) = 1

Example: Roll a fair 4-sided die twice independently: Define the r.v. X to bethe maximum of the two rolls

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

f

2nd roll

1st roll

1

2

3

4

1 2 3 4

- xf f f f

1 2 3 4

pX(x):

EE 178: Random Variables Page 2 10

• Note that pX(x) can be viewed as a probability measure over a discrete samplespace (even though the original sample space may be continuous as in examples4(b) and 5(a))

The probability of any event A R is given by

P{X A} =

xAX

pX(x)

For the previous example P{1 < X 2.5 or X 3.5} =

Notation: We use X pX(x) or simply X p(x) to mean that the discreterandom variable X has pmf pX(x) or p(x)

EE 178: Random Variables Page 2 11

Famous Probability Mass Functions

Bernoulli : X Bern(p) for 0 p 1 has pmf

pX(1) = p, and pX(0) = 1 p

Geometric : X Geom(p) for 0 p 1 has pmf

pX(k) = p(1 p)k1 for k = 1, 2, . . .

pX(k)

p

k1 2 3 4

. . .. . .

This r.v. represents, for example, the number of coin flips until the first headsshows up (assuming independent coin flips)

EE 178: Random Variables Page 2 12

• Binomial : X B(n, p) for integer n > 0 and 0 p 1 has pmf

pX(k) =

(

n

k

)

pk(1 p)(nk) for k = 0, 1, 2, . . . , n

pX(k)

k0 1 2 3

. . .. . .

nk

The maximum of pX(k) is attained at

k =

{

(n+ 1)p, (n+ 1)p 1, if (n+ 1)p is an integer[(n+ 1)p], otherwise

The binomial r.v. represents, for example, the number of heads in n independentcoin flips (see page 48 of Lecture Notes 1)

EE 178: Random Variables Page 2 13

Poisson: X Poisson() for > 0 (called the rate) has pmf

pX(k) =k

k!e for k = 0, 1, 2, . . .

pX(k)

k1 2 30

. . . . . .

k

The maximum of pX(k) attained at

k =

{

, 1, if is an integer[], otherwise

The Poisson r.v. often represents the number of random events, e.g., arrivals ofpackets, photons, customers, etc.in some time interval, e.g., (0, 1]

EE 178: Random Variables Page 2 14

• Poisson is the limit of Binomial when p 1n , as n

To show this let Xn B(n,n) for > 0. For any fixed nonnegative integer k

pXn(k) =

(

n

k

)(

n

)k(

1

n

)(nk)

=n(n 1) . . . (n k + 1)

k!

k

nk

(

n

n

)nk

=n(n 1) . . . (n k + 1)

(n )kk

k!

(

n

n

)n

=n(n 1) . . . (n k + 1)

(n )(n ) . . . (n )

k

k!

(

1

n

)n

k

k!e as n

EE 178: Random Variables Page 2 15

Continuous Random Variables

Suppose a r.v. X can take on a continuum of values each with probability 0

Examples:

Pick a number between 0 and 1

Measure the voltage across a heated resistor

Measure the phase of a random sinusoid . . .

How do we describe probabilities of interesting events?

Idea: For discrete r.v., we sum a pmf over points in a set to find its probability.For continuous r.v., integrate a probability density over a set to find itsprobability analogous to mass density in physics (integrate mass density toget the mass)

EE 178: Random Variables Page 2 16

• Probability Density Function

A continuous r.v. X can be specified by a probability density function fX(x)(pdf) such that, for any event A,

P{X A} =

A

fX(x) dx

For example, for A = (a, b], the probability can be computed as

P{X (a, b]} =

b

a

fX(x) dx

Properties of fX(x):

1. fX(x) 0

2.

fX(x) dx = 1

Important note: fX(x) should not be interpreted as the probability that X = x,in fact it is not a probability measure, e.g., it can be > 1

EE 178: Random Variables Page 2 17

Can relate fX(x) to a probability using mean value theorem for integrals: Fix xand some x > 0. Then provided fX is continuous over (x, x+x],

P{X (x, x+x]} =

x+x

x

fX() d

= fX(c) x for some x c x+x

Now, if x is sufficently small, then

P{X (x, x+x]} fX(x)x

Notation: X fX(x) means that X has pdf fX(x)

EE 178: Random Variables Page 2 18

• Famous Probability Density Functions

Uniform: X U[a, b] for b > a has the pdf

f(x) =

{

1ba for a x b

0 otherwise

fX(x)

1ba

a bx

Uniform r.v. is commonly used to model quantization noise and finite precisioncomputation error (roundoff error)

EE 178: Random Variables Page 2 19

Exponential : X Exp() for > 0 has the pdf

f(x) =

{

ex for x 00 otherwise

fX(x)

x

Exponential r.v. is commonly used to model interarrival time in a queue, i.e.,time between two consecutive packet or customer arrivals, service time in aqueue, and lifetime of a particle, etc..

EE 178: Random Variables Page 2 20

• Example: Let X Exp(0.1) be the customer service time at a bank (inminutes). The person ahead of you has been served for 5 minutes. What is theprobability that you will wait another 5 minutes or more before getting served?

We want to find P{X > 10 |X > 5}

Solution: By definition of conditional probability

P{X > 10 |X > 5} =P{X > 10, X > 5}

P{X > 5}

=P{X > 10}

P{X > 5}

=

100.1e0.1x dx

50.1e0.1x dx

=e1

e0.5= e0.5,

but P{X > 5} = e0.5, i.e., the conditional probability of waiting more than 5additional minutes given that you have already waited more than 5 minutes isthe same as the unconditional probability of waiting more than 5 minutes!

EE 178: Random Variables Page 2 21

This is because the exponential r.v. is memoryless, which in general means thatfor any 0 x1 < x,

P{X > x |X > x1} = P{X > x x1}

To show this, consider

P{X > x |X > x1} =P{X > x, X > x1}

P{X > x1}

=P{X > x}

P{X > x1}

=

xex dx

x1ex dx

=ex

ex1= e(xx1)

= P{X > x x1}

EE 178: Random Variables Page 2 22

• Gaussian: X N (, 2) has pdf

fX(x) =1

22

e

(x)2

22 for < x < ,

where is the mean and 2 is the variance

N (, 2)

x

Gaussian r.v.s are frequently encountered in nature, e.g., thermal and shot noisein electronic devices are Gaussian, and very frequently used in modelling varioussocial, biological, and other phenomena

EE 178: Random Variables Page 2 23

Mean and Variance

A discrete (continuous) r.v. is completely specified by its pmf (pdf)

It is often desirable to summarize the r.v. or predict its outcome in terms of oneor a few numbers. What do we expect the value of the r.v. to be? What rangeof values around the mean do we expect the r.v. to take? Such information canbe provided by the mean and standard deviation of the r.v.

First we consider discrete r.v.s

Let X pX(x). The expected value (or mean) of X is defined as

E(X) =

xX

xpX(x)

Interpretations: If we view probabilities as relative frequencies, the mean wouldbe the weighted sum of the relative frequencies. If we view probabilities as pointmasses, the mean would be the center of mass of the set of mass points

EE 178: Random Variables Page 2 24

• Example: If the weather is good, which happens with probability 0.6, Alice walksthe 2 miles to class at a speed 5 miles/hr, otherwise she rides a bus at speed 30miles/hr. What is the expected time to get to class?

Solution: Define the discrete r.v. T to take the value (2/5)hr with probability0.6 and (2/30)hr with probability 0.4. The expected value of T

E(T ) = 2/5 0.6 + 2/30 0.4 = 4/15 hr

The second moment (or mean square or average power) of X is defined as

E(X2) =

xX

x2pX(x)

For the previous example, the second moment is

E(T 2) = (2/5)2 0.6 + (2/30)2 0.4 = 22/225 hr2

The variance of X is defined as

Var(X) = E[

(X E(X))2]

=

xX

(x E(X))2pX(x)

EE 178: Random Variables Page 2 25

Note that the Var 0. The variance has the interpretation of the moment ofinertia about the center of mass for a set of mass points

The standard deviation of X is defined as X =

Var(X)

Variance in terms of mean and second moment: Expanding the square and usingthe linearity of sum, we obtain

Var(X) = E[

(X E(X))2]

=

x

(x E(X))2pX(x)

=

x

(x2 2xE(X) + [E(X)]2)pX(x)

=

x

x2pX(x) 2E(X)

x

xpX(x) + [E(X)]2

x

pX(x)

= E(X2) 2[E(X)]2 + [E(X)]2

= E(X2) [E(X)]2

Note that since for any r.v., Var(X) 0, E(X2) (E(X))2

So, for our example, Var(T ) = 22/225 (4/15)2 = 0.02667.

EE 178: Random Variables Page 2 26

• Mean and Variance of Famous Discrete RVs

Bernoulli r.v. X Bern(p): The mean is

E(X) = p 1 + (1 p) 0 = p

The second moment is

E(X2) = p 12 + (1 p) 02 = p

Thus the variance is

Var(X) = E(X2) (E(X))2 = p p2 = p(1 p)

Binomial r.v. X B(n, p): It is not easy to find it using the definition. Laterwe use a much simpler method to show that

E(X) = np

Var(X) = np(1 p)

Just n times the mean (variance) of a Bernoulli!

EE 178: Random Variables Page 2 27

Geometric r.v. X Geom(p): The mean is

E(X) =

k=1

kp(1 p)k1

=p

1 p

k=1

k(1 p)k

=p

1 p

k=0

k(1 p)k

=p

1 p

1 p

p2=

1

p

Note: We found the series sum using the following trick:

Recall that for |a| < 1, the sum of the geometric series

k=0

ak =1

1 a

EE 178: Random Variables Page 2 28

• Suppose we want to evaluate

k=0

kak

Multiply both sides of the geometric series sum by a to obtain

k=0

ak+1 =a

1 a

Now differentiate both sides with respect to a. The LHS gives

d

da

k=0

ak+1 =

k=0

(k + 1)ak =1

a

k=0

kak

The RHS givesd

da

a

1 a=

1

(1 a)2

Thus we have

k=0

kak =a

(1 a)2

EE 178: Random Variables Page 2 29

The second moment can be similarly evaluated to obtain

E(X2) =2 p

p2

The variance is thus

Var(X) = E(X2) (E(X))2 =1 p

p2

EE 178: Random Variables Page 2 30

• Poisson r.v. X Poisson(): The mean is given by

E(X) =

k=0

kk

k!e

= e

k=1

kk

k!

= e

k=1

k

(k 1)!

= e

k=1

(k1)

(k 1)!

= e

k=0

k

k!

=

Can show that the variance is also equal to

EE 178: Random Variables Page 2 31

Mean and Variance for Continuous RVs

Now consider a continuous r.v. X fX(x), the expected value (or mean) of Xis defined as

E(X) =

xfX(x) dx

This has the interpretation of the center of mass for a mass density

The second moment and variance are similarly defined as:

E(X2) =

x2fX(x) dx

Var(X) = E[

(X E(X))2]

= E(X2) (E(X))2

EE 178: Random Variables Page 2 32

• Uniform r.v. X U[a, b]: The mean, second moment, and variance are given by

E(X) =

xfX(x) dx =

b

a

x1

a+ b

2

E(X2) =

x2fX(x) dx

=

b

a

x2 1

=1

b a

b

a

x2 dx

=1

b a

x3

3

b

a=

b3 a3

3(b a)

Var(X) = E(X2) (E(X))2

=b3 a3

3(b a)

(a+ b)2

4=

(b a)2

12

Thus, for X U[0, 1], E(X) = 1/2 and Var = 1/12

EE 178: Random Variables Page 2 33

Exponential r.v. X Exp(): The mean and variance are given by

E(X) =

xfX(x) dx

=

0

xex dx

= (xex)

0+

0

ex dx (integration by parts)

= 01

ex

0=

1

E(X2) =

0

x2ex dx =2

2

Var(X) =2

2

1

2=

1

2

For a Gaussian r.v. X N (, 2), the mean is and the variance is 2 (willshow this later using transforms)

EE 178: Random Variables Page 2 34

• Mean and Variance for Famous r.v.s

Random Variable Mean Variance

Bern(p) p p(1 p)

Geom(p) 1/p (1 p)/p2

B(n, p) np np(1 p)

Poisson()

U[a, b] (a+ b)/2 (b a)2/12

Exp() 1/ 1/2

N (, 2) 2

EE 178: Random Variables Page 2 35

Cumulative Distribution Function (cdf)

For discrete r.v.s we use pmfs, for continuous r.v.s we use pdfs

Many real-world r.v.s are mixed, that is, have both discrete and continuouscomponents

Example: A packet arrives at a router in a communication network. If the inputbuffer is empty (happens with probability p), the packet is serviced immediately.Otherwise the packet must wait for a random amount of time as characterizedby a pdf

Define the r.v. X to be the packet service time. X is neither discrete norcontinuous

There is a third probability function that characterizes all random variable types discrete, continuous, and mixed. The cumulative distribution function or cdfFX(x) of a random variable is defined by

FX(x) = P{X x} for x (,)

EE 178: Random Variables Page 2 36

• Properties of the cdf:

Like the pmf (but unlike the pdf), the cdf is the probability of something.Hence, 0 FX(x) 1

Normalization axiom implies that

FX() = 1, and FX() = 0

FX(x) is monotonically nondecreasing, i.e., if a > b then FX(a) FX(b)

The probability of any event can be easily computed from a cdf, e.g., for aninterval (a, b],

P{X (a, b]} = P{a < X b}

= P{X b} P{X a} (additivity)

= FX(b) FX(a)

The probability of any outcome a is: P{X = a} = FX(a) FX(a)

EE 178: Random Variables Page 2 37

If a r.v. is discrete, its cdf consists of a set of stepspX(x) FX(x)

x x

1

11 22 33 44 If X is a continuous r.v. with pdf fX(x), then

FX(x) = P{X x} =

x

fX() d

So, the cdf of a continuous r.v. X is continuous

FX(x)

1

x

EE 178: Random Variables Page 2 38

• In fact the precise way to define a continuous random variable is through thecontinuity of its cdf Further, if FX(x) is differentiable (almost everywhere),then

fX(x) =dFX(x)

dx= lim

x0

FX(x+x) FX(x)

x

= limx0

P{x < X x+x}

x

The cdf of a mixed r.v. has the general form

FX(x)

1

x

EE 178: Random Variables Page 2 39

Examples

cdf of a uniform r.v.:

FX(x) =

0 if x < a x

a1

ba d =xaba if a x b

1 if x b

fX(x) FX(x)

1ba

a a bbxx

1

EE 178: Random Variables Page 2 40

• cdf of an exponential r.v.:

FX(x) = 0, X < 0, and FX(x) = 1 ex, x 0

fX(x) FX(x)

1

xx

EE 178: Random Variables