EE 178 Lecture Notes 0 Course Introduction · EE 178 Lecture Notes 0 Course Introduction ... A...
date post
19Apr2018Category
Documents
view
234download
5
Embed Size (px)
Transcript of EE 178 Lecture Notes 0 Course Introduction · EE 178 Lecture Notes 0 Course Introduction ... A...
EE 178 Lecture Notes 0
Course Introduction
About EE178
About Probability
Course Goals
Course Topics
Lecture Notes
EE 178: Course Introduction Page 0 1
EE 178
EE 178 provides an introduction to probabilistic system analysis:
Basic probability theory
Some statistics
Examples from EE systems
Fulfills the EE undergraduate probability and statistics requirement (alternativesinclude Stat 116 and Math 151)
Very important background for EEs
Signal processing
Communication
Communication networks
Reliability of systems
Noise in circuits
EE 178: Course Introduction Page 0 2
Also important in many other fields, including
Computer science (Artificial intelligence)
Economics
Finance (stock market)
Physics (statistical physics, quantum physics)
Medicine (drug trials)
Knowledge of probability is almost as necessary as calculus and linear algebra
EE 178: Course Introduction Page 0 3
Probability Theory
Probability provides mathematical models for random phenomena andexperiments, such as: gambling, stock market, packet transmission in networks,electron emission, noise, statistical mechanics, etc.
Its origin lies in observations associated with games of chance:
If an unbiased coin is tossed n times and heads come up nH times, therelative frequency of heads, nH/n, is likely to be very close to 1/2
If a card is drawn from a perfectly shuffled deck and then replaced, the deckis reshuffled, and the process is repeated n times, the relative frequency ofspades is around 1/4
The purpose of probability theory is to describe and predict such relativefrequencies (averages) in terms of probabilities of events
But how is probability defined?
EE 178: Course Introduction Page 0 4
Classical definition of probability: This was the prevailing definition for manycenturies.Define the probability of an event A as:
P(A) =NAN
,
where N is the number of possible outcomes of the random experiment and NAis the number of outcomes favorable to the event AFor example for a 6sided die there are 6 outcomes and 3 of them are even, thusP(even) = 3/6Problems with this classical definition:
Here the assumption is that all outcomes are equally likely (probable). Thus,the concept of probability is used to define probability itself! Cannot be usedas basis for a mathematical theory
In many random experiments, the outcomes are not equally likely
The definition doesnt work when the number of possible outcomes is infinite
EE 178: Course Introduction Page 0 5
Relative frequency definition of probability: Here the probability of an event A isdefined as:
P(A) = limn
nAn,
where nA is the number of times A occurs in n performances of the experiment
Problems with the frequency definition:
How do we assert that such limit exists?
We often cannot perform the experiment multiple times (or even once, e.g.,defining the probability of a nuclear plant failure)
EE 178: Course Introduction Page 0 6
Axiomatic Definition of Probability
The axiomatic definition of probability was introduced by Kolmogorov in 1933. Itprovides rules for assigning probabilities to events in a mathematically consistentway and for deducing probabilities of events from probabilities of other events
Elements of axiomatic definition:
Set of all possible outcomes of the random experiment (Sample space)
Set of events, which are subsets of
A probability law (measure or function) that assigns probabilities to eventssuch that:1. P(A) 02. P() = 1, and3. If A and B are disjoint events, i.e., A B = , then
P(A B) = P(A) + P(B)
These rules are consistent with relative frequency interpretation
EE 178: Course Introduction Page 0 7
Unlike the relative frequency definition, where the limit is assumed to exist, theaxiomatic approach itself is used to prove when and under what conditions suchlimit exists (Laws of Large Numbers) and how fast it is approached (CentralLimit Theorem)
The theory provides mathematically precise ways for dealing with experimentswith infinite number of possible outcomes, defining random variables, etc.
The theory does not deal with what the values of the probabilities are or howthey are obtained. Any assignment of probabilities that satisfies the axioms islegitimate
How do we find the actual probabilities? They are obtained from:
Physics, e.g., Boltzmann, BoseEinstein, FermiDirac statistics
Engineering models, e.g., queuing models for communication networks,models for noisy communication channels
Social science theories, e.g., intelligence, behavior
Empirically. Collect data and fit a model (may have no physical, engineering,or social meaning ...)
EE 178: Course Introduction Page 0 8
Statistics and Statistical Signal Processing
Statistics applies probability theory to real world problems. It deals with:
Methods for data collection and construction of experiments described byprobabilistic models, and
Methods for making inferences from data, e.g., estimating probabilities,deciding if a drug is effective from drug trials
Statistical signal processing:
Develops probabilistic models for EE signals, e.g., speech, audio, images,video, and systems, e.g., communication, storage, compression, errorcorrection, pattern recognition
Uses tools from probability and statistics to design signal processingalgorithms to extract, predict, estimate, detect, compress, . . . such signals
The field has provided the foundation for the Information Age. Its impact onour lives has been huge: cell phones, DSL, digital cameras, DVD, MP3players, digital TV, medical imaging, etc.
EE 178: Course Introduction Page 0 9
Course Goals
Provide an introduction to probability theory (precise but not overlymathematical)
Provide, through examples, some applications of probability to EE systems
Provide some exposure to statistics and statistical signal processing
Provide background for more advanced courses in statistical signal processingand communication (e.g., EE 278, 279)
Have a lot of fun solving cool probability problems
WARNING: To really understand probability you have to work homeworkproblems yourself. Just reading solutions or having others give you solutionswill not do
EE 178: Course Introduction Page 0 10
Course Outline
1. Sample Space and probability (B&T: Chapter 1)
Set theory Sample space: discrete, continuous Events Probability law Conditional probability; chain rule Law of total probability Bayes rule Independence Counting
EE 178: Course Introduction Page 0 11
2. Random Variables (B&T: 2.12.4, 3.13.3, 3.6 pp. 179186)
Basic concepts Probability mass function (pmf) Famous discrete random variables Probability density function (pdf) Famous continuous random variables Mean and variance Cumulative distribution function (cdf) Functions of random variables
3. Multiple Random variables (B&T: 2.52.7, 3.43.5, 3.6 pp. 186190, 4.2)
Joint, marginal, and conditional pmf: Bayes rule for pmfs, independence Joint, marginal, and conditional pdf: Bayes rule for pdfs, independence Functions of multiple random variables
EE 178: Course Introduction Page 0 12
4. Expectation: (B&T: 2.4, 3.1, pp. 144148, 4.3, 4.5, 4.6)
Definition and properties Correlation and covariance Sum of random variables Conditional expectation Iterated expectation Mean square error estimation
5. Transforms (B&T: 4.1)
6. Limit Theorems (B&T: 7.1, 7.2, 4.1, 7.4)
Sample mean Markov and Chebychev inequalities Weak law of large numbers The Central Limit Theorem Confidence Intervals
EE 178: Course Introduction Page 0 13
7. Random Processes: (B&T: 5.1, 6.16.4)
Basic concepts Bernoulli process Markov chains
EE 178: Course Introduction Page 0 14
Lecture Notes
Based on B&T text, Prof. Gray EE 178 notes, and other sources
Help organize the material and reduce note taking in lectures
You will need to take some notes, e.g. clarifications, missing steps inderivations, solutions to additional examples
Slide title indicates a topic that often continues over several consecutive slides
Lecture notes + your notes + review sessions should be sufficient (B&T is onlyrecommended)
EE 178: Course Introduction Page 0 15
EE 178 Lecture Notes 1
Basic Probability
Set Theory
Elements of Probability
Conditional probability
Sequential Calculation of Probability
Total Probability and Bayes Rule
Independence
Counting
EE 178: Basic Probability Page 1 1
Set Theory Basics
A set is a collection of objects, which are its elements
A means that is an element of the set A
A set with no elements is called the empty set, denoted by
Types of sets:
Finite: A = {1, 2, . . . , n}
Countably infinite: A = {1, 2, . . .}, e.g., the set of integers
Uncountable: A set that takes a continuous set of values, e.g., the [0, 1]interval, the real line, etc.
A set can be described by all having a certain property, e.g., A = [0, 1] can bewritten as A = { : 0 1}
A set B A means that every element of B is an element of A
A universal set contains all objects of particular interest in a particularcontext, e.g., sample space for random experiment
EE 178: Basic Probability Page 1 2
Set Operations
Assume a universal set
Three basic operations:
Complementation: A complement of a set A with respect to isAc = { : / A}, so c =
Intersection: A B = { : A and B}
Union: A B = { : A or B}
Notation:
ni=1Ai = A1 A2 . . . An
ni=1Ai = A1 A2 . . . An
A collection of sets A1, A2, . . . , An are disjoint or mutually exclusive ifAi Aj = for all i 6= j , i.e., no two of them have a common element
A collection of sets A1, A2, . . . , An partition if they are disjoint andni=1Ai =
EE 178: Basic Probability Page 1 3
Venn Diagrams
(e) A B (f) A B
(b) A
(d) Bc
(a)
(c) B
EE 178: Basic Probability Page 1 4
Algebra of Sets
Basic relations:
1. S = S
2. (Ac)c = A
3. A Ac =
4. Commutative law: A B = B A
5. Associative law: A (B C) = (A B) C
6. Distributive law: A (B C) = (A B) (A C)
7. DeMorgans law: (A B)c = Ac Bc
DeMorgans law can be generalized to n events:
(ni=1Ai)c= ni=1A
ci
These can all be proven using the definition of set operations or visualized usingVenn Diagrams
EE 178: Basic Probability Page 1 5
Elements of Probability
Probability theory provides the mathematical rules for assigning probabilities tooutcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage
Basic elements of probability:
Sample space: The set of all possible elementary or finest grainoutcomes of the random experiment (also called sample points) The sample points are all disjoint The sample points are collectively exhaustive, i.e., together they make upthe entire sample space
Events: Subsets of the sample space
Probability law : An assignment of probabilities to events in a mathematicallyconsistent way
EE 178: Basic Probability Page 1 6
Discrete Sample Spaces
Sample space is called discrete if it contains a countable number of samplepoints
Examples:
Flip a coin once: = {H, T}
Flip a coin three times: = {HHH,HHT,HTH, . . .} = {H,T}3
Flip a coin n times: = {H,T}n (set of sequences of H and T of length n)
Roll a die once: = {1, 2, 3, 4, 5, 6}
Roll a die twice: = {(1, 1), (1, 2), (2, 1), . . . , (6, 6)} = {1, 2, 3, 4, 5, 6}2
Flip a coin until the first heads appears: = {H,TH, TTH, TTTH, . . .}
Number of packets arriving in time interval (0, T ] at a node in acommunication network : = {0, 1, 2, 3, . . . }
Note that the first five examples have finite , whereas the last two havecountably infinite . Both types are called discrete
EE 178: Basic Probability Page 1 7
Sequential models: For sequential experiments, the sample space can bedescribed in terms of a tree, where each outcome corresponds to a terminalnode (or a leaf)
Example: Three flips of a coin

H

H

@@@@@@
T
AAAAAAAAAAA
T

H

@@@@@@
T
HHHHHHHT
HHHHHHHT
HHHHHHHT
HHHHHHHT
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
EE 178: Basic Probability Page 1 8
Example: Roll a fair foursided die twice.
Sample space can be represented by a tree as above, or graphically
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
2nd roll
1st roll
1
2
3
4
1 2 3 4
EE 178: Basic Probability Page 1 9
Continuous Sample Spaces
A continuous sample space consists of a continuum of points and thus containsan uncountable number of points
Examples:
Random number between 0 and 1: = [0, 1]
Suppose we pick two numbers at random between 0 and 1, then the samplespace consists of all points in the unit square, i.e., = [0, 1]2
6

1.0
1.0x
y
EE 178: Basic Probability Page 1 10
Packet arrival time: t (0,), thus = (0,)
Arrival times for n packets: ti (0,), for i = 1, 2, . . . , n, thus = (0,)n
A sample space is said to be mixed if it is neither discrete nor continuous, e.g., = [0, 1] {3}
EE 178: Basic Probability Page 1 11
Events
Events are subsets of the sample space. An event occurs if the outcome of theexperiment belongs to the event
Examples:
Any outcome (sample point) is an event (also called an elementary event),e.g., {HTH} in three coin flips experiment or {0.35} in the picking of arandom number between 0 and 1 experiment
Flip coin 3 times and get exactly one H. This is a more complicated event,consisting of three sample points {TTH, THT, HTT}
Flip coin 3 times and get an odd number of Hs. The event is{TTH, THT, HTT, HHH}
Pick a random number between 0 and 1 and get a number between 0.0 and0.5. The event is [0, 0.5]
An event might have no points in it, i.e., be the empty set
EE 178: Basic Probability Page 1 12
Axioms of Probability
Probability law (measure or function) is an assignment of probabilities to events(subsets of sample space ) such that the following three axioms are satisfied:
1. P(A) 0, for all A (nonnegativity)
2. P() = 1 (normalization)
3. If A and B are disjoint (A B = ), then
P(A B) = P(A) + P(B) (additivity)
More generally,
3. If the sample space has an infinite number of points and A1, A2, . . . aredisjoint events, then
P (i=1Ai) =
i=1
P(Ai)
EE 178: Basic Probability Page 1 13
Mimics relative frequency, i.e., perform the experiment n times (e.g., roll a die ntimes). If the number of occurances of A is nA, define the relative frequency ofan event A as fA = nA/n
Probabilities are nonnegative (like relative frequencies)
Probability something happens is 1 (again like relative frequencies)
Probabilities of disjoint events add (again like relative frequencies)
Analogy: Except for normalization, probability is a measure much like
mass
length
area
volume
They all satisfy axioms 1 and 3
This analogy provides some intuition but is not sufficient to fully understandprobability theory other aspects such as conditioning, independence, etc.., areunique to probability
EE 178: Basic Probability Page 1 14
Probability for Discrete Sample Spaces
Recall that sample space is said to be discrete if it is countable
The probability measure P can be simply defined by first assigning probabilitiesto outcomes, i.e., elementary events {}, such that:
P({}) 0, for all , and
P({}) = 1
The probability of any other event A (by the additivity axiom) is simply
P(A) =
A
P({})
EE 178: Basic Probability Page 1 15
Examples:
For the coin flipping experiment, assign
P({H}) = p and P({T}) = 1 p, for 0 p 1
Note: p is the bias of the coin, and a coin is fair if p = 12
For the die rolling experiment, assign
P({i}) =1
6, for i = 1, 2, . . . , 6
The probability of the event the outcome is even, A = {2, 4, 6}, is
P(A) = P({2}) + P({4}) + P({6}) =1
2
EE 178: Basic Probability Page 1 16
If is countably infinite, we can again assign probabilities to elementaryeventsExample: Assume = {1, 2, . . .}, assign probability 2k to event {k}The probability of the event the outcome is even
P( outcome is even ) = P({2, 4, 6, 8, . . .})
= P({2}) + P({4}) + P({6}) + . . .
=
k=1
P({2k})
=
k=1
22k =1
3
EE 178: Basic Probability Page 1 17
Probability for Continuous Sample Space
Recall that if a sample space is continuous, is uncountably infinite
For continuous , we cannot in general define the probability measure P by firstassigning probabilities to outcomes
To see why, consider assigning a uniform probability measure to = (0, 1]
In this case the probability of each single outcome event is zero
How do we find the probability of an event such as A =[
12,
34
]
?
For this example we can define uniform probability measure over [0, 1] byassigning to an event A, the probability
P(A) = length of A,
e.g., P([0, 1/3] [2/3, 1]) = 2/3
Check that this is a legitimate assignment
EE 178: Basic Probability Page 1 18
Another example: Romeo and Juliet have a date. Each arrives late with arandom delay of up to 1 hour. Each will wait only 1/4 of an hour before leaving.What is the probability that Romeo and Juliet will meet?
Solution: The pair of delays is equivalent to that achievable by picking tworandom numbers between 0 and 1. Define probability of an event as its area
The event of interest is represented by the cross hatched region
6

1.0
1.0x
y
Probability of the event is:
area of crosshatched region = 1 21
2(0.75)2 = 0.4375
EE 178: Basic Probability Page 1 19
Basic Properties of Probability
There are several useful properties that can be derived from the axioms ofprobability:
1. P(Ac) = 1 P(A) P() = 0 P(A) 1
2. If A B , then P(A) P(B)
3. P(A B) = P(A) + P(B) P(A B)
4. P(A B) P(A) + P(B), or in general
P(ni=1Ai) n
i=1
P(Ai)
This is called the Union of Events Bound
These properties can be proved using the axioms of probability and visualizedusing Venn diagrams
EE 178: Basic Probability Page 1 20
Conditional Probability
Conditional probability allows us to compute probabilities of events based onpartial knowledge of the outcome of a random experiment
Examples:
We are told that the sum of the outcomes from rolling a die twice is 9. Whatis the probability the outcome of the first die was a 6?
A spot shows up on a radar screen. What is the probability that there is anaircraft?
You receive a 0 at the output of a digital communication system. What is theprobability that a 0 was sent?
As we shall see, conditional probability provides us with two methods forcomputing probabilities of events: the sequential method and thedivideandconquer method
It is also the basis of inference in statistics: make an observation and reasonabout the cause
EE 178: Basic Probability Page 1 21
In general, given an event B has occurred, we wish to find the probability ofanother event A, P(AB)
If all elementary outcomes are equally likely, then
P(AB) =# of outcomes in both A and B
# of outcomes in B
In general, if B is an event such that P(B) 6= 0, the conditional probability ofany event A given B is defined as
P(AB) =P(A B)
P(B), or
P(A,B)
P(B)
The function P(B) for fixed B specifies a probability law, i.e., it satisfies theaxioms of probability
EE 178: Basic Probability Page 1 22
Example
Roll a fair foursided die twice. So, the sample space is {1, 2, 3, 4}2. All samplepoints have probability 1/16
Let B be the event that the minimum of the two die rolls is 2 and Am, form = 1, 2, 3, 4, be the event that the maximum of the two die rolls is m. FindP(AmB)
Solution:
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f f
2nd roll
1st roll
1
2
3
4
1 2 3 4

6
m
P(AmB)
1 2 3 4
EE 178: Basic Probability Page 1 23
Conditional Probability Models
Before: Probability law conditional probabilities
Reverse is often more natural: Conditional probabilities probability law
We use the chain rule (also called multiplication rule):
By the definition of conditional probability, P(A B) = P(AB)P(B). Supposethat A1, A2, . . . , An are events, then
EE 178: Basic Probability Page 1 24
P(A1 A2 A3 An)
= P(A1 A2 A3 An1) P(AnA1 A2 A3 An1)
= P(A1 A2 A3 An2) P(An1A1 A2 A3 An2)
P(AnA1 A2 A3 An1)
...
= P(A1) P(A2A1) P(A3A1 A2) P(AnA1 A2 A3 An1)
=n
i=1
P(AiA1, A2, . . . , Ai1),
where A0 =
EE 178: Basic Probability Page 1 25
Sequential Calculation of Probabilities
Procedure:
1. Construct a tree description of the sample space for a sequential experiment
2. Assign the conditional probabilities on the corresponding branches of the tree
3. By the chain rule, the probability of an outcome can be obtained bymultiplying the conditional probabilities along the path from the root to theleaf node corresponding to the outcome
Example (Radar Detection): Let A be the event that an aircraft is flying aboveand B be the event that the radar detects it. Assume P(A) = 0.05,P(BA) = 0.99, and P(BAc) = 0.1
What is the probability of
Missed detection?, i.e., P(A Bc)
False alarm?, i.e., P(B Ac)
The sample space is: = {(A,B), (Ac, B), (A,Bc), (Ac, Bc)
EE 178: Basic Probability Page 1 26
Solution: Represent the sample space by a tree with conditional probabilities onits edges
}
P(A) = 0.05
}
P(BA) = 0.99
}
@@@@@@
P(BcA) = 0.01
(A,B)
}
AAAAAAAAAAAA
P(Ac) = 0.95
}
P(BAc) = 0.10
}
(A,Bc) (Missed detection)
@@@@@@
P(BcAc) = 0.90
}
(Ac, B) (False alarm)
(Ac, Bc)
EE 178: Basic Probability Page 1 27
Example: Three cards are drawn at random (without replacement) from a deckof cards. What is the probability of not drawing a heart?
Solution: Let Ai, i = 1, 2, 3, represent the event of no heart in the ith draw.We can represent the sample space as:
= {(A1, A2, A3), (Ac1, A2, A3), . . . , (A
c1, A
c2, A
c3)}
To find the probability law, we represent the sample space by a tree, writeconditional probabilities on branches, and use the chain rule
EE 178: Basic Probability Page 1 28
~
P(A1) = 39/52
not heart~
P(A2A1) = 38/51
(not heart, not heart)~
@@@@@@@
13/51
(not heart, heart)~
AAAAAAAAAAAAAA
P(Ac1) = 13/52
heart~
P(A3A1, A2) = 37/50
(not heart, not heart, not heart)
HHHHHHH13/50 (not heart, not heart, heart)
~
~
EE 178: Basic Probability Page 1 29
Total probability Divide and Conquer Method
Let A1, A2, . . . , An be events that partition , i.e., that are disjoint(Ai Aj = for i 6= j) and
ni=1Ai = . Then for any event B
P(B) =n
i=1
P(Ai B) =n
i=1
P(Ai)P(BAi)
This is called the Law of Total Probability. It also holds for n = . It allows usto compute the probability of a complicated event from knowledge ofprobabilities of simpler events
Example: Chess tournament, 3 types of opponents for a certain player.
P(Type 1) = 0.5, P(Win Type1) = 0.3
P(Type 2) = 0.25, P(Win Type2) = 0.4
P(Type 3) = 0.25, P(Win Type3) = 0.5
What is probability of player winning?
EE 178: Basic Probability Page 1 30
Solution: Let B be the event of winning and Ai be the event of playing Type i,i = 1, 2, 3:
P(B) =3
i=1
P(Ai)P(BAi)
= 0.5 0.3 + 0.25 0.4 + 0.25 0.5
= 0.375
EE 178: Basic Probability Page 1 31
Bayes Rule
Let A1, A2, . . . , An be nonzero probability events (the causes) that partition ,and let B be a nonzero probability event (the effect)
We often know the a priori probabilities P(Ai), i = 1, 2, . . . , n and theconditional probabilities P(BAi)s and wish to find the a posteriori probabilitiesP(AjB) for j = 1, 2, . . . , n
From the definition of conditional probability, we know that
P(AjB) =P(B,Aj)
P(B)=
P(BAj)
P(B)P(Aj)
By the law of total probability
P(B) =n
i=1
P(Ai)P(BAi)
EE 178: Basic Probability Page 1 32
Substituting we obtain Bayes rule
P(AjB) =P(BAj)
ni=1P(Ai)P(BAi)
P(Aj) for j = 1, 2, . . . , n
Bayes rule also applies when the number of events n =
Radar Example: Recall that A is event that the aircraft is flying above and B isthe event that the aircraft is detected by the radar. What is the probability thatan aircraft is actually there given that the radar indicates a detection?
Recall P(A) = 0.05, P(BA) = 0.99, P(BAc) = 0.1. Using Bayes rule:
P(there is an aircraftradar detects it) = P(AB)
=P(BA)P(A)
P(BA)P(A) + P(BAc)P(Ac)
=0.05 0.99
0.05 0.99 + 0.95 0.1
= 0.3426
EE 178: Basic Probability Page 1 33
Binary Communication Channel
Consider a noisy binary communication channel, where 0 or 1 is sent and 0 or 1is received. Assume that 0 is sent with probability 0.2 (and 1 is sent withprobability 0.8)
The channel is noisy. If a 0 is sent, a 0 is received with probability 0.9, and if a1 is sent, a 1 is received with probability 0.975
We can represent this channel model by a probability transition diagram
00
1 1
P({0}{0}) = 0.9
P({1}{1}) = 0.975
P({0}) = 0.2
P({1}) = 0.8
0.1
0.025
Given that 0 is received, find the probability that 0 was sent
EE 178: Basic Probability Page 1 34
This is a random experiment with sample space = {(0, 0), (0, 1), (1, 0), (1, 1)},where the first entry is the bit sent and the second is the bit received
Define the two events
A = {0 is sent} = {(0, 1), (0, 0)}, and
B = {0 is received} = {(0, 0), (1, 0)}
The probability measure is defined via the P(A), P(BA), and P(BcAc)provided on the probability transition diagram of the channel
To find P(AB), the a posteriori probability that a 0 was sent. We use Bayesrule
P(AB) =P(BA)P(A)
P(A)P(BA) + P(Ac)P(BAc),
to obtain
P(AB) =0.9
0.2 0.2 = 0.9,
which is much higher than the a priori probability of A (= 0.2)
EE 178: Basic Probability Page 1 35
Independence
It often happens that the knowledge that a certain event B has occurred has noeffect on the probability that another event A has occurred, i.e.,
P(AB) = P(A)
In this case we say that the two events are statistically independent
Equivalently, two events are said to be statistically independent if
P(A,B) = P(A)P(B)
So, in this case, P(AB) = P(A) and P(BA) = P(B)
Example: Assuming that the binary channel of the previous example is used tosend two bits independently, what is the probability that both bits are in error?
EE 178: Basic Probability Page 1 36
Solution:
Define the two events
E1 = {First bit is in error}
E2 = {Second bit is in error}
Since the bits are sent independently, the probability that both are in error is
P(E1, E2) = P(E1)P(E2)
Also by symmetry, P(E1) = P(E2)To find P(E1), we express E1 in terms of the events A and B as
E1 = (A1 Bc1) (A
c1 B1),
EE 178: Basic Probability Page 1 37
Thus,
P(E1) = P(A1, Bc1) + P(A
c1, B1)
= P(A1)P(Bc1A1) + P(A
c1)P(B1A
c1)
= 0.2 0.1 + 0.8 0.025 = 0.04
The probability that the two bits are in error
P(E1, E2) = (0.04)2 = 16 104
In general A1, A2, . . . , An are mutually independent if for each subset of theevents Ai1, Ai2, . . . , Aik
P(Ai1, Ai2, . . . , Aik) =k
j=1
P(Aij)
Note: P(A1, A2, . . . , An) =n
j=1P(Ai) alone is not sufficient for independence
EE 178: Basic Probability Page 1 38
Example: Roll two fair dice independently. Define the events
A = {First die = 1, 2, or 3}
B = {First die = 2, 3, or 6}
C = {Sum of outcomes = 9}
Are A, B , and C independent?
EE 178: Basic Probability Page 1 39
Solution:
Since the dice are fair and the experiments are performed independently, theprobability of any pair of outcomes is 1/36, and
P(A) =1
2, P(B) =
1
2, P(C) =
1
9
Since A B C = {(3, 6)}, P(A,B,C) = 136 = P(A)P(B)P(C)
But are A, B , and C independent? Lets find
P(A,B) =2
6=
1
36=
1
4= P(A)P(B),
and thus A, B , and C are not independent!
EE 178: Basic Probability Page 1 40
Also, independence of subsets does not necessarily imply independence
Example: Flip a fair coin twice independently. Define the events:
A: First toss is Head
B : Second toss is Head
C : First and second toss have different outcomes
A and B are independent, A and C are independent, and B and C areindependent
Are A,B,C mutually independent?
Clearly not, since if you know A and B , you know that C could not haveoccured, i.e., P(A,B,C) = 0 6= P(A)P(B)P(C) = 1/8
EE 178: Basic Probability Page 1 41
Counting
Discrete uniform law:
Finite sample space where all sample points are equally probable:
P(A) =number of sample points in A
total number of sample points
Variation: all outcomes in A are equally likely, each with probability p. Then,
P(A) = p (number of elements of A)
In both cases, we compute probabilities by counting
EE 178: Basic Probability Page 1 42
Basic Counting Principle
Procedure (multiplication rule):
r steps
ni choices at step i
Number of choices is n1 n2 nr
Example:
Number of license plates with 3 letters followed by 4 digits:
With no repetition (replacement), the number is:
Example: Consider a set of objects {s1, s2, . . . , sn}. How many subsets of theseobjects are there (including the set itself and the empty set)?
Each object may or may not be in the subset (2 options)
The number of subsets is:
EE 178: Basic Probability Page 1 43
De Meres Paradox
Counting can be very tricky
Classic example: Throw three 6sided dice. Is the probability that the sum of theoutcomes is 11 equal to the the probability that the sum of the outcomes is 12?
De Meres argued that they are, since the number of different ways to obtain 11and 12 are the same:
Sum=11: {6, 4, 1}, {6, 3, 2}, {5, 5, 1}, {5, 4, 2}, {5, 3, 3}, {4, 4, 3}
Sum=12: {6, 5, 1}, {6, 4, 2}, {6, 3, 3}, {5, 5, 2}, {5, 4, 3}, {4, 4, 4}
This turned out to be false. Why?
EE 178: Basic Probability Page 1 44
Basic Types of Counting
Assume we have n distinct objects a1, a2, . . . , an, e.g., digits, letters, etc.
Some basic counting questions:
How many different ordered sequences of k objects can be formed out of then objects with replacement?
How many different ordered sequences of k n objects can be formed out ofthe n objects without replacement? (called kpermutations)
How many different unordered sequences (subsets) of k n objects can beformed out of the n objects without replacement? (called kcombinations)
Given r nonnegative integers n1, n2, . . . , nr that sum to n (the number ofobjects), how many ways can the n objects be partitioned into r subsets(unordered sequences) with the ith subset having exactly ni objects? (callerpartitions, and is a generalization of combinations)
EE 178: Basic Probability Page 1 45
Ordered Sequences With and Without Replacement
The number of ordered ksequences from n objects with replacement isn n n . . . n k times, i.e., nk
Example: If n = 2, e.g., binary digits, the number of ordered ksequences is 2k
The number of different ordered sequences of k objects that can be formed fromn objects without replacement, i.e., the kpermutations, is
n (n 1) (n 2) (n k + 1)
If k = n, the number is
n (n 1) (n 2) 2 1 = n! (nfactorial)
Thus the number of kpermutations is: n!/(n k)!
Example: Consider the alphabet set {A,B,C,D}, so n = 4
The number of k = 2permutations of n = 4 objects is 12
They are: AB,AC,AD,BA,BC,BD,CA,CB,CD,DA,DB, and DC
EE 178: Basic Probability Page 1 46
Unordered Sequences Without Replacement
Denote by(
nk
)
(n choose k) the number of unordered ksequences that can beformed out of n objects without replacement, i.e., the kcombinations
Two different ways of constructing the kpermutations:
1. Choose k objects ((
nk
)
), then order them (k! possible orders). This gives(
nk
)
k!
2. Choose the k objects one at a time:
n (n 1) (n k + 1) =n!
(n k)!choices
Hence(
nk
)
k! = n!(nk)! , or(
nk
)
= n!k!(nk)! Example: The number of k = 2combinations of {A,B,C,D} is 6
They are AB,AC,AD,BC,BD,CD
What is the number of binary sequences of length n with exactly k n ones?
Question: What isn
k=0
(
nk
)
?
EE 178: Basic Probability Page 1 47
Finding Probablity Using Counting
Example: Die Rolling
Roll a fair 6sided die 6 times independently. Find the probability that outcomesfrom all six rolls are different
Solution:
# outcomes yielding this event =
# of points in sample space =
Probability =
EE 178: Basic Probability Page 1 48
Example: The Birthday Problem
k people are selected at random. What is the probability that all k birthdayswill be different (neglect leap years)
Solution:
Total # ways of assigning birthdays to k people:
# of ways of assigning birthdays to k people with no two having the samebirthday:
Probability:
EE 178: Basic Probability Page 1 49
Binomial Probabilities
Consider n independent coin tosses, where P(H) = p for 0 < p < 1
Outcome is a sequence of H s and T s of length n
P(sequence) = p#heads(1 p)#tails
The probability of k heads in n tosses is thus
P(k heads) =
sequences with k heads
P(sequence)
= #(khead sequences) pk(1 p)nk
=
(
n
k
)
pk(1 p)nk
Check that it sums to 1
EE 178: Basic Probability Page 1 50
Example: Toss a coin with bias p independently 10 times.
Define the events B = {3 out of 10 tosses are heads} andA = {first two tosses are heads}. Find P(AB)
Solution: The conditional probability is
P(AB) =P(A,B)
P(B)
All points in B have the same probability p3(1 p)7, so we can find theconditional probability by counting:
# points in B beginning with two heads =
# points in B =
Probability =
EE 178: Basic Probability Page 1 51
Partitions
Let n1, n2, n3, . . . , nr be such that
r
i=1
ni = n
How many ways can the n objects be partitioned into r subsets (unorderedsequences) with the ith subset having exactly ni objects?
If r = 2 with n1, n n1 the answer is the n1combinations( (nn1)
)
Answer in general:
EE 178: Basic Probability Page 1 52
(
n
n1 n2 . . . nr
)
=
(
n
n1
)
(
n n1n2
)
(
n (n1 + n2)
n3
)
. . .
(
nr1
i=1 ninr
)
=n!
n1!(n n1)!
(n n1)!
n2!(n (n1 + n2))!
=n!
n1!n2! nr!
EE 178: Basic Probability Page 1 53
Example: Balls and bins
We have n balls and r bins. We throw each ball independently and at randominto a bin. What is the probability that bin i = 1, 2, . . . , r will have exactly niballs, where
ri=1 ni = n?
Solution:
The probability of each outcome (sequence of ni balls in bin i) is:
# of ways of partitioning the n balls into r bins such that bin i has exactlyni balls is:
Probability:
EE 178: Basic Probability Page 1 54
Example: Cards
Consider a perfectly shuffled 52card deck dealt to 4 players. FindP(each player gets an ace)
Solution:
Size of the sample space is:
# of ways of distributing the four aces:
# of ways of dealing the remaining 48 cards:
Probability =
EE 178: Basic Probability Page 1 55
EE 178 Lecture Notes 1
Basic Probability
Set Theory
Elements of Probability
Conditional probability
Sequential Calculation of Probability
Total Probability and Bayes Rule
Independence
Counting
EE 178: Basic Probability Page 1 1
Set Theory Basics
A set is a collection of objects, which are its elements
A means that is an element of the set A
A set with no elements is called the empty set, denoted by
Types of sets:
Finite: A = {1, 2, . . . , n}
Countably infinite: A = {1, 2, . . .}, e.g., the set of integers
Uncountable: A set that takes a continuous set of values, e.g., the [0, 1]interval, the real line, etc.
A set can be described by all having a certain property, e.g., A = [0, 1] can bewritten as A = { : 0 1}
A set B A means that every element of B is an element of A
A universal set contains all objects of particular interest in a particularcontext, e.g., sample space for random experiment
EE 178: Basic Probability Page 1 2
Set Operations
Assume a universal set
Three basic operations:
Complementation: A complement of a set A with respect to isAc = { : / A}, so c =
Intersection: A B = { : A and B}
Union: A B = { : A or B}
Notation:
ni=1Ai = A1 A2 . . . An
ni=1Ai = A1 A2 . . . An
A collection of sets A1, A2, . . . , An are disjoint or mutually exclusive ifAi Aj = for all i 6= j , i.e., no two of them have a common element
A collection of sets A1, A2, . . . , An partition if they are disjoint andni=1Ai =
EE 178: Basic Probability Page 1 3
Venn Diagrams
(e) A B (f) A B
(b) A
(d) Bc
(a)
(c) B
EE 178: Basic Probability Page 1 4
Algebra of Sets
Basic relations:
1. S = S
2. (Ac)c = A
3. A Ac =
4. Commutative law: A B = B A
5. Associative law: A (B C) = (A B) C
6. Distributive law: A (B C) = (A B) (A C)
7. DeMorgans law: (A B)c = Ac Bc
DeMorgans law can be generalized to n events:
(ni=1Ai)c= ni=1A
ci
These can all be proven using the definition of set operations or visualized usingVenn Diagrams
EE 178: Basic Probability Page 1 5
Elements of Probability
Probability theory provides the mathematical rules for assigning probabilities tooutcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage
Basic elements of probability:
Sample space: The set of all possible elementary or finest grainoutcomes of the random experiment (also called sample points) The sample points are all disjoint The sample points are collectively exhaustive, i.e., together they make upthe entire sample space
Events: Subsets of the sample space
Probability law : An assignment of probabilities to events in a mathematicallyconsistent way
EE 178: Basic Probability Page 1 6
Discrete Sample Spaces
Sample space is called discrete if it contains a countable number of samplepoints
Examples:
Flip a coin once: = {H, T}
Flip a coin three times: = {HHH,HHT,HTH, . . .} = {H,T}3
Flip a coin n times: = {H,T}n (set of sequences of H and T of length n)
Roll a die once: = {1, 2, 3, 4, 5, 6}
Roll a die twice: = {(1, 1), (1, 2), (2, 1), . . . , (6, 6)} = {1, 2, 3, 4, 5, 6}2
Flip a coin until the first heads appears: = {H,TH, TTH, TTTH, . . .}
Number of packets arriving in time interval (0, T ] at a node in acommunication network : = {0, 1, 2, 3, . . . }
Note that the first five examples have finite , whereas the last two havecountably infinite . Both types are called discrete
EE 178: Basic Probability Page 1 7
Sequential models: For sequential experiments, the sample space can bedescribed in terms of a tree, where each outcome corresponds to a terminalnode (or a leaf)
Example: Three flips of a coin

H

H

@@@@@@
T
AAAAAAAAAAA
T

H

@@@@@@
T
HHHHHHHT
HHHHHHHT
HHHHHHHT
HHHHHHHT
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
EE 178: Basic Probability Page 1 8
Example: Roll a fair foursided die twice.
Sample space can be represented by a tree as above, or graphically
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
2nd roll
1st roll
1
2
3
4
1 2 3 4
EE 178: Basic Probability Page 1 9
Continuous Sample Spaces
A continuous sample space consists of a continuum of points and thus containsan uncountable number of points
Examples:
Random number between 0 and 1: = [0, 1]
Suppose we pick two numbers at random between 0 and 1, then the samplespace consists of all points in the unit square, i.e., = [0, 1]2
6

1.0
1.0x
y
EE 178: Basic Probability Page 1 10
Packet arrival time: t (0,), thus = (0,)
Arrival times for n packets: ti (0,), for i = 1, 2, . . . , n, thus = (0,)n
A sample space is said to be mixed if it is neither discrete nor continuous, e.g., = [0, 1] {3}
EE 178: Basic Probability Page 1 11
Events
Events are subsets of the sample space. An event occurs if the outcome of theexperiment belongs to the event
Examples:
Any outcome (sample point) is an event (also called an elementary event),e.g., {HTH} in three coin flips experiment or {0.35} in the picking of arandom number between 0 and 1 experiment
Flip coin 3 times and get exactly one H. This is a more complicated event,consisting of three sample points {TTH, THT, HTT}
Flip coin 3 times and get an odd number of Hs. The event is{TTH, THT, HTT, HHH}
Pick a random number between 0 and 1 and get a number between 0.0 and0.5. The event is [0, 0.5]
An event might have no points in it, i.e., be the empty set
EE 178: Basic Probability Page 1 12
Axioms of Probability
Probability law (measure or function) is an assignment of probabilities to events(subsets of sample space ) such that the following three axioms are satisfied:
1. P(A) 0, for all A (nonnegativity)
2. P() = 1 (normalization)
3. If A and B are disjoint (A B = ), then
P(A B) = P(A) + P(B) (additivity)
More generally,
3. If the sample space has an infinite number of points and A1, A2, . . . aredisjoint events, then
P (i=1Ai) =
i=1
P(Ai)
EE 178: Basic Probability Page 1 13
Mimics relative frequency, i.e., perform the experiment n times (e.g., roll a die ntimes). If the number of occurances of A is nA, define the relative frequency ofan event A as fA = nA/n
Probabilities are nonnegative (like relative frequencies)
Probability something happens is 1 (again like relative frequencies)
Probabilities of disjoint events add (again like relative frequencies)
Analogy: Except for normalization, probability is a measure much like
mass
length
area
volume
They all satisfy axioms 1 and 3
This analogy provides some intuition but is not sufficient to fully understandprobability theory other aspects such as conditioning, independence, etc.., areunique to probability
EE 178: Basic Probability Page 1 14
Probability for Discrete Sample Spaces
Recall that sample space is said to be discrete if it is countable
The probability measure P can be simply defined by first assigning probabilitiesto outcomes, i.e., elementary events {}, such that:
P({}) 0, for all , and
P({}) = 1
The probability of any other event A (by the additivity axiom) is simply
P(A) =
A
P({})
EE 178: Basic Probability Page 1 15
Examples:
For the coin flipping experiment, assign
P({H}) = p and P({T}) = 1 p, for 0 p 1
Note: p is the bias of the coin, and a coin is fair if p = 12
For the die rolling experiment, assign
P({i}) =1
6, for i = 1, 2, . . . , 6
The probability of the event the outcome is even, A = {2, 4, 6}, is
P(A) = P({2}) + P({4}) + P({6}) =1
2
EE 178: Basic Probability Page 1 16
If is countably infinite, we can again assign probabilities to elementaryeventsExample: Assume = {1, 2, . . .}, assign probability 2k to event {k}The probability of the event the outcome is even
P( outcome is even ) = P({2, 4, 6, 8, . . .})
= P({2}) + P({4}) + P({6}) + . . .
=
k=1
P({2k})
=
k=1
22k =1
3
EE 178: Basic Probability Page 1 17
Probability for Continuous Sample Space
Recall that if a sample space is continuous, is uncountably infinite
For continuous , we cannot in general define the probability measure P by firstassigning probabilities to outcomes
To see why, consider assigning a uniform probability measure to = (0, 1]
In this case the probability of each single outcome event is zero
How do we find the probability of an event such as A =[
12,
34
]
?
For this example we can define uniform probability measure over [0, 1] byassigning to an event A, the probability
P(A) = length of A,
e.g., P([0, 1/3] [2/3, 1]) = 2/3
Check that this is a legitimate assignment
EE 178: Basic Probability Page 1 18
Another example: Romeo and Juliet have a date. Each arrives late with arandom delay of up to 1 hour. Each will wait only 1/4 of an hour before leaving.What is the probability that Romeo and Juliet will meet?
Solution: The pair of delays is equivalent to that achievable by picking tworandom numbers between 0 and 1. Define probability of an event as its area
The event of interest is represented by the cross hatched region
6

1.0
1.0x
y
Probability of the event is:
area of crosshatched region = 1 21
2(0.75)2 = 0.4375
EE 178: Basic Probability Page 1 19
Basic Properties of Probability
There are several useful properties that can be derived from the axioms ofprobability:
1. P(Ac) = 1 P(A) P() = 0 P(A) 1
2. If A B , then P(A) P(B)
3. P(A B) = P(A) + P(B) P(A B)
4. P(A B) P(A) + P(B), or in general
P(ni=1Ai) n
i=1
P(Ai)
This is called the Union of Events Bound
These properties can be proved using the axioms of probability and visualizedusing Venn diagrams
EE 178: Basic Probability Page 1 20
Conditional Probability
Conditional probability allows us to compute probabilities of events based onpartial knowledge of the outcome of a random experiment
Examples:
We are told that the sum of the outcomes from rolling a die twice is 9. Whatis the probability the outcome of the first die was a 6?
A spot shows up on a radar screen. What is the probability that there is anaircraft?
You receive a 0 at the output of a digital communication system. What is theprobability that a 0 was sent?
As we shall see, conditional probability provides us with two methods forcomputing probabilities of events: the sequential method and thedivideandconquer method
It is also the basis of inference in statistics: make an observation and reasonabout the cause
EE 178: Basic Probability Page 1 21
In general, given an event B has occurred, we wish to find the probability ofanother event A, P(AB)
If all elementary outcomes are equally likely, then
P(AB) =# of outcomes in both A and B
# of outcomes in B
In general, if B is an event such that P(B) 6= 0, the conditional probability ofany event A given B is defined as
P(AB) =P(A B)
P(B), or
P(A,B)
P(B)
The function P(B) for fixed B specifies a probability law, i.e., it satisfies theaxioms of probability
EE 178: Basic Probability Page 1 22
Example
Roll a fair foursided die twice. So, the sample space is {1, 2, 3, 4}2. All samplepoints have probability 1/16
Let B be the event that the minimum of the two die rolls is 2 and Am, form = 1, 2, 3, 4, be the event that the maximum of the two die rolls is m. FindP(AmB)
Solution:
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f f
2nd roll
1st roll
1
2
3
4
1 2 3 4

6
m
P(AmB)
1 2 3 4
EE 178: Basic Probability Page 1 23
Conditional Probability Models
Before: Probability law conditional probabilities
Reverse is often more natural: Conditional probabilities probability law
We use the chain rule (also called multiplication rule):
By the definition of conditional probability, P(A B) = P(AB)P(B). Supposethat A1, A2, . . . , An are events, then
EE 178: Basic Probability Page 1 24
P(A1 A2 A3 An)
= P(A1 A2 A3 An1) P(AnA1 A2 A3 An1)
= P(A1 A2 A3 An2) P(An1A1 A2 A3 An2)
P(AnA1 A2 A3 An1)
...
= P(A1) P(A2A1) P(A3A1 A2) P(AnA1 A2 A3 An1)
=n
i=1
P(AiA1, A2, . . . , Ai1),
where A0 =
EE 178: Basic Probability Page 1 25
Sequential Calculation of Probabilities
Procedure:
1. Construct a tree description of the sample space for a sequential experiment
2. Assign the conditional probabilities on the corresponding branches of the tree
3. By the chain rule, the probability of an outcome can be obtained bymultiplying the conditional probabilities along the path from the root to theleaf node corresponding to the outcome
Example (Radar Detection): Let A be the event that an aircraft is flying aboveand B be the event that the radar detects it. Assume P(A) = 0.05,P(BA) = 0.99, and P(BAc) = 0.1
What is the probability of
Missed detection?, i.e., P(A Bc)
False alarm?, i.e., P(B Ac)
The sample space is: = {(A,B), (Ac, B), (A,Bc), (Ac, Bc)
EE 178: Basic Probability Page 1 26
Solution: Represent the sample space by a tree with conditional probabilities onits edges
}
P(A) = 0.05
}
P(BA) = 0.99
}
@@@@@@
P(BcA) = 0.01
(A,B)
}
AAAAAAAAAAAA
P(Ac) = 0.95
}
P(BAc) = 0.10
}
(A,Bc) (Missed detection)
@@@@@@
P(BcAc) = 0.90
}
(Ac, B) (False alarm)
(Ac, Bc)
EE 178: Basic Probability Page 1 27
Example: Three cards are drawn at random (without replacement) from a deckof cards. What is the probability of not drawing a heart?
Solution: Let Ai, i = 1, 2, 3, represent the event of no heart in the ith draw.We can represent the sample space as:
= {(A1, A2, A3), (Ac1, A2, A3), . . . , (A
c1, A
c2, A
c3)}
To find the probability law, we represent the sample space by a tree, writeconditional probabilities on branches, and use the chain rule
EE 178: Basic Probability Page 1 28
~
P(A1) = 39/52
not heart~
P(A2A1) = 38/51
(not heart, not heart)~
@@@@@@@
13/51
(not heart, heart)~
AAAAAAAAAAAAAA
P(Ac1) = 13/52
heart~
P(A3A1, A2) = 37/50
(not heart, not heart, not heart)
HHHHHHH13/50 (not heart, not heart, heart)
~
~
EE 178: Basic Probability Page 1 29
Total probability Divide and Conquer Method
Let A1, A2, . . . , An be events that partition , i.e., that are disjoint(Ai Aj = for i 6= j) and
ni=1Ai = . Then for any event B
P(B) =n
i=1
P(Ai B) =n
i=1
P(Ai)P(BAi)
This is called the Law of Total Probability. It also holds for n = . It allows usto compute the probability of a complicated event from knowledge ofprobabilities of simpler events
Example: Chess tournament, 3 types of opponents for a certain player.
P(Type 1) = 0.5, P(Win Type1) = 0.3
P(Type 2) = 0.25, P(Win Type2) = 0.4
P(Type 3) = 0.25, P(Win Type3) = 0.5
What is probability of player winning?
EE 178: Basic Probability Page 1 30
Solution: Let B be the event of winning and Ai be the event of playing Type i,i = 1, 2, 3:
P(B) =3
i=1
P(Ai)P(BAi)
= 0.5 0.3 + 0.25 0.4 + 0.25 0.5
= 0.375
EE 178: Basic Probability Page 1 31
Bayes Rule
Let A1, A2, . . . , An be nonzero probability events (the causes) that partition ,and let B be a nonzero probability event (the effect)
We often know the a priori probabilities P(Ai), i = 1, 2, . . . , n and theconditional probabilities P(BAi)s and wish to find the a posteriori probabilitiesP(AjB) for j = 1, 2, . . . , n
From the definition of conditional probability, we know that
P(AjB) =P(B,Aj)
P(B)=
P(BAj)
P(B)P(Aj)
By the law of total probability
P(B) =n
i=1
P(Ai)P(BAi)
EE 178: Basic Probability Page 1 32
Substituting we obtain Bayes rule
P(AjB) =P(BAj)
ni=1P(Ai)P(BAi)
P(Aj) for j = 1, 2, . . . , n
Bayes rule also applies when the number of events n =
Radar Example: Recall that A is event that the aircraft is flying above and B isthe event that the aircraft is detected by the radar. What is the probability thatan aircraft is actually there given that the radar indicates a detection?
Recall P(A) = 0.05, P(BA) = 0.99, P(BAc) = 0.1. Using Bayes rule:
P(there is an aircraftradar detects it) = P(AB)
=P(BA)P(A)
P(BA)P(A) + P(BAc)P(Ac)
=0.05 0.99
0.05 0.99 + 0.95 0.1
= 0.3426
EE 178: Basic Probability Page 1 33
Binary Communication Channel
Consider a noisy binary communication channel, where 0 or 1 is sent and 0 or 1is received. Assume that 0 is sent with probability 0.2 (and 1 is sent withprobability 0.8)
The channel is noisy. If a 0 is sent, a 0 is received with probability 0.9, and if a1 is sent, a 1 is received with probability 0.975
We can represent this channel model by a probability transition diagram
00
1 1
P({0}{0}) = 0.9
P({1}{1}) = 0.975
P({0}) = 0.2
P({1}) = 0.8
0.1
0.025
Given that 0 is received, find the probability that 0 was sent
EE 178: Basic Probability Page 1 34
This is a random experiment with sample space = {(0, 0), (0, 1), (1, 0), (1, 1)},where the first entry is the bit sent and the second is the bit received
Define the two events
A = {0 is sent} = {(0, 1), (0, 0)}, and
B = {0 is received} = {(0, 0), (1, 0)}
The probability measure is defined via the P(A), P(BA), and P(BcAc)provided on the probability transition diagram of the channel
To find P(AB), the a posteriori probability that a 0 was sent. We use Bayesrule
P(AB) =P(BA)P(A)
P(A)P(BA) + P(Ac)P(BAc),
to obtain
P(AB) =0.9
0.2 0.2 = 0.9,
which is much higher than the a priori probability of A (= 0.2)
EE 178: Basic Probability Page 1 35
Independence
It often happens that the knowledge that a certain event B has occurred has noeffect on the probability that another event A has occurred, i.e.,
P(AB) = P(A)
In this case we say that the two events are statistically independent
Equivalently, two events are said to be statistically independent if
P(A,B) = P(A)P(B)
So, in this case, P(AB) = P(A) and P(BA) = P(B)
Example: Assuming that the binary channel of the previous example is used tosend two bits independently, what is the probability that both bits are in error?
EE 178: Basic Probability Page 1 36
Solution:
Define the two events
E1 = {First bit is in error}
E2 = {Second bit is in error}
Since the bits are sent independently, the probability that both are in error is
P(E1, E2) = P(E1)P(E2)
Also by symmetry, P(E1) = P(E2)To find P(E1), we express E1 in terms of the events A and B as
E1 = (A1 Bc1) (A
c1 B1),
EE 178: Basic Probability Page 1 37
Thus,
P(E1) = P(A1, Bc1) + P(A
c1, B1)
= P(A1)P(Bc1A1) + P(A
c1)P(B1A
c1)
= 0.2 0.1 + 0.8 0.025 = 0.04
The probability that the two bits are in error
P(E1, E2) = (0.04)2 = 16 104
In general A1, A2, . . . , An are mutually independent if for each subset of theevents Ai1, Ai2, . . . , Aik
P(Ai1, Ai2, . . . , Aik) =k
j=1
P(Aij)
Note: P(A1, A2, . . . , An) =n
j=1P(Ai) alone is not sufficient for independence
EE 178: Basic Probability Page 1 38
Example: Roll two fair dice independently. Define the events
A = {First die = 1, 2, or 3}
B = {First die = 2, 3, or 6}
C = {Sum of outcomes = 9}
Are A, B , and C independent?
EE 178: Basic Probability Page 1 39
Solution:
Since the dice are fair and the experiments are performed independently, theprobability of any pair of outcomes is 1/36, and
P(A) =1
2, P(B) =
1
2, P(C) =
1
9
Since A B C = {(3, 6)}, P(A,B,C) = 136 = P(A)P(B)P(C)
But are A, B , and C independent? Lets find
P(A,B) =2
6=
1
36=
1
4= P(A)P(B),
and thus A, B , and C are not independent!
EE 178: Basic Probability Page 1 40
Also, independence of subsets does not necessarily imply independence
Example: Flip a fair coin twice independently. Define the events:
A: First toss is Head
B : Second toss is Head
C : First and second toss have different outcomes
A and B are independent, A and C are independent, and B and C areindependent
Are A,B,C mutually independent?
Clearly not, since if you know A and B , you know that C could not haveoccured, i.e., P(A,B,C) = 0 6= P(A)P(B)P(C) = 1/8
EE 178: Basic Probability Page 1 41
Counting
Discrete uniform law:
Finite sample space where all sample points are equally probable:
P(A) =number of sample points in A
total number of sample points
Variation: all outcomes in A are equally likely, each with probability p. Then,
P(A) = p (number of elements of A)
In both cases, we compute probabilities by counting
EE 178: Basic Probability Page 1 42
Basic Counting Principle
Procedure (multiplication rule):
r steps
ni choices at step i
Number of choices is n1 n2 nr
Example:
Number of license plates with 3 letters followed by 4 digits:
With no repetition (replacement), the number is:
Example: Consider a set of objects {s1, s2, . . . , sn}. How many subsets of theseobjects are there (including the set itself and the empty set)?
Each object may or may not be in the subset (2 options)
The number of subsets is:
EE 178: Basic Probability Page 1 43
De Meres Paradox
Counting can be very tricky
Classic example: Throw three 6sided dice. Is the probability that the sum of theoutcomes is 11 equal to the the probability that the sum of the outcomes is 12?
De Meres argued that they are, since the number of different ways to obtain 11and 12 are the same:
Sum=11: {6, 4, 1}, {6, 3, 2}, {5, 5, 1}, {5, 4, 2}, {5, 3, 3}, {4, 4, 3}
Sum=12: {6, 5, 1}, {6, 4, 2}, {6, 3, 3}, {5, 5, 2}, {5, 4, 3}, {4, 4, 4}
This turned out to be false. Why?
EE 178: Basic Probability Page 1 44
Basic Types of Counting
Assume we have n distinct objects a1, a2, . . . , an, e.g., digits, letters, etc.
Some basic counting questions:
How many different ordered sequences of k objects can be formed out of then objects with replacement?
How many different ordered sequences of k n objects can be formed out ofthe n objects without replacement? (called kpermutations)
How many different unordered sequences (subsets) of k n objects can beformed out of the n objects without replacement? (called kcombinations)
Given r nonnegative integers n1, n2, . . . , nr that sum to n (the number ofobjects), how many ways can the n objects be partitioned into r subsets(unordered sequences) with the ith subset having exactly ni objects? (callerpartitions, and is a generalization of combinations)
EE 178: Basic Probability Page 1 45
Ordered Sequences With and Without Replacement
The number of ordered ksequences from n objects with replacement isn n n . . . n k times, i.e., nk
Example: If n = 2, e.g., binary digits, the number of ordered ksequences is 2k
The number of different ordered sequences of k objects that can be formed fromn objects without replacement, i.e., the kpermutations, is
n (n 1) (n 2) (n k + 1)
If k = n, the number is
n (n 1) (n 2) 2 1 = n! (nfactorial)
Thus the number of kpermutations is: n!/(n k)!
Example: Consider the alphabet set {A,B,C,D}, so n = 4
The number of k = 2permutations of n = 4 objects is 12
They are: AB,AC,AD,BA,BC,BD,CA,CB,CD,DA,DB, and DC
EE 178: Basic Probability Page 1 46
Unordered Sequences Without Replacement
Denote by(
nk
)
(n choose k) the number of unordered ksequences that can beformed out of n objects without replacement, i.e., the kcombinations
Two different ways of constructing the kpermutations:
1. Choose k objects ((
nk
)
), then order them (k! possible orders). This gives(
nk
)
k!
2. Choose the k objects one at a time:
n (n 1) (n k + 1) =n!
(n k)!choices
Hence(
nk
)
k! = n!(nk)! , or(
nk
)
= n!k!(nk)! Example: The number of k = 2combinations of {A,B,C,D} is 6
They are AB,AC,AD,BC,BD,CD
What is the number of binary sequences of length n with exactly k n ones?
Question: What isn
k=0
(
nk
)
?
EE 178: Basic Probability Page 1 47
Finding Probablity Using Counting
Example: Die Rolling
Roll a fair 6sided die 6 times independently. Find the probability that outcomesfrom all six rolls are different
Solution:
# outcomes yielding this event =
# of points in sample space =
Probability =
EE 178: Basic Probability Page 1 48
Example: The Birthday Problem
k people are selected at random. What is the probability that all k birthdayswill be different (neglect leap years)
Solution:
Total # ways of assigning birthdays to k people:
# of ways of assigning birthdays to k people with no two having the samebirthday:
Probability:
EE 178: Basic Probability Page 1 49
Binomial Probabilities
Consider n independent coin tosses, where P(H) = p for 0 < p < 1
Outcome is a sequence of H s and T s of length n
P(sequence) = p#heads(1 p)#tails
The probability of k heads in n tosses is thus
P(k heads) =
sequences with k heads
P(sequence)
= #(khead sequences) pk(1 p)nk
=
(
n
k
)
pk(1 p)nk
Check that it sums to 1
EE 178: Basic Probability Page 1 50
Example: Toss a coin with bias p independently 10 times.
Define the events B = {3 out of 10 tosses are heads} andA = {first two tosses are heads}. Find P(AB)
Solution: The conditional probability is
P(AB) =P(A,B)
P(B)
All points in B have the same probability p3(1 p)7, so we can find theconditional probability by counting:
# points in B beginning with two heads =
# points in B =
Probability =
EE 178: Basic Probability Page 1 51
Partitions
Let n1, n2, n3, . . . , nr be such that
r
i=1
ni = n
How many ways can the n objects be partitioned into r subsets (unorderedsequences) with the ith subset having exactly ni objects?
If r = 2 with n1, n n1 the answer is the n1combinations( (nn1)
)
Answer in general:
EE 178: Basic Probability Page 1 52
(
n
n1 n2 . . . nr
)
=
(
n
n1
)
(
n n1n2
)
(
n (n1 + n2)
n3
)
. . .
(
nr1
i=1 ninr
)
=n!
n1!(n n1)!
(n n1)!
n2!(n (n1 + n2))!
=n!
n1!n2! nr!
EE 178: Basic Probability Page 1 53
Example: Balls and bins
We have n balls and r bins. We throw each ball independently and at randominto a bin. What is the probability that bin i = 1, 2, . . . , r will have exactly niballs, where
ri=1 ni = n?
Solution:
The probability of each outcome (sequence of ni balls in bin i) is:
# of ways of partitioning the n balls into r bins such that bin i has exactlyni balls is:
Probability:
EE 178: Basic Probability Page 1 54
Example: Cards
Consider a perfectly shuffled 52card deck dealt to 4 players. FindP(each player gets an ace)
Solution:
Size of the sample space is:
# of ways of distributing the four aces:
# of ways of dealing the remaining 48 cards:
Probability =
EE 178: Basic Probability Page 1 55
Lecture Notes 2
Random Variables
Definition
Discrete Random Variables: Probability mass function (pmf)
Continuous Random Variables: Probability density function (pdf)
Mean and Variance
Cumulative Distribution Function (cdf)
Functions of Random Variables
Corresponding pages from B&T textbook: 7283, 86, 88, 90, 140144, 146150, 152157, 179186.
EE 178: Random Variables Page 2 1
Random Variable
A random variable is a realvalued variable that takes on values randomlySounds nice, but not terribly precise or useful
Mathematically, a random variable (r.v.) X is a realvalued function X() overthe sample space of a random experiment, i.e., X : R
X()
Randomness comes from the fact that outcomes are random (X() is adeterministic function)
Notations:
Always use upper case letters for random variables (X , Y , . . .)
Always use lower case letters for values of random variables: X = x meansthat the random variable X takes on the value x
EE 178: Random Variables Page 2 2
Examples:
1. Flip a coin n times. Here = {H,T}n. Define the random variableX {0, 1, 2, . . . , n} to be the number of heads
2. Roll a 4sided die twice.(a) Define the random variable X as the maximum of the two rolls
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
e
2nd roll
1st roll
1
2
3
4
1 2 3 4
 Real Linee e e e1 2 3 4
BBBBBBBBBBBBBBN
CCCCCCCCCCCCCCCCCCCW
EE 178: Random Variables Page 2 3
(b) Define the random variable Y to be the sum of the outcomes of the tworolls
(c) Define the random variable Z to be 0 if the sum of the two rolls is odd and1 if it is even
3. Flip coin until first heads shows up. Define the random variableX {1, 2. . . .} to be the number of flips until the first heads
4. Let = R. Define the two random variables(a) X =
(b) Y =
{
+1 for 0
1 otherwise
5. n packets arrive at a node in a communication network. Here is the set ofarrival time sequences (t1, t2, . . . , tn) (0,)
n
(a) Define the random variable N to be the number of packets arriving in theinterval (0, 1]
(b) Define the random variable T to be the first interarrival time
EE 178: Random Variables Page 2 4
Why do we need random variables?
Random variable can represent the gain or loss in a random experiment, e.g.,stock market
Random variable can represent a measurement over a random experiment,e.g., noise voltage on a resistor
In most applications we care more about these costs/measurements than theunderlying probability space
Very often we work directly with random variables without knowing (or caring toknow) the underlying probability space
EE 178: Random Variables Page 2 5
Specifying a Random Variable
Specifying a random variable means being able to determine the probability thatX A for any event A R, e.g., any interval
To do so, we consider the inverse image of the set A under X(),{w : X() A}
R
set A
inverse image of A under X(), i.e., { : X() A}
So, X A iff {w : X() A}, thus P({X A}) = P({w : X() A}),or in short
P{X A} = P{w : X() A}
EE 178: Random Variables Page 2 6
Example: Roll fair 4sided die twice independently: Define the r.v. X to be themaximum of the two rolls. What is the P{0.5 < X 2}?
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
2nd roll
1st roll
1
2
3
4
1 2 3 4
 xf f f f
1 2 3 4
EE 178: Random Variables Page 2 7
We classify r.v.s as:
Discrete: X can assume only one of a countable number of values. Such r.v.can be specified by a Probability Mass Function (pmf). Examples 1, 2, 3,4(b), and 5(a) are of discrete r.v.s
Continuous: X can assume one of a continuum of values and the probabilityof each value is 0. Such r.v. can be specified by a Probability DensityFunction (pdf). Examples 4(a) and 5(b) are of continuous r.v.s
Mixed: X is neither discrete nor continuous. Such r.v. (as well as discrete andcontinuous r.v.s) can be specified by a Cumulative Distribution Function (cdf)
EE 178: Random Variables Page 2 8
Discrete Random Variables
A random variable is said to be discrete if for some countable set
X R, i.e., X = {x1, x2, . . .}, P{X X} = 1
Examples 1, 2, 3, 4(b), and 5(a) are discrete random variables
Here X() partitions into the sets { : X() = xi} for i = 1, 2, . . ..Therefore, to specify X , it suffices to know P{X = xi} for all i
. . .. . .x1x2 x3 xnR
A discrete random variable is thus completely specified by its probability massfunction (pmf)
pX(x) = P{X = x} for all x X
EE 178: Random Variables Page 2 9
Clearly pX(x) 0 and
xX pX(x) = 1
Example: Roll a fair 4sided die twice independently: Define the r.v. X to bethe maximum of the two rolls
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
f
2nd roll
1st roll
1
2
3
4
1 2 3 4
 xf f f f
1 2 3 4
pX(x):
EE 178: Random Variables Page 2 10
Note that pX(x) can be viewed as a probability measure over a discrete samplespace (even though the original sample space may be continuous as in examples4(b) and 5(a))
The probability of any event A R is given by
P{X A} =
xAX
pX(x)
For the previous example P{1 < X 2.5 or X 3.5} =
Notation: We use X pX(x) or simply X p(x) to mean that the discreterandom variable X has pmf pX(x) or p(x)
EE 178: Random Variables Page 2 11
Famous Probability Mass Functions
Bernoulli : X Bern(p) for 0 p 1 has pmf
pX(1) = p, and pX(0) = 1 p
Geometric : X Geom(p) for 0 p 1 has pmf
pX(k) = p(1 p)k1 for k = 1, 2, . . .
pX(k)
p
k1 2 3 4
. . .. . .
This r.v. represents, for example, the number of coin flips until the first headsshows up (assuming independent coin flips)
EE 178: Random Variables Page 2 12
Binomial : X B(n, p) for integer n > 0 and 0 p 1 has pmf
pX(k) =
(
n
k
)
pk(1 p)(nk) for k = 0, 1, 2, . . . , n
pX(k)
k0 1 2 3
. . .. . .
nk
The maximum of pX(k) is attained at
k =
{
(n+ 1)p, (n+ 1)p 1, if (n+ 1)p is an integer[(n+ 1)p], otherwise
The binomial r.v. represents, for example, the number of heads in n independentcoin flips (see page 48 of Lecture Notes 1)
EE 178: Random Variables Page 2 13
Poisson: X Poisson() for > 0 (called the rate) has pmf
pX(k) =k
k!e for k = 0, 1, 2, . . .
pX(k)
k1 2 30
. . . . . .
k
The maximum of pX(k) attained at
k =
{
, 1, if is an integer[], otherwise
The Poisson r.v. often represents the number of random events, e.g., arrivals ofpackets, photons, customers, etc.in some time interval, e.g., (0, 1]
EE 178: Random Variables Page 2 14
Poisson is the limit of Binomial when p 1n , as n
To show this let Xn B(n,n) for > 0. For any fixed nonnegative integer k
pXn(k) =
(
n
k
)(
n
)k(
1
n
)(nk)
=n(n 1) . . . (n k + 1)
k!
k
nk
(
n
n
)nk
=n(n 1) . . . (n k + 1)
(n )kk
k!
(
n
n
)n
=n(n 1) . . . (n k + 1)
(n )(n ) . . . (n )
k
k!
(
1
n
)n
k
k!e as n
EE 178: Random Variables Page 2 15
Continuous Random Variables
Suppose a r.v. X can take on a continuum of values each with probability 0
Examples:
Pick a number between 0 and 1
Measure the voltage across a heated resistor
Measure the phase of a random sinusoid . . .
How do we describe probabilities of interesting events?
Idea: For discrete r.v., we sum a pmf over points in a set to find its probability.For continuous r.v., integrate a probability density over a set to find itsprobability analogous to mass density in physics (integrate mass density toget the mass)
EE 178: Random Variables Page 2 16
Probability Density Function
A continuous r.v. X can be specified by a probability density function fX(x)(pdf) such that, for any event A,
P{X A} =
A
fX(x) dx
For example, for A = (a, b], the probability can be computed as
P{X (a, b]} =
b
a
fX(x) dx
Properties of fX(x):
1. fX(x) 0
2.
fX(x) dx = 1
Important note: fX(x) should not be interpreted as the probability that X = x,in fact it is not a probability measure, e.g., it can be > 1
EE 178: Random Variables Page 2 17
Can relate fX(x) to a probability using mean value theorem for integrals: Fix xand some x > 0. Then provided fX is continuous over (x, x+x],
P{X (x, x+x]} =
x+x
x
fX() d
= fX(c) x for some x c x+x
Now, if x is sufficently small, then
P{X (x, x+x]} fX(x)x
Notation: X fX(x) means that X has pdf fX(x)
EE 178: Random Variables Page 2 18
Famous Probability Density Functions
Uniform: X U[a, b] for b > a has the pdf
f(x) =
{
1ba for a x b
0 otherwise
fX(x)
1ba
a bx
Uniform r.v. is commonly used to model quantization noise and finite precisioncomputation error (roundoff error)
EE 178: Random Variables Page 2 19
Exponential : X Exp() for > 0 has the pdf
f(x) =
{
ex for x 00 otherwise
fX(x)
x
Exponential r.v. is commonly used to model interarrival time in a queue, i.e.,time between two consecutive packet or customer arrivals, service time in aqueue, and lifetime of a particle, etc..
EE 178: Random Variables Page 2 20
Example: Let X Exp(0.1) be the customer service time at a bank (inminutes). The person ahead of you has been served for 5 minutes. What is theprobability that you will wait another 5 minutes or more before getting served?
We want to find P{X > 10 X > 5}
Solution: By definition of conditional probability
P{X > 10 X > 5} =P{X > 10, X > 5}
P{X > 5}
=P{X > 10}
P{X > 5}
=
100.1e0.1x dx
50.1e0.1x dx
=e1
e0.5= e0.5,
but P{X > 5} = e0.5, i.e., the conditional probability of waiting more than 5additional minutes given that you have already waited more than 5 minutes isthe same as the unconditional probability of waiting more than 5 minutes!
EE 178: Random Variables Page 2 21
This is because the exponential r.v. is memoryless, which in general means thatfor any 0 x1 < x,
P{X > x X > x1} = P{X > x x1}
To show this, consider
P{X > x X > x1} =P{X > x, X > x1}
P{X > x1}
=P{X > x}
P{X > x1}
=
xex dx
x1ex dx
=ex
ex1= e(xx1)
= P{X > x x1}
EE 178: Random Variables Page 2 22
Gaussian: X N (, 2) has pdf
fX(x) =1
22
e
(x)2
22 for < x < ,
where is the mean and 2 is the variance
N (, 2)
x
Gaussian r.v.s are frequently encountered in nature, e.g., thermal and shot noisein electronic devices are Gaussian, and very frequently used in modelling varioussocial, biological, and other phenomena
EE 178: Random Variables Page 2 23
Mean and Variance
A discrete (continuous) r.v. is completely specified by its pmf (pdf)
It is often desirable to summarize the r.v. or predict its outcome in terms of oneor a few numbers. What do we expect the value of the r.v. to be? What rangeof values around the mean do we expect the r.v. to take? Such information canbe provided by the mean and standard deviation of the r.v.
First we consider discrete r.v.s
Let X pX(x). The expected value (or mean) of X is defined as
E(X) =
xX
xpX(x)
Interpretations: If we view probabilities as relative frequencies, the mean wouldbe the weighted sum of the relative frequencies. If we view probabilities as pointmasses, the mean would be the center of mass of the set of mass points
EE 178: Random Variables Page 2 24
Example: If the weather is good, which happens with probability 0.6, Alice walksthe 2 miles to class at a speed 5 miles/hr, otherwise she rides a bus at speed 30miles/hr. What is the expected time to get to class?
Solution: Define the discrete r.v. T to take the value (2/5)hr with probability0.6 and (2/30)hr with probability 0.4. The expected value of T
E(T ) = 2/5 0.6 + 2/30 0.4 = 4/15 hr
The second moment (or mean square or average power) of X is defined as
E(X2) =
xX
x2pX(x)
For the previous example, the second moment is
E(T 2) = (2/5)2 0.6 + (2/30)2 0.4 = 22/225 hr2
The variance of X is defined as
Var(X) = E[
(X E(X))2]
=
xX
(x E(X))2pX(x)
EE 178: Random Variables Page 2 25
Note that the Var 0. The variance has the interpretation of the moment ofinertia about the center of mass for a set of mass points
The standard deviation of X is defined as X =
Var(X)
Variance in terms of mean and second moment: Expanding the square and usingthe linearity of sum, we obtain
Var(X) = E[
(X E(X))2]
=
x
(x E(X))2pX(x)
=
x
(x2 2xE(X) + [E(X)]2)pX(x)
=
x
x2pX(x) 2E(X)
x
xpX(x) + [E(X)]2
x
pX(x)
= E(X2) 2[E(X)]2 + [E(X)]2
= E(X2) [E(X)]2
Note that since for any r.v., Var(X) 0, E(X2) (E(X))2
So, for our example, Var(T ) = 22/225 (4/15)2 = 0.02667.
EE 178: Random Variables Page 2 26
Mean and Variance of Famous Discrete RVs
Bernoulli r.v. X Bern(p): The mean is
E(X) = p 1 + (1 p) 0 = p
The second moment is
E(X2) = p 12 + (1 p) 02 = p
Thus the variance is
Var(X) = E(X2) (E(X))2 = p p2 = p(1 p)
Binomial r.v. X B(n, p): It is not easy to find it using the definition. Laterwe use a much simpler method to show that
E(X) = np
Var(X) = np(1 p)
Just n times the mean (variance) of a Bernoulli!
EE 178: Random Variables Page 2 27
Geometric r.v. X Geom(p): The mean is
E(X) =
k=1
kp(1 p)k1
=p
1 p
k=1
k(1 p)k
=p
1 p
k=0
k(1 p)k
=p
1 p
1 p
p2=
1
p
Note: We found the series sum using the following trick:
Recall that for a < 1, the sum of the geometric series
k=0
ak =1
1 a
EE 178: Random Variables Page 2 28
Suppose we want to evaluate
k=0
kak
Multiply both sides of the geometric series sum by a to obtain
k=0
ak+1 =a
1 a
Now differentiate both sides with respect to a. The LHS gives
d
da
k=0
ak+1 =
k=0
(k + 1)ak =1
a
k=0
kak
The RHS givesd
da
a
1 a=
1
(1 a)2
Thus we have
k=0
kak =a
(1 a)2
EE 178: Random Variables Page 2 29
The second moment can be similarly evaluated to obtain
E(X2) =2 p
p2
The variance is thus
Var(X) = E(X2) (E(X))2 =1 p
p2
EE 178: Random Variables Page 2 30
Poisson r.v. X Poisson(): The mean is given by
E(X) =
k=0
kk
k!e
= e
k=1
kk
k!
= e
k=1
k
(k 1)!
= e
k=1
(k1)
(k 1)!
= e
k=0
k
k!
=
Can show that the variance is also equal to
EE 178: Random Variables Page 2 31
Mean and Variance for Continuous RVs
Now consider a continuous r.v. X fX(x), the expected value (or mean) of Xis defined as
E(X) =
xfX(x) dx
This has the interpretation of the center of mass for a mass density
The second moment and variance are similarly defined as:
E(X2) =
x2fX(x) dx
Var(X) = E[
(X E(X))2]
= E(X2) (E(X))2
EE 178: Random Variables Page 2 32
Uniform r.v. X U[a, b]: The mean, second moment, and variance are given by
E(X) =
xfX(x) dx =
b
a
x1
b adx =
a+ b
2
E(X2) =
x2fX(x) dx
=
b
a
x2 1
b adx
=1
b a
b
a
x2 dx
=1
b a
x3
3
b
a=
b3 a3
3(b a)
Var(X) = E(X2) (E(X))2
=b3 a3
3(b a)
(a+ b)2
4=
(b a)2
12
Thus, for X U[0, 1], E(X) = 1/2 and Var = 1/12
EE 178: Random Variables Page 2 33
Exponential r.v. X Exp(): The mean and variance are given by
E(X) =
xfX(x) dx
=
0
xex dx
= (xex)
0+
0
ex dx (integration by parts)
= 01
ex
0=
1
E(X2) =
0
x2ex dx =2
2
Var(X) =2
2
1
2=
1
2
For a Gaussian r.v. X N (, 2), the mean is and the variance is 2 (willshow this later using transforms)
EE 178: Random Variables Page 2 34
Mean and Variance for Famous r.v.s
Random Variable Mean Variance
Bern(p) p p(1 p)
Geom(p) 1/p (1 p)/p2
B(n, p) np np(1 p)
Poisson()
U[a, b] (a+ b)/2 (b a)2/12
Exp() 1/ 1/2
N (, 2) 2
EE 178: Random Variables Page 2 35
Cumulative Distribution Function (cdf)
For discrete r.v.s we use pmfs, for continuous r.v.s we use pdfs
Many realworld r.v.s are mixed, that is, have both discrete and continuouscomponents
Example: A packet arrives at a router in a communication network. If the inputbuffer is empty (happens with probability p), the packet is serviced immediately.Otherwise the packet must wait for a random amount of time as characterizedby a pdf
Define the r.v. X to be the packet service time. X is neither discrete norcontinuous
There is a third probability function that characterizes all random variable types discrete, continuous, and mixed. The cumulative distribution function or cdfFX(x) of a random variable is defined by
FX(x) = P{X x} for x (,)
EE 178: Random Variables Page 2 36
Properties of the cdf:
Like the pmf (but unlike the pdf), the cdf is the probability of something.Hence, 0 FX(x) 1
Normalization axiom implies that
FX() = 1, and FX() = 0
FX(x) is monotonically nondecreasing, i.e., if a > b then FX(a) FX(b)
The probability of any event can be easily computed from a cdf, e.g., for aninterval (a, b],
P{X (a, b]} = P{a < X b}
= P{X b} P{X a} (additivity)
= FX(b) FX(a)
The probability of any outcome a is: P{X = a} = FX(a) FX(a)
EE 178: Random Variables Page 2 37
If a r.v. is discrete, its cdf consists of a set of stepspX(x) FX(x)
x x
1
11 22 33 44 If X is a continuous r.v. with pdf fX(x), then
FX(x) = P{X x} =
x
fX() d
So, the cdf of a continuous r.v. X is continuous
FX(x)
1
x
EE 178: Random Variables Page 2 38
In fact the precise way to define a continuous random variable is through thecontinuity of its cdf Further, if FX(x) is differentiable (almost everywhere),then
fX(x) =dFX(x)
dx= lim
x0
FX(x+x) FX(x)
x
= limx0
P{x < X x+x}
x
The cdf of a mixed r.v. has the general form
FX(x)
1
x
EE 178: Random Variables Page 2 39
Examples
cdf of a uniform r.v.:
FX(x) =
0 if x < a x
a1
ba d =xaba if a x b
1 if x b
fX(x) FX(x)
1ba
a a bbxx
1
EE 178: Random Variables Page 2 40
cdf of an exponential r.v.:
FX(x) = 0, X < 0, and FX(x) = 1 ex, x 0
fX(x) FX(x)
1
xx
EE 178: Random Variables