Stochastic Processes Notes

ECM3724 Stochastic Processes 1

ECM3724 Stochastic Processes

1 Overview of Probability

We call (X,Ω, P ) a probability space. Here Ω is the sample space, X : Ω → R is a random variable (RV)and P is a probability (measure). This is a function on subsets of Ω. Elements ω ∈ Ω are called outcomes.Subsets of Ω are called events.

Given A ⊆ R, PX ∈ A = ω ∈ Ω : X(ω) ∈ A. Given x ∈ R, X = x = ω ∈ Ω : X(ω) = x.

ExampleSuppose we toss a coin twice, then Ω = HH,HT, TH, TT, |Ω| = 4, X is the number of heads. Then ΩXis the of values that X takes, that is ΩX = 0, 1, 2. Now X = 1 = ω ∈ Ω : X(ω) = 1 = HT, TH.P (X = 1) = P (HT ) + P (TH).

Ω (or ΩX) could be discrete, for example ΩX = x1, x2, .... We require∑x∈ΩX

P (X = x) = 1. Ω (or ΩX)

could be continuous. If ΩX = [0, 1], P (X ∈ A) =∫AfX(x)dx. Here fX(x) is a probability density function (pdf),

fX(x) ≥ 0,∫

ΩXfX(x)dx = 1.

ExpectationIf ΩX = x1, x2, ... and g : ΩX → R then E(g(X)) =

∑g(xi)P (X = xi). If g(X) = X, E(X) := µX , then mean

of X. If g(X) = (X − µX)2 then E(g(X)) := V ar(X) = σ2X > 0, the variance of X. Also V ar(X) = E(X2)− µ2

X .In the continuous case, E(g(X)) =

∫Xg(x)fX(x)dx.

Common DistributionsIf X ∼ Unif [0, 1] then we say that X is distributed as Uniform[0, 1]. If A = [a, b] ⊆ ΩX then P (X ∈ A) =

∫ ba

1dx =b− a. If A = Q ∩ [0, 1], P (X ∈ A) = 0. There exist subsets of [0, 1] for which P (X ∈ A) is undefined.

If X ∼ Ber(p) then we say that X is distributed as a Bernoulli trial with success probability p. ΩX = 0, 1,P (X = 0) = 1 − p, P (X = 1) = p, p ∈ (0, 1). We can extend this over multiple independent, identicallydistributed (IID) trials (recall that events A and B are independent if P (A|B) = P (A), P (B|A) = P (B) orP (A ∩ B) = P (A)P (B)). In this case X ∼ Bin(n, p), that is X is distributed binomially with n trials and successprobability p. In this case ΩX = 0, 1, 2, ..., n. For 0 ≤ r ≤ n, P (X = r) =

(nr

)pr(1 − p)n−r. The Binomial

distribution has E(X) = np, V ar(X) = np(1− p).

Let ΩX = 0, 1, 2, .... We say that X ∼ Poisson(λ) if P (X = r) = λre−λ

r! for r ∈ N ∪ 0. Recall that

exp(x) =∑r≥0

xr

r! , which means that∑r P (X = r) = 1. For the Poisson distribution the mean and variance are

both λ. Other discrete distributions include the geometric and hypergeometric distributions.

We say X ∼ Exp(λ) if the pdf of X is given by fX(x) = λe−λx for x ≥ 0. Now

P (X > x) =

∫ ∞x

λe−λudu = [−e−λu]∞x = e−λx.

The exponential distribution is memoryless, that is the time to wait does not depend on the time already waited.More specifically, P (X > t+ s|X > s) = P (X > t) = e−λt.

Gaussian/Normal distributionWe say X ∼ N(µ, σ2) if the pdf is

fX(x) =1

σ√

2πexp

(− (x− µ)2

2σ2

)for x ∈ R. E(X) = µ, V ar(X) = σ2.

The Central Limit Theorem (CLT)

Supoose (Xi)ni=1 are IID RVs with E(Xi) = µ, V ar(Xi) = σ2 and let Sn =

∑ni=1Xi. Then if Zn = Sn−nµ

σ√n

d−→


N(0, 1), that is the distribution converges to N(0, 1). If A = [a, b] then P (Zn ∈ A)n→∞ →∫ ba

exp(−u2/2√2π

du.

This can be applied, for example if we take Sn to be the number of heads in n coin tosses. E(Sn) = n/2,V ar(Sn) =

∑ni=1 V ar(Xi) = n/4. Hence

P

(Sn − n

212

√n∈ [a, b]

)=

∫ b

a

exp(−u2/2)√2π

du.

Moment Generating Functions (MGF)For a RV X, the MGF of X is the function MX(t) = E(etx), a function of t ∈ R. In the discrete case,E(etx) =

∑i etxiP (X = xi), where X ∈ ΩX = x1, x2, .... In the continuous case, E(etx) =

∫ΩX

etxfX(x)dx.

Properties

• MX(0) = 1.

• drMX

dtr t=0 = E(Xr).

• If Z = X + Y and X,Y are independent then MZ(t) = MX(t)MY (t).

• If X,Y are RVs and MX(t) = MY (t) then X and Y have the same probability distribution, provided MX(t)is continuous in a neighbourhood of t = 0.

ExerciseCompute MX(t) for the Bernoulli distribution. Compute MY (t) for Y = X1 + ... + Xn, where the Xi are IIDBernoulli RVs. What is the distribution of Y ? What happens as n→∞ (and p→ 0 with λ = np fixed.We have MX(t) = p(et − 1) + 1. Hence My(t) = (p(et − 1) + 1)n by the properties of the MGF. If Y ∼ Bin(n, p)then P (Y = r) =

(nr

)pr(1− p)n−r so

E(ety) =

n∑r=0

(n

r

)pr(1− p)n−retr = (p(et − 1) + 1)n

by the Binomial theorem. Hence the sum of IID Bernoulli trials has a Binomial distribution. If we fix λ > 0 andlet λ = np with n→∞ (so p→ 0). Note that in a special case with n→∞ and p close to 1

2 then we can apply the

CLT. In this case we can use the fact that limn→∞(1 + x

n

)n= ex then

MY (t) =

(1 +

λ(et − 1)

n

)n→ exp

(λ(et − 1)

).

If Z ∼ Poisson(λ), P (Y = r) = λre−λ

r! for r ≥ 0. Then

MY (t) =

∞∑r=0

etrλre−λ

r!=

∞∑r=0

(λet)re−λ

r!= exp(λ(et − 1))

since ex =∑∞r=0

xr

r! . This agrees with the limiting case of the Binomial distribution.

Probability generating functions (PGF)These are useful in cases when X takes integer values.DefinitionSuppose X takes values in N. The PGF for X is the function GX(θ) with GX(θ) = E(θX), that is GX(θ) =∑∞n=0 θ

nP (X = n). If θ = et, then we recover MX(t).

Properties

• GX(1) = 1.

• dGXdθ =

∑∞n=1 nθ

n−1P (X = n), dGXdθ θ=1 = E(X).

• G′′X(1) = E(X(X − 1)).


• GX+Y (θ) = GX(θ)GY (θ) if X,Y are independent.

• V ar(X) = G′′X(1) +G′X(1)− [G′X(1)]2.

• Given a series for GX(θ), the coefficient of θn is precisely P (X = n).

• Moreover, GX(0) = P (X = 0).

The final property can be compared to M ′′X(0)−MX(0)2 = V ar(X).

ExampleX = X1 + ... + Xn, where the Xi ∼ Bernoulli(p). Now GXi(θ) = (1 − p + pθ). If the Xi are independent thenGX(θ) = GX1(θ)GX2(θ)...GXn(θ) = (1− p+ pθ)n. So X ∼ Bin(n, p).

ExampleConsider GX(θ) = 1

2−θ . What distribution does X have? Now

GX(θ) = E(θX) =

∞∑n=0

θnP (X = n) =1

2− θ=

1

2

∞∑n=0

(θ

2

)nwhich means that P (X = n) = 1

2n+1 , that is X has a geometric distribution.

Conditional Expectation and Joint RVsConsider X and Y discrete RVs taking values in ΩX = x1, x2, ... and ΩY = y1, y2, .... The joint probabilityfunction is given by fX,Y (xi, yj) = P (X = xi, Y = yj). The marginal probability (distribution) function isfX(xi) = P (X = xi) =

∑j fX,Y (xi, yj) or FY (yj) = P (Y = yj) =

∑i fX,Y (xi, yj). The conditional probability

of X = xi given Y = yj is

fX|Y (xi|yj) = P (X = xi|Y = yj) =P (X = xi, Y = yj)

P (Y = yj)=fX,Y (xi, yj)

fY (yj).

If X,Y are independent then fX,Y (xi, yj) = fX(xi)fY (yj) for all i, j. Given g : ΩX × ΩY → R, E(g(X,Y )) =∑i,j g(xi, yj)fX,Y (xi, yj). IfX,Y are independent and g(X,Y ) = h1(X)h2(Y ) then E(g(X,Y )) = E(h1(X))E(h2(Y )).

The conditional expectation of X given Y is the quantity E(X|Y ). This is a function of Y , the “average overX given by a value of Y ”. If Y = yj , then

E(X|Y = yj) =∑i

xiP (X = xi|Y = yj) =∑i

xifX,Y (xi, yj)

fY (yj),

a function of Y = yj . E(X|Y ) is a RV which is governed by the probability distribution of Y , hence we can alsotake expectations.

Tower ruleE(E(X|Y )) = E(X).

We have a useful check, if X and Y are independent then E(X|Y ) = E(X). In general

E(E(X|Y )) =∑j

(∑i

xifX,Y (xi, yj)

fY (yj)

)fY (yj) =

∑i

xi∑j

fX,Y (xi, yj) =∑i

xifX(xi) = E(X).

Compound processesSuppose (Xi)

∞i=1 are IID RVs with PGF GX(θ) (since the Xi are IID, X = Xi). Suppose N is a RV with PGF GN (θ),

independent of the Xi. Let Z = X1 +X2 +...+XN . Z is a compound process, a random sum of random variables.

PropositionFor the compound process Z, the PGF is GZ(θ) = GN (GX(θ)) = GN GX(θ).


ProofBy definition

GZ(θ) = E(θZ) =

∞∑n=0

θnP (Z = n)

= E(θX1+X2+...+Xn) = E(E(θX1+X2+...+Xn |N)) [Tower rule]

=

∞∑n=0

E(θX1+X2+...+Xn |N = n)P (N = n) =

∞∑n=0

E(θX1)E(θX2)...E(θXn)P (N = n) [Independence]

=

∞∑n=0

(GX(θ))nP (N = n) = GN (GX(θ)) [Definition of PGF].

The coefficient of θn in GZ(θ) gives P (Z = n).

ExampleSuppose we roll a dice and then flip a number of coins equal to the number on the dice. If Z is the number of heads,what is P (Z = k)?By the previous proposition we have GZ(θ) = GN (GX(θ)) where N ∼ Unif(6) (the values on the dice) andX ∼ Bernoulli(1/2) (the flip of the coin). Now

GX(θ) = E(θX) =

1∑n=0

θnP (X = n) =1

2(1 + θ)

and

GN (θ) = E(θN ) =

6∑n=1

θnP (N = n) =1

6

6∑n=1

θn.

Hence

GZ(θ) = GN

(1

2(1 + θ)

)=

1

6

6∑n=1

1

2n(1 + θ)n.

It follows that P (Z = k) is given by the θk coefficient in the sum. By the binomial theorem, we have P (Z = k) =16

∑6n=1

(nk

) (12

)n, recalling that

(nk

)= 0 for k > n.

2 Branching Processes

Let Sn be the number of individuals in a population at time n. Suppose S0 = 1 (one individual at time 0). Indi-viduals evolve at each timestep according to a common RV X, and evolve independently of others. We assume Xhas PGF GX(θ). Let Xi, i ≥ 1 be IID copies of X. We want to work out the long term behaviour of Sn, E(Sn)and P (Sn = 0).

We use generating function analysis. For Sn, denote the PGF by Gn(θ). So since S1 = X, G1(θ) = GX(θ). ForS2, G2(θ) = E(θS2 ) = E(E(θS2 |X)) = GX GX(θ) by the previous proposition. Similarly, G3(θ) = E(E(θS3 |S2)) =GX GX GX(θ).

PropositionGn(θ) = GX GX ... GX(θ) (n-fold composition). Moreover Gn(θ) = GX(Gn−1(θ)) = Gn−1(GX(θ)).ProofThis follows easily by induction.

Remark: the coefficient of θk in Gn(θ) gives P (Sn = k).


Expected behaviour of SnWe want to study E(Sn) = dGn

dθ θ=1 = G′n(1). Let µ = E(X) = G′X(1). We work out E(Sn) iteratively. Now

Gn(θ) = GX(Gn−1(θ))⇒ G′n(θ) = G′X(Gn−1(θ))G′n−1(θ) [Chain rule]

⇒ G′n(1) = G′X(Gn−1(1))G′n−1(1)

⇒ G′n(1) = G′X(1)G′n−1(1) [Since GX(1) = 1 for all RVs X]

⇒ E(Sn) = µE(Sn−1).

Since E(S1) = µ, we can apply this iteratively to get µn := E(Sn) = µn.

Probability of extinctionRecall, given Gn(θ), P (Sn = 0) = Gn(0). Let en = Gn(0) be the probability of extinction at time n. Lete = limn→∞ en be the probability of ultimate extinction. Now e1 = GX(0), e2 = GX(GX(0)) = GX(e1). Byiteration, en+1 = GX(en), that is en = GX GX ... GX(0) (n-fold composition).

Finding eWe can begin to find e by plotting GX(θ) for θ ∈ [0, 1]. Note that GX(0) ∈ [0, 1], GX(1) = 1 and sinceGX(θ) =

∑∞n=0 θ

nP (X = n), hence GX(θ) is increasing. There are two cases, as seen in the following figure.

Figure 1: The two behaviours of the sequence (en) as n→∞.

(en) is an increasing sequence. e = limn→∞ en+1 = limn→∞GX(en) = GX(limn→∞ en) since GX(θ) is continu-ous for θ ∈ [0, 1]. Hence e = GX(e), that is e is a fixed point of GX(θ). Remark: If µX := G′X(1) ≤ 1 then e = 1.If µX > 1 then e 6= 1, in fact e is the smallest root in [0, 1] of GX(θ) = θ. Useful check: GX(1) = 1, so θ = 1 is aroot too.

ExampleSuppose P (X = 0) = 0.3, P (X = 1) = 0.5, P (X = 2) = 0.2. Work out E(Sn) and P (S2 = 2) and e.Note GX(θ) = 0.3 + 0.5θ + 0.2θ2. µX = 0.5× 1 + 0.2× 2 = 0.9, hence E(Sn) = 0.9n since µn = µnX . Since µX ≤ 1,e = 1. G2(θ) = GX GX(θ) = 0.3 + 0.5(0.3 + 0.5θ+ 0.2θ2) + 0.2(0.3 + 0.5θ+ 0.2θ2)2. Hence, since P (S2 = 2) is theθ2 coefficient of G2(θ), P (S2 = 2) = 0.5× 0.2 + 0.12× 0.2 + 0.2× (0.5)2 = 0.174.

ExampleSuppose GX(θ) = 0.2 + 0.4θ + 0.3θ2 + 0.1θ3. Work out e and E(Sn).E(Sn) = (G′X(1))n = (0.4 + 0.6 + 0.3)n = (1.3)n. Since µX = 1.3 > 1, e < 1. We need to solve GX(θ) = θ.


Rearranging this equation we get 2− 6θ+ 3θ2 + θ3 = 0⇔ (θ− 1)(θ2 + 4θ− 2) = 0, so the three roots are θ = 1 andθ = −2±

√6. We need to take the positive root for this to make sense as a probability, so e = −2 +

√6.

Envelope problemTake two envelopes, one contains twice the amount of the other. Pick an envelope, suppose it contains amount x.Hence the other envelope contains either x/2 or 2x. The expected value for switching is 1

2

(x2

)+ 1

2 (2x) = 5x4 > x,

suggesting that you should always switch. This is a slight misconception, since the same argument can be appliedagain meaning that you would be better off staying with the envelope you already have, which gives a paradox.

Consider S0 random, with PGF GY (θ) specified. How do the results (as before) change? In this case, S1 =X1 +X2 + ...+XY and Sn+1 = X1 +X2 + ...+XSn .

PropositionThe PGF Gn(θ) for Sn is given by Gn(θ) = GY Gn(θ), where Gn(θ) is the S0 = 1 case PGF.ProofWe make the observation that G1(θ) = E(E(θS1 |Y )) = GY G1(θ) = GY GX(θ). We can apply this argumentrepeatedly to get the result.

Consequencesµn := E(Sn) = µY µn, where µn is the S0 = 1 case mean. en := GY (en), where en is the S0 = 1 case. Hencee = GY (e), where e is the ultimate extinction probability assuming S0 = 1.

ExampleSuppose S0 = 6 and GX(θ) = 0.3 + 0.5θ + 0.2θ2. Work out E(Sn) and e.We have E(Sn) = µY × µn. Since µY = 6 and µn = 0.9, E(Sn) = 6(0.9)n. Similarly, e = GY (e). Since e = 1,e = GY (1) = 1.What if GY (θ) = (0.4θ + 0.6)3?This is a Binomial(3, 0.4) distribution for S0. So E(Y ) = 1.2, which means that E(Sn) = 1.2(0.9)n. e = GY (e) =(0.4e+ 0.6)3 = 1 since e = 1. Remark: If e = 1, then we always get e = 1 if GY (θ) is a well defined PGF.

3 Poisson Processes

DefinitionEvents occur as a Poisson process if the intervals of time between events are IID exponentially distributed RVs.Recall that a RV T has an exponential distribution of its pdf is given by fT (t) = λe−λt for t ≥ 0 and λ > 0 (and 0otherwise). For example,

P (T > t) =

∫ ∞t

λe−λudu = e−λt, E(T ) =

∫ ∞0

λte−λtdt =1

λ.

The mean time between successive events is 1λ . The mean number of events per unit time is λ.

Let T1 be the time to the first event, T2 be the time between the first and second events,..., Tk be the timebetween the k − 1-th and the k-th events. Let Sn = T1 + T2 + ...+ Tn, this is the time to the n-th event. AssumeTknk=1 is a sequence of IID RVs each with exponential distribution of rate λ. Recall the memoryless property ofthe exponential distribution: P (T > t+ s|T > s) = P (T > t).

Questions: What is the distribution of Sn? Given a specified time t, how many events occur within this time?

Remark: Suppose we have n lightbulbs in sequence. We may be interested to find P (minT1, ..., Tn ≥ t) orP (maxT1, ..., Tn ≤ 6).

Theorem 2.1The time Sn to the n-th event follows a gamma distribution with pdf gn(t) = λ(λt)n−1 exp(−λt)

(n−1)! for t ≥ 0, n ≥ 1.

Note that n = 1 gives the exponential distribution.ProofWe will compute the moment generating functions (MGF) for Sn via the MGFs for the Tk, and show that this


coincides with the MGF for a RV Y with pdf gn(t). By uniqueness of MGFs, the distributions must then coincide.Recall that MT (t) = E(etT ). T has an exponential distribution, hence

MT (t) =

∫ ∞0

etu(λe−λu)du =

∫ ∞0

λeu(t−λ)du =

[λ

t− λexp(u(t− λ))

]∞0

=λ

λ− t,

that is assuming t < λ. It follows that

MSn(t) = E(ttSn) = E(etT1etT2 ...etTn) = E(etT1)E(etT2)...E(etTn)

since T1, ..., Tn are independent. Hence MSn(t) = λn

(λ−t)n . If Y is a RV with the gamma distribution pdf gn(x),

then the MGF for Y is precisely∫∞

0etxgn(x)dx. By direct calculation, using

∫∞0xne−xdx = n! and substituting

y = x(t − λ), we have MY (t) = λn

(λ−t)n . We can conclude that MSn(t) = MY (t). Since these are continuous near

t = 0, then Sn has the same (gamma) distribution as Y .

Suppose we now wish to fix some period of time t and consider the number N of events in this time.

Theorem 2.2The number Nt of events in time t follows a Poisson distribution with parameter λt, that is Nt ∼ Po(λt). That is

P (Nt = r) = (λt)r exp(−λt)r! for r ≥ 0.

ProofConsider the following. For r ≥ 1, P (At least r events in time period t) = P (Time to the r-th event is at most

t) =∫ t

0gr(x)dx. P (Exactly r events in time t) = P (At least r events in time t) − P (At least r + 1 events in time

t) =∫ t

0gr(x)dx−

∫ t0gr+1(x)dx = (λt)r exp(−λt)

r! .

We can draw several consequences from this.

Combining Poisson processesA Poisson stream is a sequence of arrivals (events) where the inter-arrival times are independent and follow anexponential distribution.

Suppose males (M) arrive to a shop as a Poisson stream of rate λm. Suppose females (F ) also arrive as a Poissonstream with rate λf . Assume both streams are independent. We want to analyse the combined process for the totalarrivals.

Method 1Let G(t) := P (Time to the next arrival is less than t). This is the probability distribution of the arrival time ofthe next customer, irrespective of being male or female. Now G(t) = 1− P (No arrivals before time t) = 1− P (Nomales arrive in time t)P (No females arrive in time t) = 1− exp(−λmt) exp(−λf t) = 1− exp(−(λm + λf )t).

Method 2Let N be the number of arrivals in time t, N (m) be the number of male arrivals and N (f) be the number of femalearrivals. Then N (m) ∼ Po(λmt), N (f) ∼ Po(λf t). Then

P (N = k) = P (N (m) +N (f) = k) =

k∑r=0

P (N (m) = r and N (f) = k − r) =

k∑r=0

P (N (m) = r)P (N (f) = k − r)

=

k∑r=0

(λmt)r exp(−λmt)r!

(λf t)k−r exp(−λf t)(k − r)!

=exp(−λmt) exp(−λf t)

k!

k∑r=0

(k

r

)(λmt)

r(λf t)k−r

=((λm + λf )t)k exp(−(λm + λf )t)

k!.

Hence N has a Poisson distribution of rate (λm + λf )t.

In general, given X and Y , Z = X + Y , Z is a convolution of X and Y .


Splitting processesSuppose customers arrive as a Poisson stream with combined rate λ. For each customer that arrives, there is aprobability p that the customer is male, and probability 1−p that the customer is female. Arrivals are independent.We want to find the arrival process for females alone. We expect λf = λ(1− p). Now, P (No females arrive in timet) = P (No arrivals in time t) +

∑∞n=1 P (n arrivals in time t, each arrival is male)

= exp(−λt) +

∞∑n=1

(λt)n exp(−λt)n!

pn = exp(−λt)

1 +

∞∑n=1

(λtp)n

n!

= exp(−λt) exp(λtp) = exp(−λ(1− p)t).

Hence the inter-arrival times for females have an exponential distribution with parameter λ(1− p).

Remark: we can extend combining and splitting to an arbitrary number of Poisson stream types.

ExampleSuppose events occur as a Poisson stream. Suppose we know that there are N events in time T (N,T fixed). Nowfix t < T . What is the probability distribution that governs the number of events in time t?We can split the time interval T into two parts, one up to time t and one after time t. Now

P (r events in time t|N events in time T ) =P (r events in time t ∩ N events in time T)

P (N events in time T )

=P (r events in time t ∩ N − r events in time T − t)

P (N events in time T )

by the memoryless property of the exponential distribution. The top events are independent so we can take theproduct of the probabilities. Hence

P (r events in time t|N events in time T ) =

((λt)r exp(−λt)

r!

)((λ(T − t))N−r exp(−λ(T − t))

(N − r)!

)(N !

(λT )N exp(−λt)

)=

(N

r

)prt (1− pt)N−r

where pt = tT . So it has a Binomial(N, tT ) distribution.

Remark: Also consider the time to the r-th event given N events in time T . The corresponding distributionis a beta distribution and the mean time taken is rT

N+1 .

ExampleSuppose students arrive to Harrison as a Poisson stream of rate 3 per unit time.(i) Find P (5 students enter within time 2).(ii) Find P (Time taken for 4-th student to arrive is at least 2).(iii) Find P (3 students enter in time 1|5 students enter by time 2).(iv) If one student entered in time 1, show that the time the student entered is uniformly distributed on [0, 1].

(i) We need to find P (N2 = 5) for λ = 3. P (N2 = 5) = 65 exp(−6)5! = 0.161 to 3 sf.

(ii) Following the proof of Theorem 2.2:

P (4−th event in time t ≥ 2) = P (≤ 3 events in time 2) = P (N2 ≤ 3) = e−6

(1 + 6 +

62

2+

63

3!

)= 0.151 to 3 sf.

(iii) For N = 5, T = 2, r = 3, t = 1, P (3 students in time 1|5 students in time 2) =(

53

) (12

)3 ( 12

)2= 5

16 .(iv) The single event in time T is governed by a Binomial(1, tT ). Since T = 1, the probability of success is t, thatmeans we get the probability distribution for a Uniform[0, 1] RV.

Poisson Rate Equations (Steady state)Previously we have has the number of arrivals in time t Nt ∼ Poisson(λt). If Nt is specified alone, the systemstate will just tend to ∞. We’ll consider a corresponding departure process (also with a Poisson distributionPoisson(µt)). We’ll examine the long run average, the probability distribution governing the state of the system infuture time (after transient effects). We hope that the long run probabilities that govern the number of individuals


in the system do not depend on time.

Consider the number of arrivals (events) in a short time period δt. Define o(x) as any function of x such that

limx→0o(x)x = 0. x3/2 is a suitable choice for o(x), but

√x isn’t. In time period δt, the number N of events is

governed by a Poisson(λδt) distribution. We want to look at

P (N = 0) = exp(−λδt) = 1− λδt+λ2δt2

2!+ ... = 1− λδt+ o(δt)

P (N = 1) = λδt exp(−λδt) = λδt+ o(δt)

P (N ≥ 2) =(λδt)2 exp(−λδt)

2+

(λδt)3 exp(−λδt)3!

+ ... = o(δt).

We’ll consider probabilities with o(δt) as insignificant. We’ll now consider a system which contains a populationof individuals. Denote the system state by n, that is the number of individuals in the system. Given state n, weassume arrivals are governed by a Poisson(λnt) distribution, and departures are governed by a Poisson distributionwith rate µn (per unit time). Observe that if the state of the system changes then so do the probability distributionsthat govern arrivals and departures, that is the rates λn and µn will change. If the system is in state n = 0 (empty)then µ0 = 0. Also, assume an upper limit on capacity so that if we are in state N (for some fixed number) thenλN = 0.

State diagram

Figure 2: The state diagram for the system, show the directions of the rate constants λi and µj .

Define Pn(t) to be the probability of being in state n at time t. We’ll compare Pn(t) to its neighbours. In factthe evolution of Pn(t) with time will just depend on the neighbours. Transitions to states with a difference in n atleast 2 will have small probabilities in time scale δt. Note we assume all processes are independent. Now supposePn(t) is given for all 0 ≤ n ≤ N . Consider P0(t+ δt), that is the probability of being in state 0 at time t+ δt. NowP0(t+ δt) = P (State 0 at time t and no arrivals in time δt) + P (State 1 at time t and one departure from state 1in time δt) + P (State at least 2 at time t and sufficient departures to 0). Hence

P0(t+ δt) = P0(t)(1− λ0δt) + P1(t)µ1δt+ o(δt). (1)

Similarly

PN (t+ δt) = PN (t)(1− µNδt) + PN−1(t)λN−1δt+ o(δt). (2)

Now we consider 0 < n < N . In this case we also need to include the possibility of no arrivals or departures aboutour given state. Hence

Pn(t+ δt) = Pn−1(t)λn−1δt+ Pn+1(t)µn+1δt+ Pn(t)(1− λnδt)(1− µnδt) + o(δt) (3)

= Pn−1(t)λn−1δt+ Pn+1(t)µn+1δt+ Pn(t)(1− λnδt− µnδt) + o(δt). (4)

We can rearrange Equations 1 to 3 to get the LHSs in the form Pn(t+δt)−Pn(t)δt and take δt→ 0. These LHSs become

dPndt for 0 ≤ n ≤ N . Hence we obtain the Poisson rate equations:

Equation 1⇒ dP0

dt= µ1P1(t)− λ0P0(t) (5)


Equation 2⇒ dPNdt

= λN−1PN−1(t)− µNPN (t) (6)

Equation 3⇒ dPndt

= λn−1Pn−1(t) + µn+1Pn+1(t)− (λn + µn)Pn(t) for 0 < n < N. (7)

We now have N + 1 coupled ODEs. We will be interested in the steady state, and this means that the long runbehaviour is time-independent. Sufficient conditions for (that is that imply) steady state are at least one of thefollowing: (i) An upper limit on capacity; (ii) After some state n = n0, the departure rate µn is greater than thearrival rate λn for all n ≥ n0.

We’ll assume that the system tends to a steady state. Moreover we assume that the steady state is indepen-dent of the initial state and transient effects can be ignored (quickly) as time t increases. As we approach thesteady state, Pn(t)→ Pn as t→∞ (for some constant Pn). Hence dPn

dt → 0 and hence we set dPndt = 0 in Equations

4 to 6 to get:

Equation 4⇒ 0 = µ1P1(t)− λ0P0(t) (8)

Equation 5⇒ 0 = λN−1PN−1(t)− µNPN (t) (9)

Equation 6⇒ 0 = λn−1Pn−1(t)µn+1Pn+1(t)− (λn + µn)Pn(t) for 0 < n < N. (10)

These equations are homogeneous and Pn = 0 is a solution for all n. However we have linear dependence and wecan remove this by imposing

∑Nn=0 Pn = 1. This will lead to a unique solution. We take a change of variable,

letting Πn = λn−1Pn−1 − µnPn. Then from Equation 7 we find Π1 = 0, from Equation 9 we find Πn = Πn+1, that

is Πn = 0 for all n ≤ N − 1. Finally from Equation 8 we get ΠN = 0. Since Pn = 0, we have Pn = λn−1Pn−1

µnfor

n = 1, ..., N − 1. By iteration we find

Pn =λn−1λn−2...λ0

µnµn−1...µ1P0. (11)

Since∑Nn=0 Pn = 1, we have

1 +

∑Nn=1

λn−1...λ0

µn...µ1

P0 = 1 which can be solved for P0 and hence Pn from Equation

10. This is the steady state relation.

Steady state diagram analysisWe want to work out the steady state probabilities. We don’t need to remember Equations 1 to 10. A probabilityflow from state (n) to state (n+1) is just λnPn. A probability flow from state (n+1) to state (n) is just µn+1Pn+1.In fact, at a steady state, the probability flows balance, that is µn+1Pn+1 = λnPn. This analysis is equivalent tosolving Πn = 0 (or Πn = Πn+1). Hence Pn+1 = λnPn

µn+1.

Method

1. Draw the steady state diagram and equate probability flows to get steady state equations in Pn.

2. Solve these equations for Pn in terms of P0.

3. Solve for P0 using∑Nn=0 Pn = 1 (note that N =∞ is allowed).

4. Find Pn via the equations in Step 2.

5. Compute the expected system state∑Nn=0 nPn.


Recall that if N = ∞ we require limn→∞λnµn

< 1. If limn→∞ > 1 then there is no steady state (that is the state

tends to infinity with probability 1). If limn→∞λnµn

= 1 then finding a steady state is possible but not alwaysguaranteed.

ExampleSuppose we have one engineer to repair a set of three photocopiers. Individual machines break down at a rateof once per hour. Repair time is 30 minutes per machine on average. The state of the system if the number ofmachines broken. Times are exponentially distributed.The individual breakdown rate is 1 per hour so λ = 1. The repair rate is 2 per hour so µ = 2.

Figure 3: The steady state diagram for this example.

From the steady state diagram, we can derive the steady state equations: 3P0 = 2P1, 2P1 = 2P2, P2 = 2P3,which in turn mean P1 = 3P0

2 , P2 = 3P0

2 , P3 = 3P0

4 . Hence by applying∑3n=0 Pn = 1 we get P0 = 4

19 , P1 = 619 ,

P2 = 619 and P3 = 3

19 . From this we can find the expected state of the system is 2719 .

ExampleSuppose there are 3 machines with a breakdown rate of λ = 1 per hour. Suppose there are 2 engineers and eachrepairs a single machine with a mean service time of 30 minutes. Let X be the number of machines broken. FindP0 and E(X).Note that the rate of service (per engineer) is µ = 2 per hour.

Figure 4: The steady state diagram for the case of 3 photocopiers and 2 engineers.

To find the required information we equate probability flows. Doing this, we find the equations 2λP0 = µP1,2λP1 = 2µP2 and λP2 = 2µP3. Solving each of these equations for P0 and using P0 + P1 + P2 + P3 = 1 we findP0 = 16

55 . Using this we can calculate E(X) = 5755 .

4 Queueing Theory

As usual, we ignore transient effects and assume steady state. The analysis is via steady state diagrams. Wehave the notation that a G1/G2/n queue is a queue whose arrival process is governed by a process G1, a service(departure) process given by G2, and n is the number of servers. Denote G1/G2/n/∞ to be the queue as abovewith infinite capacity (we usually omit ∞).


We’ll consider G1 = M(λ) and G2 = M(µ), where the arrival rate is λ and the individual service rate is µand G1, G2 are Poisson streams (note the ′M ′ denotes Markov). The mean time between successive arrivals is 1

λ(from the exponential distribution). We will focus on M/M/n queues with n = 1, 2 specifically. We’ll also considerfinite or infinity capacity for n = 1. We’ll analyse the probability distribution of the system size, expected systemsize and waiting time in the system.

Suppose we have an M/M/1 = M/M/1/∞ queue. This is a single server queue with infinite capacity. Thereis an arrival rate λ of individuals and service rate µ. The state is the number of individuals in the systems, thatis the sum of the number of people in the system and the number of people being served. Let ρ = λ

µ be the trafficintensity parameter. We get the following steady state equations. λP0 = µP1,...,λPn = µPn+1,... By induction wecan see that P1 = ρP0, P2 = ρ2P0,..., Pn = ρnP0. We then find P0 from

∑∞n=0 Pn = 1. Hence P0 (

∑∞n=0 P

n) = 1which means that P0 = 1− ρ (provided ρ < 1). If ρ > 1 then there is no steady state solution and the system sizetends to ∞. Hence Pn = ρn(1− ρ).

We define Ls to be the mean number of individuals in the system and Lq to be the mean number of individu-als in the queue. Now

Ls =∞∑n=0

nPn =∞∑n=0

nρn(1− ρ) = ρ(1− ρ)∞∑n=0

nρn−1 =ρ

1− ρ

since∑∞n=0 nx

n−1 = 1(1−x)2 provided |x| < 1. We can obtain Lq in two ways. Firstly Lq =

∑∞n=1(n− 1)Pn = ρ1

1−ρ .

However Ls is the mean number of people in the queue plus the mean number of people being served, that isLq + 0 · P (System empty)+1 · P (System busy). Hence

Ls = Lq + 0 · P0 + 1 · (1− P0)⇒ ρ

1− ρ= Lq + ρ⇒ Lq =

ρ2

1− ρ.

Now let Ws be the average waiting time in the system and Wq be the average waiting time in the queue. We have

the result that Ws =∑∞n=0

(n+1µ

)Pn, that is if there are n customers in the system then there is a waiting time n

µ

in the queue and a time 1µ being served. Similarly to before we have Ws = Wq + 1

µ , hence

Ws =1

µ

∞∑n=0

nPn +1

µ=

1

µ

(ρ

1− ρ+ 1

)=

1

µ(1− ρ)=

ρ

λ(1− ρ)⇒Ws =

Lsλ.

We can see similarly that

Wq =ρ

µ(1− ρ)=

ρ2

λ(1− ρ)=Lqλ.

We can summarise this as Ws = Wq + 1µ , Ls = Lq + ρ and Ls = λWs, Lq = λWq (Little’s formula).

Suppose we have an M/M/2 queue with 2 servers and infinite capacity. The individual arrival rate is λ andthe service rate per server is µ.

Figure 5: The set up for the M/M/2 queue, note that a single line is formed and the first customer joins any emptyserver.


We can derive the steady state equations λP0 = µP1, λP1 = 2µP2,...,λPn = 2µPn+1 for n ≥ 1. We can solvethese for P0 to get P1 = 2ρP0,...,Pn = 2ρnP0 for n ≥ 1, where ρ = λ

2µ . We solve∑∞n=0 Pn = 1 to find P0, specifically

P0

(1 + 2

∞∑n=1

ρn

)= 1⇒ P0

(1 +

2ρ

1− ρ

)= 1⇒ P0 =

1− ρ1 + ρ

and Pn = 2

(1− ρ1 + ρ

)ρn for n ≥ 1.

This steady state solution is valid provided ρ < 1. As before we have

Ls =

∞∑n=0

nPn =2ρ

1− ρ2, Lq =

∞∑n=2

(n− 2)Pn =2ρ3

1− ρ2.

We can also use the relation Ls = Lq plus the expected number of people being served. To get Ws and Wq we will

use Little’s formula: Ws = Lsλ andWq =

Lqλ , that is the expected time spent in the system and the queue respectively.

M/M/N systemsSuppose we have a system with N servers, infinite capacity, an arrival rate λ and a service rate µ (per server).

Figure 6: The state diagram for the M/M/N/∞ system.

From the state diagram we can derive the steady state equations λP0 = µP1, λP1 = 2µP2,..., λPN−1 = NµPN ,λPn = NµPn+1 for all n ≥ N . From these equations we can derive

Pn =

ρn

n! P0 if n < N and ρ = λµ

ρn

N !P0

Nn−N if n ≥ N

provided we set ρ = λµ . As before we can set

∑∞n=0 Pn = 1 to solve for P0 and hence for Pn. In this case we can

apply Little’s theorem so Ls =∑∞n=0 nPn and Ws = Ls

λ , Lq =∑∞n=N (n−N)Pn and Wq =

Lqλ .

ExampleSuppose we have an M/M/1 queue with finite capacity N . The possible system states are 0, 1, 2, ..., N . The arrivalrate is λ and the service rate is µ and we define the state of the system to be the number of people in the shop. Wecan use our previous steady state equations to find the steady state probabilities Pn = ρnP0 for n ≤ N and ρ = λ

µ .

As usual we solve for P0 by setting∑Nn=0 Pn = 1, that is P0(1 + ρ+ ρ2 + ...+ ρN ) = 1. This is a geometric series so

P0 = 1−ρ1−ρN+1 provided ρ 6= 1. In the case of ρ = 1 we have P0(1 + 11 + ...+ 1N ) = 1 so P0 = 1

N+1 . Using the ρ 6= 1

case we have Pn = ρn(1−ρ)1−ρN+1 for n ≤ N . As N → ∞, the results are consistent with the infinite capacity case. Now

Ls =∑Nn=0 nPn = P0

∑Nn=0 nρ

n. Let X be the state of the system. Then

GX(θ) = E(θX) =

N∑n=0

θnP (X = n) =

N∑n=0

P0(θρ)n =P0(1− (θρ)N+1)

1− θρ.

We then compute G′X(1) to get a formula for E(X) in terms of N and ρ, specifically

Ls =ρ(1− (N + 1)ρN +NρN+1)

(1− ρ)(1− ρN+1).

In this case we are unable to apply Little’s theorem (due to customers potentially being turned away). We need toreplace λ by a modified effective arrival rate λeff = (1− PN )λ.


Little’s FormulaeLet λeff denote the effective arrival rate, that is the rate at which customers arrive and actually join the queue,that is the arrival rate for customers who eventually get served.

Little’s TheoremLs = λeffWs, Lq = λeffWq where Ls (Lq) is the average number of customers in the system (queue) and Ws (Wq)is the average waiting time in the system (queue).Idea of proofAn arriving customer sees Ls in the system (on average). The customer spends time Ws in the system beforedeparting, in this time a further λeffWs have arrived. Since we’re in steady state, the number seen on arrivalshould balance the number seen on departure, hence Ls = λeffWs.

In general λeff 6= λ. For M/M/1, M/M/2 queues with infinite capacity then λeff = λ. For an M/M/1 queue withfinite capacity N , λeff = λ(1− PN ). In general,

λeff =

∞∑n=0

(Probability of being in state n)×(Probability customer stays, given state n)×(Arrival rate to state n).

For example, in an M/M/1 queue, λeff =∑∞n=0 Pn × 1 × λ = λ. In a model with finite capacity N , λeff =

λ∑N−1n=0 Pn = λ(1− PN ).

Remark: We have Ls =∑∞n=0 nPn, Lq =

∑∞n=r(n − r)Pn (for r servers. Also, Ls = Lq+expected number

being served= Lq + 0P0 + 1P1 + ...+ rPr + r∑∞n=r+1 Pn.

Method

• Compute Pn, P0 via steady state diagrams.

• Compute Ls, Lq directly.

• Compute λeff , then Ws, Wq via Little’s Theorem.

Queue efficiencyIs an M/M/2 queue faster than two parallel M/M/1 systems?We take µ = 1 and let λ be the arrival rate with λ < 2.

Figure 7: The two queue systems to be compared. In the case of the two queues, new customers join either queuewith equal probability.

We can quote the results for Ls and Ws for the M/M/1 and M/M/2 queues. Ls = ρ1−ρ for M/M/1, Ls = 2ρ

1−ρ2

for M/M/2. For M/M/1, ρ = λ/21 and for M/M/2, ρ = λ

2µ = λ2 . For M/M/2, Ls = 2(λ/2)

1−(λ/2)2 = 4λ4−λ2 . For M/M/1

per server, Ls = (λ/2)1−(λ/2) = λ

2−λ , hence the system total is 2Ls = 2λ2−λ . For M/M/2, Ws = Ls

λ = 44−λ2 (which is

valid given our assumption that λ < 2). For each M/M/1 queue we have

Ws =Individual server Ls

Arrival rate to the server=

1

(λ/2)Ls =

2

2− λ.


We can now compare the two systems:

Waiting time in parallel M/M/1− Waiting time in M/M/2 =2

2− λ− 4

4− λ2=

2λ

4− λ2> 0.

Hence the expected waiting time in two M/M/1 queues is longer than that of one M/M/2 queue. We can alsocompare the expected number of people in each of the two systems:

Expected number in parallel M/M/1− Expected number in M/M/2 =2λ

2− λ− 4λ

4− λ2=

2λ2

4− λ2> 0.

Hence we can expect to see more people in the two M/M/1 queues than in the single M/M/2 queues.

Limitations in the model

1. Finite capacity is usual in realistic models. However for N large, the infinite capacity model is a goodapproximation.

2. Customers tend to opt for the queue of minimum length.

3. Arrival and service processes are not always Poisson.

5 Markov Chains

Each day is considered either cloudy (C), or sunny (S). If C occurs on any given day then on the following day Coccurs with probability 1

2 . If S occurs on any given day then it is cloudy with probability 13 on the next day.

Let Sn denote the event of being sunny on day n. Let Cn denote the event of being cloudy on day n. LetP (0) = (P (S0), P (C0)) denote the initial probability state at time 0. For example, if P (0) = (0, 1) then we are

certain that it is cloudy on day 0. We let P (n) = (P (Sn), P (Cn)). We want to know P (n) given some P (0). We canuse our information to draw a probability transition diagram.

Figure 8: The probability transition diagram for this weather example.

This gives the day-to-day probability transition rules: P (S1) = 23P (S0)+ 1

2P (C0) and P (C1) = 13P (S0)+ 1

2P (C0).Hence

(P (S1), P (C1)) = (P (S0), P (C0))

(23

13

12

12

)= (P (S0), P (C0))T.

The components in row 1 of T are the transitions from S and the entries in row 2 are the transitions from C. Notethat the row sum is always 1. Notice also that this rule does no depend on the day n. Hence P (n+1) = P (n)T . In


general T could depend on time n, and in this case we write T := T (n). However, here T does not depend on time.

By iteration, we see that P (1) = P (0)T ,...,P (n) = P (0)Tn. We can observe that as n → ∞ P (n) settles to a limitvector, which we call P . That is limn→∞ P (n) = P , moreover limn→∞ P (n+1) = P . But we have

limn→∞

P (n+1) = limn→∞

(P (n)T ) = ( limn→∞

P (n))T = P T ⇒ P = P T.

Let P = (P1, P2). We solve for P using linear algebra, specifically

(P1, P2) = (P1, P2)

(23

13

12

12

)⇒ (P1, P2) =

(2

3P1 +

1

2P2,

1

3P1 +

1

2P2

).

This matrix equation gives 3P2 = 2P1 for both components. However P1 + P2 = 1 (since P is a probability vector).Solving these simultaneous equations we get P1 = 3

5 and P2 = 25 . So P =

(35 ,

25

)describes the long term state

behaviour. P is also independent of P (0) in this example. The speed of convergence of P (n) to P depends onthe eigenvalues of T . Observe that P is a row eigenvector of T with eigenvalue 1. The next largest eigenvalue ofmodulus less than 1 determines this rate of convergence. To find this eigenvalue we solve det(T − λI) = 0 for λ,which gives the equation

(23 − λ

) (12 − λ

)− 1

6 = 0. We know that λ = 1 is a solution to this equation and λ = 16 is

the other. Hence |P (0) − P | ≤ c(

16

)nfor some c > 0.

When computing P (n) for small n it is possible to calculate these directly from the probability transition dia-gram, especially in the case of a large number of states in the system.

Some conventions use column vectors, and in this case we require the column sums of T to be equal to 1. Theproblem is equivalent by taking transposes.

General TheoryWe consider a system with m states 1, 2, ...,m (or labelled as (1), (2), ..., (m) or labelled E1, ..., Em) and timen ∈ Z, but usually n ≥ 0. The state at time n will be denoted by a RV Xn taking values in 1, ...,m. We willstudy P (Xn = j|Xn = i) for 1 ≤ i, j ≤ m and study (for example) P (Xn+1 = j|Xn = in, Xn−1 = in−1) and so on.We also assume that Tij = P (Xn+1 = j|Xn = i) is given for all 1 ≤ i, j ≤ m and all n. To work out the state attime n + 1, we just need to know the state at time n and the transition probabilities Tij . T is an m ×m matrixwith elements Tij = P (Xn+1 = j|Xn = i). The system is memoryless in the sense that we do not need the entirehistory X0, X1, ..., Xn−1, Xn to work out Xn+1, we just need Xn. The Markov (chain) property states that

P (Xn+1 = j|Xn = in, Xn−1 = in−1, ..., X0 = i0) = P (Xn+1 = j|Xn = in).

Moreover

P (Xn+1 = j =

m∑i=1

P (Xn+1 = j|Xn = i)P (Xn = i)⇒ P (n+1) = P (n)T.

Here P (n) = (P (Xn = 1), P (Xn = 2), ..., P (Xn = m)).

Remarks

1. m is usually finite, but we can take m ∈ N (a countable number of states), for example random walks whichwill be discussed later.

2. It suffices to take time n ≥ 0.

3. T could depend on time n. We write T := T (n). In this case P (n) = P (n−1)T (n), so P (n) = P (0)T (1)T (2)...T (n).An example of this situation would be a model with two states where the probability of changing state is 1

n+1for the n-th move.

4. T is an m × m matrix and each row sum∑j Tij = 1. Note in the probability transition diagram we omit

arrows where Tij = 0.

AimsWe want to study the long term, steady state behaviour of the system. We also want to classify the states in the


system in terms of their recurrence properties. That is, we want to know the frequency of visits to each state astime evolves.

Consider two urns, Urn 1 and Urn 2. These contain between them 3 balls labelled 1, 2, 3. A ball is selectedat random with an equal chance for any of them to be chosen. We then take that labelled ball and transfer it fromone urn to the other. The state of the system at time n, denoted by Xn is the number of balls in Urn 1. We wantto find P (n) and P (the behaviour as n→∞).

Figure 9: The probability transition diagram for the case of two urns and three balls.

From the probability transition diagram, we can see that

T =

0 1 0 013 0 2

3 00 2

3 0 13

0 0 1 0

.

We observe that as n increases, P (n) oscillates and does not converge to a limit. So P = limn→∞ P (n) does notexist as previously defined. However, solving P T = P subject to

∑3i=0 Pi = 1 gives a solution P =

(18 ,

38 ,

38 ,

18

).

Before solving this, let’s fully understand the two state Markov chain.

Figure 10: The probability transition diagram for the two state Markov chain, where 0 ≤ a, b ≤ 1.

From the probability transition diagram, we can see that

T =

(1− a ab 1− b

).


If P is the steady state probability then P = P T and P1 + P2 = 1. From the matrix equation we get

P1 = (1− a)P1 + bP2, P2 = aP1 + (1− b)P2 ⇒ aP1 = bP2.

Combining this with the probability equation gives P =(

ba+b ,

aa+b

)for 0 < a, b < 1. Solving det(T − λI) = 0 gives

the eigenvalues λ = 1 and λ = 1− a− b ∈ (−1, 1) for 0 < a, b < 1. In this case, if |1− a− b| < 1 then for any P (0),

P (0)Tn → P with rate bounded by |1− a− b|. There are also two special cases to consider. If a = b = 1 then thestate vectors form an alternating sequence (1, 0)→ (0, 1)→ (1, 0).... Moreover

(12 ,

12

)→(

12 ,

12

)under T . However

if we redefine P to be the average of all of the state vectors then 1n

∑n−1k=0 P

(k) →n→∞

P . The second special case is

b = 0, a 6= 0. In this case P = (0, 1), P (n) = ((1− a)n, 1− (1− a)n), given P (0) = (1, 0).

Going back to the urn problem with three balls, P (n) oscillates with period 2, however

1

n

n−1∑k=0

P (k) →n→∞

P =

(1

8,

3

8,

3

8,

1

8

).

What if we now want to consider M balls in two urns? Again each ball is equally likely to be chosen and Xn denotesthe number of balls in Urn 1.

Figure 11: The probability transition diagram for the urn problem with M balls and two urns.

As before we want to determine P . We observe that Tij = P (Xn+1 = j|Xn = i) is an (M + 1) × (M + 1)

matrix with zeros on the diagonal. We examine the vector P = (P0, P1, ..., PM ) such that P = P T subject to

P0 + P1 + ... + PM = 1. This gives Pi =(Mi

) (12

)Mfor 0 ≤ i ≤ M (this gives the Binomial distribution). For

M = 3, we saw P =(

18 ,

38 ,

38 ,

18

). However P (n) = P (0)Tn 6→ P as n→∞ (it oscillated). Moreover, for the long-run

average, 1n

∑n−1k=0 P

(k) → P as n→∞.

Classification of states (subchains)DefinitionA state j is accessible from i if (Tn)ij > 0 for some n > 0. Two states i, j communicate if they are accessible toeach other. This induces an equivalence relation (↔) on the states, that is if i and j communicate then

1. If i↔ j then j ↔ i.

2. i↔ i.

3. If i↔ j and j ↔ k then i↔ k (transitivity).

Consequently, communication splits the Markov chain into subchains. These are disjoint equivalence classes.

A problem in the study of Markov chains is finding these irreducible subchains (consisting of communicatingstates). A Markov chain is irreducible if all states within communicate with each other, for example the urnproblem is irreducible.

A state i is called absorbing if Tii = 1 and Tij = 0 for j 6= i. A state i is periodic with period k > 1 if(Tn)ii > 0 when k|n and (Tn)ii = 0 otherwise. State i is aperiodic if no such k exists. The urn problem is a Markovchain with period 2 since all states have period 2.

Recurrence of states


Let f(n)i be the probability of the first return to state i occurring at time n (starting from i at time 0). Let

fi =∑n≥1 f

(n)i , this is the probability of eventual return to state i (starting from i initially). Notice that

f(n)i 6= (T (n))ii, where (T (n))ij = P (Xn = j|X0 = i). The latter (T (n))ii includes intermediate returns before timen.

Classification of recurrenceIf fi = 1 then state i is called recurrent, hence the return to state i is certain (this is equivalently expressed by∑n≥1(t(n))ii =∞). If fi < 1 then state i is said to be transient and return is not certain (again characterised by∑n≥1(t(n))ii <∞). The number of returns is then governed by a geometric distribution with parameter fi in the

case of a transient state. In a recurrent state, any number of returns occurs with probability 1. Let µi =∑∞n=1 nf

(n)i ,

the expected recurrence time. For a recurrent state i, we say that state i is positively recurrent if µi is finite,but we say state i is null recurrent if µi is infinite. We say a state is ergodic if it is aperiodic and positivelyrecurrent. Similarly, a subchain is called ergodic if all of its communicating states are ergodic.

Remark: The urn problem has an irreducible Markov chain but is not ergodic. The previous example consid-ering sunny and cloudy weather can be shown to be ergodic.

For ergodic chains, there exists some n such that (T (n))ij > 0 for all i, j. For ergodic chains, we see that P (n) → P

with P = P T . This need not hold for irreducible (for example periodic) chains. Instead 1n

∑n−1k=0 P

(k) → P asn→∞.

General approachGiven a Markov matrix T , first draw the corresponding probability transition diagram.

• Decide which states communicate, and hence identify subchains.

• Decide which state are absorbing, periodic or aperiodic.

• Decide which states are recurrent or transient. Calculate fi =∑n≥1 f

(n)i (the sum of the probabilities of first

return at time n for all positive n). If fi = 1 then state i is recurrent and if fi < 1 then state i is transient.

• If state i is recurrent, we compute µi =∑n≥1 nf

(n)i . If µi <∞ then state i is positively recurrent. If µi =∞

then state i is null recurrent (this requires the use of series convergence tests).

Remark: If T is a constant transition matrix (in time n) then every recurrent state has µi < ∞ (f(n)i → 0

exponentially fast).

6 Gambler’s Ruin Problems

We pose a classical problem. At each play of a game the gambler wins 1 with probability p and loses 1 withprobability q = 1 − p. The gambler aims to win N before losing to 0. The gambler starts with i for 0 ≤ i ≤ N .What is the probability that the gambler wins? Let Xn be the gambler’s fortune at time n, then Xn ∈ 0, 1, ..., N.By considering Markov chains, we observe that states 0 and N are absorbing states.

Figure 12: The probability transition diagram for the gambler’s ruin problem.

Let Ei be the event of winning given that we start in state i and θi be the probability of winning given thatwe start in state i. We can show that the states 1, ..., N − 1 are transient, so with probability 1 we will eventuallyreach either state 0 or state N . We aim to get a recurrence relation (difference equation) between θi, θi−1 and θi+1


which we will then solve for θi in terms of i. Notice that we can write

θi = P (Winning|Win at time 1)P (Win at time 1)+P (Winning|Lose at time 1)P (Lose at time 1) = pθi+1+qθi−1.

(12)

We know that θ0 = 0 and θN = 1. We take a trial solution θi = Aλi for some unknown λ and A. Then

Aλi = pAλi+1 + qAλi−1 ⇒ Aλi−1(pλ2 − λ+ 1) = 0⇒ (λ− 1)(pλ− q) = 0⇒ λ = 1 or λ =q

p.

Note that we should always have λ = 1 as a solution. Combining these two values of λ, we get the general solution

θi = A+B(qp

)i. From the boundary conditions we have

θ0 = 0⇒ A+B = 0, θN = 1⇒ A+B

(q

p

)N= 1⇒ θi =

1−(qp

)i1−

(qp

)N . (13)

This is valid only for q 6= p. If q = p = 12 then the general solution is

θi = A+Bi⇒ θi =i

N(14)

is the specific solution.

RemarkLet N → ∞, and keep state 0 as an absorbing state. From Equations 12 and 13 we can see that if p > q then weescape to∞ with positive probability. If p ≤ q then we reach state 0 with probability 1 (that is θi → 0 as N →∞).There are analogous results if the lower bound is allowed to tend to −∞ with N fixed, that is we will reach stateN with probability 1. If both boundaries are extended to ±∞, we get a random walk on Z.

If the states are a, a + s, a + 2s, ..., a + Ns = b for some a, b, s ∈ R, we need to translate the problem to0, 1, 2, ..., N in order to quote the results from Equations 12 and 13. For example if Xn ∈ −4,−2, 0, 2, 4then Yn = Xn+4

2 ∈ 0, 1, 2, 3, 4.

Extended models and applicationsThe general set up is that we have N states. Suppose r < N of the states are absorbing and define A = a1, a2, ..., arwhere ai is an absorbing state. Then A = W ∪ L where W is the set of winning states and L is the set of losingstates. Given some state i 6∈ A, we let Ei be the event of reaching W before L. Let θi = P (Ei). We have two results.

Lemma

θi =∑j

θjTij (15)

where (Tij) is the transition matrix.ProofBy the total law of probability

θi =∑j

P (Ei|i→ j)P (i→ j)⇒ θi =∑j

θjTij .

Note that this result is not the same as the steady state vector equation P = P T ⇒ Pj =∑i PiTij . The steady

state vector equation has non-zero values for states in A.

In order to solve this problem, we also need to impose boundary conditions. If state i ∈ A then θi = 0 if i ∈ Land θi = 1 if i ∈ W . In the classical gambler’s ruin problem, A = 0, N, L = 0 and W = N. Let Di be theexpected time to reach some state in A given that we start in some state i 6∈ A. From this we can define a set ofboundary conditions Di = 0 if state i ∈ A.


LemmaIf Tij is the transition matrix then

Di =

∑j

DjTij

+ 1. (16)

ProofLet E(n, i) be the event of reaching A in time n, starting in state i. Then

P (E(n, i)) =∑j

P (E(n, i)|i→ j)P (i→ j) =∑j

P (E(n− 1, j))Tij . (17)

By definition, Di =∑n≥0 nP (E(n, i)). Then Di =

∑n≥1

∑j nP (E(n − 1, j))Tij . We interchange the sums and

relabel n by n+ 1 then

Di =∑j

∑n≥0

(n+1)P (E(n, j))Tij ⇒ Di =∑j

∑n≥0

nP (E(n, j))

Tij+∑j

∑n≥0

P (E(n, j))

Tij =

∑j

DjTij

+1.

ExampleWe consider the classical gambler’s ruin problem. Immediately, Equation 15 gives Di = pDi+1 + qDi−1 + 1. This

is a difference equation with homogeneous solution Di = A + B(qp

)i, and we take a trial particular solution

Di = C + Ei. We substitute this into the above to find C and E. Then we use D0 = DN = 0 to find A and B. Ifp = q = 1

2 we can check that Di = i(N − i).

It is usually best to solve Equation 14 (for θi) or Equation 15 (for Di) by direct algebra of simultaneous equa-tions.

ExampleSuppose we toss a fair coin. How long do we have to wait in order to see 3 heads in a row?Let En be the event of seeing HHH at time n. Then the En are not independent over n, since if the n− 1-th tosscomes up T then En = En+1 = 0.

Figure 13: The probability transition diagram for the coin toss example. Note that the unlabelled arrows haveprobability 1

2 .

In this example we have states S, T,H,HH,HHH and DHHH = 0 since it is an absorbing state. FromEquation 15 we get

DS =1

2DH +

1

2DT + 1, DH =

1

2DHH +

1

2DT + 2, DHH =

1

2DHHH +

1

2DT + 1, DT =

1

2DT +

1

2DH + 1.

Using the boundary condition DHHH = 0 and back-substitution gives DS = 14. We can extend this for requiringN heads in a row. In this case we can show that DS = 2n+1 − 2.


ExampleSuppose coins are flipped in sequence. Player 1 wins when HH occurs and Player 2 wins when TH occurs. Thesequence of coin tosses continues until one of these events occurs. What is the probability that Player 1 wins?

Figure 14: The probability transition diagram for this game. Note that each arrow has probability 12 and the red

values are the probabilities of reaching each winning state.

We can see from the diagram that the probability of Player 1 winning is 14 , since the only way for Player 1 to

win is if the first two coins are H. We can see this another way. Let θi be the probability of reaching HH fromstate i. We want to know θS . The states of the system are S,H, T,HH, TH and the boundary conditions areθHH = 1, θTH = 0. We then solve θi =

∑Tijθj so θS = 1

2θH + 12θT , θT = 1

2θT + 12θTH , θH = 1

2θHH + 12θT . Solving

these equations simultaneously gives θS = 14 . We can also calculate Di, the mean time to finish given state i in a

similar way, noting the boundary conditions DHH = DTH = 0.

This idea generalises. Given any sequence of coin states, we can create a sequence that favourably beats theoriginal sequence by removing the last state and adding any state to the beginning of the sequence, for exampleHTHHTTH loses to HHTHHTT more often than not. Proving the relative probabilities proceeds in a similarmethod to above.

Stochastic Processes Notes

Documents

Transcript of Stochastic Processes Notes