ELEMENTS OF PROBABILITY pavl/Lec2_prob.pdfآ  Elements of Probability Theory † A collection...

download ELEMENTS OF PROBABILITY pavl/Lec2_prob.pdfآ  Elements of Probability Theory † A collection of subsets

of 28

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of ELEMENTS OF PROBABILITY pavl/Lec2_prob.pdfآ  Elements of Probability Theory † A collection...


  • Elements of Probability Theory

    • A collection of subsets of a set Ω is called a σ–algebra if it contains Ω and is closed under the operations of taking complements and countable unions of its elements.

    • A sub-σ–algebra is a collection of subsets of a σ–algebra which satisfies the axioms of a σ–algebra.

    • A measurable space is a pair (Ω,F) where Ω is a set and F is a σ–algebra of subsets of Ω.

    • Let (Ω,F) and (E,G) be two measurable spaces. A function X : Ω 7→ E such that the event

    {ω ∈ Ω : X(ω) ∈ A} =: {X ∈ A}

    belongs to F for arbitrary A ∈ G is called a measurable function or random variable.

  • Elements of Probability Theory

    • Let (Ω,F) be a measurable space. A function µ : F 7→ [0, 1] is called a probability measure if µ(∅) = 1, µ(Ω) = 1 and µ(∪∞k=1Ak) =

    ∑∞ k=1 µ(Ak) for all sequences of pairwise disjoint

    sets {Ak}∞k=1 ∈ F . • The triplet (Ω,F , µ) is called a probability space. • Let X be a random variable (measurable function) from

    (Ω,F , µ) to (E,G). If E is a metric space then we may define expectation with respect to the measure µ by

    E[X] = ∫

    X(ω) dµ(ω).

    • More generally, let f : E 7→ R be G–measurable. Then,

    E[f(X)] = ∫

    f(X(ω)) dµ(ω).

  • Elements of Probability Theory

    • Let U be a topological space. We will use the notation B(U) to denote the Borel σ–algebra of U : the smallest σ–algebra containing all open sets of U . Every random variable from a probability space (Ω,F , µ) to a measurable space (E,B(E)) induces a probability measure on E:

    µX(B) = PX−1(B) = µ(ω ∈ Ω; X(ω) ∈ B), B ∈ B(E).

    The measure µX is called the distribution (or sometimes the law) of X.

    Example 1 Let I denote a subset of the positive integers. A vector ρ0 = {ρ0,i, i ∈ I} is a distribution on I if it has nonnegative entries and its total mass equals 1:

    ∑ i∈I ρ0,i = 1.

  • Elements of Probability Theory

    • We can use the distribution of a random variable to compute expectations and probabilities:

    E[f(X)] = ∫


    f(x) dµX(x)


    P[X ∈ G] = ∫


    dµX(x), G ∈ B(E).

    • When E = Rd and we can write dµX(x) = ρ(x) dx, then we refer to ρ(x) as the probability density function (pdf), or density with respect to Lebesque measure for X.

    • When E = Rd then by Lp(Ω;Rd), or sometimes Lp(Ω; µ) or even simply Lp(µ), we mean the Banach space of measurable functions on Ω with norm

    ‖X‖Lp = ( E|X|p

    )1/p .

  • Elements of Probability Theory

    Example 2 i) Consider the random variable X : Ω 7→ R with pdf

    γσ,m(x) := (2πσ)− 1 2 exp

    ( − (x−m)


    ) .

    Such an X is termed a Gaussian or normal random variable. The mean is

    EX = ∫

    R xγσ,m(x) dx = m

    and the variance is

    E(X −m)2 = ∫

    R (x−m)2γσ,m(x) dx = σ.

    Since the mean and variance specify completely a Gaussian random variable on R, the Gaussian is commonly denoted by N (m,σ). The standard normal random variable is N (0, 1).

  • Elements of Probability Theory

    ii) Let m ∈ Rd and Σ ∈ Rd×d be symmetric and positive definite. The random variable X : Ω 7→ Rd with pdf

    γΣ,m(x) := ( (2π)ddetΣ

    )− 12 exp ( −1

    2 〈Σ−1(x−m), (x−m)〉


    is termed a multivariate Gaussian or normal random variable. The mean is

    E(X) = m (1)

    and the covariance matrix is

    E ( (X −m)⊗ (X −m)

    ) = Σ. (2)

    Since the mean and covariance matrix completely specify a Gaussian random variable on Rd, the Gaussian is commonly denoted by N (m,Σ).

  • Elements of Probability Theory

    Example 3 An exponential random variable T : Ω → R+ with rate λ > 0 satisfies

    P(T > t) = e−λt, ∀t > 0. We write T ∼ exp(λ). The related pdf is

    fT (t) = { λe−λt, t > 0,

    0, t < 0. (3)

    Notice that

    ET = ∫ ∞ −∞

    tfT (t)dt = 1 λ

    ∫ ∞ 0

    (λt)e−λtd(λt) = 1 λ


    If the times τn = tn+1 − tn are i.i.d random variables with τ0 ∼ exp(λ) then, for t0 = 0,

    tn = n−1∑



  • Elements of Probability Theory

    and it is possible to show that

    P(0 6 tk 6 t < tk+1) = e−λt(λt)k

    k! . (4)

  • Elements of Probability Theory

    • Assume that E|X| < ∞ and let G be a sub–σ–algebra of F . The conditional expectation of X with respect to G is defined to be the function E[X|G] : Ω 7→ E which is G–measurable and satisfies


    E[X|G] dµ = ∫


    X dµ ∀G ∈ G.

    • We can define E[f(X)|G] and the conditional probability P[X ∈ F |G] = E[IF (X)|G], where IF is the indicator function of F , in a similar manner.


  • Definition of a Stochastic Process

    • Let T be an ordered set. A stochastic process is a collection of random variables X = {Xt; t ∈ T} where, for each fixed t ∈ T , Xt is a random variable from (Ω,F) to (E,G).

    • The measurable space {Ω,F} is called the sample space. The space (E,G) is called the state space .

    • In this course we will take the set T to be [0, +∞). • The state space E will usually be Rd equipped with the

    σ–algebra of Borel sets.

    • A stochastic process X may be viewed as a function of both t ∈ T and ω ∈ Ω. We will sometimes write X(t), X(t, ω) or Xt(ω) instead of Xt. For a fixed sample point ω ∈ Ω, the function Xt(ω) : T 7→ E is called a sample path (realization, trajectory) of the process X.

  • Definition of a Stochastic Process

    • The finite dimensional distributions (fdd) of a stochastic process are the Ek–valued random variables (X(t1), X(t2), . . . , X(tk)) for arbitrary positive integer k and arbitrary times ti ∈ T, i ∈ {1, . . . , k}.

    • We will say that two processes Xt and Yt are equivalent if they have same finite dimensional distributions.

    • From experiments or numerical simulations we can only obtain information about the (fdd) of a process.

  • Stationary Processes

    • A process is called (strictly) stationary if all fdd are invariant under are time translation: for any integer k and times ti ∈ T , the distribution of (X(t1), X(t2), . . . , X(tk)) is equal to that of (X(s + t1), X(s + t2), . . . , X(s + tk)) for any s such that s + ti ∈ T for all i ∈ {1, . . . , k}.

    • Let Xt be a stationary stochastic process with finite second moment (i.e. Xt ∈ L2). Stationarity implies that EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s). The converse is not true.

    • A stochastic process Xt ∈ L2 is called second-order stationary (or stationary in the wide sense) if the first moment EXt is a constant and the second moment depends only on the difference t− s:

    EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s).

  • Stationary Processes

    • The function C(t) is called the correlation (or covariance) function of Xt.

    • Let Xt ∈ L2 be a mean zero second order stationary process on R which is mean square continuous, i.e.

    lim t→s

    E|Xt −Xs|2 = 0.

    • Then the correlation function admits the representation

    C(t) = ∫ ∞ −∞

    eitxf(x) dx, t ∈ R.

    • the function f(x) is called the spectral density of the process Xt.

    • In many cases, the experimentally measured quantity is the spectral density (or power spectrum) of the stochastic process.

  • Stationary Processes

    • Given the correlation function of Xt, and assuming that C(t) ∈ L1(R), we can calculate the spectral density through its Fourier transform:

    f(x) = 1 2π

    ∫ ∞ −∞

    e−itxC(t) dt.

    • The correlation function of a second order stationary process enables us to associate a time scale to Xt, the correlation time τcor:

    τcor = 1


    ∫ ∞ 0

    C(τ) dτ = ∫ ∞


    E(XτX0)/E(X20 ) dτ.

    • The slower the decay of the correlation function, the larger the correlation time is. We have to assume sufficiently fast decay of correlations so that the correlation time is finite.

  • Stationary Processes

    Example 4 Consider a second stationary process with correlation function

    C(t) = C(0)e−γ|t|.

    The spectral density of this process is

    f(x) = 1 2π

    C(0) ∫ ∞ −∞

    e−itxe−γ|t| dt

    = C(0) 1 π


    γ2 + x2 .

    The correlation time is

    τcor = ∫ ∞


    e−γt dt = γ−1.

  • Gaussian Processes

    • The most important class of stochastic processes is that of Gaussian processes:

    Definition 5 A Gaussian process is one for which E = Rd and all the finite dimensional distributions are Gaussian.

    • A Gaussian process x(t) is characterized by its mean

    m(t) := Ex(t)

    and the covariance function

    C(t, s) = E ((

    x(t)−m(t))⊗ (x(s)−m(s)) ) .

    • Thus, the first two moments of a Gaussian process are sufficient for a complete charac