of 28

• date post

18-Mar-2020
• Category

## Documents

• view

4

0

Embed Size (px)

### Transcript of ELEMENTS OF PROBABILITY pavl/Lec2_prob.pdfآ  Elements of Probability Theory â€  A collection...

• ELEMENTS OF PROBABILITY THEORY

• Elements of Probability Theory

• A collection of subsets of a set Ω is called a σ–algebra if it contains Ω and is closed under the operations of taking complements and countable unions of its elements.

• A sub-σ–algebra is a collection of subsets of a σ–algebra which satisfies the axioms of a σ–algebra.

• A measurable space is a pair (Ω,F) where Ω is a set and F is a σ–algebra of subsets of Ω.

• Let (Ω,F) and (E,G) be two measurable spaces. A function X : Ω 7→ E such that the event

{ω ∈ Ω : X(ω) ∈ A} =: {X ∈ A}

belongs to F for arbitrary A ∈ G is called a measurable function or random variable.

• Elements of Probability Theory

• Let (Ω,F) be a measurable space. A function µ : F 7→ [0, 1] is called a probability measure if µ(∅) = 1, µ(Ω) = 1 and µ(∪∞k=1Ak) =

∑∞ k=1 µ(Ak) for all sequences of pairwise disjoint

sets {Ak}∞k=1 ∈ F . • The triplet (Ω,F , µ) is called a probability space. • Let X be a random variable (measurable function) from

(Ω,F , µ) to (E,G). If E is a metric space then we may define expectation with respect to the measure µ by

E[X] = ∫

X(ω) dµ(ω).

• More generally, let f : E 7→ R be G–measurable. Then,

E[f(X)] = ∫

f(X(ω)) dµ(ω).

• Elements of Probability Theory

• Let U be a topological space. We will use the notation B(U) to denote the Borel σ–algebra of U : the smallest σ–algebra containing all open sets of U . Every random variable from a probability space (Ω,F , µ) to a measurable space (E,B(E)) induces a probability measure on E:

µX(B) = PX−1(B) = µ(ω ∈ Ω; X(ω) ∈ B), B ∈ B(E).

The measure µX is called the distribution (or sometimes the law) of X.

Example 1 Let I denote a subset of the positive integers. A vector ρ0 = {ρ0,i, i ∈ I} is a distribution on I if it has nonnegative entries and its total mass equals 1:

∑ i∈I ρ0,i = 1.

• Elements of Probability Theory

• We can use the distribution of a random variable to compute expectations and probabilities:

E[f(X)] = ∫

S

f(x) dµX(x)

and

P[X ∈ G] = ∫

G

dµX(x), G ∈ B(E).

• When E = Rd and we can write dµX(x) = ρ(x) dx, then we refer to ρ(x) as the probability density function (pdf), or density with respect to Lebesque measure for X.

• When E = Rd then by Lp(Ω;Rd), or sometimes Lp(Ω; µ) or even simply Lp(µ), we mean the Banach space of measurable functions on Ω with norm

‖X‖Lp = ( E|X|p

)1/p .

• Elements of Probability Theory

Example 2 i) Consider the random variable X : Ω 7→ R with pdf

γσ,m(x) := (2πσ)− 1 2 exp

( − (x−m)

2

) .

Such an X is termed a Gaussian or normal random variable. The mean is

EX = ∫

R xγσ,m(x) dx = m

and the variance is

E(X −m)2 = ∫

R (x−m)2γσ,m(x) dx = σ.

Since the mean and variance specify completely a Gaussian random variable on R, the Gaussian is commonly denoted by N (m,σ). The standard normal random variable is N (0, 1).

• Elements of Probability Theory

ii) Let m ∈ Rd and Σ ∈ Rd×d be symmetric and positive definite. The random variable X : Ω 7→ Rd with pdf

γΣ,m(x) := ( (2π)ddetΣ

)− 12 exp ( −1

2 〈Σ−1(x−m), (x−m)〉

)

is termed a multivariate Gaussian or normal random variable. The mean is

E(X) = m (1)

and the covariance matrix is

E ( (X −m)⊗ (X −m)

) = Σ. (2)

Since the mean and covariance matrix completely specify a Gaussian random variable on Rd, the Gaussian is commonly denoted by N (m,Σ).

• Elements of Probability Theory

Example 3 An exponential random variable T : Ω → R+ with rate λ > 0 satisfies

P(T > t) = e−λt, ∀t > 0. We write T ∼ exp(λ). The related pdf is

fT (t) = { λe−λt, t > 0,

0, t < 0. (3)

Notice that

ET = ∫ ∞ −∞

tfT (t)dt = 1 λ

∫ ∞ 0

(λt)e−λtd(λt) = 1 λ

.

If the times τn = tn+1 − tn are i.i.d random variables with τ0 ∼ exp(λ) then, for t0 = 0,

tn = n−1∑

k=0

τk

• Elements of Probability Theory

and it is possible to show that

P(0 6 tk 6 t < tk+1) = e−λt(λt)k

k! . (4)

• Elements of Probability Theory

• Assume that E|X| < ∞ and let G be a sub–σ–algebra of F . The conditional expectation of X with respect to G is defined to be the function E[X|G] : Ω 7→ E which is G–measurable and satisfies

G

E[X|G] dµ = ∫

G

X dµ ∀G ∈ G.

• We can define E[f(X)|G] and the conditional probability P[X ∈ F |G] = E[IF (X)|G], where IF is the indicator function of F , in a similar manner.

• ELEMENTS OF THE THEORY OF STOCHASTIC PROCESSES

• Definition of a Stochastic Process

• Let T be an ordered set. A stochastic process is a collection of random variables X = {Xt; t ∈ T} where, for each fixed t ∈ T , Xt is a random variable from (Ω,F) to (E,G).

• The measurable space {Ω,F} is called the sample space. The space (E,G) is called the state space .

• In this course we will take the set T to be [0, +∞). • The state space E will usually be Rd equipped with the

σ–algebra of Borel sets.

• A stochastic process X may be viewed as a function of both t ∈ T and ω ∈ Ω. We will sometimes write X(t), X(t, ω) or Xt(ω) instead of Xt. For a fixed sample point ω ∈ Ω, the function Xt(ω) : T 7→ E is called a sample path (realization, trajectory) of the process X.

• Definition of a Stochastic Process

• The finite dimensional distributions (fdd) of a stochastic process are the Ek–valued random variables (X(t1), X(t2), . . . , X(tk)) for arbitrary positive integer k and arbitrary times ti ∈ T, i ∈ {1, . . . , k}.

• We will say that two processes Xt and Yt are equivalent if they have same finite dimensional distributions.

• From experiments or numerical simulations we can only obtain information about the (fdd) of a process.

• Stationary Processes

• A process is called (strictly) stationary if all fdd are invariant under are time translation: for any integer k and times ti ∈ T , the distribution of (X(t1), X(t2), . . . , X(tk)) is equal to that of (X(s + t1), X(s + t2), . . . , X(s + tk)) for any s such that s + ti ∈ T for all i ∈ {1, . . . , k}.

• Let Xt be a stationary stochastic process with finite second moment (i.e. Xt ∈ L2). Stationarity implies that EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s). The converse is not true.

• A stochastic process Xt ∈ L2 is called second-order stationary (or stationary in the wide sense) if the first moment EXt is a constant and the second moment depends only on the difference t− s:

EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s).

• Stationary Processes

• The function C(t) is called the correlation (or covariance) function of Xt.

• Let Xt ∈ L2 be a mean zero second order stationary process on R which is mean square continuous, i.e.

lim t→s

E|Xt −Xs|2 = 0.

• Then the correlation function admits the representation

C(t) = ∫ ∞ −∞

eitxf(x) dx, t ∈ R.

• the function f(x) is called the spectral density of the process Xt.

• In many cases, the experimentally measured quantity is the spectral density (or power spectrum) of the stochastic process.

• Stationary Processes

• Given the correlation function of Xt, and assuming that C(t) ∈ L1(R), we can calculate the spectral density through its Fourier transform:

f(x) = 1 2π

∫ ∞ −∞

e−itxC(t) dt.

• The correlation function of a second order stationary process enables us to associate a time scale to Xt, the correlation time τcor:

τcor = 1

C(0)

∫ ∞ 0

C(τ) dτ = ∫ ∞

0

E(XτX0)/E(X20 ) dτ.

• The slower the decay of the correlation function, the larger the correlation time is. We have to assume sufficiently fast decay of correlations so that the correlation time is finite.

• Stationary Processes

Example 4 Consider a second stationary process with correlation function

C(t) = C(0)e−γ|t|.

The spectral density of this process is

f(x) = 1 2π

C(0) ∫ ∞ −∞

e−itxe−γ|t| dt

= C(0) 1 π

γ

γ2 + x2 .

The correlation time is

τcor = ∫ ∞

0

e−γt dt = γ−1.

• Gaussian Processes

• The most important class of stochastic processes is that of Gaussian processes:

Definition 5 A Gaussian process is one for which E = Rd and all the finite dimensional distributions are Gaussian.

• A Gaussian process x(t) is characterized by its mean

m(t) := Ex(t)

and the covariance function

C(t, s) = E ((

x(t)−m(t))⊗ (x(s)−m(s)) ) .

• Thus, the first two moments of a Gaussian process are sufficient for a complete charac