Convergence of Random Variables - Stanford University
Transcript of Convergence of Random Variables - Stanford University
![Page 1: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/1.jpg)
Convergence of Random Variables
John Duchi
Stats 300b – Winter Quarter 2021
Convergence of Random Variables 1–1
![Page 2: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/2.jpg)
Outline
I convergence definitions
I continuous mapping and Slutsky’s theorems
I big-O notation
I major convergence theorems
Reading: van der Vaart Chapter 2
Convergence of Random Variables 1–2
![Page 3: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/3.jpg)
Basics of convergence
DefinitionLet Xn be a sequence of random vectors. Then Xn converges inprobability to X ,
Xnp→X
if for all ε > 0,
P(‖Xn − X‖ ≥ ε)→ 0 as n→∞
Convergence of Random Variables 1–3
![Page 4: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/4.jpg)
Convergence in distribution / weak convergence
DefinitionFor random variables Xn ∈ R and X ∈ R, Xn converges indistribution to X ,
Xnd→X or Xn X
if for all x such that x 7→ P(X ≤ x) is continuous,
P(Xn ≤ x)→ P(X ≤ x) as n→∞
Convergence of Random Variables 1–4
![Page 5: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/5.jpg)
Convergence in distribution / weak convergence
DefinitionFor metric space-valued random variables Xn,X , Xn converges indistribution to X if for all bounded continuous f
E[f (Xn)]→ E[f (X )] as n→∞
Convergence of Random Variables 1–5
![Page 6: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/6.jpg)
Convergence in pth mean
DefinitionVector-valued random variables Xn converge in Lp (or converge inpth mean) to X ,
XnLp→ X
if E[‖Xn − X‖p]→ 0 as n→∞.
Convergence of Random Variables 1–6
![Page 7: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/7.jpg)
Almost sure convergence
DefinitionRandom variables converge almost surely, Xn
a.s.→ X , if
P(
limn
Xn 6= X)
= 0 or P(
limn‖Xn − X‖ ≥ ε
)= 0
for all ε > 0.
Convergence of Random Variables 1–7
![Page 8: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/8.jpg)
Standard implications
Convergence of Random Variables 1–8
![Page 9: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/9.jpg)
An example (the central limit theorem)
Convergence of Random Variables 1–9
![Page 10: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/10.jpg)
Portmanteau Lemma
TheoremLet Xn,X be random vectors. The following are all equivalent.
(1) Xnd→X
(2) E[f (Xn)]→ E[f (X )] for all bounded continuous f
(3) E[f (Xn)]→ E[f (X )] for all 1-Lipschitz f with f (x) ∈ [0, 1]
(4) lim infn E[f (Xn)] ≥ E[f (X )] for all continuous nonnegative f
(5) lim infn P(Xn ∈ O) ≥ P(X ∈ O) for all open sets O
(6) lim supn P(Xn ∈ C ) ≤ P(X ∈ C ) for all closed sets C
(7) limP(Xn ∈ B) = P(X ∈ B) for all sets B such thatP(X ∈ bdB) = 0
Convergence of Random Variables 1–10
![Page 11: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/11.jpg)
A few remarks on the Portmanteau Lemma
I A collection F is a convergence determining class if
E[f (Xn)]→ E[f (X )] for all f ∈ F if and only if Xnd→X
I Characteristic functions of the form f (x) = exp(itT x) fori =√−1 and t ∈ Rd are convergence determining
I Boundedness of f in the Portmanteau lemma is important
Convergence of Random Variables 1–11
![Page 12: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/12.jpg)
Proof sketches of the Pormanteau lemma
Convergence of Random Variables 1–12
![Page 13: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/13.jpg)
Continuous mapping theorems
Theorem (Continuous mapping)
Let g be continuous on a set B such that P(X ∈ B) = 1. Then
(1) Xnp→X implies g(Xn)
p→ g(X )
(2) Xnd→X implies g(Xn)
d→ g(X )
(3) Xna.s.→ X implies g(Xn)
a.s.→ g(X )
Convergence of Random Variables 1–13
![Page 14: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/14.jpg)
Slutsky’s theorems
Theorem (Slutsky)
(1) Xnd→ c if and only if Xn
p→ c
(2) Xnd→X and d(Xn,Yn)
p→ 0 implies that Ynd→X
(3) Xnd→X and Yn
p→ c implies(Xn
Yn
)d→(Xc
)
Convergence of Random Variables 1–14
![Page 15: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/15.jpg)
Proof sketches
Convergence of Random Variables 1–15
![Page 16: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/16.jpg)
Consequences of Slutsky’s theorems
Corollary
If Xnd→X and Yn
d→ c , then
(1) Xn + Ynd→X + c
(2) XnYnd→ cX
(3) If C ∈ Rd×d with det(C ) 6= 0, and Ynd→C , then
Y−1n Xnd→C−1X
Convergence of Random Variables 1–16
![Page 17: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/17.jpg)
Example: t-type statistics
Example
Let Xiiid∼ P with Cov(Xi ) = Σ � 0, set
µn :=1
n
n∑i=1
Xi Sn :=1
n
n∑i=1
(Xi − µn)(Xi − µn)T .
Then
Tn :=1√nS−1/2n
n∑i=1
(Xi − µn) satisfies Tnd→N (0, I ).
Convergence of Random Variables 1–17
![Page 18: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/18.jpg)
Big-O Notation
Let Xn be random vectors, Rn ∈ R be random variables. We write
Xn = oP(Rn) if Xn = YnRn, where Ynp→ 0
and
Xn = OP(Rn) if Xn = YnRn where Yn = OP(1)
and Yn = OP(1) means that Yn is uniformly tight, that is,
lim supM→∞
supn
P (‖Yn‖ ≥ M) = 0.
Convergence of Random Variables 1–18
![Page 19: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/19.jpg)
Some consequences of big-O notation
LemmaWe have oP(1) + oP(1) = oP(1), OP(1) + OP(1) = OP(1), andoP(Rn) = OP(Rn)
LemmaLet R : Rd → Rk satisfy R(0) = 0 and let Xn
p→ 0. Then
(1) If R(h) = o(‖h‖p) as h→ 0, then R(Xn) = oP(‖Xn‖p)
(2) If R(h) = O(‖h‖p) as h→ 0, then R(Xn) = OP(‖Xn‖p)
Convergence of Random Variables 1–19
![Page 20: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/20.jpg)
Compactness and distributional convergence
DefinitionA collection of random vectors {Xα}α∈A is uniformly tight if for allε > 0, there exists M <∞ such that
supα∈A
P (‖Xα‖ ≥ M) ≤ ε.
Remark: A single random vector is tight
Remark: if Xnd→X then {Xn}N∈N is uniformly tight
Convergence of Random Variables 1–20
![Page 21: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/21.jpg)
Prohorov’s Theorem
Theorem (Prohorov)
A collection {Xα}α∈A is uniformly tight if and only if it issequentially compact for convergence in distribution, that is, for allsequences {Xn} ⊂ {Xα}α∈A, there is a subsequence n(k) such that
Xn(k)d→X as k →∞ for some random vector X .
Convergence of Random Variables 1–21
![Page 22: Convergence of Random Variables - Stanford University](https://reader034.fdocument.org/reader034/viewer/2022042914/626a49943a948c071e547f64/html5/thumbnails/22.jpg)
A final useful convergence theorem
Theorem (Scheffe)
For a measure µ, let fn → f µ-almost everywhere and assumelim supn
∫|fn|pdµ ≤
∫|f |pdµ <∞. Then
‖fn − f ‖pLp(µ) =
∫|fn − f |pdµ→ 0
Convergence of Random Variables 1–22