18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175:...

50
18.175: Lecture 13 More large deviations Scott Sheffield MIT 18.175 Lecture 13

Transcript of 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175:...

Page 1: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

18.175: Lecture 13

More large deviations

Scott Sheffield

MIT

18.175 Lecture 13

Page 2: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Outline

Legendre transform

Large deviations

18.175 Lecture 13

Page 3: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Outline

Legendre transform

Large deviations

18.175 Lecture 13

Page 4: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0

at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex

function?I What’s the higher dimensional analog of rolling the tangent

line?

18.175 Lecture 13

Page 5: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0

at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex

function?I What’s the higher dimensional analog of rolling the tangent

line?

18.175 Lecture 13

Page 6: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?

I What is the Legendre dual of x2? Of the function equal to 0at 0 and ∞ everywhere else?

I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex

function?I What’s the higher dimensional analog of rolling the tangent

line?

18.175 Lecture 13

Page 7: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0

at 0 and ∞ everywhere else?

I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex

function?I What’s the higher dimensional analog of rolling the tangent

line?

18.175 Lecture 13

Page 8: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0

at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?

I What is the Legendre dual of the Legendre dual of a convexfunction?

I What’s the higher dimensional analog of rolling the tangentline?

18.175 Lecture 13

Page 9: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0

at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex

function?

I What’s the higher dimensional analog of rolling the tangentline?

18.175 Lecture 13

Page 10: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Legendre transform

I Define Legendre transform (or Legendre dual) of a functionΛ : Rd → R by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I Let’s describe the Legendre dual geometrically if d = 1: Λ∗(x)is where tangent line to Λ of slope x intersects the real axis.We can “roll” this tangent line around the convex hull of thegraph of Λ, to get all Λ∗ values.

I Is the Legendre dual always convex?I What is the Legendre dual of x2? Of the function equal to 0

at 0 and ∞ everywhere else?I How are derivatives of Λ and Λ∗ related?I What is the Legendre dual of the Legendre dual of a convex

function?I What’s the higher dimensional analog of rolling the tangent

line?18.175 Lecture 13

Page 11: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Outline

Legendre transform

Large deviations

18.175 Lecture 13

Page 12: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Outline

Legendre transform

Large deviations

18.175 Lecture 13

Page 13: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 14: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 15: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 16: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 17: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 18: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 19: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 20: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions

I Let X be a random variable.

I The moment generating function of X is defined byM(t) = MX (t) := E [etX ].

I When X is discrete, can write M(t) =∑

x etxpX (x). So M(t)

is a weighted average of countably many exponentialfunctions.

I When X is continuous, can write M(t) =∫∞−∞ etx f (x)dx . So

M(t) is a weighted average of a continuum of exponentialfunctions.

I We always have M(0) = 1.

I If b > 0 and t > 0 thenE [etX ] ≥ E [et min{X ,b}] ≥ P{X ≥ b}etb.

I If X takes both positive and negative values with positiveprobability then M(t) grows at least exponentially fast in |t|as |t| → ∞.

18.175 Lecture 13

Page 21: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions for i.i.d. sums

I We showed that if Z = X + Y and X and Y are independent,then MZ (t) = MX (t)MY (t)

I If X1 . . .Xn are i.i.d. copies of X and Z = X1 + . . .+ Xn thenwhat is MZ?

I Answer: MnX .

18.175 Lecture 13

Page 22: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions for i.i.d. sums

I We showed that if Z = X + Y and X and Y are independent,then MZ (t) = MX (t)MY (t)

I If X1 . . .Xn are i.i.d. copies of X and Z = X1 + . . .+ Xn thenwhat is MZ?

I Answer: MnX .

18.175 Lecture 13

Page 23: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Recall: moment generating functions for i.i.d. sums

I We showed that if Z = X + Y and X and Y are independent,then MZ (t) = MX (t)MY (t)

I If X1 . . .Xn are i.i.d. copies of X and Z = X1 + . . .+ Xn thenwhat is MZ?

I Answer: MnX .

18.175 Lecture 13

Page 24: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Large deviations

I Consider i.i.d. random variables Xi . Can we show thatP(Sn ≥ na)→ 0 exponentially fast when a > E [Xi ]?

I Kind of a quantitative form of the weak law of large numbers.The empirical average An is very unlikely to ε away from itsexpected value (where “very” means with probability less thansome exponentially decaying function of n).

18.175 Lecture 13

Page 25: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Large deviations

I Consider i.i.d. random variables Xi . Can we show thatP(Sn ≥ na)→ 0 exponentially fast when a > E [Xi ]?

I Kind of a quantitative form of the weak law of large numbers.The empirical average An is very unlikely to ε away from itsexpected value (where “very” means with probability less thansome exponentially decaying function of n).

18.175 Lecture 13

Page 26: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?

18.175 Lecture 13

Page 27: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?

18.175 Lecture 13

Page 28: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?

18.175 Lecture 13

Page 29: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?

18.175 Lecture 13

Page 30: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.

I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?

18.175 Lecture 13

Page 31: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?

18.175 Lecture 13

Page 32: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

General large deviation principle

I More general framework: a large deviation principle describeslimiting behavior as n→∞ of family {µn} of measures onmeasure space (X ,B) in terms of a rate function I .

I The rate function is a lower-semicontinuous mapI : X → [0,∞]. (The sets {x : I (x) ≤ a} are closed — ratefunction called “good” if these sets are compact.)

I DEFINITION: {µn} satisfy LDP with rate function I andspeed n if for all Γ ∈ B,

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ) ≤ lim sup

n→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x).

I INTUITION: when “near x” the probability density functionfor µn is tending to zero like e−I (x)n, as n→∞.

I Simple case: I is continuous, Γ is closure of its interior.I Question: How would I change if we replaced the measuresµn by weighted measures e(λn,·)µn?

I Replace I (x) by I (x)− (λ, x)? What is infx I (x)− (λ, x)?18.175 Lecture 13

Page 33: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj for i.i.d.

vectors X1,X2, . . . ,Xn in Rd with same law as X .

I Define log moment generating function of X by

Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),

where (·, ·) is inner product on Rd .

I Define Legendre transform of Λ by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.

18.175 Lecture 13

Page 34: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj for i.i.d.

vectors X1,X2, . . . ,Xn in Rd with same law as X .

I Define log moment generating function of X by

Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),

where (·, ·) is inner product on Rd .

I Define Legendre transform of Λ by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.

18.175 Lecture 13

Page 35: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj for i.i.d.

vectors X1,X2, . . . ,Xn in Rd with same law as X .

I Define log moment generating function of X by

Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),

where (·, ·) is inner product on Rd .

I Define Legendre transform of Λ by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.

18.175 Lecture 13

Page 36: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj for i.i.d.

vectors X1,X2, . . . ,Xn in Rd with same law as X .

I Define log moment generating function of X by

Λ(λ) = ΛX (λ) = logMX (λ) = logEe(λ,X ),

where (·, ·) is inner product on Rd .

I Define Legendre transform of Λ by

Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)}.

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction Λ∗.

18.175 Lecture 13

Page 37: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Thinking about Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj .

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction

I (x) = Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)},

where Λ(λ) = logM(λ) = Ee(λ,X1).I This means that for all Γ ∈ B we have this asymptotic lower

bound on probabilities µn(Γ)

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ),

so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).I and this asymptotic upper bound on the probabilities µn(Γ)

lim supn→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x),

which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).

18.175 Lecture 13

Page 38: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Thinking about Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj .

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction

I (x) = Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)},

where Λ(λ) = logM(λ) = Ee(λ,X1).

I This means that for all Γ ∈ B we have this asymptotic lowerbound on probabilities µn(Γ)

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ),

so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).I and this asymptotic upper bound on the probabilities µn(Γ)

lim supn→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x),

which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).

18.175 Lecture 13

Page 39: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Thinking about Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj .

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction

I (x) = Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)},

where Λ(λ) = logM(λ) = Ee(λ,X1).I This means that for all Γ ∈ B we have this asymptotic lower

bound on probabilities µn(Γ)

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ),

so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).

I and this asymptotic upper bound on the probabilities µn(Γ)

lim supn→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x),

which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).

18.175 Lecture 13

Page 40: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Thinking about Cramer’s theorem

I Let µn be law of empirical mean An = 1n

∑nj=1 Xj .

I CRAMER’S THEOREM: µn satisfy LDP with convex ratefunction

I (x) = Λ∗(x) = supλ∈Rd

{(λ, x)− Λ(λ)},

where Λ(λ) = logM(λ) = Ee(λ,X1).I This means that for all Γ ∈ B we have this asymptotic lower

bound on probabilities µn(Γ)

− infx∈Γ0

I (x) ≤ lim infn→∞

1

nlogµn(Γ),

so (up to sub-exponential error) µn(Γ) ≥ e−n infx∈Γ0 I (x).I and this asymptotic upper bound on the probabilities µn(Γ)

lim supn→∞

1

nlogµn(Γ) ≤ − inf

x∈ΓI (x),

which says (up to subexponential error) µn(Γ) ≤ e−n infx∈Γ I (x).18.175 Lecture 13

Page 41: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer upper bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.

I For simplicity, assume that Λ is defined for all x (whichimplies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.

I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).

I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that

Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),

and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1

n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.

I General Γ: cut into finitely many pieces, bound each piece?

18.175 Lecture 13

Page 42: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer upper bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which

implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.

I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).

I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that

Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),

and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1

n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.

I General Γ: cut into finitely many pieces, bound each piece?

18.175 Lecture 13

Page 43: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer upper bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which

implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.

I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).

I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that

Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),

and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1

n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.

I General Γ: cut into finitely many pieces, bound each piece?

18.175 Lecture 13

Page 44: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer upper bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which

implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.

I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).

I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that

Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),

and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1

n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.

I General Γ: cut into finitely many pieces, bound each piece?

18.175 Lecture 13

Page 45: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer upper bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I For simplicity, assume that Λ is defined for all x (which

implies that X has moments of all orders and Λ and Λ∗ arestrictly convex, and the derivatives of Λ and Λ′ are inverses ofeach other). It is also enough to consider the case X hasmean zero, which implies that Λ(0) = 0 is a minimum of Λ,and Λ∗(0) = 0 is a minimum of Λ∗.

I We aim to show (up to subexponential error) thatµn(Γ) ≤ e−n infx∈Γ I (x).

I If Γ were singleton set {x} we could find the λ correspondingto x , so Λ∗(x) = (x , λ)− Λ(λ). Note then that

Ee(nλ,An) = Ee(λ,Sn) = MnX (λ) = enΛ(λ),

and also Ee(nλ,An) ≥ en(λ,x)µn{x}. Taking logs and dividingby n gives Λ(λ) ≥ 1

n logµn + (λ, x), so that1n logµn(Γ) ≤ −Λ∗(x), as desired.

I General Γ: cut into finitely many pieces, bound each piece?18.175 Lecture 13

Page 46: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer lower bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.

I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.

I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).

18.175 Lecture 13

Page 47: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer lower bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.

I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).

18.175 Lecture 13

Page 48: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer lower bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.

I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).

18.175 Lecture 13

Page 49: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer lower bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.

I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).

18.175 Lecture 13

Page 50: 18.175: Lecture 13 .1in More large deviationsmath.mit.edu/~sheffield/175/Lecture13.pdf18.175: Lecture 13 More large deviations Scott She eld MIT 18.175 Lecture 13. Outline Legendre

Proving Cramer lower bound

I Recall that I (x) = Λ∗(x) = supλ∈Rd{(λ, x)− Λ(λ)}.I We aim to show that asymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I It’s enough to show that for each given x ∈ Γ0, we have thatasymptotically µn(Γ) ≥ e−n infx∈Γ0 I (x).

I Idea is to weight the law of X by e(λ,x) for some λ andnormalize to get a new measure whose expectation is thispoint x . In this new measure, An is “typically” in Γ for largeΓ, so the probability is of order 1.

I But by how much did we have to modify the measure to makethis typical? Not more than by factor e−n infx∈Γ0 I (x).

18.175 Lecture 13