Download - Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Transcript
Page 1: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Thermodynamics, Statistical Mechanics and LargeDeviations

Stefano OllaCourse Notes

November 2011

Page 2: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2

Page 3: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Chapter 1

Thermodynamics: a crash course

1.1 Thermodynamic equilibrium states

From a mechanical point of view, the equilibrium state of an elastic wire is char-acterized by its length L, that can be changed applying a tension (force) τ on theextremes. The resulting length is an increasing function of τ : L = F (τ). We alsoobserve that this function F depends also on the temperature θ of the wire, typicallyincreasing with θ. The first object of thermodynamics is to introduce this parameterθ, whose definition (or measurement) is much more delicate than L or τ .

The definition of temperature goes through defining when two systems are atthe same temperature, by what is called the 0th principle of thermodynamics : 1:

If a system A remains in equilibrium when isolated and placed inthermal contact first with system B and then with system C, the equi-librium of B and C will not be disturbed when they are placed in contactwith each other.

Here “remains in equilibrium” means that the relation between L and τ doesnot change. We also use here the concept of isolated system and thermal contact,that require respectively the notion of adiabatic wall and conductive wall.

The system A and B are separated by an adiabatic wall if they can havedifferent equilibrium relation between L and τ . They are separated by a conductivewall if the must have the same equilibrium relation.

1The numbering of the principles in thermodynamics follows a inverse chronological order: thesecond principle was postulated by Carnot in 1824, the first principle was clearly formulated byHelmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero principle announcedhere was realized by Fowler in 1931. See the detailed discussion in the first chapter of Zemansky

3

Page 4: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

4 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

One could see all these as circular definition, in fact all this is equivalent aspostulating the existence of adiabatic and diathermic (thermally conductive) wallsthat are defined as devices that have the above properties. From all this we obtainthe existence of the parameter θ that we call temperature (see in Zemanski a verydetailed discussion of this point).

So we can define the equilibrium relation L = L(τ, θ). Since it is strictlyincreasing in both variable we can also write τ = τ(L, θ), as well as θ = θ(L, τ), i.e.any two of these three variables can be chosen independently in order to characterizea thermodynamic equilibrium state.

1.2 Differential changes of equilibrium states

Suppose we have our wire in the thermodynamic equilibrium defined by the valueτ, θ. This can be obtained by applying a tension τ to one extreme and fixing theother to a wall, and applying a thermal bath at temperature θ, for example a verylarge (infinite) system at this temperature on the other side of the conductive wall.If we perform infinitesimal changes of these parameters, they imply an infinitesimalvariation dL of the lenght:

dL =

(∂L∂θ

dθ +

(∂L∂τ

dτ (1.2.1)

These partial derivatives are connected with physical important quantities that canbe measured experimentally:

• the linear dilation coefficient:

α =1

L

(∂L∂θ

(1.2.2)

Experimentally it is observed that α(τ, θ) depends little by τ , but changes verystrongly with θ.

• the isothermal Young modulus

Y =LA

(∂τ

∂L

(1.2.3)

where A is the section of the wire. Experimentally Y depends little on τ andstrongly on θ.

We also call Cθ = 1L

(∂L∂τ

the isothermal compressibility.

Page 5: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.2. DIFFERENTIAL CHANGES OF EQUILIBRIUM STATES 5

It is an elementary exercise to prove that(∂τ

∂L

(∂L∂θ

= −(∂τ

∂θ

)L

(1.2.4)

and consequently (∂τ

∂θ

)L

= − α

Cθ(1.2.5)

An infinitesimal variation of the tension can be written in function of dθ and dL:

dτ =

(∂τ

∂θ

)Ldθ +

(∂τ

∂L

dL = − α

Cθdθ +

1

CθLdL (1.2.6)

At constant volume we have

dτ = − α

Cθdθ (1.2.7)

One of the main issues in discussing foundations of thermodynamics is thephysical meaning of these differential changes of equilibrium states. In principle,as we actually change the tension of the cable, the system will go into a sequenceof non-equilibrium states before to relax to the new equilibrium. But, quotingZemanski, thermodynamics does not attempt to deal with any problem involving therate at which the process takes place. And, always quoting Zemanski:

Every infinitesimal in thermodynamics must satisfy the requirementthat it represents a change in a quantity which is small with respect tothe quantity itself and large in comparison with the effect produced bythe behavior of few molecules.

1.2.1 Work

In an arbitrary quasi static infinitesimal transformation, the differential form τdL iscalled work (or differential work). It is clear that this is not an exact differential form,but in thermodynamics books it is used the notation d\W . This is elementary, lookingat the path of a transformation in the (τ,L) coordinates frame. In performing aclosed path, that we call cycle, maybe through a sequence of isobar and isocoretransformations, the path integral

∮d\W 6= 0 (equal to the area inside the path),

and represent the work done on the system by the external force (tension)2.

2Mathematically the difference with the notion of work in mechanics is that here the force τ isalso a function of the temperature

Page 6: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

6 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

1.2.2 Internal Energy and Heat exchange

The point of the first principle of thermodynamics is to preserve the mechan-ical notion of energy as a conserved quantity also during thermodynamic transfor-mations, which means to assume the existence of a state function U(L, θ) thatrepresent the internal energy of the system in the corresponding thermodynamicequilibrium state. During a quasi-static thermodynamic infinitesimal transforma-tion, this energy is modified by the work d\W and, since dU has to be an exactdifferential, by some other (not exact) differential form d\Q called heat exchange:

dU = τdL+ d\Q (1.2.8)

It is important to notice that work and heat are determined by specifying the processof change, and they are not functions of the state of the system. Mathematically thismeans they are not exact differentials. As we have already said, in mechanics anychange of the energy of a system is caused by the work done by external forces. If wewant to reduce the first principle to a purely mechanical interpretation (that will bethe scope of statistical mechanics), this will be the following. The system has many(a very large number) degrees of freedom and many external forces acting on them.Some, few, of these forces are controlled, ordered, macroscopic and slow, and thework done by these we still call it work, in our case τ is this ordered and controlledslow force, and τdL the work associated. The other forces are many, uncontrolled(or disordered, in the sense that we do not have information on them), microscopicand fast. The amount of this uncontrolled or disordered work or exchange of energywe call heat.

One of the main problem of the statistical mechanic interpretation of thermo-dynamics is to separate the slow macroscopic degree of freedom that generate workfrom the fast microscopic ones that generate heat. The slow degrees of freedom aregenerally associated to conserved quantities of the isolated system (with no externalforces acting on it or thermal contact with other systems).

So the first principle (1.2.8) defines this separation of scales (in space and intime), whose mechanical explanation impose the use of probability to describe theuncontrolled forces.

1.2.3 Thermodynamic transformations and cycles

We can represent a finite thermodynamic transformation by integration along pathof the differential forms defined above. Each choice of a path defines a differentthermodynamic process or quasi static transformation. Depending on the type oftransformation it may be interesting to make a different choice of the coordinatesto represent it graphically.

Page 7: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.2. DIFFERENTIAL CHANGES OF EQUILIBRIUM STATES 7

Often is used the τ − L diagrams.

The first diagram on the left describe a quasi-static transformation for lenghtLi to Lf . If this is happening for example as a free expansion means that the tensionτ is decreasing, but it could be increasing if instead τ is pulling with respect themechanical equilibrium. The second diagram represent a compression from Lf toLi, and the third a so called cycle, returning to the original state. The shaded arearepresent the work done during the transformation (taken with the negative sign inthe second diagram). In the third the work is given by the integral along the cycle

∆W =

∮τdL (1.2.9)

that by the first principle will be equal to −∆Q, where ∆Q is the total heat producedby the process during the cycle and transmitted to the exterior (or absorbed by theexterior, depending from the sign).

There are some important thermodynamic quasi static transformation we wantto consider:

• Isothermal transformations : While a force perform work on the system, thisis in contact with a thermostat, a huge system in equilibrium at a giventemperature θ, so big that the exchange of heat with our elastic does notperturb the equilibrium state of the thermostat. Ideally a thermostat is aninfinite system. During a isothermal transformation only the length L changesas effect of the change of the tension dτ , and the infinitesimal exchanges ofheat and work are related by

d\W = τdL = τ

(∂L∂τ

dτ = −d\Q+ dU (1.2.10)

The isothermal transformations defines isothermal lines parametrized by thetemperature (each temperature defines an isothermal line in the τ − L plane.

Page 8: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

8 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

• Adiabatic transformations : The system is thermically isolated from the exte-rior. This means that the only force acting on it is given by the tension τ .Equivalently are transformations such that d\Q = 0, and

d\W = τdL = dU (1.2.11)

Adiabatic transformations defines adiabatic lines, but their construction isdone by solving the ordinary differential equation

dL= −∂LU

∂τU(1.2.12)

• Isocore Transformations : Thermodynamic transformation at fixed length L.Consequently d\W = 0, no work if perfomed to or by the system, and

d\Q = dU (1.2.13)

• Isobar transformations : Thermodynamic transformation at fixed tension L,dτ = 0

Carnot Cycles

A Carnot cycle is a cycle composed by a sequence of isothermal and adiabatic quasistatic transformations. In particular is a special machine that generates work fromthe heat difference of two thermostats. SO different Carnot cycles can be composedin a sequence etc.

Let us consider the following cycle. The states A,B are at the same temperatureθ2 and C and D at the temperature θ1. let us assume that θ2 > θ1. We assumethat A and C are in the same adiabatic curve, so are B and D. We perform an (hot)isothermal transformation from A to B, then and adiabatic from B to D, then an(cold) isothermal from D to C, then another adiabatic from C to A.

During the isothermal extension of the wire from A to B, it absorb a quantity ofheat (energy)Q2 from the thermostat at temperature θ2, correspondingly it exchange−Q1 with the thermostat θ1 during the isothermal transformation DC. Since duringadiabatic transformations there is no exchange of heat, during the all cycle the totalheat that the system exchange with the exterior is Q2 − Q1. By the first principlethis is equal to the work done by the system τ :

W =

∮τ dL = Q2 −Q1

Page 9: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.2. DIFFERENTIAL CHANGES OF EQUILIBRIUM STATES 9

So, unless Q1 = 0, not all heat absorbed from the hot thermostat is changed inwork. We define the efficiency of the Carnot cycle:

η =W

Q2

= 1− Q1

Q2

The cycle is reversible, i.e. we can do all the operations in the reverse order. In thiscase the wire absorbs the work W , the quantity Q1 is adsorbed by he system at thecold temperature θ1 and Q2 is given to the hot thermostat θ2. So in the reversedcycle is able to move heat from the cold thermostat to the hot, by absorbing a workW .

1.2.4 Second Principle and Entropy

The second principle of thermodynamics has equivalent statements in terms of aCarnot machine.

We enunciate first the Lord Kelvin statement of the second law:

if W > 0, then Q2 > 0 and Q1 > 0. (1.2.14)

Equivalently η < 1. More prosaically we say that all thermodynamic cyclesthat transforms all heat extracted from the hot reservoir in work are impossible.

The Clausius statement of the second law is

if W = 0, then Q2 = Q1 > 0. (1.2.15)

i.e. it does not exists a cyclic thermodynamic transformation whose only result is atransfer of heat from the cold reservoir to the hot reservoir.

It can be proven that this two statements are equivalent.

The Kelvin postulate (1.2.14) has a simple and intuitive statement, but a verydeep consequence: it imply the existence of an absolute scale of temperature.

Proposition 1.2.1 There exists a universal function f such that for any Carnotcycle

Q2

Q1

= f(θ1, θ2) (1.2.16)

Proof of proposition 1.2.1.

Page 10: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

10 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

Consider another Carnot machine operating between the same temperaturesθ1, θ2, and let be Q′1, Q

′2 the corresponding heat exchanges. We want to prove that

Q2

Q1

=Q′2Q′1

(1.2.17)

Assume first that the ratio Q2

Q′2is a rational number N ′

N, so that N ′Q′2 − NQ2 = 0.

Let us assume that it is null. Now we can consider a cycle composed by N ′ cyclesof the second machine and by N cycles reversed of the first machine. The total heatexchanged with the thermostat at the (hot) temperature θ2 by this given by

Q2,tot = N ′Q′2 −NQ2 = 0 (1.2.18)

So the total amount of work done by the composed cycle is

Wtot = −Q1,tot = −(N ′Q′1 −NQ1) (1.2.19)

By the Kelvin postulate we must have W ≤ 0, that implies

N ′Q′1 ≥ NQ1 (1.2.20)

that impliesQ2

Q1

≥ Q′2Q′1

(1.2.21)

To obtain the opposite inequality, we have just to exchange the role of the twomachines. The equality (1.2.17) implies that efficiency does not depends on thespecific cycle or machine, but only by the temperatures θ1 and θ2.

Assume now that Q2

Q′2= α is not rational3. We can assume without any re-

striction, that W > 0 and W ′ > 0. In fact if this is not the case, just reverse thecorresponding cycle. By a lemma on the best rational approximation (cf. Sierpinski,Number Theory), there exist two increasing sequences on integers Nkk, N ′k suchthat

0 < α− Nk

N ′k<

1

N ′kN′k−1

(1.2.22)

This implies N ′kQ′2 −NkQ2 > 0. Now we let run the second cycle N ′k times and the

first cycle Nk times in the reverse direction. Consequently, by Lord Kelvin statement(1.2.14), we have Q1, Q

′1, Q2, Q

′2 > 0.

The total work is given by Wk = N ′kW′ − NkW . Let us assume first that

Wk > 0 for any k. Then by the first principle we have

0 < Wk = (N ′kQ′2 −NkQ2)− (N ′kQ

′1 −NkQ1)

3This argument was suggested by Tomasz Komorowski

Page 11: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.2. DIFFERENTIAL CHANGES OF EQUILIBRIUM STATES 11

By (1.2.22), N ′kQ′2 −NkQ2 → 0 as k →∞. By the Lord Kelvin statement (1.2.14),

N ′kQ′1 −NkQ1 > 0, and we obtain

N ′kQ′1 −NkQ1 −→

k→∞0

that implies Q1

Q′1= α = Q2

Q′2.

Assume now that Wk ≤ 0. Then

lim supk→∞

N ′kQ′1 −NkQ1 ≤ 0

It follows that Q1

Q′1≥ α.

By inverting the role of the machines, i.e. running the first cycle Nk timesand then the second cycle N ′k times in the reverse direction we obtain the oppositeinequality.

Proposition 1.2.2 For every θ0, θ1, θ2, we have

f(θ1, θ2) =f(θ0, θ2)

f(θ0, θ1)(1.2.23)

Proof of (1.2.23) : Let A1 and A2 two Carnot cycles working respectivelybetween temperature θ1 and θ0 and θ2 and θ0. Assume that they are chosen insuch a way that the amount of heat that they exchange with the thermostat attemperature θ0 are equal, and we denote it by Q0. Then A1 also exchange Q1 ofheat at temperature θ1, and

Q1

Q0

= f(θ0, θ1)

Similarly for the cycle A2:Q2

Q0

= f(θ0, θ2)

and we deduce thatQ1

Q2

=f(θ0, θ1)

f(θ0, θ2)(1.2.24)

But combining the two cycles in sequance, we obtain a cycle that exchange Q1

with the thermostat at temperature θ1, and Q2 to the thermostat at temperatureθ2 (the total heat exchanged with the thermostat θ0 is null). Consequently for thiscomposite cycle we have

Q2

Q1

= f(θ1, θ2) (1.2.25)

Page 12: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

12 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

Combining (1.2.23) and (1.2.25) we obtain (1.2.23).

It follovs that there exists a universal function g, defined up to a multiplicativeconstant, such that

Q2

Q1

=g(θ1)

g(θ2)

This defines an absolute temperature T = g(θ). The multiplicative constant is theused to define the different scales (C0, F0 etc.).

Thermodynamic entropy

Notice that in a simple Carnot cycle we hace Q1

T1= Q2

T2with Tj = g(θj). In terms of

the integration of the differential form d\QT

, this means∮d\QT

= 0 (1.2.26)

This is also true for a integration on any composite Carnot cycle (made by a sequenceof isothermal and adiabatic transformations). Since any cycle can be approximatedby composite Carnot cycles (exercice), (1.2.26) is actually valid for any cycle, i.e. any

closed curve on the state space. Consequently d\QT

is an exact form, i.e. the differentialof a function S of the state of the system. This function is called ThermodynamicEntropy. 4

If we choose L, U as parameter determining the state of the system, we have

dS = − τTdL+

1

TdU (1.2.27)

i.e.∂S

∂L= − τ

T,

∂S

∂U=

1

T(1.2.28)

It is also suggestive to use as parameters for the thermodynamic state of thewire S and L, and see the internal energy as function of these because we have

dU = τdL+ TdS (1.2.29)

so that we can interpret the absolute temperature T as a kind of thermal force whoseeffect is in changing the entropy together with the energy.

4I never believed the legend about Von Neumann and Shannon: entropy is a well defined physicalquantity, like energy, temperature, etc. A quantity that physicist can measure experimentally (atleast its variation, like for energy). Physicist knew perfectly what entropy is.

Page 13: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.3. INTENSIVE AND EXTENSIVE QUANTITIES 13

Irreversible transformations

We have worked until here with reversible thermodynamic transformations, that canbe decribed as quasistatic transformation and by continuous lines of the state space.Actually these reversibility is only an idealization. All transformations have to movefrom an equilibrium state to another, passing through some non equilibrium statethat the thermodynamics does not attempt to describe. Still thermodynamics, inparticular the second principle, can say something about the change of the entropy Swhen the system goes through non-equilibrium states in passing from an equilibriumto another. These transformations are not time reversible.

A reversible transformation, as the ones we have considered until now, is anordered succession of equilibrium states. It is quasi static, in the sense that does nottake into consideration the time taken by the system to relax to these equilibriumstates after each change. We can represent it as a line in the state space.

A real irreversible transformation is a temporal succession of equilibrium andnon-equilibrium states. Still taking into account only the equilibrium points, onecan represent these irreversible transformations as lines in the (equilibrium) statesspace. But the identification of τdL as the mechanical work and of TdS as the heattransfer is valid only for (reversible) quasi static processes.

1.3 Intensive and extensive quantities

Imagine our cable in equilibrium be divided in two equal parts (in such a way thatthe preserve the same boundary conditions that guarantees the original equilibrium).Those quantities that remains the same are called intensive (tension τ , temperatureT ), while the that are halved are called extensive (trivially the lenght L, internalenergy U , entropy S, ...). We also call the intensive quantities control parame-ters. We will see that when we consider extended systems, dynamically the controlparameters and the extensive quantities plays very different role.

1.4 Axiomatic Approach

we can proceed differently and make a more mathematical set-up of the thermody-namics with an axiomatic approach where the extensive quantities U,L are takenas basic thermodynamic coordinates to identify an equilibrium state and entropy isassumed as a state function satisfying certain properties.

axiom-1) There exist an open cone set Γ ⊂ R+ × R, and (U,L) ∈ Γ.

Page 14: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

14 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

axiom-2) There exists a C1–function S : Γ→ R such that

(i) S is concave,

(ii) ∂S∂U

> 0,

(iii) S is positively homogeneous of degree 1:

S(λU, λL) = λS(U,L), λ > 0 (1.4.1)

By 2ii, one can choose eventually S and L as thermodynamic coordinates, i.e.there exists a function U(S,L) such that ∂U

∂S> 0 (exercise).

We call

T =∂U

∂Stemperature

τ =∂U

∂Ltension

(1.4.2)

Exercise: Prove that U(S,L) is homogeneous of degree 1 (extensive), andT,L are homogeneous of degree 0 (intensive).

One can use also the intensive quantities τ, T as thermodynamic coordinates,and it is useful to define the Gibbs potential (also called free enthalpy):

G(T, τ) = infS,LU(S,L)− τL − TS

= infU,LU − τL − TS(U,L)

(1.4.3)

Exercice:

S(U,L) = infτ,T

1

TU − τ

TL − G(T, τ)

T

(1.4.4)

The differential forms d\Q = TdS is called heating, and τdL work, that imply

d\Q = −τdL+ dU (1.4.5)

Thermodynamic transformations that are quasistatic and reversible, and thecorresponding cycles are then defined as in the previous sections, and the corre-sponding work and heat exchange as integrals of hese diferential forms on the cor-responding lines defining the transformations.

More controversial is the definition of the non-reversible transformations. Theseare real thermodynamic transformations that take into consideration the fact thatthe system, in order to go from one equilibrium state to another, has to pas throughnon-equilibrium states. Without some theory, or modeling, of these non-equilibriumstates, all definitions of these non-reversible transformations remain vague.

Page 15: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.4. AXIOMATIC APPROACH 15

1.4.1 Extended thermodynamics: extended systems

A possible definition of a non-equilibrium state is to consider the system, in our casethe wire, as spatially extended, and with different parts of the system in differentequilibrium states. For example our wire could be constituted by two differentwires, that have the same constitutive materials (i.e. they are make by the samematerial and they have the same mass), but they are prepared in two differentequilibrium state, parametrized by the extensive quantities: (U1,L1), (U2,L2). Theinternal energy of the total system composed by the two wires glued together, willbe U1 + U2, while its length will be L1 + L2. Even though the wire is not inequilibrium, we can say that also the other extensive quantities are given by thesum of the corresponding values of each constitutive part in equilibrium, i.e. in theexample the entropy will be given by S(U1,L1) + S(U2,L2). Notice that concavityand homogeneity properties of S imply

S(U1,L1) + S(U2,L2) ≤ 2S

(U1 + U2

2,L1 + L2

2

)= S(U1 + U2,L1 + L2) (1.4.6)

This means that the composed wire, if in equilibrium with corresponding energy andlength values (U1 +U2,L1 +L2), has higher entropy than the sum of the entropy ofthe two subsystems at different equilibrium values. The equality is valid is U1 = U2

and L1 = L2.

Consequently if we have a time evolution (dynamics, etc.), that conserves thetotal energy (adiabatic transformation), and the total lenght (isocore transforma-tion, and that brings the total system in a global equilibrium, then the final resultof this evolution increase the thermodynamic entropy S.

In this framework, the second principle of thermodynamics intended as a strictincrease of the thermodynamic entropy if the system undergoes a non-reversibletransformation, is strictly related to the property of this transformation to bringthe system towards a global equilibrium.

More generally we can assign a continuous coordinate x ∈ [0, 1] to each ma-terial component of the wire (x is not the displacement or spacial position of thiscomponent). This component (that should be tought as containing respect a largenumber of atoms) is in equilibrium with an energy U(x) and stretch r(x). Thesefunctions should be thought as densities, we call them also profiles. The actualspacial displacement (position) of the component x is given by

L(x) =

∫ x

0

r(x′)dx′ (1.4.7)

The entropy of the component x is given by S(U(x), r(x)). This class of non-equilibrium states we can call local equilibrium states for obvious reasons.

Page 16: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

16 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

We can associate a total lenght, energy and entropy to these profiles (i.e. tothe corresponding non-equilibrium state):

Ltot =

∫ 1

0

r(x)dx, Utot =

∫ 1

0

U(x)dx, Stot =

∫ 1

0

S(U(x), r(x))dx. (1.4.8)

By concavity of S:Stot ≤ S(Utot,Ltot) (1.4.9)

Usual thermodynamics does not worry about time scales where the thermo-dynamic processes happens. But in the extended thermodynamics we can considertime evolutions of these profiles (typically evolving following some partial differen-tial equations). The actual time scale in which these evolution occuors with respectto the microscopic dynamivcs of the atoms, will be the subject of the hydrodynamiclimits that we will study in the later chapters.

So if denote by r(x, t) and U(x, t) the correponding time derivatives, we havefor the time evolution of the entropy:

∂tS(U(x, t), r(x, t)) =1

T

(U − τ r

)(1.4.10)

Example: adiabatic evolution, Euler equations

In this evolution, whose deduction from the microscopic dynamics we willstudy in detail in chapter xx, is given by

∂tr = ∂xπ

∂tπ = ∂xτ

∂tU = ∂x(τπ)− π∂xτ = τ∂xπ

(1.4.11)

This means that the material element x, whose position at time t is L(x, t), hasvelocity π(x, t) = ∂tL(x, t). The tension τ(U(x, t),L(x, t)) is the force acting onthe material element x (more precisely the gradient ∂xτ , since the resulting force isgiven by the difference of the tension on the right and on the left of the material

element). The total energy of the element x is given by E(x, t) = U(x, t) + π(x,t)2

2,

the sum of its internal energy and its kinetic energy. The dynamic is adiabatic, sothe total energy is changed only by the work τπ, more precisely by its gradient:

∂tE = ∂x(τπ) (1.4.12)

In particular ∂tS(U(x, t), r(x, t)) = 0, i.e. if the solution is C1, even the entropyis conserved also locally (in the sense that the entropy per component remainsunchanged).

Page 17: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

1.4. AXIOMATIC APPROACH 17

If one does not consider the effect of the boundary conditions (for example tak-ing periodic b.c.) this system has three conserved quantities (

∫rdx,

∫πdx,

∫Edx).

The system (3.4.3) is a non-linear hyperbolic system of equation. It is expectedthat any nontrivial solution will develop shocks. After appeareance of shock, theequations should be considered in a weak sense and a criterion of choice of the wealsoluton is that it should have a positive production of entropy. A mathematical the-orem that guarantee uniqueness of this entropy solution is still lacking. Eventuallyshocks will create dissipations ans the entropy solution, as t → ∞ will converge toflat profiles for the conserved quantities, i.e. to the system in global equilibrium.

Isothermal evolution: diffusion equation

Consider our wire immersed in a viscous liquid at temperature T uniform, thatacts as a thermostat on each element x of the wire. We can consider the evolutionof the local equilibrium distribution (U(x, t), r(x, t)), where the two parameter aredepending to each other onder the constraint that temperature T is constant in x.Velocities of the wire are damped to 0 and it turns out that the evolution is givenby the nonlinear diffusion equation:

∂tr(x, t) = ∂2xτ(x, t) (1.4.13)

where τ(x, t) = τ(r(x, t), T ), the tension as function of the lenght and temperature.

Since

∂tU(x, t) =∂U

∂L∂tr (1.4.14)

Page 18: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

18 CHAPTER 1. THERMODYNAMICS: A CRASH COURSE

Page 19: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Chapter 2

Large Deviations

2.1 Introduction

As Dembo and Zeitouni point out in the introduction to their monograph on thesubject [1], there is no real theory of large deviations, but a variety of tools thatallow analysis of small probability.

To give an idea of what we mean with large deviations, let us consider a se-quence of independent identical distributed real valued random variablesX1, X2, . . . , Xn

such that E(X2j ) = 1, and E(Xj) = 0. Let Sn = 1

n

∑iXi the empirical sum. The

weak law of large numbers says that for any δ > 0,

P(|Sn| ≥ δ) −→n→∞

0 (2.1.1)

The central limit theorem is a refinement that says

P(√nSn ∈ [a, b]) −→

n→∞

1√2π

∫ b

a

e−x2/2dx . (2.1.2)

In the case Xj ∼ N (0, 1), we have Sn ∼ N(0, 1/n), and we can compute explicitly

P(|Sn| ≥ δ) = 1− 1√2π

∫ δ√n

−δ√n

e−x2/2dx .

therefore (exercise)1

nlog P(|Sn| ≥ δ) −→

n→∞−δ

2

2(2.1.3)

Equation (2.1.3) is an example of a large deviation statement.

19

Page 20: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

20 CHAPTER 2. LARGE DEVIATIONS

2.2 Cramer’s Theorem in R

Let Xn a sequence of i.i.d. random variables on R with common probabilitydistribution α(dx). We define the moment generating function

M(λ) = E[eλX1

](2.2.1)

and let us assume that there exists λ∗ > 0 such that M(λ) <∞ if |λ| < λ∗. Noticethat, since |x| ≤ λ−1(eλx + e−λx) for any λ > 0, this condition implies that X1 isintegrable and we denote m = E(X1) ∈ R. It is easy to see that m = M ′(0). Weare interested in the logarithmic moment generating function

Z(λ) = log E[eλX1

](2.2.2)

By Jensen’s inequality, we have Z(λ) ≥ λm > −∞. Let DZ = λ : Z(λ) < +∞.Under our hypothesis, 0 ∈ DoZ (the interior of DZ).

Lemma 2.2.1 1. Z(·) is convex.

2. Z(·) is continuously differentiable in DoZ and

Z ′(λ) =E(X1e

λX1)

M(λ)λ ∈ DoZ .

Proof:

1. For any α ∈ [0, 1], it follows by Holder inequality

E(e(αλ1+(1−α)λ2)X1) ≤M(λ1)αM(λ2)1−α

and consequently

Z(αλ1 + (1− α)λ2) ≤ αZ(λ1) + (1− α)Z(λ2)

2. The function fε(x) = (e(λ+ε)x−eλx)/ε converges pointwise to xeλx, and |fε(x)| ≤eλx(eδ|x|− 1)/δ ≤ eλx(eδx + e−δx)/δ = h(x), for every |ε| ≤ δ. For any λ ∈ DoZ ,there exists a δ > 0 small enough such that E(h(X1)) ≤M(λ+δ)+M(λ−δ) <+∞. Then the result follows by the dominated convergence theorem.

Using the same argument one can prove that Z(·) ∈ C∞(DoZ). Computing thesecond derivative we obtain

Z ′′(λ) =E(X2

1eλX1)

M(λ)−(

E(X1eλX1)

M(λ)

)2

≥ 0

Page 21: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.2. CRAMER’S THEOREM IN R 21

Observe that Z ′′(0) = Var(X1). To avoid the trivial deterministic case, we assumethat Var(X1) > 0. It follows that Z ′′(λ) > 0 for any λ ∈ DoZ , i.e. Z(·) is strictlyconvex.

We define the rate function as the Fenchel-Legendre transform of ZI(x) = sup

λ∈Rλx−Z(λ) (2.2.3)

It is immediate to see that I is convex (as supremum of linear functions), hencecontinuous, and that I(x) ≥ 0. Furthermore we have that I(m) = 0. In fact byJensen’s inequality M(λ) ≥ eλm for any λ ∈ R, so that

λm−Z(λ) ≤ 0

and it is equal to 0 for λ = 0. We conclude that I(m) = 0.

Consequently m is a minimum of the convex positive function I(x). It followsthat I(x) is nondecreasing for x ≥ m and nonincreasing for x ≤ m.

Observe that if x > m and λ < 0

λx−Z(λ) ≤ λm−Z(λ)

that impliesI(x) = sup

λ≥0λx−Z(λ) x > m (2.2.4)

Similarly one obtains

I(x) = supλ≤0λx−Z(λ) x < m (2.2.5)

Here are other important properties of I(·):

Lemma 2.2.2 I(x)→ +∞ as |x| → ∞, and its level sets are compact.

Proof: If x > m ∨ 0, for any positive λ ∈ DZ ,

I(x)

x≥ λ− Z(λ)

x

and limx→+∞Z(λ)/x = 0, so we have limx→+∞ I(x)/x ≥ λ. Consequently its levelsets x : I(x) ≤ a are bounded, and closed by continuity of I.

We want to prove the following theorem:

Theorem 2.2.3 (Cramer) For any set A ⊂ R,

− infx∈Ao

I(x) ≤ lim infn→∞

1

nlog P(Sn ∈ A) ≤ lim sup

n→∞

1

nlog P(Sn ∈ A) ≤ − inf

x∈AI(x)

were Ao is the interior of A and A is the closure of A.

Page 22: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

22 CHAPTER 2. LARGE DEVIATIONS

2.2.1 Properties of Legendre transforms

We denote DI = x ∈ R : I(x) <∞.

Lemma 2.2.4

The function I is convex in DI , strictly convex in D0I and I ∈ C∞(DoI). Furthermore

for any x ∈ DoI there exists a unique λ ∈ DoZ such that

x = Z ′(λ)

and

λ = I ′(x)

Furthermore I(x) = λx−Z(λ).

We will say that x and λ are in duality if the conditions of the above lemma aresatisfied.

Proof: The function Fx(λ) = λx − Z(λ) has a maximum for λ = λ. This isbecause it is concave and ∂λFx(λ) = 0. It follows that I(x) = λx − Z(λ) and thatZ(λ) = supx λx− I(x). By the same argument Gλ(x) = λx− I(x) is maximizedby x.

2.2.2 Proof of Cramer’s theorem

Upper bound

Let us start with A a closed interval of the form Jx = [x,+∞) and let x > m. Thenthe exponential Chebycheff’s inequality gives for any λ > 0

P(Sn ≥ x) ≤ e−nλxE[ePni=1 λXi ] = e−nλxM(λ)n

Since λ > 0 is arbitrary, we can optimize the bound and obtain for x > m

1

nlog P(Sn ≥ x) ≤ − sup

λ>0λx−Z(λ) = −I(x) (2.2.6)

where we use (2.2.4) in the last equality. Similarly for x < m we obtain

1

nlog P(Sn ≤ x) ≤ − sup

λ<0λx−Z(λ) = I(x) (2.2.7)

Page 23: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.2. CRAMER’S THEOREM IN R 23

Consider now an arbitrary closed set C ⊂ R. If m ∈ C, then infx∈C I(x) = 0and the upper bound is trivial.

If m 6∈ C, let (x1, x2) be the largest open interval around m such that C ∩(x1, x2) = ∅, i.e.

C ⊆ (−∞, x1] ∪ [x2,+∞)

(if x1 = −∞ then C ⊆ [x2,+∞) and if x2 = +∞ then C ⊆ (−∞, x1]). Observethat x1 < m < x2. Consequently

P(Sn ∈ C) ≤ P(Sn ≥ x2) + P(Sn ≤ x1) ≤ 2 maxP(Sn ≥ x2),P(Sn ≤ x1)

and using (2.2.6) and (2.2.7)

1

nlog P(Sn ∈ C) ≤ −minI(x2), I(x1)+

1

nlog 2 (2.2.8)

and from the monotonicity of I(x) on (−∞, x1] and [x2,+∞)

infx∈C

I(x) ≥ minI(x2), I(x1)

which concludes the upper bound.

Lower bound

Given an open set G, it is enough to prove that for any x ∈ G

lim infn→∞

1

nlog P(Sn ∈ G) ≥ −I(x) .

To this end, it is enough to prove that for any x and any δ > 0,

lim infn→∞

1

nlog P(Sn ∈ (x− δ, x+ δ)) ≥ −I(x) .

Clearly it is enough to consider x such that I(x) <∞. Assume α has finite supportand that α((−∞, 0)) > 0, α((0,∞)) > 0. Then Z is finite everywhere (DZ = DoZ =R) and there exists a unique λ0 ∈ DoZ such that

I(x) = λ0x−Z(λ0) and x = Z ′(λ0)

Assuming x ≥ m, we have that λ0 ≥ 0.

Let us define the probability law on R

αλ0(dy) =eλ0y

M(λ0)α(dy)

Page 24: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

24 CHAPTER 2. LARGE DEVIATIONS

Notice that ∫y αλ0(dy) = Z ′(λ0) = x

Noting An,δ = (x1, . . . , xn) : (x1 + · · · + xn)/n ∈ (x − δ, x + δ) ⊂ Rn, then forδ1 < δ

P(Sn ∈ (x− δ, x+ δ)) ≥∫An,δ1

α(dx1) . . . α(dxn)

= M(λ0)n∫An,δ1

e−λ0(x1+···+xn)αλ0(dx1) . . . αλ0(dxn)

≥M(λ0)ne−nλ0(x+δ1)

∫An,δ1

αλ0(dx1) . . . αλ0(dxn)

By the law of large numbers, for any δ1 > 0∫An,δ1

αλ0(dx1) . . . αλ0(dxn) −→n→∞

1

so that

lim infn→∞

1

nlog P(Sn ∈ (x− δ, x+ δ)) ≥ − [λ0(x+ δ1)−Z(λ0)] = −I(x)− λ0δ1

Since δ1 < δ is arbitrary, we can let δ1 → 0 and it gives the result. If x < m, wehave λ0 < 0, and in the steps of the above we will have x− δ1 instead of x+ δ1.

Assume now α is of unbounded support with α((−∞, x)) > 0, α((x,∞)) > 0.Let A0 > 0 be such that α([−A0, x)) > 0, α((x,A0]) > 0. For any A ≥ A0 let βbe the law of X1 conditioned on |X1| ≤ A, and βn the law of Sn conditioned on|Xi| ≤ A, i = 1, . . . , n. Then, for all n ≥ 1 and every δ > 0,

αn((x− δ, x+ δ)) = βn((x− δ, x+ δ)) α([−A,A])n

The preceding result applies for βn. Note

ZA(λ) = log

∫ A

−Aeλyα(dy),

and observe that the logarithmic generating function of β is given by

ZA(λ)− logα([−A,A]) ≥ ZA(λ)

It follows that

lim infn→∞

1

nlogαn((x− δ, x+ δ)) ≥ inf

λ∈R

ZA(λ)− λx

(2.2.9)

Page 25: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.2. CRAMER’S THEOREM IN R 25

Let us define

I∗(x) = lim supA→∞

[supλ∈R

λx−ZA(λ)

]then we have

lim infn→∞

1

nlogαn((x− δ, x+ δ)) ≥ −I∗(x) (2.2.10)

Observe that ZA(·) is nondecreasing in A, ZA(0) ≤ Z(0) = 0, and thus −I∗(x) ≤ 0.Moreover the assumption α([−A0, x)) > 0, α((x,A0]) > 0 implies

λx−ZA(λ) ≥ − inf logα[−A0, x) , logα(x,A0]

Therefore we have −I∗(x) > −∞. The level sets λ; ZA(λ) − λx ≤ −I∗(x) arenon-empty, compact sets that are nested with respect to A. Then it exists λ0 intheir intersection and −I(x) ≤ Z(λ0)− λ0x = limA→∞ZA(λ0)− λ0x ≤ −I∗(x). By(2.2.10) we get

lim infn→∞

1

nlogαn((x− δ, x+ δ)) ≥ −I(x)

The proof for an arbitrary probability law α is completed by observing thatif either α((−∞, x)) or α((x,∞)) = 0 then Z(·) is a monotone function withinfλ∈RZ(λ)− λx = logα(x). Then we have

αn((x− δ, x+ δ)) ≥ αn(x) = α(x)n

and

lim infn→∞

1

nlog P(Sn ∈ (x− δ, x+ δ)) ≥ −I(x)

Remark 2.2.5 Notice that the proof contains the non-asymptotic bound (2.2.8), i.e.

∀n ≥ 1, P(Sn ∈ C) ≤ 2e−n infx∈C I(x) (2.2.11)

also called Chernoff’s bound.

Remark 2.2.6 The lower bound was obtained by using the change of variable inconjunction with the law of large numbers for the new probabilities. One can getbetter bound by using the central limit theorem, and obtain the following corollary

Page 26: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

26 CHAPTER 2. LARGE DEVIATIONS

Corollary 2.2.7 For any x > m,

limn→∞

1

nlog P(Sn ≥ x) = −I(x) if x > m

limn→∞

1

nlog P(Sn ≤ x) = −I(x) if x < m

(2.2.12)

Proof : By the central limit theorem∫x1+···+xn/n∈[x,x+δ1)

αλ0(dx1) . . . αλ0(dxn) −→n→∞

1

2

So in the proof of the lower bound one can substitute (x− δ, x+ δ) with [x, x+ δ).Since P(Sn ≥ x) ≥ P(Sn ∈ [x, x+ δ)) one obtains

lim infn→∞

1

nlog P(Sn ≥ x) ≥ −I(x)

The upper bound follows from the one in theorem 2.2.3.

Examples in R

1. Let α be the gaussian distribution

1√2πσ2

e−(x−m)2/2σ2

dx

then I(x) = (x − m)2/2σ2. In this case one can compute it directly, sinceSn − nm has law N (0, σ2/n).

2. α = 12(δ0 + δ1) (Bernoulli). Then M(λ) = 1

2(1 + eλ) and

I(x) = x log x+ (1− x) log(1− x) + log 2 if x ∈ [0, 1]

and I(x) = +∞ otherwise.

3. For the exponential law α(dx) = βe−βx1x≥0 dx, we have M(λ) = β/(β − λ)for −∞ < λ < β, otherwise M(λ) = +∞. Then

I(x) = βx− 1− log(βx) if x > 0

and I(x) = +∞ if x ≤ 0.

Page 27: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.3. CRAMER’S THEOREM IN RD 27

4. If ξ in a random variable with law N (0, 1/β), then ξ2 has law χ2(1), i.e. agamma law Γ(1/2, β/2), which has density

β1/2

√2Γ(1/2)

x−1/2e−βx

Its moment generating function is M(λ) = (β/(β − 2λ))1/2 if λ < β/2, other-wise equal to +∞. The rate function results

I(x) =1

2βx− log(βx)− 1 if x > 0

and +∞ if x < 0.

2.3 Cramer’s Theorem in Rd

Let Xn be a sequence of i.i.d. random variables in Rd, and denote α(dx) thecommon law. We define as before, for u ∈ Rd, the moment generating function andits logarithm

M(u) =

∫Rdeu·x α(dx), Z(u) = logM(u) (2.3.1)

and we denote DZ = u ∈ Rd : Z(u) < +∞. We assume that 0 ∈ DoZ . Then M(u)is smooth in this open set and ∇M(0) = m = E(X1).

The rate function is the Legendre-Fenchel transform of Z:

I(x) = supu∈Rdu · x−Z(u) (2.3.2)

As in the one dimensional case, it follows immediately from the definition that Iis non negative, convex, lower semicontinuous and I(m) = 0. Denoting DI = x :I(x) < +∞ we have similar properties as in the one dimensional case:

Lemma 2.3.1 I(x) ∈ C∞(DIo), and m ∈ (DIo). There exists a diffeomorphismbetween DIo and Doλ defined by

u∗ = (∇Z)(u), u = (∇I)(u∗) (2.3.3)

and

(∇2Z)(u) =[∇2I)(u∗)

]−1(2.3.4)

Page 28: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

28 CHAPTER 2. LARGE DEVIATIONS

Theorem 2.3.2 For any Borel set A ⊂ Rd,

− infx∈Ao

I(x) ≤ lim infn→∞

1

nlog P(Sn ∈ A) ≤ lim sup

n→∞

1

nlog P(Sn ∈ A) ≤ − inf

x∈AI(x)

were Ao is the interior of A and A is the closure of A.

Proof :

The lower bound is proven in the same way as in d = 1. Consider u∗ such thatI(u∗) < +∞. To simplify we assume there exists a unique u ∈ DoI such that

I(u∗) = u∗ · u−Z(u) u = (∇I)(u∗)

Then we consider the new probability law on Rd, absolutely continuous with respectto α, defined by

αu(dx) = eu·x−Z(u) α(dx)

Observe that ∫xαu(dx) = u∗

Noting An,δ = (x1, . . . ,xn) : |(x1 + · · ·+xn)/n−u∗| ≤ δ ⊂ Rn, then for any δ1 < δ

P(|Sn − u∗| < δ) ≥∫An,δ1

α(dx1) . . . α(dxn)

= M(u)n∫An,δ1

e−u·(x1+···+xn)αu(dx1) . . . αu(dxn)

= M(u)ne−nu·u∗∫An,δ1

e−u·[(x1+···+xn)−nu∗]αu(dx1) . . . αu(dxn)

≥ e−nI(u∗)e−n|u|δ1

∫An,δ1

αu(dx1) . . . αu(dxn)

The law of large numbers now says that

limn→∞

∫An,δ1

αu(dx1) . . . αu(dxn) = 1

and we obtain

lim infn→∞

1

nlog P(|Sn − u∗| < δ) ≥ −I(u∗)− |u|δ1

and letting δ1 → 0 we conclude that for any δ > 0 we have the lower bound

lim infn→∞

1

nlog P(|Sn − u∗| < δ) ≥ −I(u∗)

Page 29: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.3. CRAMER’S THEOREM IN RD 29

The upper bound requires a little more work. Convexity plays a role here.

Let C any Borel set in Rd. Then the exponential Chebicheff inequality impliesfor any u ∈ Rd

P(Sn ∈ C

)≤ exp

[−n inf

x∈Cu · x

]E(enu·Sn

)= exp

[−n inf

x∈Cu · x

]M(u)n

and optimizing in u ∈ Rd we obtain

1

nlog P

(Sn ∈ C

)≤ − sup

u∈Rdinfx∈C

[u · x−Z(u)] (2.3.5)

So to conclude we need to exchange “supu∈Rd” with “infx∈C”. This is immediate ifC is a convex set by the following lemma (c.f. [3], chapter 6):

Lemma 2.3.3 Let g(u,x) be convex and lower semicontinuous in x, concave anduppersemicontinuous in u, then if C is compact and convex

infx∈C

supu∈Rd

g(u,x) = supu∈Rd

infx∈C

g(u,x) (2.3.6)

Consider now any compact setK ⊂ Rd,there exists l > 0 such that infx∈K I(x) =l. By the ower semicontinuity of I(·), for a fixed ε > 0 and any x′ ∈ K, there existsa closed ball C(x′) such that

I(x) ≥ l − ε ∀x ∈ C(x′)

Since K is compact, there exists a finite subcover C(x′1), . . . , C(x′N) extracted fromthese closed ball. Then

P(Sn ∈ K

)≤

N∑j=1

P(Sn ∈ C(x′j)

)≤ N max

1≤j≤NP(Sn ∈ C(x′j)

)≤ N max

1≤j≤Nexp

(−n inf

C(x′j)I

)≤ Ne−n(l−ε)

and

lim supn→∞

1

nlog P

(Sn ∈ K

)≤ −(l − ε)

Since ε is arbitrary, this proves the upper bound for compact sets.

To extend this bound from compact to closed sets, we need to prove the expo-nential tightness of the distribution of Sn, i.e.

limρ→∞

limn→∞

1

nlog P

(Sn 6∈ Hρ

)= −∞ (2.3.7)

Page 30: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

30 CHAPTER 2. LARGE DEVIATIONS

where Hρ = [−ρ, ρ]d is the centered hypercube of length 2ρ. To prove this observe

that, denoting S(j)n is the average of X

(j)1 , . . . , X

(j)n , by applying the results obtained

in the one-dimensional case, we have

P(Sn 6∈ Hρ

)≤

d∑j=1

P(S(j)n 6∈ (−ρ, ρ)

)≤ d max

j=1,...,dexp

(−nminIj(ρ), Ij(−ρ)

)where Ij is the rate function for the j−marginal distribution of the law α. Then(2.3.7) follows by applying lemma 2.2.2.

2.4 Generalities on Large Deviations

Let X a complete separable metric space and Pn a family of probability distributionson X. In the previous sections X = Rd and Pn the distribution of Sn. We says thatPn satisfies a large deviation principle with good rate function I(·) if there existsa function I : X → [0,∞] such that:

1. I(·) is lower semicontinuous.

2. For each ` <∞ the set x : I(x) ≤ ` is compact in X.

3. For each closed set C ⊂ X

lim supn→∞

1

nlogPn(C) ≤ − inf

x∈CI(x).

4. For each open set G ⊂ X

lim infn→∞

1

nlogPn(G) ≥ − inf

x∈GI(x).

Here the adjective good refers to properties 1 and 2. The next lemma does notrequire the rate function I to be good.

Theorem 2.4.1 Varadhan’s Lemma. Let Pn satisfy the large deviation principlewith rate function I. Then for any bounded continuous function F (x) on X

limn→∞

1

nlog

∫enF (x)dPn(x) = sup

x∈XF (x)− I(x).

Page 31: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.4. GENERALITIES ON LARGE DEVIATIONS 31

Proof.

Upper bound. For any given δ > 0, since F is bounded and continuous, we canfind a finite number of closed sets covering X such that the oscillation of F (·) oneach of these closed sets is less or equal δ. Then∫

enF (x)dPn(x) ≤m∑j=1

∫Cj

enF (x)dPn(x) ≤m∑j=1

enFj+δPn(Cj)

where Fj = infCj F (x). It follows

lim supn→∞

1

nlog

∫enF (x)dPn(x) ≤ sup

1≤j≤m[Fj + δ − inf

CjI(x)]

≤ sup1≤j≤m

supCj

[F (x)− I(x)] + δ

= supx∈X

[F (x)− I(x)] + δ

Since δ is arbitrary, we can let it go to 0.

Lower bound. By definition of a supremum for any δ > 0 we can find y ∈ Xsuch that F (y)− I(y) ≥ supx[F (x)− I(x)]− δ/2. Since F is continuous we can findan open neighborhood U of y such that F (x) ≥ F (y)− δ/2 for any x ∈ U . Then weobtain

lim infn→∞

1

nlog

∫enF (x)dPn(x) ≥ lim inf

n→∞

1

nlog

∫U

enF (x)dPn(x)

≥ F (y)− δ

2− inf

x∈UI(x) ≥ F (y)− I(y)− δ

2≥ sup

x[F (x)− I(x)]− δ

and we conclude from the arbitrariness of δ.

Theorem 2.4.2 Contraction Principle. Let Pn satisfy the large deviation prin-ciple with rate function I, and π : X → Y a continuous mapping from X to anothercomplete separable metric space Y . Then Pn = Pnπ

−1 satisfies a large deviationprinciple with rate function

I(y) = infx:π(x)=y

I(x),

I(y) = +∞ if x : π(x) = y = ∅

Proof. Since π is continuous, given any closed set C ⊂ Y , the subset C = π−1(C) isclosed in X. Then

lim supn→∞

1

nlog Pn(C) = lim sup

n→∞

1

nlogPn(C) ≤ − inf

x∈CI(x) = − inf

y∈Cinf

x:π(x)=yI(x).

and similarly for the lower bound.

Page 32: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

32 CHAPTER 2. LARGE DEVIATIONS

2.5 Large deviations for densities

We deal first with the one-dimensional case. If the distribution of Sn on R hasa density that we denote by fn(x), from Cramers theorem we have the intuitionthat fn(x) ∼ e−nI(x) for large n. We will prove this under some condition on theprobability α(dx). It is interesting to notice that we will not use Cramer’s theoremin the proof, but the following local central limit theorem.

Theorem 2.5.1 Local central limit theorem. Let φ(k) the characteristic func-tion of a centered probability measure α(dx) with finite variance σ2, and assume that|φ(k)| < 1 if k 6= 0 and that there exists an integer r ≥ 1 such that |φ|r is integrable.Let gn(x) the probability density of (X1 + · · · + Xn)/

√n, where Xj are i.i.d. with

common law α. Then

limn→∞

gn(x) =1√

2πσ2e−x

2/2σ2

.

Proof. The characteristic function of α is defined by

φ(k) =

∫eixkα(dx) (2.5.1)

The characteristic function of the distribution of X1 + · · · + Xr is φr(k) that isintegrable. It follows that the probability density gn(x) exists for any n ≥ r (cf. [?],theorem XV.3.3). Then

gn(x) =1

∫ +∞

−∞e−ixk

(k√n

)]ndk

and therefore∣∣∣∣gn(x)− 1√2πσ2

e−x2/2σ2

∣∣∣∣ ≤ 1

∫ +∞

−∞

∣∣∣∣φ( k√n

)n− e−k2σ2/2

∣∣∣∣ dkGiven a > 0, we split the integral in three parts.

1. Uniformly in k ∈ [−a, a],

φ

(k√n

)n=

(1− k2σ2

2n+ o

(1

n

))n−→n→∞

e−k2σ2/2

so that ∫ +a

−a

∣∣∣∣φ( k√n

)n− e−k2σ2/2

∣∣∣∣ dk → 0

Page 33: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.5. LARGE DEVIATIONS FOR DENSITIES 33

2. Observe that it is possible to choose δ > 0 such that

|φ(k)| ≤ e−k2σ2/4 if |k| ≤ δ.

Then for the interval |k| ∈ (a, δ√n), we can estimate as∫ δ

√n

a

∣∣∣∣φ( k√n

)n− e−k2σ2/2

∣∣∣∣ dk ≤ ∫ δ√n

a

2e−k2σ2/4dk ≤

∫ +∞

a

2e−k2σ2/4dk

that converge to 0 as a→∞.

3. It remains to estimate the contribution from the interval (δ√n,+∞). Since

we assumed that |φ(k)| < 1 for k 6= 0, and since |φ|k is integrable, we haveφ(k)→ 0 as k →∞. Consequently we must have sup|k|≥δ |φ(k)| = η < 1, andwe can estimate∫ +∞

δ√n

∣∣∣∣φ( k√n

)n− e−k2σ2/2

∣∣∣∣ dk ≤ ηn−r∫ +∞

−∞

∣∣∣∣φ( k√n

)∣∣∣∣r dk +

∫ +∞

δ√n

e−k2σ2/2dk

= ηn−r√n

∫ +∞

−∞|φ (k)|r dk +

∫ +∞

δ√n

e−k2σ2/2dk

that converges to 0 as n→∞.

Distributions such that their characteristic function |φ(k)| < 1 for k 6= 0 arecalled non-lattice ( [2], chapter 2). It does not imply they have density.

We assume now that the measure α(dx) satisfies all the assumptions madein section 2.2, and furthermore its characteristic function satisfies conditions of thelocal central limit theorem 2.5.1. Then, for n ≥ r, the distribution of Sn on R hasa density that we denote by fn(x).

Theorem 2.5.2 For any y ∈ DoI we have

limn→∞

1

nlog fn(y) = −I(y) . (2.5.2)

Proof.

Let τyα the translation of the measure α by y. Assume that m =∫xα(dx) = 0,

otherwise just recenter it and consider τmα.

Let y ∈ DIo. Then by lemma 2.2.4 there exists a unique λ ∈ DZo such thaty = Z ′(λ), λ = I ′(y), and I(y) = λy −Z(λ). Define

α(y, dx) =1

M(λ)e(x+y)λτyα(dx)

Page 34: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

34 CHAPTER 2. LARGE DEVIATIONS

Observe that this is a probability distribution with 0 average. In fact∫α(y, dx) =

1

M(λ)

∫ezλα(dz) = 1

and ∫xα(y, dx) = −y +

1

M(λ)

∫zezλα(dz) = −y + Z ′(λ) = 0

So we treat here y as a parameter. Let Xy1 , . . . , X

yn i.i.d. random variables with law

given by α(y, dx).

For n ≥ r it exists the density for the distribution of (Xy1 + · · · + Xy

n)/n thatwe denote by fn(x, y), and it is equal to

fn(x, y) =en(x+y)λ

M(λ)nfn(x+ y) = en(I(y)+λx)fn(x+ y)

To prove this formula, compute, for a given bounded measurable function G(·):

E (G((Xy1 + · · ·+Xy

n)/n)) =

∫RnG(sn)en(I(y)+λsn)τyα(dx1) . . . τyα(dxn)

=

∫RG(s)en(I(y)+λs)fn(s+ y)ds

(2.5.3)

It follows thatfn(y) = e−nI(y)fn(0, y)

To conclude we only need to prove that (log fn(0, y))/n→ 0 as n→∞.

Let fn(x, y) the density of (Xy1 +· · ·+Xy

n)/√n. Then fn(x, y) =

√nfn(√nx, y).

By the local central limit theorem 2.5.1, the result follows immediately.

For y ∈ R define ν(n)y (dx1, . . . , dxn) the conditional distribution of (X1, . . . , Xn)

on the hyperplane x1 + · · ·+xn = ny. This is defined as the probability measure onRn−1 satisfying the relation

E(G(Sn)H(X1, . . . , Xn)

)=

∫Rdyfn(y)G(y)

∫H(x1, . . . , xn)ν(n)

y (dx1, . . . , dxn)

Lemma 2.5.3 Let F be a bounded continuous function on R and y ∈ DoI , λ = I ′(y).For every θ ∈ R, the limit

limn→∞

1

nlog

∫eθ(F (x1)+....+F (xn))ν(n)

y (dx1, . . . , dxn) = G(y, θ) (2.5.4)

exists and G is differentiable at θ = 0 with

∂G(y, θ)

∂θ

∣∣∣θ=0

=

∫F (x)αλ(dx). (2.5.5)

Page 35: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.5. LARGE DEVIATIONS FOR DENSITIES 35

Proof Denote by Hn(y, θ) the function∫eθ(F (x1)+....+F (xn))ν(n)

y (dx1, . . . , dxn) =Hn(y, θ)

fn(y)(2.5.6)

which, by (2.5.3), can be formally written as

Hn(y, θ) =

∫x1+···+xn=ny

eθ(F (x1)+....+F (xn))α(dx1) . . . α(dxn).

Let us denote

a(θ) =

∫eθF (x)α(dx), M(λ, θ) =

1

a(θ)

∫eλx+θF (x)α(dx)

Then we can compute the Cramer rate function for the law a(θ)−1eθF (x)α(dx), andthis is given by

Iθ(y) = I(y, θ) = supλ

λy − logM(λ, θ)

Observe that DIθ = DI because F is bounded. If (Y1, . . . , Yn) are i.i.d. distributedby a(θ)−1eθF (x)α(dx), then the density of the distribution of (Y1 + · · · + Yn)/n isgiven by a(θ)−nHn(y, θ). Then by applying 2.5.2 to this law we obtain

limn→∞

1

nlogHn(y, θ) = −I(y, θ) + log a(θ).

Consequently we have, applying again 2.5.2

limn→∞

1

nlog

∫eθ(F (x1)+....+F (xn))ν(n)

y (dx1, . . . , dxn) = limn→∞

1

nlogHn(y, θ)− lim

n→∞

1

nlog fn(y)

= log a(θ)− I(y, θ) + I(y) ≡ G(y, θ).

Differentiating G(y, θ) we have

∂G(y, θ)

∂θ=a′(θ)

a(θ)− ∂I(y, θ)

∂θ

In order to compute this last expression let us set λ∗(y, θ) = ∂yI(y, θ). Existence ofλ∗(y, θ) is provided by the assumption y ∈ DoI and the equality between the sets DIand DIθ . We have

I(y, θ) = λ∗y − logM(λ∗, θ).

Then, since ∂λ logM(λ∗, θ) = y, we get

∂θI(y, θ) = y∂θλ∗ −M−1 (∂θM + ∂λM∂θλ

∗) = −∂θ logM(λ∗, θ)

= ∂θ log a(θ)−M−1∂θ

∫eλx+θF (x)α(dx) =

a′(θ)

a(θ)−∫F (x)eλ

∗x+θF (x)−logM(λ∗,θ)α(dx)

Page 36: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

36 CHAPTER 2. LARGE DEVIATIONS

So we have

∂θG(y, θ) =

∫F (x)eλ

∗x+θF (x)−logM(λ∗,θ)α(dx)

and sending θ → 0 we obtain

∂θG(y, 0) =

∫F (x)eλ

∗(y,0)x−logM(λ∗(y,0),0)α(dx) =

∫F (x)αλ(dx)

Theorem 2.5.4 For any y ∈ DoI , and any ε > 0

limn→∞

ν(n)y

(∣∣∣∣∣ 1nn∑j=1

F (Xj)−∫F (x)αλ(dx)

∣∣∣∣∣ ≥ ε

)= 0 (2.5.7)

Proof. Without loosing any generality, let us assume that∫F (x)αλ(dx) = 0.

Consequently G(θ, y) = O(θ2). Then for any θ > 0

ν(n)y

(∣∣∣∣∣ 1nn∑j=1

F (Xj)

∣∣∣∣∣ ≥ ε

)≤ e−nθε

∫eθ|

Pnj=1 F (xj)|ν(n)

y (dx1, . . . , dxn)

≤ e−nθε∫eθ

Pnj=1 F (xj)ν(n)

y (dx1, . . . , dxn)

+e−nθε∫e−θ

Pnj=1 F (xj)ν(n)

y (dx1, . . . , dxn)

and by (2.5.4)

limn→∞

1

nlog ν(n)

y

(∣∣∣∣∣ 1nn∑j=1

F (xj)

∣∣∣∣∣ ≥ ε

)≤ −θε+ maxG(θ, y), G(−θ, y)

Optimizing the above bound in θ one obtains

limn→∞

1

nlog ν(n)

y

(∣∣∣∣∣ 1nn∑j=1

F (xj)

∣∣∣∣∣ ≥ ε

)≤ −Cε2

for some positive constant C.

Observe that ν(n)y is a symmetric measure, so we have∫

F (x1)ν(n)y (dx1, . . . , dxn) =

∫1

n

n∑j=1

F (xj)ν(n)y (dx1, . . . , dxn) −→

n→∞

∫F (x)αλ(dx)

Page 37: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.5. LARGE DEVIATIONS FOR DENSITIES 37

Theorem 2.5.5 Let F (x1, . . . , xk) a bounded continuous function on Rk and y ∈DoI , then

limn→∞

∫F (x1, . . . , xk)ν

(n)y (dx1, . . . , dxn) =

∫F (x1, . . . , xk)αλ(dx1) . . . αλ(dxk)

Proof. It is enough to consider functions of the form F (x1, . . . , xk) = F1(x1) . . . F (xk).For simplicity let us prove the case k = 2, the generalization to any k is straight-forward. Without loosing generality, let us assume that

∫Fj(x)αλ(dx) = 0. By the

exchange symmetry of ν(n)y we have∫

F1(x1)F2(x2)ν(n)y (dx1, . . . , dxn) =

∫1

n(n− 1)

∑i 6=j

F1(xi)F2(xj)ν(n)y (dx1, . . . , dxn)

=

∫n2

n(n− 1)

(1

n

∑i

F1(xi)

)(1

n

∑j

F2(xj)

)ν(n)y (dx1, . . . , dxn) +O

(1

n

)and this last expression converges to 0 an n→∞ by (2.5.7) .

The generalization to more dimensions of the above results is quite straight-forward and can be left as exercise. Let us state here what the result is in thiscontext.

Let α(dx) a probability measure on Rd that satisfies conditions used in section2.3. Let us assume that its characteristic function is such that |φ(k)| < 1 for k 6= 0,and such that |φ(k)|r is integrable on Rd for some integer r ≥ 1. Then, for n ≥ rthe n-convolution of α has a density and we denote by fn(x) the density of thedistribution of (X1 + · · ·+ Xn)/n, where Xj are i.i.d. with common distributionα(dx).

Theorem 2.5.6 For any y ∈ DoI we have

limn→∞

1

nlog fn(y) = −I(y) . (2.5.8)

Example Let V : R → R+ a positive function such that V (y) → +∞ for|y| → +∞, and such that

Z(λ, β) =

∫e−βV (y)+λydy <∞ ∀ λ ∈ R, β > 0.

Page 38: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

38 CHAPTER 2. LARGE DEVIATIONS

Then we can define the probability density (on R2)

fλ,β(r, p) =e−β(V (r)+p2/2)+λr√

2πβ−1Z(λ, β)(2.5.9)

Let Yj = (rj, pj) be a sequence of i.i.d. random variables with common lawgiven by f0,β0(r, p)dr dp, β0 > 0 fixed.

Then the vector valued random variables Xj = (rj, (V (rj) +p2j/2)) clearly has

a law α(dx) which is degenerate in R2 but α ∗ α has a density w.r.t. the Lebesguemeasure. Its logarithmic moment generating function is given by

Z(λ, η) = log

∫eλr+η(V (r)+p2/2)f0,β(r, p)drdp = log

Z(λ, β0 − η)

Z(0, β0)

√β0

β0 − η

for η < β0 and +∞ otherwise. The corresponding Legendre transform, for L ∈ Rand U > 0, is given by

I(L, U) = supη<β0,λ

λL+ ηU − logZ(λ, η)

= supβ>0,λ

λL − βU − log

(√2πβ−1Z(λ, β)

)+ β0U + log

(√2πβ−1

0 Z(0, β0)

)The function defined by

S(U,L) = infλ,β>0

−λL+ βU − log

(Z(λ, β)

√2πβ−1

)(2.5.10)

is called thermodynamic entropy. So we have obtained

I(L, U) = −S(U,L) + β0U + logZ(0, β0) +1

2log

β0

Observe that S does not depend on β0.

The density of the distribution of 1n

∑nj=1 Xj is given by

fn(L, U)

=

∫R2n

e−β0Pj Ej

(2πβ−10 )n/2Z(0, β0)n

δ

(1

n

n∑j=1

Ej − U ;1

n

n∑j=1

rj − L

)∏j

drjdpj

=e−nβ0E

(2πβ−10 )n/2Z(0, β0)n

∫R2n

δ

(1

n

n∑j=1

Ej − U ;1

n

n∑j=1

rj − L

)∏j

drjdpj

=e−nβ0E

(2πβ−10 )n/2Z(0, β0)n

Γn(L, U).

Page 39: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

2.5. LARGE DEVIATIONS FOR DENSITIES 39

where Γn(r, E) is the volume of the corresponding microcanonical 2n−2-dimensionalsurface on R2n. More precisely, using the co-area formula [4],

Γn(L, U) =

∫Σ(nU,nL)

1∑nj (p2 + V ′(ri)2)

dσ(r1, p1, . . . , rn, pn) (2.5.11)

where dσ is the 2n−2-dimensional surface (Hausdorff) measure of the microcanonicalmanifold

Σn(nU, nL) =

n∑i=1

Ei = nU,

n∑i=1

ri = nL

(2.5.12)

Applying (2.5.8) we obtain

limn→∞

1

nlog Γn(L, U) = S(U,L) . (2.5.13)

for any (L, U) ∈ DoS. A sufficient condition to have D0S = R× (0,∞) is V (r) ≥ cr2

for a positive constant c.

Page 40: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

40 CHAPTER 2. LARGE DEVIATIONS

Page 41: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Chapter 3

Statistical mechanics andthermodynamics of onedimensional chain of oscillators

3.1 The model: grand canonical formalism

We study a system of n anharmonic oscillators. The particles are denoted byj = 1, . . . n. We denote with qj, j = 1, . . . , n their positions, and with pj thecorresponding momentum (which is equal to its velocity since we assume that allparticles have mass 1). We consider first the system attached to a wall, and weset q0 = 0, p0 = 0. Between each pair of consecutive particles (i, i + 1) there is ananharmonic spring described by its potential energy V (qi+1 − qi). We assume V isa positive smooth function such that V (r)→ +∞ as |r| → ∞ and such that

Z(λ, β) :=

∫e−βV (r)+λrdr < +∞ (3.1.1)

for all β > 0 and all λ ∈ R. Let a be the equilibrium interparticle spacing, where Vattains its minimum that we assume is 0: V (a) = 0. It is convenient to work withinterparticle distance as coordinates, rather than absolute particle position, so wedefine rj = qj − qj−1 − a, j = 1, . . . , n. Without loosing any generality, we willchoose a = 0 for the sequence.

The configuration of the system is given by pj, rj, j = 1, . . . , n ∈ R2n, andenergy function (Hamiltonian) defined on each configuration is given by

H =n∑j=1

Ej

41

Page 42: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

42 CHAPTER 3. THERMODYNAMICS

where

Ej =1

2p2j + V (rj), j = 1, . . . , n

is the energy of each oscillator. This choice is a bit arbitrary, because we associatethe potential energy of the bond V (rj) to the particle j. Different choices can bemade, but this one is notationally convenient.

At the other end of the chain we apply a constant force τ ∈ R on the particlen (tension). The position of the particle n is given by qn =

∑nj=1 rj. We consider

the Hamiltonian dynamics:

rj(t) = pj(t)− pj−1(t), j = 1, . . . , n,

pj(t) = V ′(rj+1(t))− V ′(rj(t)), j = 1, . . . , n− 1,

pn(t) = τ − V ′(rn(t)),

(3.1.2)

It is easy to see that, for any β > 0, the grand canonical measure µgcτ,β defined by

dµn,gcτ,β =n∏j=1

e−β(Ej−τrj)√2πβ−1Z(βτ, β)

drjdpj (3.1.3)

is stationary for this dynamics. The distribution µn,gcτ,β is called grand canonicalGibbs measure at temperature T = β−1 and tension (or pressure) τ . Notice thatr1, . . . , rn, p1, . . . , pn are independently distributed under this probability measure.

We have already defined the thermodynamic entropy in the previous chapteras

S(U,L) = infλ,β>0

−λL+ βU − log

(Z(λ, β)

√2πβ−1

)(3.1.4)

Observe that Γn(L, U) is clearly sub–multiplicative

Γn+m(L, U) ≥ Γn(L, u)Γm(L, U)

that implies the existence of the limit

limn→∞

1

nlog Γn(L, U) = S(U,L) . (3.1.5)

and applying (2.5.8) we identify as the thermodynamic entropy defined by (3.1.4).

This is the fundamental relation that connects the microscopic system to itsthermodynamic macroscopic description.

The limit in (3.1.5) is intended for all avalues of the internal energy U > 0. Itis easy to see that S(U,L) > 0, and we extend its definition to the value U = 0 bysetting S(0,L) = 0.

Page 43: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

3.2. MICROCANONICAL MEASURE 43

It is easy to prove from (2.5.10) and (3.1.5) that S(U,L) is concave and ho-mogeneous of degree 1.

We can now define the other thermodynamic quantities from the entropy def-inition (3.1.4). From equation (3.1.4) we have

λ(`, u) = −∂S(`, u)

∂`, β(`, u) =

∂S(`, u)

∂u(3.1.6)

and we will always define the tension as τ(`, u) = λ(`, u)/β(`, u).

`(λ, β) =∂ logZ(λ, β)

∂λ=

∫reλr−βV (r)

Z(λ, β)dr =

∫rj dµ

gcτ,β

u(λ, β) = −∂ log

(Z(λ, β)

√2π/β

)∂β

=

∫V (r)

eλr−βV (r)

Z(λ, β)dr +

1

2β=

∫Ej dµgcτ,β

(3.1.7)

Computing the total differential of S(r, u) we have

dS = −βτd`+ βdu =d\QT

(3.1.8)

where d\Q is the (non-exact) differential

d\Q = −τd`+ du (3.1.9)

and represents the energy gained (or lost) by the system under the infinitesimalchange d`, du. In fact τd` is the infinitesimal work done on the system by the forceτ to perform the infinitesimal displacement d`, while du is the infinitesimal changeof internal energy, so that we can identify d\Q as the energy exchanced from thesystem to the exterior during the thermodynamic infinitesimal change d`, du.

3.2 Microcanonical measure

Instead of applying a force (tension) to one side of the chain, one can fix the particlen to another wall at distance nr (qn =

∑nj=1 rj = nr and pn = pn = 0). The

corresponding constrained dynamics is

rj(t) = pj(t)− pj−1(t), j = 1, . . . , n− 1,

pj(t) = V ′(rj+1(t))− V ′(rj(t)), j = 1, . . . , n− 1,

rn(t) = nr −n−1∑j=1

rj(t) .

(3.2.1)

Page 44: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

44 CHAPTER 3. THERMODYNAMICS

The dynamics now is conserving the total energy H =∑

j Ej = nu and the totallength

∑nj=1 rj = nr. The microcanonical measures µn,mcr,u are now stationary for

this dynamics. These are defined in the following way:

Consider the vector valued i.i.d. random variables

Xj = (rj, Ej), j = 1, . . . , n,

distributed by dµn,gcτ0,β0. Fix x = (r, u), and define µn,mcx the conditional distribution of

(r1, p1, . . . , rn, pn) on the manifold∑n

j=1 Xj = nx. This is defined, for any bounded

continuous function G : R× R+ → R and H : R2n → R, by∫G(Sn)H(r1, p1, . . . , rn, pn) dµn,gcτ0,β0

(r1, p1, . . . , rn, pn)

=

∫R×R+

dx G(x)fn(x)

∫H(r1, p1, . . . , rn, pn) dµn,mcx

where Sn = 1n

∑ni=1 Xi. It is easy to see that µn,mcx does not depend on τ0, β0. We

call µn,mcx the microcanonical measure.

The multidimensional application of theorem 2.5.4 gives the following equiva-lence between microcanonical and grandcanonical measure:

Theorem 3.2.1 Given x = (r, u), let

β = β(r, u), τ = λ(r, u)β−1.

Then for any bounded continuous function F : R2k → R we have

limn→∞

∫F (r1, p1, . . . , rk, pk) dµ

n,mcx (r1, p1, . . . , rn, pn)

=

∫F (r1, p1, . . . , rk, pk) dµ

gcτ,β(. . . , r1, p1, . . . , rn, pn, . . .)

It will be useful later the equivalence of ensembles in the following form:

Theorem 3.2.2 Under the same conditions of Theorem 3.2.1, assume that∫F (r1, p1, . . . , rk, pk) dµ

k,gcτ,β (r1, p1, . . . , rk, pk) = 0.

Then

limn→∞

∫ ∣∣∣∣∣ 1

n− k

n−k∑i=1

F (ri, pi, . . . , ri+k, pk+i)

∣∣∣∣∣ dµn,mcx = 0

The proof of these two theorems follows the argument used for Theorems 2.5.4and 2.5.5.

Page 45: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

3.3. CANONICAL MEASURE 45

3.3 Canonical measure

Applying a Langevin’s thermostat at temperature T = β−1 to the particle n (or toany other particle), we obtain a dynamics that has the canonical measure µn,cr,β asstationary measure:

rj(t) = pj(t)− pj−1(t), j = 1, . . . , n− 1,

dpj(t) = (V ′(rj+1(t))− V ′(rj(t))) dt

+ δj,n−1

(−pj(t)dt+

√βdw(t)

), j = 1, . . . , n− 1,

rn(t) = nr −n−1∑j=1

rj(t) .

(3.3.1)

This is defined as follows:

If we condition the grand canonical measure µn,gc0,0,β on the total length of thechain equal to L = nr =

∑j rj = qn− q0 , we obtain the canonical measure that we

denote by µn,cr,β. We can formally write

dµn,cr,β =∏j

e−βp2j/2√

2πβ−1dpj ⊗

e−βPj V (rj)

Zn,c(r, β)δ

(∑j

rj = nr

)∏j

drj

where Zn,c(r, β) is the normalization constant (canonical partition function).

Similar statements as theorems 3.2.1 and 3.2.2 holds, µn,cr,β converging to thegrand-canonical measure µn,gcτ,β , with τ given by the thermodynamic relations (3.1.6).

Other boundary conditions can be made, like applying a tension τ and aLangevin thermostat at temperature β−1 to the n particle, obtaining a system withµgcτ,β as stationary measure.

3.4 Local equilibrium, local Gibbs measures

The Gibbs distributions defined in the above sections are also called equilibriumdistributions for the dynamics. Studying the non-equilibrium behaviour we need theconcept of local equilibrium distributions. These are probability distributions thathave some asymptotic properties when the system became large (n→∞), vaguelyspeaking locally they look like Gibbs measure. We need a precise mathematicaldefinition, that will be useful later for proving macroscopic behaviour of the system.

Page 46: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

46 CHAPTER 3. THERMODYNAMICS

Definition 3.4.1 Given two functions β(y) > 0, τ(y), y ∈ [0, 1], we say that thesequence of probability measures µn on R2n has the local equilibrium property (withrespect to the profiles β(·), τ(·)) if for any k > 0 and y ∈ (0, 1),

limn→∞

µn∣∣([ny],[ny]+k)

= µk,gcτ(y),β(y) (3.4.1)

Sometimes we will need some weaker definition of local equilibrium (for ex-ample relaxing the pointwise convergence in y). It is important here to understandthat local equilibrium is a property of a sequence of probability measures.

The most simple example of local equilibrium sequence is given by the localGibbs measures:

n∏j=1

e−β(j/n)(Ej−τ(j/n)rj)√2πβ(j/n)−1Z(β(j/n)τ(j/n), β(j/n))

drjdpj = gnτ(·),β(·)

n∏j=1

drjdpj (3.4.2)

Of course are local equilibrium sequence also small order perturbation of this se-quence like

ePj Fj(rj−h,pj−h,...,rj+h,pj+h)/ngnτ(·),β(·)

n∏j=1

drjdpj (3.4.3)

where Fj are local functions.

To a local equilibrium sequence we can associate a thermodynamic entropy,defined as

S(r(·), u(·)) =

∫ 1

0

S(r(y), u(y)) dy (3.4.4)

where r(y), u(y) are computed from τ(y), β(y) using (3.1.7).

Page 47: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Chapter 4

Entropy

4.1 Generalities

Let Ω be a polish space and P(Ω) the topological space of probability measureson Ω equipped with the weak topology. For µ, ν ∈ P(Ω), the relative entropyH(ν|µ) ∈ [0,∞] of µ with respect to ν is defined by

H(ν|µ) = supf

∫fdν − log

(∫efdµ

)(4.1.1)

where the supremum is taken over all bounded continuous functions f : Ω → R.The positivity of H(ν|µ) follows from the choice f = 0 in the variational formula.Observe also that the supremum can be restricted to the set of positive boundedcontinuous functions.

As a trivial but useful consequence we have the following entropy inequality

Proposition 4.1.1 (Entropy inequality) Let f : Ω → R be a bounded measur-able function and η > 0 a positive number. Then∫

fdν ≤ η−1

log

(∫eηfdµ

)+H(ν|µ)

(4.1.2)

Proof If f is continuous it is a trivial consequence of the definition. Since theclass of f ’s for which (4.1.2) holds is closed under point-wise convergence, (4.1.2)continues to be true for every bounded measurable function.

Proposition 4.1.2 Relative entropy is convex and lower semicontinuous.

47

Page 48: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

48 CHAPTER 4. ENTROPY

Proof It is a simple consequence of the variational formula.

Proposition 4.1.3 The entropy H(ν|µ) is equal to +∞ if ν is not absolutely con-tinuous with respect to µ. Otherwise it is given by

H(ν|µ) =

∫φ log φdµ, φ(x) =

(dν

)(x) (4.1.3)

Proof Observe first that for any density function φ the integral of φ log φ makessense in R∪+∞ since x ∈ [0,∞)→ x log x is bounded above so that the negativepart of φ log φ is integrable with respect to µ.

If ν is not absolutely continuous with respect to µ then there exists a measur-able set A such that µ(A) = 0, ν(A) > 0. Choosing the function f = α1A, α > 0 in(4.1.2) we get H(ν|µ) ≥ αν(A). Since α is arbitrary we have H(ν|µ) = +∞.

We first show that if dν = φdµ is absolutely continuous with respect to µ andif dνθ = θdµ+ (1− θ)dν = φθdµ for θ ∈ [0, 1] then

limθ→0

∫φθ log φθdµ =

∫φ log φdµ (4.1.4)

Since x ∈ [0,∞)→ x log x is convex, we have

I(θ) =

∫φθ log φθdµ ≤ (1− θ)

∫φ log φdµ

On the other hand since x ∈ (0,∞)→ log x is concave and non-decreasing, log φθ ≥suplog θ, (1− θ) log φ. Therefore

I(θ) = θ

∫log φθdµ+ (1− θ)

∫φ log φθdµ ≥ θ log θ + (1− θ)2

∫φ log φdµ

It proves (4.1.4).

Since φθ ≥ θ > 0, Jensen’s inequality shows

exp

[∫ψdνθ − I(θ)

]≤∫eψ

φθdνθ =

∫exp(ψ)dµ

Hence we have ∫ψdνθ ≤ I(θ) + log

(∫eψdµ

)

Page 49: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

4.1. GENERALITIES 49

It implies H(νθ|µ) ≤ I(θ). Recall that relative entropy is lower semi-continuous. Inview (4.1.4) we have

H(ν|µ) ≤∫φ log φdµ

To obtain the reversed inequality we observe that if φ is uniformly positive anduniformly bounded, it is trivial, by the variational formula defining H(ν|µ), that∫

φ log φdµ ≤ H(ν|µ)

If φ is uniformly strictly positive but not bounded we define φn = φ ∧ n and byFatou’s lemma we have∫

φ log φdµ =

∫log φdν

≤ lim infn→∞

∫log φndν

≤ H(ν|µ) + lim infn→∞

log

(∫(f ∧ n)dµ

)= H(ν|µ)

Finally to treat the general case we assume φ is uniformly bounded and use φθdefined above. We proved that

∫φθ log φθdµ ≤ H(νθ|µ). Since θ ∈ [0, 1]→ H(νθ|µ)

is bounded, lower semi-continuous and convex, it is continuous. Using also (4.1.4)the result follows.

We now consider the case Ω = (R2)Z equipped with the product topology. Theclosed set of translation invariant probability measures on Ω is noted T . Let Λ bea subset of Z and µ, ν ∈ P(Ω). The relative entropy HΛ(ν|µ) of ν with respect to νin Λ is defined by

HΛ(ν|µ) = H(µΛ|νΛ) (4.1.5)

where µΛ, νΛ are the marginals of µ and ν with respect to Λ.

Proposition 4.1.4 (Superadditivity) Assume µ ∈ T is product. We have thefollowing superadditivity property of the entropy:

HΛ∪Λ′(ν|µ) ≥ HΛ(ν|µ) +HΛ′(ν|µ) (4.1.6)

if Λ ∩ Λ′ = ∅.

Proof Let fΛ (resp. fΛ′) be an arbitrary continuous bounded function depend-ing only on ω through sites in Λ (resp. Λ′). We have∫

fΛdνΛ − log

(∫efΛdµΛ

)=

∫fΛdνΛ∪Λ′ − log

(∫efΛdµΛ∪Λ′

)

Page 50: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

50 CHAPTER 4. ENTROPY

and similarly with Λ replaced by Λ′. Summing the two equalities obtained and usingindependence of fΛ and fΛ′ under µΛ∪Λ′ , we get

HΛ∪Λ′(ν|µ) ≥∫

(fΛ + fΛ′)dνΛ∪Λ′ − log

(∫e[fΛ+fΛ′ ]dµΛ∪Λ′

)=

∫fΛdνΛ − log

(∫efΛdµΛ

)+

∫f ′Λdν

′Λ − log

(∫ef′Λdµ′Λ

)Taking now the supremum over fΛ and fΛ′ we obtain the desired inequality.

So if ν, µ are translation invariant and µ is product, denoting Λn = −n, . . . , n,we have that HΛn(ν|µ) is a superadditive function of n, and consequently it existsthe limit

H(ν|µ) = limn→∞

1

2n+ 1HΛn(ν|µ) = sup

n

HΛn(ν|µ)

2n+ 1(4.1.7)

Moreover it is easy to show that ν ∈ T → H(ν|µ) inherits properties of relativeentropy. Hence it is convex and lower semicontinuous.

For any bounded continuous function φ with support in −n0, . . . , n0 definethe limit

F (φ) = limn→∞

1

2n+ 1Fn(φ), Fn(φ) = log

∫e

Pni=−n τiφ dµ (4.1.8)

where τi is the shift operator on functions on (R2)Z. Existence of the limit is provedin the following proposition.

Proposition 4.1.5 Let ν a translation invariant probability on Ω, then

H(ν|µ) = supφ

∫φ dν − F (φ)

(4.1.9)

where the supremum is taken over all bounded continuous functions φ.

Proof We claim it is sufficient to prove that

lim infn→∞

(2n+ 1)−1Fn(φ) ≥ supν∈T

∫φdν − H(ν|µ)

(4.1.10)

and

lim supn→∞

(2n+ 1)−1Fn(φ) ≤ supν∈T

∫φdν − H(ν|µ)

(4.1.11)

Assume we proved (4.1.10) and (4.1.11). Let M(Ω) the vector space of finitesigned measures on Ω equipped with the weak topology. We extend H toM(Ω) byH(ν|µ) = +∞ if µ /∈ T . Since T is a closed convex set the function H extended in

Page 51: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

4.1. GENERALITIES 51

this way remains convex and lower semi-continuous. Let (M(Ω))′ be the topologicaldual of M(Ω) and consider the application Φ ∈ (M(Ω))′ → fΦ ∈ Cb(Ω) defined byfΦ(x) = Φ(δx). Remark that fΦ is continuous and that

∫fΦdν = Φ(ν). Moreover it

is injective since the finite support signed measures are dense (for the weak topology)intoM(Ω) (see [?], appendix III, theorem 4) and surjective since for any f ∈ Cb(Ω)the linear form Φ : ν ∈ M(Ω) →

∫Ωfdν ∈ R is such that fΦ = f . Hence Φ → fΦ

is an isomorphism between (M(Ω))′ and Cb(Ω). Then Fenchel-Moreau’s theoremimplies

H(ν|µ) = supφ

∫φ dν − F (φ)

(4.1.12)

for any ν ∈M(ω) and in particular for ν ∈ T .

We start by proving (4.1.10). Let ν ∈ T such that H(ν|µ) < ∞. By thevariational formula defining HΛn(ν|µ) applied with

∑ni=−n τiφ and the translation

invariance of ν we have

Fn(φ)

2n+ 1≥∫

1

2n+ 1

(n∑

i=−n

τiφ

)dν −

HΛn(ν|µ)

|Λn|

≥∫φ dν − H(ν|µ)

Taking the liminf in n and the supremum over ν ∈ T we get (4.1.10).

Observe that for any n ≥ 1, we have

log

(∫e

Pni=−n τiφdµ

)= sup

α

∫ ( n∑i=−n

τiφ

)dα−HΛn+n0

(α|µ)

(4.1.13)

where the supremum is carried over all probability measures α on (R2)Λn+n0 . Onesense of the inequality is trivial by the variational formula defining HΛn+n0

(α|µ) andthe other sense is obtained by taking the optimal α which maximizes the supremumin (4.1.13). This α is such that HΛn+n0

(α|µ) < +∞.

For such α we define α ∈ T in two steps. First α∗ is obtained by takingindependent copies of α on all translated disjoint cubes (Ck)k∈Z of Λn+n0 :

α∗ = ⊗k∈Zτ2k(n+n0)+1α, Ck = τ2k(n+n0)+1Λn+n0

The family of probability measures (αp)p≥1 defined by

αp =1

2p+ 1

p∑`=−p

τ`α∗

Page 52: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

52 CHAPTER 4. ENTROPY

is tight and we denote by α a fixed limit point of the family. To simplify notationswe denote the subsequence along which the limit is achieved by the same letter p.It is trivial that α is translation invariant.

Let Bj = ∪jk=−jCk. Since entropy is lower semicontinuous we have

H(α|µ) ≤ limj→∞

lim infp→∞

|Bj|−1HBj(αp|µ) (4.1.14)

By convexity of entropy and translation invariance of µ we have

|Bj|−1HBj(αp|µ) ≤

1

Λn+n0

1

(2p+ 1)(2j + 1)

p∑`=−p

HBj(τ`α∗|µ)

=1

Λn+n0

1

(2p+ 1)(2j + 1)

p∑`=−p

Hτ−`Bj(α∗|µ)

Observe that τ−`Bj can be written as the disjoint union I ∪ ∪j`+2j−2k=j`

Ck ∪ Fwhere j` ∈ Z, I ⊂ Cj`−1 and F ⊂ Cj`+2j−1.

Since the relative entropy of a product of measures β ⊗ γ is the sum of therelative entropies of β and γ, we have

Hτ−`Bj(α∗|µ) = HI(α

∗|µ) +

j`+2j−2∑k=j`

HCk(α∗|µ) +HF (α∗|µ)

= HI(α∗|µ) + (2j − 1)HΛn+n0

(α∗|µ) +HF (α∗|µ)

Since HI(α∗|µ) ≤ HΛn+n0

(α|µ) and HF (α∗|µ) ≤ HΛn+n0(α|µ), taking first the

limit in p and then the limit in j we get

H(α|µ) ≤HΛn+n0

(α|µ)

|Λn+n0

(4.1.15)

Moreover it is clear that

limn→∞

∣∣∣∣∣ 1

2n+ 1

n∑i=−n

∫(τiφ) dα−

∫φdα

∣∣∣∣∣ = 0

and we get (4.1.11).

4.2 Entropy Production

Let (ωt)t≥0 be a Markov process on ΩΛ = (R2)Λ, Λ ⊂ Z, with generator LΛ andinvariant probability measure µ. We denote by AΛ (resp. SΛ) the antisymmetric

Page 53: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

4.2. ENTROPY PRODUCTION 53

(resp. symmetric) part of LΛ in L2(ΩΛ, µ). We assume the space of smooth boundedfunctions on ΩΛ is a common core of AΛ and SΛ and that LΛ = AΛ + SΛ andL∗Λ = −AΛ+SΛ are generators of Feller semigroups (P t

Λ)t≥0 and (P ∗tΛ )t≥0. We supposethat LΛ (resp. SΛ) can be extended to strong Feller semigroups on L2(ΩΛ, µ). Thesemigroup generated by LΛ (resp. L∗Λ on L2(ΩΛ, µ)) is also denoted P t

Λ (resp. P ∗tΛ ).

The first proposition below shows that entropy w.r.t. invariant probabilitymeasures can only decrease

Proposition 4.2.1 For any probability measure ν on ΩΛ we have

H(νP tΛ|µ) ≤ H(ν|µ) (4.2.1)

Proof By definition we have

H(νP tΛ|µ) = sup

φ

∫P t

Λφdν − log

(∫eφdµ

)and by Jensen’s inequality and stationarity of µ we have∫

ePtΛφdµ =

∫dµ(ω) exp (Eω(φ(ωt))) ≤

∫Eω(eφ(ωt))dµ(ω) =

∫eφdµ

and then the result follows since P tΛφ is a bounded continuous function.

In particular this shows that if we start from a probability measure ν withfinite entropy then at any time the probability measure νP t

Λ has finite entropy andhence is absolutely continuous with respect to µ. Let gΛ(t) be the density of νP t

Λ

w.r.t. µ.

We are now interested in the entropy production rate. The Dirichlet form ofν w.r.t. µ is defined by

DΛ(ν|µ) = sup

−∫SΛψ

ψdν : ψ ∈ Dom(SΛ), inf ψ > 0

(4.2.2)

Proposition 4.2.2 Let ν be a probability measure on ΩΛ with finite entropy H(ν|µ) =HΛ(ν|µ) <∞. For every t, h ≥ 0, we have that

H(νP t+hΛ |µ)−H(νP t

Λ|µ) ≤ 2

∫ t+h

t

dsDΛ(νP sΛ|µ) (4.2.3)

Proof Let α and β two probability measures on ΩΛ. Jensen’s inequality showsthat

H(αP tΛ|βP t

Λ) ≤ H(α|β) (4.2.4)

Page 54: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

54 CHAPTER 4. ENTROPY

It is sufficient to show the proposition for t = 0. Let ψ be a smooth boundedpositive function with mean one with respect to µ. Fix a positive time τ . We definethe probability measure β by dβ = ψdµ. Then we have βP τ

Λ is absolutely continuouswith respect to µ with density P ∗τΛ ψ.

It is an elementary fact that

H(α|β) = H(α|µ)−∫

logψ dα (4.2.5)

It follows that

H(αP tΛ|βP t

Λ) = H(αP τΛµ)−

∫log(P ∗τλ ψ d(αP τ

Λ) (4.2.6)

Hence, with (4.2.4), we get

H(α|µ)−H(αP τΛ|µ) ≥

∫logψ dα−

∫P τ

Λ log (P ∗τΛ ψ) dα

≥∫

logψ dα−∫

log (P τΛP∗τΛ ψ) dα

≥ −∫

log

(P τ

ΛP∗τΛ ψ

ψ

)dα

≥∫ (

ψ − P τΛP∗τΛ ψ

ψ

)dα

where we used Jensen’s inequality and log(1 + η) ≤ η.

Since (P tΛ)t≥0 and (P ∗tΛ )t≥0 are Feller semigroups we have

ψ − P τΛP∗τΛ ψ = ψ − P τ

Λψ − ψ − P τΛ(P ∗τΛ ψ − ψ) = −2τSΛψ + τε(ψ, τ)

where ‖ε(ψ, τ)‖∞ → 0 as τ → 0.

Let be t > 0 and divide the time interval [0, t] in m small intervals of lengthτ = t/m and let m goes to infinity. We have

H(α|µ)−H(αP tΛ|µ) =

m−1∑i=0

[H(αP

im t

Λ |µ)−H(αPi+1m t

Λ |µ)]

≥ lim supm→∞

m−1∑i=0

[H(αP

im t

Λ |µ)−H(αPi+1m t

Λ |µ)]

≥ lim supm→∞

[−

2t

m

m−1∑i=0

∫ (P

itm

Λ

(SΛψ

ψ

))dα + tε(ψ, t/m))

]

= −2

∫ t

0

ds

∫P s

Λ

(SΛψ

ψ

)dα

Page 55: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

4.2. ENTROPY PRODUCTION 55

By taking the supremum over all ψ considered we conclude the proof.

Proposition 4.2.3 (Donsker-Varadhan) Let Λ be a finite set. Assume that DΛ(ν|µ)is finite and that ν is abolutely continuous with respect to µ with a density denotedby g. Then

√g belongs to the domain of (−SΛ)1/2 and

DΛ(ν|µ) =∑x∈Λ

∫dµ((−SΛ)1/2√g)2 (4.2.7)

Proof See theorem 5 in [?].

Page 56: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

56 CHAPTER 4. ENTROPY

Page 57: Thermodynamics, Statistical Mechanics and Large …ssdnm.mimuw.edu.pl/pliki/wyklady/stefano-olla-umcs.pdf · Helmholtz and Thomson (Lord Kelvin) in 1848, while the need of the zero

Bibliography

[1] Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications,volume 38 of Applications of Mathematics (New York). Springer-Verlag, NewYork, second edition, 1998.

[2] R. Durrett. Probability: Theory and Examples. Duxbury Press, third edition,2005.

[3] Ivar Ekeland and Roger Temam. Convex analysis and variational problems.North-Holland Publishing Co., Amsterdam, 1976. Translated from the French,Studies in Mathematics and its Applications, Vol. 1.

[4] L.-C. Evans and R. F. Gariepy. Measure Theory and Fine Properties of Func-tions, chapter 3. Studies in Advanced Mathematics. CRC Press, 1992.

57