Chapter I. First Variations c 2015,PhilipDLoewen

Chapter I. First Variationsc©2015, Philip D Loewen

In abstract terms, the Calculus of Variations is a subject concerned with max/minproblems for a real-valued function of several variables. Given a vector space V anda function Φ:V → R, we explore the theory and practice of minimizing Φ[x] overx ∈ V . Additional interest and power comes from allowing

• dim(V ) = +∞,

• constrained minimization, where the choice variable x must lie in some preas-signed subset S of V .

We’ll investigate and generalize familiar facts and new issues, including . . .

- necessary conditions: if x minimizes Φ over V , then Φ′[x] = 0 and Φ′′[x] ≥ 0;

- existence/regularity: what spaces V are appropriate?

- sufficient conditions: if x obeys Φ′[x] = 0 and Φ′′[x] > 0 then x gives a localminimum.

- applications, calculations, etc.

A. The Basic Problem

Choice Variables. A closed real interval [a, b] of finite length is given. The setX = C1[a, b] denotes the collection of all functions x: [a, b]→ R whose derivative x iscontinuous on [a, b]. These will be the “choice variables” in a minimization problem:we call them “arcs” for now. Typically x = x(t).

Detail: Continuity of x on [a, b] requires x(a) and x(b) to be defined. We use one-sideddefinitions at a and b to arrange this. For example,

x(a) = limr→0+

x(a+ r)− x(a)

r.

Examples: (i) The function x(t) = t|t| lies in C1[−1, 1], so it’s an arc (check this).However, x 6∈ C2[−1, 1]: there’s trouble at 0.

(ii) The function x(t) =√t does not lie in C1[0, 1]. (It’s too steep at left end.)

Objective Functional. A function L ∈ C1([a, b] × R × R) (“the Lagrangian”) isgiven. Typically L = L(t, x, v): t for “time”, x for “position”, v for “velocity”. TheLagrangian helps assign a number to every arc x in X , namely

Λ[x]def=

∫ b

a

L(t, x(t), x(t)) dt.

A Minimization Problem. Given real numbers A, B, the basic problem in the

Calculus of Variations is this:

minimize Λ[x] =

∫ b

a

L(t, x(t), x(t)) dt

over x ∈ Xsubject to x(a) = A, x(b) = B.

File “var1”, version of 08 January 2015, page 1. Typeset at 14:51 January 9, 2015.

2 PHILIP D. LOEWEN

Shorthand:

minx∈C1[a,b]

{∫ b

a

L(t, x(t), x(t)) dt : x(a) = A, x(b) = B

}. (P )

Example: Geodesics in the Plane. An arc defined on [a, b] has total length

s =

∫ds =

∫ b

a

√1 + x(t)2 dt.

So finding the shortest arc joining (a, A) to (b, B) is an instance of the basic problemin which L(t, x, v) =

√1 + v2.

Example: Soap Film in Zero-Gravity. Wire rings of radii A > 0 and B > 0 areperpendicular to an axis through both their centres; the centres are 1 unit apart. Asoap film stretches between them, forming a surface of revolution relative to the axesshown below. Surface tension acts to minimize the area of that surface, which wecan calculate: infinitesimal ring at position t with horizontal slice dt has slant lengthds =

√dt2 + dx2 =

√1 + x(t)2 dt, perimeter 2πx(t), hence area

dS = 2πx(t)√1 + x(t)2 dt.

Total area is the “sum” of these contributions, i.e.,

S =

∫dS =

∫ 1

0

2πx(t)√1 + x(t)2 dt.

The problem of minimizing S subject to x(0) = A, x(1) = B fits the pattern above,with [a, b] = [0, 1] and

L(t, x, v) = 2πx√1 + v2.

Example: Brachistochrone. The birth announcement of our subject came justover 310 years ago:

“I, Johann Bernoulli, greet the most clever mathematicians in the world.Nothing is more attractive to intelligent people than an honest, challengingproblem whose possible solution will bestow fame and remain as a lastingmonument. Following the example set by Pascal, Fermat, etc., I hope toearn the gratitude of the entire scientific community by placing before thefinest mathematicians of our time a problem which will test their methodsand the strength of their intellect. If someone communicates to me thesolution of the proposed problem, I shall then publicly declare him worthyof praise.” (Groningen, 1 January 1697)

Here is a statement of Bernoulli’s problem in modern terms: Given two points αand β in a vertical plane, find the curve joining α to β down which a bead—slidingfrom rest without friction—will fall in least time. The Greek works for “least” and“time” give the unknown curve its impressive title: the brachistochrone. To set


Chapter I. First Variations 3

up, install a Cartesian coordinate system with its origin at point α and the y-axispointing downward. Then B ≥ 0, for the bead to “fall”.

Now speed is the rate of change of distance relative to time: v = ds/dt. Along acurve in the (x, y)-plane, ds =

√(dx)2 + (dy)2 =

√1 + (y′(x))2 dx, so the infinites-

imal time taken to travel along the segment of curve corresponding to a horizontaldistance dx is

dt =ds

v=

√1 + (y′(x))2 dx

v.

Here v is the speed of the bead, given from conservation of energy as

PE + KE = const.

−mgy + 12mv2 = 1

2mv20 .

(Here v0 is the bead’s initial velocity: v0 ≥ 0 seems reasonable.) This gives v =√v20 + 2gy, leading to

dt =

√1 + (y′(x))2√v20 + 2gy(x)

dx.

The total travel time is then

T =

∫dt =

∫ b

x=0

√1 + (y′(x))2

v20 + 2gy(x)dx.

The problem of minimum fall time fits our pattern, with integration interval [0, b],endpoint values A = 0 and B as specified above, and integrand

L(x, y, w) =

√1 + w2

v20 + 2gy.

Or, after changing to standard letter-names (spoiling the meaning completely!),

L(t, x, v) =

√1 + v2

v20 + 2gx.

Comparing this integrand to the one for minimum-distance makes a straight lineseem unlikely to provide the minimum. More details later.

Analogy/Crystal Ball. Calculus deals with minimization at every level. For un-constrained local minima, the only possible minimizers are critical points:

• When solving minx∈R f(x), solve f ′(x) = 0 . . . an algebraic equation for theunknown number x.

• When solving minx∈Rn F (x), solve ∇F (x) = 0 . . . a system of n algebraic equa-

tions for the unknown vector x.

• When solving minx∈C1[a,b] Λ(x), solve DΛ[x] = 0 . . . a differential equation forthe unknown arc x.

Before deriving that differential equation, it will be instructive to try some ad-hocmethods directly on the given problem.


4 PHILIP D. LOEWEN

A′. Ad-hoc Methods

Let a pair of endpoints (a, A), (b, B) with a < b and a smooth function L = L(t, x, v)be given. We call any piecewise-smooth function defined on [a, b] an “arc”, anddistinguish two special families of arcs:

• an arc x: [a, b] → R is “admissible” if x(a) = A and x(b) = B;

• an arc h: [a, b] → R is “a variation” if h(a) = 0 and h(b) = 0.

Our ultimate goal is to minimize

Λ[x] :=

∫ b

a

L(t, x(t), x(t)) dt

among all admissible arcs x. In this “basic problem”, a simple admissible arc isalways available—the straight line between the given endpoints:

x0(t) = A+

(t− a

b− a

)[B − A], a ≤ t ≤ b.

The number Λ[x0] provides a reference value for the cost we hope to minimize: clearlythe minimum value must be less than or equal to Λ[x0]. The arc x0 also provides areference input for our problem, and we can search for preferable inputs by makingwise adjustments to x0. For any specific variation h, consider the one-parameterfamily of arcs

xλ(t) = x0(t) + λh(t), a ≤ t ≤ b; λ ∈ R.

Each of these arcs is admissible; the sign and magnitude of the parameter λ determinethe strength by which a perturbation of “shape” h is added to the reference shapex0. To assess how much improvement from the reference value we can achieve, wedefine the function

φ(λ)def=Λ[xλ] =

∫ b

a

L(t, xλ(t), xλ(t)) dt.

Here φ(0) = Λ[x0] is our reference value, and if φ′(0) < 0 we know that φ(λ) < φ(0)for small λ > 0: that is, a small perturbation of shape h will improve our refer-ence arc. (If φ′(0) > 0, then φ(λ) < φ(0) for small λ < 0, so an improvement isstill possible—it is obtained by adding a small positive multiple of −h to x0.) Toget the most possible improvement out of this idea, we could somehow solve thesingle-variable problem of choosing the scalar λ∗ that minimizes the function φ. Theresulting perturbed arc xλ∗ is certain to be preferable to x0 whenever φ′(0) 6= 0. Insome practical problems, good choices of the reference arc x0 and the variation hmight make xλ∗ a usable improvement. Alternatively, one could build an iterative-improvement scheme by declaring xλ∗ as the new reference arc (change its name tox0) choosing a new variation, and repeating the process above to generate furtherimprovements. (Note: This approach is both conceptually attractive and technicallyfeasible, but it is neither efficient nor effective. Training computers to find approx-imate solutions for the basic problem is an ongoing area of research, and the bestknown methods are quite different from the one outlined above.)



Example (Soap Film). When L(t, x, v) = x√1 + v2, (a, A) = (0, 1), and (b, B) =

(1, 2), the unique admissible linear arc is

x0(t) = 1 + t, 0 ≤ t ≤ 1.

Calculation gives

Λ[x0] =

∫ 1

0

x0(t)√1 + x0(t)2 dt =

∫ 1

0

(1 + t)√2 dt =

3√2

2≈ 2.1213.

Selecting the variation h(t) = sin(πt) leads to

xλ(t) = 1 + t+ λ sin(πt), xλ(t) = 1 + πλ cos(πt),

φ(λ) =

∫ 1

0

(1 + t+ λ sin(πt))√1 + (1 + πλ cos(πt))2 dt.

The computer-algebra system “Maple” suggests that the choice λ ≈ −0.1886 mini-mizes φ(λ), with

φ(−0.1886 . . .) ≈ 2.0796.

The arc x(−0.1886) has a Λ-value that is 1.96% lower than Λ[x0].

Selecting the variation h(t) = t(1− t) instead leads to

xλ(t) = 1 + t+ λt(1− t), xλ(t) = 1 + λ(1− 2t),

φ(λ) =

∫ 1

0

(1 + t+ λt(1− t))√1 + (1 + λ(1− 2t))2 dt.

Numerical minimization using “Maple” suggests that the λ ≈ −0.7252 minimizesφ(λ), with

φ(−0.725 . . .) ≈ 2.0792.

The arc x(−0.725) produces a Λ-value that is 1.99% lower than Λ[x0].

Notice that the definitions of φ, xλ, and the minimizing λ-value all dependimplicitly on the chosen variation h. Some variations produce better results thanothers.

Theoretical methods we will develop below reveal that the minimizing arc in thisproblem is

x(t) =cosh(α+ t coshα)

coshα, where α ≈ 0.323074.

The minimum value is Λ[x] ≈ 2.0788, only 2.00% lower than Λ[x0].

Here are two plots to illustrate these results. The first shows the four admissiblearcs generated above: the straight line in blue, the trigonometric perturbation inred, the quadratic perturbation in green, and the true minimizer in black. Thenonlinear arcs are so close together that they look identical when printed. To displaytheir differences, the second plot shows the differences x− xopt for the trigonometric


6 PHILIP D. LOEWEN

perturbation x in red, the quadratic perturbation x in green, and the true minimizerx = xopt in black. (So the black line coincides with the t-axis.)

0 0.2 0.4 0.6 0.8 11

1.5

2

2.5

0 0.2 0.4 0.6 0.8 1−5

0

5

10

15

20x 10

−3

////

Descent Directions. Every time we choose a reference arc x0 and a variation h, wecan define the single-variable function

φ(λ)def=Λ[x0 + λh].

For small λ, the linear approximation

φ(λ) ≈ φ(0) + φ′(0)λ+ o(λ2)

reveals the scalar φ′(0) as the rate of change of the objective value Λ with respectto a variation in direction h, locally near the reference arc x0. It seems natural tosay that h provides a “descent direction for Λ at x0” when φ′(0) < 0. Let’s calculate



φ′(0), holding tight to a single fixed variation h throughout.

φ′(0) = limλ→0

φ(λ)− φ(0)

λ

= limλ→0

∫ b

a

L(t, x0(t) + λh(t), x0(t) + λh(t))− L(t, x0(t), x0(t))

λdt

=

∫ b

a

limλ→0

L(t, x0(t) + λh(t), x0(t) + λh(t))− L(t, x0(t), x0(t))

λdt

=

∫ b

a

[d

dλL(t, x0(t) + λh(t), x0(t) + λh(t))

]

λ=0

dt

=

∫ b

a

Lx(t, x0(t), x0(t))h(t) + Lv(t, x0(t), x0(t))h(t) dt

=

∫ b

a

(Lx(t, x0(t), x0(t))−

d

dtLv(t, x0(t), x0(t))

)h(t) dt (int by parts).

In summary, we have

φ′(0) =

∫ b

a

R(t)h(t) dt,

where R(t) = Lx(t, x0(t), x0(t))−d

dtLv(t, x0(t), x0(t)).

(∗∗)

A key observation: φ and φ′(0) depend on our choice of h, but R does not.

Formula (∗∗) is valid for any smooth admissible x0 and variation h, and is fullof useful information.

Viewpoint 1. For an admissible arc x0, we can use (∗∗) to guide a search for descentdirections. It suffices to choose a variation h whose pointwise product with R is largeand negative. The choice h = −R is particularly tempting, but the requirement thath(a) = 0 = h(b) sometimes requires a modification of this selection.

Example. Suppose (a, A) = (0, 0), (b, B) = (π2 ,π2 ), and L(t, x, v) = v2−x2. Consider

the linear reference arc x0(t) = t. Its objective value is

Λ[x0] =

∫ π/2

0

(x0(t)

2 − x0(t)2)dt =

∫ π/2

0

(1− t2

)dt =

π

2− 1

3

(π

2

)3

≈ 0.2789.

To improve on this, calculate

Lx(t, x, v) = −2x, Lv(t, x, v) = 2v,

so Lx(t, x0(t), x0(t)) = −2t, Lv(t, x0(t), x0(t)) = 2.

We get R(t) = [−2t] − ddt [2] = −2t. Here −R(t) = 2t does not vanish at both end-

points, but it is positive everywhere, so we try a variation that is positive everywhere:

h(t) = sin(2t).


8 PHILIP D. LOEWEN

Calculation gives

φ(λ) =

∫ π/2

0

([1 + 2λ cos(2t)]2 − [t+ λ sin(2t)]2

)dt

=

∫ π/2

0

([1 + 4λ cos(2t) + 4λ2 cos2(2t)]− [t2 + 2λt sin(2t) + λ2 sin2(2t)]

)dt

=3π

4λ2 − π

2λ+

π

2− π3

24.

This is minimized when λ = 1/3, and the minimum value provides a 94% discountfrom the reference value Λ[x0]:

Λ[x1/3] = φ(1/3) =5π

12− π3

24≈ 0.01707.

Viewpoint 2. If our minimization problem has a solution, we don’t need to know itin detail to assign it the name “x0”. If the solution happens to be smooth, then thederivation above applies and conclusion (∗∗) is available. But now the minimalityproperty of x0 makes it impossible to improve upon: we must have φ′(0) = 0 forevery possible variation h, i.e.,

0 =

∫ b

a

R(t)h(t) dt for every h: [a, b] → R obeying h(a) = 0 = h(b). (†)

This forces R(t) = 0 for all t. To see why any other outcome is impossible, imaginethat some θ ∈ (a, b) makes R(θ) < 0. Our smoothness hypotheses guarantee that Ris continuous on [a, b]. Recall

R(t) = Lx(t, x0(t), x0(t))−d

dt[Lv(t, x0(t), x0(t))] , t ∈ [a, b].

So if R(θ) < 0, there must be some open interval (α, β) containing θ such that[α, β] ⊆ [a, b] and R(t) < 0 for all t in (α, β). The variation

h(t) =

1− cos

(2π

[t− α

β − α

]), for α ≤ t ≤ β,

0, otherwise,

is continuously differentiable, everywhere nonnegative, and has h(t) > 0 if and onlyif t ∈ (α, β). Using this variation in (∗∗) would give

φ′(0) =

∫ β

α

R(t)h(t) dt < 0.

This contradicts (†). We have shown that if x0 is a smooth minimizer, then R cannottake on any negative values. Positive R-values are impossible for similar reasons.Our conclusion, R(t) = 0 for all t ∈ [a, b], is usually written as

Lx(t, x0(t), x0(t)) =d

dt[Lv(t, x0(t), x0(t))] , t ∈ [a, b]. (DEL)

This is the renowned Euler-Lagrange Equation (“EL”) in differentiated form (“D”,hence “DEL”). If a smooth arc x0 gives the minimum in the basic problem, it mustobey (DEL). Solutions of (DEL) are called extremal arcs.



Example. When L(t, x, v) = v2−x2, we have Lx(t, x, v) = −2x and Lv(t, x, v) = 2v,so equation (DEL) for an unknown arc x(·) says

−2x(t) =d

dt[2x(t)] , i.e., x(t) + x(t) = 0.

A complete list of smooth solutions for this equation is

x(t) = c1 cos(t) + c2 sin(t), c1, c2 ∈ R.

In a previous example, the prescribed endpoints where (a, A) = (0, 0) and (b, B) =(π2, π2). Only one solution joins these points: substitution gives

0 = x(0) = c1,π2 = c2 sin

(π2

)= c2,

so x(t) = π2 sin(t) is the only smooth contender for optimality in the corresponding

problem. Its objective value is

Λ[π2sin] =

(π

2

)2 ∫ π/2

0

(cos2 t− sin2 t

)dt =

π2

4

∫ π/2

0

cos(2t) dt =π2

8sin(2t)

∣∣∣∣π/2

t=0

= 0.

////

Discussion. It seems natural to call R the residual in equation (DEL), and torecord the following interpretation. A given arc x0 is extremal if and only if it makesR = 0. If x0 makes R(θ) < 0 at some instant θ, then perturbing x0 with a smoothupward bump centred at θ will give a preferable arc (i.e., an arc with lower Λ-value).A smooth downward bump is advantageous near any point where R(θ) > 0.

B. Abstract Minimization in Vector Spaces

Throughout this section, V is a real vector space, and Φ:V → R is given.

Derivatives. Suppose Φ:V → R is given. The directional derivative of Φ at base

point x in direction h is this number (or “undefined”):

Φ′[x; h]def= lim

λ→0+

Φ[x+ λh] − Φ[x]

λ.

Note: Φ′[x; 0] = 0, and for all r > 0,

Φ′[x; rh] = limλ→0+

Φ[x+ λrh]− Φ[x]

λ× r

r= r lim

λ→0+

Φ[x+ (λr)h]− Φ[x]

(λr)= rΦ′[x; h].

When x and Φ are such that Φ′[x; h] is defined for every h ∈ V , we say Φ is di-

rectionally differentiable at x, and define the derivative of Φ at x as the operatorDΦ[x]:V → R for which

DΦ[x](h) = Φ′[x; h] ∀h ∈ V.


10 PHILIP D. LOEWEN

Descent. If Φ′[x; h] < 0, then h 6= 0, and h provides a first-order descent direction

for Φ at x. That is, for 0 < λ≪ 1,

Φ′[x; h] ≈ Φ[x+ λh]− Φ[x]

λ=⇒ Φ[x+ λh] ≈ Φ[x] + λΦ′[x; h] < Φ[x]. (∗)

Directional Local Minima. A point x in X provides a Directional Local Mini-mum (DLM) for Φ over V exactly when, for every h ∈ V , there exists ε = ε(h) > 0so small that

∀λ ∈ (0, ε), Φ[x] ≤ Φ[x+ λh].

Intuitively, x is a DLM for Φ if it provides an ordinary local minimum in theone-variable sense along every line through x in the space V .

Proposition. If x gives a DLM for Φ over V , then

∀h ∈ V, Φ′[x; h] ≥ 0 (or Φ′[x; h] is undefined). (∗∗)

In particular, if Φ is directionally differentiable at x andDΦ[x] is linear, thenDΦ[x] =0 (“the zero operator”).

Proof. If (∗∗) is false, then Φ′[x; h] < 0 for some h ∈ X , and DLM definition iscontradicted by (∗). So (∗∗) must hold. Now if DΦ[x] is linear, then for arbitraryh ∈ V two applications of (∗∗) give

DΦ[x](h) = Φ′[x; h] ≥ 0,

−DΦ[x](h) = DΦ[x](−h) = Φ′[x;−h] ≥ 0

Thus 0 ≤ DΦ[x](h) ≤ 0, giving DΦ[x](h) = 0. Since this holds for arbitrary h ∈ V ,DΦ[x] must be the zero operator. ////

Application. When V = Rn and Φ:Rn → R has a gradient at x ∈ R

n, we calculate

Φ′[x; h] = limλ→0+

Φ[x+ λh]− Φ[x]

λ

= limλ→0+

φ(λ)− φ(0)

λfor φ(λ)

def=Φ[x+ λh]

= φ′(0) = ∇Φ(x)h.

Here DΦ[x] is the operator “left matrix multiplication by the 1× n matrix ∇Φ(x)”.This is linear, so the proposition above gives a familiar result: if x minimizes Φ overR

n, then ∇Φ(x) must be the zero vector.

Affine Constraints. In many practical minimization problems, the domain of theobjective function is confined to a proper subset of a real vector spaceX . The simplestsuch case arises when the subset S is affine, i.e., when

S = x0 + V = {x0 + h : h ∈ V }

for some element x0 ∈ X and subspace V of X .



Lemma. For S as above, one has S = z + V for every z ∈ S.

Proof. Since z ∈ S, there exists h0 ∈ V such that z = x0 + h0. Then

y ∈ S ⇔ y = x0 + h for some h ∈ V

⇔ y = (z − h0) + h for some h ∈ V

⇔ y = z + (h− h0) for some h ∈ V

⇔ y = z +H for some H ∈ V (H = h− h0).

////

Definition. Given a vector space X containing an element x0 and a subspace V ,let S = x0 + V and suppose Φ:S → R. A point x ∈ S gives a directional localminimum (DLM) for Φ relative to V if and only if

∀h ∈ V, ∃ε = ε(h) > 0 : ∀λ ∈ [0, ε), Φ[x] ≤ Φ[x+ λh].

Geometrically, x provides a DLM relative to V if x gives a local minimum inevery one-dimensional problem obtained by restricting Φ to a line passing through xand parallel to V .

To harness our previous results, we translate (shift) our reference point from theorigin of X to the point x, defining Ψ:V → R via

Ψ[h]def=Φ[x+ h], h ∈ V.

Note that

Ψ′[0; h] = limλ→0+

Ψ[λh]−Ψ[0]

λ= lim

λ→0+

Φ[x+ λh]− Φ[x]

λ= Φ′[x; h].

If x ∈ S gives a DLM for Φ relative to V , then 0 gives a DLM for Ψ over V , and ourprevious result applies to Ψ, giving

∀h ∈ V, Ψ′[0; h] ≥ 0 (or Ψ′[0; h] is undefined),

Summary. If x ∈ S = x0 + V solves the constrained problem

minx∈X

{Φ[x] : x ∈ x0 + V } ,

then∀h ∈ V, Φ′[x; h] ≥ 0 (or Φ′[x; h] is undefined),

If, in addition, Φ is differentiable on X and DΦ[x]:X → R is linear, then thenDΦ[x](h) = 0 for all h ∈ V .

Application. Consider X = Rn. Given a nonzero normal vector N ∈ R

n and pointx0 ∈ R

n, consider the hyperplane

S = {x ∈ Rn : N • (x− x0) = 0} .


12 PHILIP D. LOEWEN

Observe that x ∈ S iff x− x0 ∈ V , where

V = {h ∈ Rn : N • h = 0} .

This V is a subspace of Rn: it’s the set of all vectors perpendicular to N. Now

suppose some x in Rn minimizes Φ over S. We have DΦ[x](h) = 0 for all h ∈ V , i.e.,

∇Φ(x)h = 0 for all h ∈ V , i.e., ∇Φ(x) ⊥ V , i.e., ∇Φ(x) = −λN for some λ ∈ R. Insummary, if x minimizes Φ:Rn → R subject to the constraint N • (x− x0) = 0, then

0 = ∇[Φ(x) + λN • (x− x0)

]

x=x

.

That’s the Lagrange Multiplier Rule for the case of a single (linear) constraint!

C. Derivatives of Integral Functionals

Our “basic problem” involves minimization of Λ:C1[a, b] → R defined by Λ[x] =∫ b

aL(t, x(t), x(t)) dt for some given Lagrangian L ∈ C1 ([a, b]× R× R). So we pick

arbitrary x, h ∈ C1[a, b] and calculate

Λ′[x; h] = limλ→0+

1

λ

[Λ[x+ λh]− Λ[x]

](1)

= limλ→0+

1

λ

∫ b

a

[L(t, x(t) + λh(t), ˙x(t) + λh(t)

)− L(t, x(t), x(t))

]dt (2)

=

∫ b

a

limλ→0+

1

λ

[L(t, x(t) + λh(t), ˙x(t) + λh(t)

)− L

(t, x(t), ˙x(t)

)]dt (3)

=

∫ b

a

∂

∂λ

[L(t, x(t) + λh(t), ˙x(t) + λh(t)

)]

λ=0

dt (4)

=

∫ b

a

[Lx

(t, x(t), ˙x(t)

)h(t) + Lv

(t, x(t), ˙x(t)

)h(t)

]dt (5)

Here line (1) is the definition of the directional derivative, and line (2) comes fromthe definition of Λ. Passing from (2) to (3) requires that we interchange the limit andthe integral. In our case this is justified because the limit is approached uniformlyin t, a consequence of L ∈ C1. See, e.g., Walter Rudin, Real and Complex Analysis,

page 223. Existence of the derivative in (4) and its evaluation in (5) also follow fromour assumption that L ∈ C1; the definition of this derivative allows it to be evaluatedas shown inside the integral in (3).

Using the notation L(t) = L(t, x(t), ˙x(t)

)and likewise defining Lx(t) and Lv(t),

we summarize: if L ∈ C1 and x ∈ C1[a, b], then

Λ′[x; h] =

∫ b

a

[Lx(t)h(t) + Lv(t)h(t)

]dt ∀h ∈ C1[a, b]. (∗)

Integration by parts gives an equivalent expression. Let q(t) =∫ t

aLx(r) dr. Then

q(a) = 0 and

dq(t) = q(t) dt = Lx(t) dt ∀t ∈ [a, b].



(The latter follows from the Fundamental Theorem of Calculus, since Lx is continuouson [a, b].) Hence the first term on the right in (∗) is

∫ b

a

Lx(t)h(t) dt =

∫ b

a

h(t) dq

= q(t)h(t)

∣∣∣∣b

t=a

−∫ b

a

q(t) dh(t)

= q(b)h(b)−∫ b

a

Lx(t)h(t) dt

We conclude that for all h ∈ C1[a, b],

Λ′[x; h] =

[∫ b

a

Lx(t) dt

]h(b) +

∫ b

a

(Lv(t)−

∫ t

a

Lx(r) dr

)h(t) dt. (∗∗)

Note that this expression well-defined for all h ∈ C1[a, b], and linear in h.

D. Fundamental Lemma; Euler-Lagrange Equation

Given real numbers a < b, consider these subspaces of C1[a, b]:

VII ={h ∈ C1[a, b] : h(a) = 0, h(b) = 0

},

VI0 ={h ∈ C1[a, b] : h(a) = 0

},

V0I ={h ∈ C1[a, b] : h(b) = 0

},

VMN ={h ∈ C1[a, b] : Mh(a) = 0, Nh(b) = 0

},

V00 = C1[a, b].

Recall the basic problem of the COV:

minx∈C1[a,b]

{Λ[x] :=

∫ b

a

L (t, x(t), x(t)) dt : x(a) = A, x(b) = B

}.

With X = C1[a, b], the arcs competing for minimality in (P ) are those in the set

S = {x ∈ X : x(a) = A, x(b) = B} .

An arc is admissible for (P ) if it lies in S. This set is affine: pick any x0 in S (likethe straight-line curve x0(t) = A+ (t− a)(B −A)/(b− a)) to see

S = x0 + VII .

[Proof: One has x ∈ S if and only if x − x0 ∈ VII . This is easy.] Our generaldiscussion above shows that if some arc x in S gives a directional local minimum forΛ relative to VII , then every h ∈ VII must satisfy

0 = Λ′[x; h] =

∫ b

a

N(t)h(t) dt, where N(t) = Lv(t)−∫ t

a

Lx(r) dr.

This situation explains our interest in the Fundamental Lemma below.


14 PHILIP D. LOEWEN

Lemma (duBois-Reymond). If N : [a, b] → R is integrable, TFAE:

(a)

∫ b

a

N(t)h(t) dt = 0 for all h ∈ VII .

(b) The function N is essentially constant.

Proof. (b⇒a): If N(t) = c for all t in [a, b] (allowing finitely many exceptions), theneach h ∈ VII obeys ∫ b

a

ch(t) dt = ch(t)

∣∣∣∣b

t=a

= 0. (∗)

(a⇒b): Consider any piecewise continuous function N . Use the average value of N ,namely the constant

N =1

b− a

∫ b

a

N(r) dr,

to define

h(t) =

∫ t

a

(N −N(r)

)dr.

Obviously h ∈ C1[a, b] with h(a) = 0, but also (by definition of N)

h(b) =

∫ b

a

(N −N(r)

)dr = (b− a)N −

∫ b

a

N(r) dr = 0.

Therefore h ∈ VII ; in particular, using c = N in line (∗) gives∫ b

a

Nh(t) dt = 0.

It follows that∫ b

a

N(t)h(t) dt =

∫ b

a

(N(t)−N

)h(t) dt

=

∫ b

a

(N(t)−N

) (N −N(t)

)dt = −

∫ b

a

(N(t)−N

)2dt.

If (a) holds, this integral equals 0, and therefore N(t) = N for all t in [a, b] (with atmost finitely many exceptions). This proves (b). ////

Theorem (Euler-Lagrange Equation—Integral Form). If x is a directionallocal minimizer in the basic problem (P ), then there is a constant c such that

Lv(t) = c+

∫ t

a

Lx(r) dr ∀t ∈ [a, b]. (IEL)

Proof. As discussed above, directional local minimality implies that for every h inVII obeys

0 = Λ′[x; h] =

∫ b

a

N(t)h(t) dt, where N(t) = Lv(t)−∫ t

a

Lx(r) dr.

Thanks to the Fundamental Lemma, it follows that N is essentially constant. Butsince both L and x are C1, the function N is continuous on [a, b], so (IEL) holds atall points without exception. ////



Definition. An arc that satisfies (IEL) on some interval [a, b] is called an extremal

for L on that interval.

Remark. Note that (IEL) is the same for the function −L as it is for L, so it musthold also for a directional local maximizer in the Basic Problem. An extremal arc inthe COV is analogous to a “critical point” in ordinary calculus: the set of extremalarcs includes every arc that provides a (directional) local minimum or maximum, andpossibly some arcs that provide neither.

Remark. For any admissible arc x that fails to satisfy (IEL), the continuous function

N defined above will be nonconstant and the proof of the Fundamental Lemma showsthat

h(t) =

∫ t

a

(N −N(r)

)dr, where N =

1

b− a

∫ b

a

N(r) dr,

makes Λ′[x; h] < 0. That is, this h is a descent direction for Λ[·] relative to thenominal arc x. ////

Regularity Bonus. The integrand L(t, x, v) = 12v

2, for which Lv(t, x, v) = v, givesLv(t, x(t), x(t)) = x(t). So for typical arcs x ∈ C1[a, b], we can be sure that themapping t 7→ Lv(t, x(t), x(t)) is continuous, but perhaps no better. However, mini-mality exerts a little bias in favour of smoothness: in (IEL), the integrand on RHS

is continuous, so the RHS fcn of t is differentiable. This means that t 7→ Lv(t) isguaranteed to be differentiable, with

d

dtLv(t) = Lx(t) ∀t ∈ [a, b]. (DEL)

In fact, since the right side here is continuous, we have the following.

Corollary. If x gives a directional local minimum in problem (P), then Lv is con-tinuously differentiable, and (DEL) holds.

We discuss consequences of this in the next section.

Example. Find candidates for minimality in (P ) with L(t, x, v) = v2 − x2 and(a, A) = (0, 0), in cases

(i) (b1, B1) = (π/2, 1),

(ii) (b2, B2) = (3π/2, 1).

Note Lv(t, x, v) = 2v and Lx(t, x, v) = −2x.

If x minimizes Λ among all C1 curves from (0, 0) to (b1, B1), then (DEL) says

d

dt

(2 ˙x(t)

)∃=− 2x(t) ∀t.

That is, ¨x(t) = −x(t) for all t. This shows x ∈ C2, and gives the general solution

x(t) = c1 cos(t) + c2 sin(t), c1, c2 ∈ R.


16 PHILIP D. LOEWEN

(i) Here the boundary conditions give c1 = 0, c2 = 1. Unique candidate: x1(t) =sin(t). Later we’ll show that x1 gives a true [global] minimum:

Λ1[x1] = min

{Λ1[x] =

∫ π/2

0

(x(t)2 − x(t)2

)dt : x(0) = 0, x(π/2) = 1

}.

(ii) Here the BC’s identify the unique candidate x2(t) = − sin(t). Later we’ll showthat this does not give even a directional local minimum; moreover,

inf

{Λ2[x] =

∫ 3π/2

0

(x(t)2 − x(t)2

)dt : x(0) = 0, x(3π/2) = 1

}= −∞.

////

Special Case 1: L = L(t, v) is independent of x.

Here (IEL) reduces to a first-order ODE for x, involving an unknown constant:

Lv(t) = const.

Consider these subcases, where L = L(v) is also independent of t:

L = v2, L =√1 + v2, L =

( [v2 − 1

]+)2

.

In these three cases, every admissible extremal x is a global minimizer. To see this,let c = Lv( ˙x) and define

f(v) = L(v)− cv.

Then f ′(v) = Lv(v) − c is nondecreasing, with f ′( ˙x(t)) = 0, so that point gives aglobal minimum. In other words,

f(v) ≥ f( ˙x(t)) ∀v ∈ R, ∀t ∈ [a, b]. (∗)

Now every arc x obeying the BC’s has∫ b

acx(t) dt = c [x(b)− x(a)] = c [B −A], so

∫ b

a

f(x(t)) dt ≥∫ b

a

f( ˙x(t)) dt

∫ b

a

L(x(t)) dt− c [B −A] ≥∫ b

a

L( ˙x(t)) dt− c [B −A]

Λ[x] ≥ Λ[x].

(For L =√1 + v2, this proves that the arc of shortest length from (a, A) to (b, B)

is the straight line. The technical definition of the term “arc” here leaves room forsome improvement in this well-known conclusion.)



An Optimistic Calculation: Suppose x solves (IEL) and is in fact C2. (See“Regularity Bonus” above, and Section D below.) Then the Chain Rule and (DEL)together give

d

dtL(t, x(t), ˙x(t)

)= Lt(t) + Lx(t) ˙x(t) + Lv(t)¨x(t)

= Lt(t) +d

dt

[Lv(t) ˙x(t)

]by (DEL).

We rearrange this to get

d

dt

[L(t)− Lv(t) ˙x(t)

]= Lt(t) ∀t ∈ [a, b]. (WE2)

Special Case 2: L = L(x, v) is independent of t (“autonomous”). For every ex-tremal x of class C2, (WE2) implies that

L(t)− Lv(t) ˙x(t) = C ∀t ∈ [a, b],

for some constant C. In other words, the following function of 2 variables is constantalong every C2 arc solving (IEL):

L(x, v)− Lv(x, v) · v.

In Physics, a famous Lagrangian is L(x, v) = 12mv

2 − 12kx

2 (that’s KE minusPE for a simple mass-spring system). Here Lv = mv, and the function above worksout to (

12mv2 − 1

2kx2

)− (mv)v = −

(12mv2 + 1

2kx2

),

the total energy. For other Lagrangians, condition (WE2) expresses conservation ofenergy along real motions. ////

Caution. (WE2) and (IEL) are not quite equivalent, even for very smooth arcs. TheLagrangian L(x, v) = 1

2mv2 − 1

2kx2 illustrates this. As shown above, (WE2) holds

along an arc x if and only if

12m

˙x(t)2 + 12kx(t)

2 = const,

and this is true for any constant function x. However, (IEL) holds along x iff

m¨x(t) + kx(t) = 0,

and the only constant solution of this equation is x(t) = 0. The upshot: Everysmooth solution of (IEL) must obey (WE2), but (WE2) may have some spurioussolutions as well. Home practice: Show that if x ∈ C2 obeys (WE2) and satisfiesx(t) 6= 0 for all t, then x obeys (IEL) also. (Thus, many solutions of WE2 also obeyIEL, and we can predict which they are.) ////


18 PHILIP D. LOEWEN

D. Smoothness of Extremals

Here we explore the “regularity bonus” mentioned above in detail. The situationis especially good when L is quadratic in v.

Proposition. Suppose

L(t, x, v) = 12A(t, x)v

2 +B(t, x)v + C(t, x)

for C1 functions A,B,C. Then for any x obeying (IEL), with A(t, x(t)) 6= 0 for allt ∈ [a, b], we have x ∈ C2[a, b].

Proof. Here Lv(t, x, v) = A(t, x)v +B(t, x). Along the arc x, this identity gives

˙x(t) =1

A(t, x(t))

[Lv(t)−B(t, x(t))

], t ∈ [a, b].

The function of t on the RHS in C1, so ˙x ∈ C1[a, b], giving x ∈ C2[a, b]. ////

The previous result covers a surprising wealth of examples, but can still begeneralized. Recall the quadratic approximation:

f(v) ≈ f(v) +∇f(v)(v − v) + 12(v − v)TD2f(v)(v − v), v ≈ v.

Use this on the function f(v) = L(t, x, v) near v = ˙x(t):

L(t, x, v) ≈ L(t, x, v) + Lv(t, x, v)(v − v) + 12Lvv(t, x, v)(v − v)2.

Here the coefficient of 12v

2 is A(t, x) = Lvv(t, x, v). So we might expect the propo-

sition above to hold for general L, under the assumption that Lvv(t) 6= 0 for allt ∈ [a, b]. This turns out to be correct, but the reasons are not simple.

Classic M100 problem: Assuming the relation

v3 − v − t = 0

defines v as a function of t near the point (t, v) = (0, 0), find dv/dt there.

Solution: Differentiation gives

3v2dv

dt− dv

dt− 1 = 0 =⇒ dv

dt=

1

3v2 − 1.

At the point (t, v) = (0, 0), substitution gives

dv

dt

∣∣∣∣(t,v)=(0,0)

= −1.

The curve t = v3 − v is easy to draw in the (t, v)-plane. As the following sketchshows, there are three points on the curve satisfying t = 0, and the calculation above



finds the slope at just one of them:

t

v

α β

p

q

Figure 1: The curve v3 − v − t = 0 in (t, v)-space.

(The shaded rectangle will be described later.)

Classic M200 reformulation: Assuming the relation F (t, v) = 0 defines v as a functionof t near the point (t0, v0), find dv/dt at this point.

Solution:

0 =d

dtF (t, v(t)) = Ft(t, v(t)) + Fv(t, v(t))

dv

dt=⇒ dv

dt= −Ft(t, v(t))

Fv(t, v(t)).

Classic M321 theorem (Rudin, Principles, Thm. 9.28):

Implicit Function Theorem. An open set U ⊆ R2 is given, along with

(t0, v0) ∈ U and a function F :U → R (F = F (t, v)) such that both Ft andFv exist and are continuous at each point of U . Suppose F (t0, v0) = 0. IfFv(t0, v0) 6= 0, then there are open intervals (α, β) containing t0 and (p, q)containing v0 such that

(i) For each t ∈ (α, β), the equation F (t, v) = 0 holds for a unique pointv ∈ (p, q).

(ii) If we write ψ(t) for the unique v in (i), so that F (t, ψ(t)) = 0 for allt ∈ (α, β), then ψ ∈ C1(α, β), with

ψ(t) = −Ft(t, ψ(t))

Fv(t, ψ(t))∀t ∈ (α, β).

Illustrations. The equation z = F (t, v) defines a surface (the graph of F ) in R3

lying above the open set U . The (t, v)-plane in R3 is defined by the equation z = 0.


20 PHILIP D. LOEWEN

This plane slices the graph of F in the same curve we see in the M100 example above.

–1–0.5

00.5

1

t

–1–0.5

00.5

1

v

–1

0

1

Figure 2: The plane z = 0 slicing the surface z = F (t, v).

The Theorem presents conditions under which some open rectangle (α, β) × (p, q)centred at (t0, v0) contains a piece of the curve that coincides with the graph of a C1

function on (α, β). A rectangle consistent with this conclusion is shown in Figure 1.

We now prove the famous regularity theorem of Weierstrass/Hilbert.

Theorem (Weierstrass/Hilbert). Suppose L = L(t, x, v) is C2 on some open setcontaining the point (t0, x0, v0) ∈ [a, b]× R× R. Assume

Lvv(t0, x0, v0) 6= 0.

Then any arc x obeying (IEL) on some interval I containing t0, and satisfying

(t0, x(t0), ˙x(t0)) = (t0, x0, v0), must be C2 on some relatively open subinterval ofI containing t0. In particular, if L ∈ C2 everywhere and x ∈ C1[a, b] is extremal forL, then x ∈ C2[a, b].

Proof. Since x is an extremal, there is a constant c so that the function

F (t, v) := Lv(t, x(t), v)−∫ t

a

Lx(r) dr − c

obeys F (t, ˙x(t)) = 0 for all t ∈ [a, b]. In particular, F (t0, v0) = 0. Now function F isjointly C1 near (t0, v0), and

Fv(t0, v0) = Lvv(t0, x0, v0) 6= 0

by (ii). Hence the implicit function theorem cited above gives an open interval (α, β)around t0 and another open set U around v0 such that the conditions

F (t, ψ(t)) = 0, ψ(t) ∈ U

implicitly define a unique ψ ∈ C1(α, β). Since ˙x is continuous at t0, with ˙x(t0) = v0 ∈U , we may shrink (α, β) if necessary to guarantee that ˙x(t) ∈ U for all t ∈ I ∩ (α, β).

We already know F (t, ˙x(t)) = 0 in I ∩ (α, β) so uniqueness gives ψ(t) = ˙x(t) for all t

in this interval. But since ψ ∈ C1, this gives ˙x ∈ C1, i.e., x ∈ C2 (I ∩ (α, β)). ////



E. Natural Boundary Conditions

Our abstract theory is easily extended to problems where one or both of the endpointvalues are unconstrained. A missing boundary condition in the problem formulationgenerates allows a larger class of admissible variations, and this gives an extra conclu-sion in the first-order analysis. The extra conclusion is called a “Natural BoundaryCondition” because it arises naturally through minimization, rather than artificiallyin the formulation of the problem.

Consider, for example, this problem where the right endpoint is unconstrained:

min

{Λ[x] :=

∫ b

a

L(t, x(t), x(t)) dt : x(a) = A, x(b) ∈ R

}.

For any arc x with x(a) = A, the arc x+h is admissible for every variation h ∈ VI0 ={h ∈ C1[a, b] : h(a) = 0

}. If x happens to give a DLM relative to VI0, then every

h ∈ VI0 obeys

0 = DΛ[x](h) =

[∫ b

a

Lx(t) dt

]h(b) +

∫ b

a

(Lv(t)−

∫ t

a

Lx(r) dr

)h(t) dt. (∗)

Now VII ⊆ VI0, and knowing (∗) for h ∈ VII is enough to establish (IEL), as shownabove: thus there exists some constant c such that

Lv(t) = c+

∫ t

a

Lx(r) dr ∀t ∈ [a, b].

This equation is independent of h: it describes a property of x that can be usedanywhere, including in (∗) above. Hence, for all h ∈ VI0,

0 = DΛ[x](h) =[Lv(b)− c

]h(b) +

∫ b

a

ch(t) dt

=[Lv(b)− c

]h(b) + c [h(b)− h(a)] h(t) dt

= Lv(b)h(b).

Since h(b) is arbitrary, we get the natural boundary condition

Lv(b) = 0.

A similar argument, with a similar outcome, applies to problems when x(a) is freeto vary in R, but x(b) = B is required. When both endpoints are free, both natural


22 PHILIP D. LOEWEN

boundary conditions are in force. The table below summarizes these results:

BC’s in Prob Stmt Admissible Variations Natural BC’s

x(a) = A, x(b) = B VII ,

x(a) = A, VI0 , Lv(b) = 0

, x(b) = B V0I Lv(a) = 0,

, V00 = C1[a, b] Lv(a) = 0, Lv(b) = 0

Example. Consider the Brachistochrone Problem with a free right endpoint:

min

{Λ[x] =

∫ b

0

√1 + x(t)2

v20 + 2gx(t)dt : x(0) = 0

}.

Here Lv = v[(1 + v2)(v20 + 2gx)

]−1/2, and the natural boundary condition at t = b

says a minimizing curve must obey

0 = Lv(b) = v[(1 + v2)(v20 + 2gx)

]−1/2∣∣∣(x,v)=(x(b),

˙x(b))

.

It follows that ˙x(b) = 0: the minimizing curve must be horizontal at its right end.

[Troutman, page 156: “In 1696, Jakob Bernoulli publicly challenged his youngerbrother Johann to find the solutions to several problems in optimization including[this one] (thereby initiating a long, bitter, and pointless rivalry between two repre-sentatives of the best minds of their era).”] ////

HW02. Modify the derivation above to treat free-endpoint problems where the costto be minimized includes endpoint terms, so it looks like this:

k(x(a)) + ℓ(x(b)) +

∫ b

a

L(t, x(t), x(t)) dt.

Future Considerations. Later, we’ll study problems in which one or both endpointsare allowed to vary along given curves in the (t, x)-plane. The analysis above handlesonly rather special curves . . . namely, the vertical lines t = a and t = b.


Chapter I. First Variations c 2015,PhilipDLoewen

Documents

Transcript of Chapter I. First Variations c 2015,PhilipDLoewen