Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf ·...

38
Convexity Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta Course website: http://cseweb.ucsd.edu/classes/wi19/cse151-b/

Transcript of Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf ·...

Page 1: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Convexity

Instructor: Taylor Berg-KirkpatrickSlides: Sanjoy Dasgupta

Course website:http://cseweb.ucsd.edu/classes/wi19/cse151-b/

Page 2: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Convexity

ba

A function f : Rd → R is convex if for all a, b ∈ Rd and0 < θ < 1,

f (θa + (1− θ)b) ≤ θf (a) + (1− θ)f (b).

It is strictly convex if strict inequality holds for all a 6= b.

f is concave ⇔ −f is convex

Page 3: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Convexity

ba

A function f : Rd → R is convex if for all a, b ∈ Rd and0 < θ < 1,

f (θa + (1− θ)b) ≤ θf (a) + (1− θ)f (b).

It is strictly convex if strict inequality holds for all a 6= b.

f is concave ⇔ −f is convex

Page 4: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Convexity

ba

A function f : Rd → R is convex if for all a, b ∈ Rd and0 < θ < 1,

f (θa + (1− θ)b) ≤ θf (a) + (1− θ)f (b).

It is strictly convex if strict inequality holds for all a 6= b.

f is concave ⇔ −f is convex

Page 5: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Checking convexity

A function on one variable, f : R→ R, is convex if its secondderivative is ≥ 0 everywhere.

Example: f (z) = z2

Page 6: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Checking convexity

A function on one variable, f : R→ R, is convex if its secondderivative is ≥ 0 everywhere.

Example: f (z) = z2

Page 7: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

First and second derivatives of multivariate functions

For a function f : Rd → R,

• the first derivative is a vector with d entries:

∇f (z) =

∂f∂z1...∂f∂zd

• the second derivative is a d × d matrix, the Hessian H(z):

Hjk =∂2f

∂zj∂zk

Page 8: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Find the second derivative matrix of f (z) = ‖z‖2.

f (z) =d∑

i=1

z2i

∇f (z) =

2z1...

2zd

∇2f (z) = 2I

Page 9: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Find the second derivative matrix of f (z) = ‖z‖2.

f (z) =d∑

i=1

z2i

∇f (z) =

2z1...

2zd

∇2f (z) = 2I

Page 10: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Find the second derivative matrix of f (z) = ‖z‖2.

f (z) =d∑

i=1

z2i

∇f (z) =

2z1...

2zd

∇2f (z) = 2I

Page 11: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Find the second derivative matrix of f (z) = ‖z‖2.

f (z) =d∑

i=1

z2i

∇f (z) =

2z1...

2zd

∇2f (z) = 2I

Page 12: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Second-derivative test for convexity

A function f : Rd → R is convex if its matrix of second derivativesis positive semidefinite everywhere.

Recall: every square matrix M encodes a quadratic function:

x 7→ xTMx =d∑

i ,j=1

Mijxixj

(M is a d × d matrix and x is a vector in Rd)

Sometimes xTMx is always ≥ 0, no matter what x you plug in.

Page 13: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Second-derivative test for convexity

A function f : Rd → R is convex if its matrix of second derivativesis positive semidefinite everywhere.

Recall: every square matrix M encodes a quadratic function:

x 7→ xTMx =d∑

i ,j=1

Mijxixj

(M is a d × d matrix and x is a vector in Rd)

Sometimes xTMx is always ≥ 0, no matter what x you plug in.

Page 14: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A hierarchy of square matrices

Square

Positivesemidefinite

Positive definite

Symmetric

M 2 Rd⇥d

M = MT

zT Mz � 0 for all z 2 Rd

zT Mz > 0 for all z 6= 0

[

[

[

Page 15: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

PSD or not?

•(

1 11 1

)

(x1 x2

)(1 11 1

)(x1x2

)=(x1 x2

)(x1 + x2x1 + x2

)

= x21 + 2x1x2 + x22

= (x1 + x2)2

•(

1 22 1

)

(x1 x2

)(1 22 1

)(x1x2

)= x21 + 4x1x2 + x22

= (x1 + x2)2 + 2x1x2

Page 16: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

PSD or not?

•(

1 11 1

)

(x1 x2

)(1 11 1

)(x1x2

)=(x1 x2

)(x1 + x2x1 + x2

)

= x21 + 2x1x2 + x22

= (x1 + x2)2

•(

1 22 1

)

(x1 x2

)(1 22 1

)(x1x2

)= x21 + 4x1x2 + x22

= (x1 + x2)2 + 2x1x2

Page 17: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

PSD or not?

•(

1 11 1

)

(x1 x2

)(1 11 1

)(x1x2

)=(x1 x2

)(x1 + x2x1 + x2

)

= x21 + 2x1x2 + x22

= (x1 + x2)2

•(

1 22 1

)

(x1 x2

)(1 22 1

)(x1x2

)= x21 + 4x1x2 + x22

= (x1 + x2)2 + 2x1x2

Page 18: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

PSD or not?

•(

1 11 1

)

(x1 x2

)(1 11 1

)(x1x2

)=(x1 x2

)(x1 + x2x1 + x2

)

= x21 + 2x1x2 + x22

= (x1 + x2)2

•(

1 22 1

)

(x1 x2

)(1 22 1

)(x1x2

)= x21 + 4x1x2 + x22

= (x1 + x2)2 + 2x1x2

Page 19: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

When is a diagonal matrix PSD?

A =

a1a2

. . .

an

xTAx = a1x21 + a2x

22 + · · ·+ anx

2n

Page 20: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

When is a diagonal matrix PSD?

A =

a1a2

. . .

an

xTAx = a1x21 + a2x

22 + · · ·+ anx

2n

Page 21: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

If M,N are of the same size and PSD, must M + N be PSD?

xT (M + N)x = xTMx + xTNx

Page 22: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

A symmetric matrix M is positive semidefinite (psd) if:

zTMz ≥ 0 for all vectors z

If M,N are of the same size and PSD, must M + N be PSD?

xT (M + N)x = xTMx + xTNx

Page 23: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Checking if a matrix is PSD

A matrix M is PSD if and only if it can be written as M = UUT

for some matrix U.

Quick check: say U ∈ Rr×d and M = UUT .

1 M is square.

2 M is symmetric.

3 Pick any z ∈ Rr . Then

zTMz = zTUUT z = (zTU)(UT z)

= (UT z)T (UT z)

= ‖UT z‖2 ≥ 0.

Another useful fact: any covariance matrix is PSD.

Page 24: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Checking if a matrix is PSD

A matrix M is PSD if and only if it can be written as M = UUT

for some matrix U.

Quick check: say U ∈ Rr×d and M = UUT .

1 M is square.

2 M is symmetric.

3 Pick any z ∈ Rr . Then

zTMz = zTUUT z = (zTU)(UT z)

= (UT z)T (UT z)

= ‖UT z‖2 ≥ 0.

Another useful fact: any covariance matrix is PSD.

Page 25: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Checking if a matrix is PSD

A matrix M is PSD if and only if it can be written as M = UUT

for some matrix U.

Quick check: say U ∈ Rr×d and M = UUT .

1 M is square.

2 M is symmetric.

3 Pick any z ∈ Rr . Then

zTMz = zTUUT z = (zTU)(UT z)

= (UT z)T (UT z)

= ‖UT z‖2 ≥ 0.

Another useful fact: any covariance matrix is PSD.

Page 26: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Another useful fact: any covariance matrix is PSD.

Σ =1

N(X −M)T (X −M)

=

(1√N

(X −M)

)T ( 1√N

(X −M)

)

where M ∈ RN×d and

M =

µ...µ

Page 27: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Second-derivative test for convexity

A function (of several variables) is convex if its second-derivativematrix is positive semidefinite everywhere.

More formally:Suppose that for f : Rd → R, the second partial derivatives existeverywhere and are continuous functions of z . Then:

1 H(z) is a symmetric matrix

2 f is convex ⇔ H(z) is positive semidefinite for all z ∈ Rd

Page 28: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Is f (x) = ‖x‖2 convex?

• Recall:

∇2f (x) = 2I

• This is a diagonal matrix with all positive entries along thediagonal.

Page 29: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Is f (x) = ‖x‖2 convex?

• Recall:

∇2f (x) = 2I

• This is a diagonal matrix with all positive entries along thediagonal.

Page 30: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Example

Is f (x) = ‖x‖2 convex?

• Recall:

∇2f (x) = 2I

• This is a diagonal matrix with all positive entries along thediagonal.

Page 31: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Fix any vector u ∈ Rd . Is this function f : Rd → R convex?

f (z) = (u · z)2

∇f (z) = 2(u · z)u

∇2f (z) = 2

u21 u1u2 · · · u1udu1u2 u22 · · · u2ud

......

......

u1ud u2ud · · · u2d

= 2uuT

Page 32: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Fix any vector u ∈ Rd . Is this function f : Rd → R convex?

f (z) = (u · z)2

∇f (z) = 2(u · z)u

∇2f (z) = 2

u21 u1u2 · · · u1udu1u2 u22 · · · u2ud

......

......

u1ud u2ud · · · u2d

= 2uuT

Page 33: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Least-squares regression

Recall loss function: for data points (x (i), y (i)) ∈ Rd × R,

L(w) =n∑

i=1

(y (i) − (w · x (i)))2

Page 34: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Logistic regression

Recall loss function: for data (x (i), y (i)) ∈ Rd × {−1,+1},

L(w) =n∑

i=1

ln(1 + e−y (i)(w ·x(i)))

We earlier found the first derivative:

∂L

∂wj= −

n∑

i=1

y (i)x(i)j

1

1 + ey(i)(w ·x(i))

.

Page 35: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Logistic regression

Recall loss function: for data (x (i), y (i)) ∈ Rd × {−1,+1},

L(w) =n∑

i=1

ln(1 + e−y (i)(w ·x(i)))

We earlier found the first derivative:

∂L

∂wj= −

n∑

i=1

y (i)x(i)j

1

1 + ey(i)(w ·x(i))

.

Page 36: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Logistic regression, cont’d

Second derivative: the (j , k) entry of the Hessian H(w) is

∂L

∂wk∂wj=

n∑

i=1

(x(i)j x

(i)k

1

1 + ew ·x(i)1

1 + e−w ·x(i)

)

This is uj · uk , where vectors u1, . . . , ud ∈ Rn are defined as follows:

uj has ith coordinate x(i)j

√1

(1 + ew ·x(i))(1 + e−w ·x(i))

Therefore H(w) = UUT , where U is the matrix with rows uj .Convex!

Page 37: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Logistic regression, cont’d

Second derivative: the (j , k) entry of the Hessian H(w) is

∂L

∂wk∂wj=

n∑

i=1

(x(i)j x

(i)k

1

1 + ew ·x(i)1

1 + e−w ·x(i)

)

This is uj · uk , where vectors u1, . . . , ud ∈ Rn are defined as follows:

uj has ith coordinate x(i)j

√1

(1 + ew ·x(i))(1 + e−w ·x(i))

Therefore H(w) = UUT , where U is the matrix with rows uj .Convex!

Page 38: Convexity - cseweb.ucsd.educseweb.ucsd.edu/classes/fa19/cse151-a/convexity-annotated.pdf · Second-derivative test for convexity A function f : Rd!R is convex if its matrix of second

Logistic regression, cont’d

Second derivative: the (j , k) entry of the Hessian H(w) is

∂L

∂wk∂wj=

n∑

i=1

(x(i)j x

(i)k

1

1 + ew ·x(i)1

1 + e−w ·x(i)

)

This is uj · uk , where vectors u1, . . . , ud ∈ Rn are defined as follows:

uj has ith coordinate x(i)j

√1

(1 + ew ·x(i))(1 + e−w ·x(i))

Therefore H(w) = UUT , where U is the matrix with rows uj .Convex!