Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...

Post on 23-Jul-2020

8 views 0 download

Transcript of Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...

Screening Tests for the LASSO

Hanxiao Liuhanxiaol@cs.cmu.edu

November 3, 2015

1 / 18

Outline

LASSO

Primal & Dual formPrimal-dual correspondence

Safe Test for LASSO

Static case (example: sphere test)Dynamic case

Better safe test based on duality gap

Geometric illustrationConvergence

Empirical Results

2 / 18

LASSO

X ∈ Rn×p where p� n

minβ∈Rp

1

2‖Xβ − y‖22 + λ‖β‖1 (1)

Commonly used forhigh-dimensional feature selection

Today’s topic—screening tests

Rules to early discard irrelevant features prior to starting aLASSO solver without affecting the final opt solution.

The “chicken-and-egg problem”?

3 / 18

LASSO Dual

Primal: minβ∈Rp,z∈Rn

1

2‖y − z‖22 + λ‖β‖1 s.t. z = Xβ (2)

Dual: maxu∈Rn

minβ,z

1

2‖y − z‖22 + λ‖β‖1 + u> (z −Xβ)︸ ︷︷ ︸

Lagrangian L(β,z,u)︸ ︷︷ ︸Dual Objective g(u)

(3)

=⇒ maxu

[minz

(1

2‖y − z‖22 + u>z

)− λmax

β

(u>X

λβ − ‖β‖1

)](4)

=⇒ maxu

1

2‖y‖22 −

1

2‖u− y‖22 − λI{∥∥∥X>u

λ

∥∥∥∞≤1} (5)

θ=uλ====⇒ max

θ∈Rn1

2‖y‖22 −

λ2

2

∥∥∥θ − y

λ

∥∥∥2

2s.t.

∥∥X>θ∥∥∞ ≤ 1 (6)

4 / 18

Karush-Khun-Tucker Condition

Primal-dual correspondence

β, z ∈ argminβ,z L(β, z, θ

)(7)

0n ∈ ∂βL(β, z, θ

)(8)

= ∂β

[1

2‖y − z‖22 + λ‖β‖1 + λθ>

(z −Xβ

)](9)

= λ∂β‖β‖1 − λX>θ (10)

x>j θ ∈ ∂βj |βj | =

{sign

(βj)

βj 6= 0

[−1, 1] βj = 0∀j ∈ [p] (11)

Key observation: |x>j θ| < 1 =⇒ βj = 0

5 / 18

Safe Test

|x>j θ| < 1 =⇒ βj = 0

Challenge: dual solution θ is unknown

Workaround: relaxation

Let C be a set containing θ and define µC (xj) := supθ∈C |x>j θ|.Obviously |x>j θ| ≤ µC (xj).

Safe Test

µC (xj) < 1 =⇒ βj = 0 (12)

The test is useful when

1 µC (xj) can be evaluated efficiently.

2 C is small—thus leading to small µC (xj).

Goal: Find C satisfying both conditions above.

6 / 18

Sphere Tests

Parametrize C as a closed `2-ball B (c, r) = {θ : ‖θ − c‖2 ≤ r}.

µC (xj) = µB(c,r)(xj) = supθ∈B(c,r)

|x>j θ| = |c>xj |+ r‖xj‖2 (13)

Q: How to find B(c, r) that contains θ without knowing θ?A: Any dual-feasible θ′ defines a ball over θ∥∥θ − y

λ︸︷︷︸c

∥∥2≤∥∥θ′ − y

λ

∥∥2︸ ︷︷ ︸

r

(14)

Recall: maxθ∈Rn12‖y‖

22 − λ2

2

∥∥θ − yλ

∥∥2

2s.t.

∥∥X>θ∥∥∞ ≤ 1︸ ︷︷ ︸

θ∈∆X

A trivial feasible point: θ′ = yλmax

where λmax := ‖X>y‖∞ 1.

=⇒ c =y

λ, r = ‖y‖

∣∣∣∣ 1λ − 1

λmax

∣∣∣∣ (15)

1In fact, θ′ obtained in this manner is the dual solution when λ = λmax,corresponding to all-zero primal solution.

7 / 18

Dynamic Safe Tests

To iteratively apply safe tests as the algorithm proceeds

Recall ∀θ′ ∈ ∆X defines an `2-ball containing θ

Let θk ∈ ∆X be a dual-feasible point at iteration k, {θk}k∈Ndefines a sequence of balls

{B( yλ ,∥∥θk − y

λ

∥∥2

)}k∈N

Each ball defines a safe test

How θk is defined via βk?

θk := Π∆X∩span(ρk)

( yλ

)where ρk := y −Xβk

Intuition1 Dual opt: θ = Π∆X

(yλ

)2 Primal-dual correspondence: θ ∈ span (ρ) where ρ := y −Xβ

8 / 18

Mind the Duality Gap

Can we better bound θ by also leveraging primal info?

∀ (β, θ) ∈ Rp ×∆X , we claim

R (β) ≤∥∥θ − y

λ

∥∥ ≤ R (θ) (16)

where

R (θ) = ‖θ − yλ‖ (dual optimality)

R (β) = 1λ

√(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖2)+ (duality gap)

weak duality

1

2‖y‖2 − λ2

2

∥∥θ − y

λ

∥∥2 ≤ 1

2‖y −Xβ‖2 + λ‖β‖1 (17)

Therefore, θ lies in an annulus, i.e. A(yλ , R (θ) , R (β)

)9 / 18

Geometric Illustration

Recall R (β) ≤∥∥θ − y

λ

∥∥ ≤ R (θ)

1[Fercoq et al., 2015]

10 / 18

Fine-grained Analysis

Two geometrical observations

1

[θ, θ]⊆ A

(yλ , R (θ) , R (θ)

)Proof.

Convexity of polyhedron ∆X =⇒ convexity of ∆X ∩B( yλ , R (θ)

)=⇒ convexity of ∆X ∩A

(yλ , R (θ) , R (β)

)containing θ, θ

2 vecAngle(θ − θ, yλ − θ

)≥ 90◦

Proof.

∀θ′ ∈[θ, θ]

we have ‖ yλ − θ‖ ≤ ‖yλ − θ

′‖ as θ, θ′ ∈ ∆X and

θ = Π∆X

( yλ

). However, suppose vecAngle

(θ − θ, yλ − θ

)< 90◦,

contradiction occurs by setting θ′ := Π[θ,θ]( yλ

).

11 / 18

Geometric Illustration

Recall vecAngle(θ − θ, yλ − θ

)≥ 90◦

1[Fercoq et al., 2015]

12 / 18

Safe Tests Refined

Two convex relaxation schemes

Sphere Crelaxed = B(θ,

√R (θ)2 − R (β)2︸ ︷︷ ︸

r(θ,β)

)(18)

Dome Crelaxed = conv (darkBlueRegion) (19)

1[Fercoq et al., 2015]13 / 18

Convergence

Proposition

Let G (β, θ) be the LASSO duality gap, ∀ (β, θ) ∈ Rp ×∆X wehave r (β, θ)2 ≤ 2

λ2G (β, θ)

1

2‖y‖2 − λ2

2

∥∥θ − y

λ

∥∥2︸ ︷︷ ︸dual obj

+G(β, θ) =1

2‖y −Xβ‖2 + λ‖β‖1︸ ︷︷ ︸

primal obj

(20)

2

λ2G(β, θ) =

∥∥θ − y

λ

∥∥2︸ ︷︷ ︸=R(θ)2

− 1

λ2

(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖1

)︸ ︷︷ ︸

≤R(β)2

(21)

≥r (β, θ)2 (22)

=⇒ limk→∞ r (βk, θk) = 0. Convergence of domes is also implied.

14 / 18

Experiment Results (a)

Proportion of active variables v.s. (1) num of iterations (2) λ

1[Fercoq et al., 2015]

15 / 18

Experiment Results (b)

Leukemiapn = 7129

72 ≈ 99.0

20NewsGrouppn = 10094

961 ≈ 10.5

RCV1pn = 47236

20242 ≈ 2.3

1[Fercoq et al., 2015]

16 / 18

Conclusion

Summary

LASSO - primal & dual

Safe rules - discard irrelevant variables prior to optimization

Refined safe rules based on duality gap

Additional Note

Safe rules can be applied to other models as well, e.g. supportvector machines [Ogawa et al., 2013]

17 / 18

Reference I

El Ghaoui, L., Viallon, V., and Rabbani, T. (2010).Safe feature elimination in sparse supervised learning technical report no.Technical report, UCB/EECS-2010-126, EECS Department, University ofCalifornia, Berkeley.

Fercoq, O., Gramfort, A., and Salmon, J. (2015).Mind the duality gap: safer rules for the lasso.arXiv preprint arXiv:1505.03410.

Ogawa, K., Suzuki, Y., and Takeuchi, I. (2013).Safe screening of non-support vectors in pathwise svm computation.In Proceedings of the 30th International Conference on Machine Learning, pages1382–1390.

Xiang, Z. J., Wang, Y., and Ramadge, P. J. (2014).Screening tests for lasso problems.arXiv preprint arXiv:1405.4897.

18 / 18