Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...

18
Screening Tests for the LASSO Hanxiao Liu [email protected] November 3, 2015 1 / 18

Transcript of Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...

Page 1: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Screening Tests for the LASSO

Hanxiao [email protected]

November 3, 2015

1 / 18

Page 2: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Outline

LASSO

Primal & Dual formPrimal-dual correspondence

Safe Test for LASSO

Static case (example: sphere test)Dynamic case

Better safe test based on duality gap

Geometric illustrationConvergence

Empirical Results

2 / 18

Page 3: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

LASSO

X ∈ Rn×p where p� n

minβ∈Rp

1

2‖Xβ − y‖22 + λ‖β‖1 (1)

Commonly used forhigh-dimensional feature selection

Today’s topic—screening tests

Rules to early discard irrelevant features prior to starting aLASSO solver without affecting the final opt solution.

The “chicken-and-egg problem”?

3 / 18

Page 4: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

LASSO Dual

Primal: minβ∈Rp,z∈Rn

1

2‖y − z‖22 + λ‖β‖1 s.t. z = Xβ (2)

Dual: maxu∈Rn

minβ,z

1

2‖y − z‖22 + λ‖β‖1 + u> (z −Xβ)︸ ︷︷ ︸

Lagrangian L(β,z,u)︸ ︷︷ ︸Dual Objective g(u)

(3)

=⇒ maxu

[minz

(1

2‖y − z‖22 + u>z

)− λmax

β

(u>X

λβ − ‖β‖1

)](4)

=⇒ maxu

1

2‖y‖22 −

1

2‖u− y‖22 − λI{∥∥∥X>u

λ

∥∥∥∞≤1} (5)

θ=uλ====⇒ max

θ∈Rn1

2‖y‖22 −

λ2

2

∥∥∥θ − y

λ

∥∥∥2

2s.t.

∥∥X>θ∥∥∞ ≤ 1 (6)

4 / 18

Page 5: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Karush-Khun-Tucker Condition

Primal-dual correspondence

β, z ∈ argminβ,z L(β, z, θ

)(7)

0n ∈ ∂βL(β, z, θ

)(8)

= ∂β

[1

2‖y − z‖22 + λ‖β‖1 + λθ>

(z −Xβ

)](9)

= λ∂β‖β‖1 − λX>θ (10)

x>j θ ∈ ∂βj |βj | =

{sign

(βj)

βj 6= 0

[−1, 1] βj = 0∀j ∈ [p] (11)

Key observation: |x>j θ| < 1 =⇒ βj = 0

5 / 18

Page 6: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Safe Test

|x>j θ| < 1 =⇒ βj = 0

Challenge: dual solution θ is unknown

Workaround: relaxation

Let C be a set containing θ and define µC (xj) := supθ∈C |x>j θ|.Obviously |x>j θ| ≤ µC (xj).

Safe Test

µC (xj) < 1 =⇒ βj = 0 (12)

The test is useful when

1 µC (xj) can be evaluated efficiently.

2 C is small—thus leading to small µC (xj).

Goal: Find C satisfying both conditions above.

6 / 18

Page 7: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Sphere Tests

Parametrize C as a closed `2-ball B (c, r) = {θ : ‖θ − c‖2 ≤ r}.

µC (xj) = µB(c,r)(xj) = supθ∈B(c,r)

|x>j θ| = |c>xj |+ r‖xj‖2 (13)

Q: How to find B(c, r) that contains θ without knowing θ?A: Any dual-feasible θ′ defines a ball over θ∥∥θ − y

λ︸︷︷︸c

∥∥2≤∥∥θ′ − y

λ

∥∥2︸ ︷︷ ︸

r

(14)

Recall: maxθ∈Rn12‖y‖

22 − λ2

2

∥∥θ − yλ

∥∥2

2s.t.

∥∥X>θ∥∥∞ ≤ 1︸ ︷︷ ︸

θ∈∆X

A trivial feasible point: θ′ = yλmax

where λmax := ‖X>y‖∞ 1.

=⇒ c =y

λ, r = ‖y‖

∣∣∣∣ 1λ − 1

λmax

∣∣∣∣ (15)

1In fact, θ′ obtained in this manner is the dual solution when λ = λmax,corresponding to all-zero primal solution.

7 / 18

Page 8: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Dynamic Safe Tests

To iteratively apply safe tests as the algorithm proceeds

Recall ∀θ′ ∈ ∆X defines an `2-ball containing θ

Let θk ∈ ∆X be a dual-feasible point at iteration k, {θk}k∈Ndefines a sequence of balls

{B( yλ ,∥∥θk − y

λ

∥∥2

)}k∈N

Each ball defines a safe test

How θk is defined via βk?

θk := Π∆X∩span(ρk)

( yλ

)where ρk := y −Xβk

Intuition1 Dual opt: θ = Π∆X

(yλ

)2 Primal-dual correspondence: θ ∈ span (ρ) where ρ := y −Xβ

8 / 18

Page 9: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Mind the Duality Gap

Can we better bound θ by also leveraging primal info?

∀ (β, θ) ∈ Rp ×∆X , we claim

R (β) ≤∥∥θ − y

λ

∥∥ ≤ R (θ) (16)

where

R (θ) = ‖θ − yλ‖ (dual optimality)

R (β) = 1λ

√(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖2)+ (duality gap)

weak duality

1

2‖y‖2 − λ2

2

∥∥θ − y

λ

∥∥2 ≤ 1

2‖y −Xβ‖2 + λ‖β‖1 (17)

Therefore, θ lies in an annulus, i.e. A(yλ , R (θ) , R (β)

)9 / 18

Page 10: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Geometric Illustration

Recall R (β) ≤∥∥θ − y

λ

∥∥ ≤ R (θ)

1[Fercoq et al., 2015]

10 / 18

Page 11: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Fine-grained Analysis

Two geometrical observations

1

[θ, θ]⊆ A

(yλ , R (θ) , R (θ)

)Proof.

Convexity of polyhedron ∆X =⇒ convexity of ∆X ∩B( yλ , R (θ)

)=⇒ convexity of ∆X ∩A

(yλ , R (θ) , R (β)

)containing θ, θ

2 vecAngle(θ − θ, yλ − θ

)≥ 90◦

Proof.

∀θ′ ∈[θ, θ]

we have ‖ yλ − θ‖ ≤ ‖yλ − θ

′‖ as θ, θ′ ∈ ∆X and

θ = Π∆X

( yλ

). However, suppose vecAngle

(θ − θ, yλ − θ

)< 90◦,

contradiction occurs by setting θ′ := Π[θ,θ]( yλ

).

11 / 18

Page 12: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Geometric Illustration

Recall vecAngle(θ − θ, yλ − θ

)≥ 90◦

1[Fercoq et al., 2015]

12 / 18

Page 13: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Safe Tests Refined

Two convex relaxation schemes

Sphere Crelaxed = B(θ,

√R (θ)2 − R (β)2︸ ︷︷ ︸

r(θ,β)

)(18)

Dome Crelaxed = conv (darkBlueRegion) (19)

1[Fercoq et al., 2015]13 / 18

Page 14: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Convergence

Proposition

Let G (β, θ) be the LASSO duality gap, ∀ (β, θ) ∈ Rp ×∆X wehave r (β, θ)2 ≤ 2

λ2G (β, θ)

1

2‖y‖2 − λ2

2

∥∥θ − y

λ

∥∥2︸ ︷︷ ︸dual obj

+G(β, θ) =1

2‖y −Xβ‖2 + λ‖β‖1︸ ︷︷ ︸

primal obj

(20)

2

λ2G(β, θ) =

∥∥θ − y

λ

∥∥2︸ ︷︷ ︸=R(θ)2

− 1

λ2

(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖1

)︸ ︷︷ ︸

≤R(β)2

(21)

≥r (β, θ)2 (22)

=⇒ limk→∞ r (βk, θk) = 0. Convergence of domes is also implied.

14 / 18

Page 15: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Experiment Results (a)

Proportion of active variables v.s. (1) num of iterations (2) λ

1[Fercoq et al., 2015]

15 / 18

Page 16: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Experiment Results (b)

Leukemiapn = 7129

72 ≈ 99.0

20NewsGrouppn = 10094

961 ≈ 10.5

RCV1pn = 47236

20242 ≈ 2.3

1[Fercoq et al., 2015]

16 / 18

Page 17: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Conclusion

Summary

LASSO - primal & dual

Safe rules - discard irrelevant variables prior to optimization

Refined safe rules based on duality gap

Additional Note

Safe rules can be applied to other models as well, e.g. supportvector machines [Ogawa et al., 2013]

17 / 18

Page 18: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference

Reference I

El Ghaoui, L., Viallon, V., and Rabbani, T. (2010).Safe feature elimination in sparse supervised learning technical report no.Technical report, UCB/EECS-2010-126, EECS Department, University ofCalifornia, Berkeley.

Fercoq, O., Gramfort, A., and Salmon, J. (2015).Mind the duality gap: safer rules for the lasso.arXiv preprint arXiv:1505.03410.

Ogawa, K., Suzuki, Y., and Takeuchi, I. (2013).Safe screening of non-support vectors in pathwise svm computation.In Proceedings of the 30th International Conference on Machine Learning, pages1382–1390.

Xiang, Z. J., Wang, Y., and Ramadge, P. J. (2014).Screening tests for lasso problems.arXiv preprint arXiv:1405.4897.

18 / 18