Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient...

38
Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians Constantinos Daskalakis, MIT Gautam Kamath, MIT

Transcript of Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient...

Page 1: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Faster and Sample Near-Optimal Algorithms for Proper Learning

Mixtures of GaussiansConstantinos Daskalakis, MIT

Gautam Kamath, MIT

Page 2: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

What’s a Gaussian Mixture Model (GMM)?

• Interpretation 1: PDF is a convex combination of Gaussian PDFs• 𝑝 𝑥 = Σ𝑖𝑤𝑖𝒩 𝜇𝑖 , 𝜎𝑖

2, 𝑥

• Interpretation 2: Several unlabeled Gaussian populations, mixed together

• Focus on mixtures of 2 Gaussians (2-GMM) in one dimension

Page 3: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

PAC (Proper) Learning Model

• Output (with high probability) a 2-GMM 𝑋 which is close to 𝑋 (in statistical distance*)

• Algorithm design goals:• Minimize sample size• Minimize time

𝑋 ∈ ℱ 𝒜𝑥1, 𝑥2, … , 𝑥𝑚~ 𝑋 𝑋 ∈ ℱ

*statistical distance = total variation distance = ½ × 𝐿1 distance

Page 4: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Learning Goals

Improper Proper Parameter

Target

Page 5: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Learning Goals

Improper Proper Parameter

Target

Page 6: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Prior Work

Learning Model Sample Complexity Time Complexity

Improper Learning

Proper Learning

Parameter Estimation 𝑝𝑜𝑙𝑦 1/휀 𝑝𝑜𝑙𝑦 1/휀 [KMV10]

Page 7: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Prior Work

Learning Model Sample Complexity Time Complexity

Improper Learning 𝑂 1/휀2 𝑝𝑜𝑙𝑦 1/휀 [CDSS14]

Proper Learning

Parameter Estimation 𝑝𝑜𝑙𝑦 1/휀 𝑝𝑜𝑙𝑦 1/휀 [KMV10]

Page 8: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Prior Work

Learning Model Sample Complexity Time Complexity

Improper Learning 𝑂 1/휀2 𝑝𝑜𝑙𝑦 1/휀 [CDSS14]

Proper Learning 𝑂 1/휀2

𝑶 𝟏/𝜺𝟐 𝑂 1/휀7 [AJOS14] 𝑶 𝟏/𝜺𝟓 [DK14]

Parameter Estimation 𝑝𝑜𝑙𝑦 1/휀 𝑝𝑜𝑙𝑦 1/휀 [KMV10]

Page 9: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Prior Work

Learning Model Sample Complexity Time Complexity

Improper Learning 𝑂 1/휀2 𝑝𝑜𝑙𝑦 1/휀 [CDSS14]

Proper Learning 𝑂 1/휀2

𝑶 𝟏/𝜺𝟐 𝑂 1/휀7 [AJOS14] 𝑶 𝟏/𝜺𝟓 [DK14]

Parameter Estimation 𝑝𝑜𝑙𝑦 1/휀 𝑝𝑜𝑙𝑦 1/휀 [KMV10]

• Sample lower bounds:• Improper and Proper Learning: Ω 1/휀2 [folklore]

• Parameter Estimation: Ω 1/휀6 [HP14]• Matching upper bound, but not immediately extendable to proper learning

Page 10: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

Page 11: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

Page 12: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

Page 13: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Some Tools Along the Way

a) How to remove part of a distribution which we already know

b) How to robustly estimate parameters of a distribution

c) How to pick a good hypothesis from a pool of hypotheses

Page 14: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

Page 15: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Who Do We Want In Our Pool?

• Hypothesis: ( 𝑤, 𝜇1, 𝜎1, 𝜇2, 𝜎2)

• Need at least one “good” hypothesis

• Parameters are close to true parameters• Implies desired statistical distance bound

• Want:• 𝑤 − 𝑤 ≤ 휀

• 𝜇𝑖 − 𝜇𝑖 ≤ 휀𝜎𝑖• 𝜎𝑖 − 𝜎𝑖 ≤ 휀𝜎𝑖

Page 16: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Warm Up: Learning one Gaussian

• Easy!

• 𝜇 = sample mean

• 𝜎2 = sample variance

Page 17: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The Real Deal: Mixtures of Two Gaussians

• Harder!

• Sample moments mix up samples from each component

• Plan:• Tall, skinny Gaussian* stands out –

learn it first

• Remove it from the mixture

• Learn one Gaussian (easy?)

*Component with maximum 𝑤𝑖

𝜎𝑖

Page 18: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The First Component

• Claim: Using 𝑂12 sample size, can

generate 𝑂13 candidates

𝑤, 𝜇1, 𝜎1 , at least one is close to the taller component

• If we knew which candidate was right, could we remove this component?

Page 19: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Some Tools Along the Way

a) How to remove part of a distribution which we already know

b) How to robustly estimate parameters of a distribution

c) How to pick a good hypothesis from a pool of hypotheses

Page 20: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Dvoretzky–Kiefer–Wolfowitz (DKW) inequality

• Using sample of size 𝑂12 from a distribution 𝑋, can output 𝑋 such

that 𝑑𝐾 𝑋, 𝑋 ≤ 휀

• Kolmogorov distance – CDFs of distributions are close in 𝐿∞distance• Weaker than statistical distance

• Works for any probability distribution!

Page 21: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Subtracting out the known component

PDF to CDF

Page 22: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Subtracting out the known component

DKW inequality

Page 23: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Subtracting out the known component

Subtract component

Page 24: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Subtracting out the known component

Monotonize

Page 25: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Warm Up (?): Learning one (almost) Gaussian

Page 26: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Warm Up (?): Learning one (almost) Gaussian

Page 27: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Warm Up (?): Learning one (almost) Gaussian

Page 28: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Some Tools Along the Way

a) How to remove part of a distribution which we already know

b) How to robustly estimate parameters of a distribution

c) How to pick a good hypothesis from a pool of hypotheses

Page 29: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Robust Statistics

• Broad field of study in statistics

• Median

• Interquartile range

• Recover original parameters (approximately), even for distributions at distance 휀

• Entirely determined by the other component

• Still 𝑂13 candidates!

Page 30: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

Page 31: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Some Tools Along the Way

a) How to remove part of a distribution which we already know

b) How to robustly estimate parameters of a distribution

c) How to pick a good hypothesis from a pool of hypotheses

Page 32: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Hypothesis Selection

• 𝑁 candidate distributions

• At least one is 휀-close to 𝑋 (in statistical distance)

• Goal: Return candidate which is 𝑂 휀 -close to 𝑋

𝑐휀

𝑋

Page 33: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Hypothesis Selection

• Classical approaches [Yat85], [DL01]• Scheffé estimator, computation of the “Scheffé set” (potentially hard)

• 𝑂(𝑁2) time

• Acharya et al. [AJOS14]• Based on Scheffé estimator

• 𝑂(𝑁 log𝑁) time

• Our result [DK14]• General estimator, minimal access to hypotheses

• 𝑂(𝑁 log𝑁) time

• Milder dependence on error probability

Page 34: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Hypothesis Selection

• Input:• Sample access to 𝑋 and hypotheses ℋ = 𝐻1, … , 𝐻𝑁• PDF comparator for each pair 𝐻𝑖 , 𝐻𝑗• Accuracy parameter 휀 , confidence parameter 𝛿

• Output:• 𝐻 ∈ ℋ• If there is a 𝐻∗ ∈ ℋ such that 𝑑𝑠𝑡𝑎𝑡 𝐻

∗, 𝑋 ≤ 휀, then 𝑑𝑠𝑡𝑎𝑡 𝐻,𝑋 ≤ 𝑂(휀) with probability ≥1 − 𝛿

• Sample complexity: 𝑂log 1/𝛿

2 log𝑁

• Time complexity: 𝑂log 1/𝛿

2 𝑁 log𝑁 + log21

𝛿

• Expected time complexity: 𝑂𝑁 log 𝑁/𝛿

2

Page 35: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Hypothesis Selection

• Naive: Tournament among candidate hypotheses; compare every pair; output hypothesis with most wins

• Us: Set up a single-elimination tournament• Issue: error doubles at every level of tree; log𝑁 levels →Ω(2log 𝑁 휀) error

• Better analysis via double-window argument:• Great hypotheses: those within 휀 of target

• Good hypotheses: those within 8휀 of target

• Bad hypotheses: the rest

• Show: if density of good hypotheses small, error propagation won’t happen

• If density large, sub-sample 𝑁 hypotheses; run naive tournament

Page 36: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Putting It All Together

• 𝑁 = 𝑂13 candidates

• Use hypothesis selection algorithm to pick one

• Sample complexity: 𝑂 log(1/𝛿) /휀2

• Time complexity: 𝑂 log3(1/𝛿) /휀5

Page 37: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Open Questions

• Faster algorithms for 2-GMMs

• Time complexity of k-GMMs

• High dimensions

Page 38: Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Efficient Density Estimation via Piecewise Polynomial Approximation. •[DL01] Luc Devroye and Gabor

Bibliography

• [AJOS14] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theerta Suresh. Near-optimal-sample estimators for spherical Gaussian mixtures.

• [CDSS14] Siu On Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Efficient Density Estimation via Piecewise Polynomial Approximation.

• [DL01] Luc Devroye and Gabor Lugosi. Combinatorial Methods in Density Estimation.

• [HP14] Moritz Hardt and Eric Price. Sharp bounds for learning a mixture of two Gaussians.

• [KMV10] Adam Kalai, Ankur Moitra, Gregory Valiant. Efficiently Learning Mixtures of Two Gaussians.

• [Yat85] Yannis Yatracos. Rates of convergence of minimum distance estimators and Kolmogorov’s entropy.