Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Eﬃcient...

Faster and Sample Near-Optimal Algorithms for Proper Learning

Mixtures of GaussiansConstantinos Daskalakis, MIT

Gautam Kamath, MIT

What’s a Gaussian Mixture Model (GMM)?

• Interpretation 1: PDF is a convex combination of Gaussian PDFs• 𝑝 𝑥 = Σ𝑖𝑤𝑖𝒩 𝜇𝑖 , 𝜎𝑖

2, 𝑥

• Interpretation 2: Several unlabeled Gaussian populations, mixed together

• Focus on mixtures of 2 Gaussians (2-GMM) in one dimension

PAC (Proper) Learning Model

• Output (with high probability) a 2-GMM 𝑋 which is close to 𝑋 (in statistical distance*)

• Algorithm design goals:• Minimize sample size• Minimize time

𝑋 ∈ ℱ 𝒜𝑥1, 𝑥2, … , 𝑥𝑚~ 𝑋 𝑋 ∈ ℱ

*statistical distance = total variation distance = ½ × 𝐿1 distance

Learning Goals

Improper Proper Parameter

Target

Learning Goals

Improper Proper Parameter

Target

Prior Work

Learning Model Sample Complexity Time Complexity

Improper Learning

Proper Learning

Parameter Estimation 𝑝𝑜𝑙𝑦 1/휀 𝑝𝑜𝑙𝑦 1/휀 [KMV10]

Prior Work

Improper Learning 𝑂 1/휀2 𝑝𝑜𝑙𝑦 1/휀 [CDSS14]

Proper Learning

Prior Work

Proper Learning 𝑂 1/휀2

𝑶 𝟏/𝜺𝟐 𝑂 1/휀7 [AJOS14] 𝑶 𝟏/𝜺𝟓 [DK14]

Prior Work

Proper Learning 𝑂 1/휀2

𝑶 𝟏/𝜺𝟐 𝑂 1/휀7 [AJOS14] 𝑶 𝟏/𝜺𝟓 [DK14]

• Sample lower bounds:• Improper and Proper Learning: Ω 1/휀2 [folklore]

• Parameter Estimation: Ω 1/휀6 [HP14]• Matching upper bound, but not immediately extendable to proper learning

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

The Plan

Some Tools Along the Way

a) How to remove part of a distribution which we already know

b) How to robustly estimate parameters of a distribution

c) How to pick a good hypothesis from a pool of hypotheses

The Plan

Who Do We Want In Our Pool?

• Hypothesis: ( 𝑤, 𝜇1, 𝜎1, 𝜇2, 𝜎2)

• Need at least one “good” hypothesis

• Parameters are close to true parameters• Implies desired statistical distance bound

• Want:• 𝑤 − 𝑤 ≤ 휀

• 𝜇𝑖 − 𝜇𝑖 ≤ 휀𝜎𝑖• 𝜎𝑖 − 𝜎𝑖 ≤ 휀𝜎𝑖

Warm Up: Learning one Gaussian

• Easy!

• 𝜇 = sample mean

• 𝜎2 = sample variance

The Real Deal: Mixtures of Two Gaussians

• Harder!

• Sample moments mix up samples from each component

• Plan:• Tall, skinny Gaussian* stands out –

learn it first

• Remove it from the mixture

• Learn one Gaussian (easy?)

*Component with maximum 𝑤𝑖

𝜎𝑖

The First Component

• Claim: Using 𝑂12 sample size, can

generate 𝑂13 candidates

𝑤, 𝜇1, 𝜎1 , at least one is close to the taller component

• If we knew which candidate was right, could we remove this component?

Dvoretzky–Kiefer–Wolfowitz (DKW) inequality

• Using sample of size 𝑂12 from a distribution 𝑋, can output 𝑋 such

that 𝑑𝐾 𝑋, 𝑋 ≤ 휀

• Kolmogorov distance – CDFs of distributions are close in 𝐿∞distance• Weaker than statistical distance

• Works for any probability distribution!

Subtracting out the known component

PDF to CDF

DKW inequality

Subtract component

Monotonize

Warm Up (?): Learning one (almost) Gaussian

Robust Statistics

• Broad field of study in statistics

• Median

• Interquartile range

• Recover original parameters (approximately), even for distributions at distance 휀

• Entirely determined by the other component

• Still 𝑂13 candidates!

The Plan

Hypothesis Selection

• 𝑁 candidate distributions

• At least one is 휀-close to 𝑋 (in statistical distance)

• Goal: Return candidate which is 𝑂 휀 -close to 𝑋

𝑐휀

• Classical approaches [Yat85], [DL01]• Scheffé estimator, computation of the “Scheffé set” (potentially hard)

• 𝑂(𝑁2) time

• Acharya et al. [AJOS14]• Based on Scheffé estimator

• 𝑂(𝑁 log𝑁) time

• Our result [DK14]• General estimator, minimal access to hypotheses

• 𝑂(𝑁 log𝑁) time

• Milder dependence on error probability

• Input:• Sample access to 𝑋 and hypotheses ℋ = 𝐻1, … , 𝐻𝑁• PDF comparator for each pair 𝐻𝑖 , 𝐻𝑗• Accuracy parameter 휀 , confidence parameter 𝛿

• Output:• 𝐻 ∈ ℋ• If there is a 𝐻∗ ∈ ℋ such that 𝑑𝑠𝑡𝑎𝑡 𝐻

∗, 𝑋 ≤ 휀, then 𝑑𝑠𝑡𝑎𝑡 𝐻,𝑋 ≤ 𝑂(휀) with probability ≥1 − 𝛿

• Sample complexity: 𝑂log 1/𝛿

2 log𝑁

• Time complexity: 𝑂log 1/𝛿

2 𝑁 log𝑁 + log21

• Expected time complexity: 𝑂𝑁 log 𝑁/𝛿

• Naive: Tournament among candidate hypotheses; compare every pair; output hypothesis with most wins

• Us: Set up a single-elimination tournament• Issue: error doubles at every level of tree; log𝑁 levels →Ω(2log 𝑁 휀) error

• Better analysis via double-window argument:• Great hypotheses: those within 휀 of target

• Good hypotheses: those within 8휀 of target

• Bad hypotheses: the rest

• Show: if density of good hypotheses small, error propagation won’t happen

• If density large, sub-sample 𝑁 hypotheses; run naive tournament

Putting It All Together

• 𝑁 = 𝑂13 candidates

• Use hypothesis selection algorithm to pick one

• Sample complexity: 𝑂 log(1/𝛿) /휀2

• Time complexity: 𝑂 log3(1/𝛿) /휀5

Open Questions

• Faster algorithms for 2-GMMs

• Time complexity of k-GMMs

• High dimensions

Bibliography

• [AJOS14] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theerta Suresh. Near-optimal-sample estimators for spherical Gaussian mixtures.

• [CDSS14] Siu On Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Efficient Density Estimation via Piecewise Polynomial Approximation.

• [DL01] Luc Devroye and Gabor Lugosi. Combinatorial Methods in Density Estimation.

• [HP14] Moritz Hardt and Eric Price. Sharp bounds for learning a mixture of two Gaussians.

• [KMV10] Adam Kalai, Ankur Moitra, Gregory Valiant. Efficiently Learning Mixtures of Two Gaussians.

• [Yat85] Yannis Yatracos. Rates of convergence of minimum distance estimators and Kolmogorov’s entropy.

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Eﬃcient...

Documents

Transcript of Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Eﬃcient...

Lecture 2 Piecewise-linear optimization - Engineering ...vandenbe/ee236a/lectures/pwl.pdf · Lecture 2 Piecewise-linear optimization ... • equivalent LP ... modeling tools simplify

Orthonormal wavelet basesmdav.ece.gatech.edu/ece-6250-fall2017/notes/07-notes...piecewise-constant functions. Indeed we can, and it leads to a very rich family of orthonormal wavelet

RC and RL Circuits with Piecewise Constant Sources

BROWNIAN MOTION ON MATRICES...2011/04/14 · It has a piecewise polynomial density on this set. It is the image by an aﬃne map of Lebesgue measure on a convex polytope of dimension

Javier R. Movellan - University of California, San DiegoJavier R. Movellan 1 The Temporal (1-D) Gabor Filter Gabor ﬁlters can serve as excellent band-pass ﬁlters for unidimensional

Gabor Frames for Quasicrystals and K-theory

Gabor Filter

EE531 (Semester II, 2010) 14. Recursive Identiﬁcation Methodsjitkomut.eng.chula.ac.th/ee531/riden.pdf• update the model regularly • make use of previous calculations in an eﬃcient

Lipschitz stability for a piecewise linear Schro¨dinger ... · bootstrap argument introduced in [8] we eventually achieve the desired global Lipschitz stability. The outline of the

SPLINE-TYPE SPACES IN GABOR ANALYSIS › 069f › ef5269d0957afed... · 2018-12-11 · SPLINE-TYPE SPACES IN GABOR ANALYSIS H. G. FEICHTINGER Department of Mathematics, University

Approximation by Conic Splines - University of Groningengert/documents/2007/approximation-by-conics.pdf · curve by a piecewise cubic curve, ... aﬃne spiral and its optimal bitangent

Linear programming, width-1 CSPs, and robust satisfactionodonnell/papers/width1.pdf · Linear programming, width-1 CSPs, and robust satisfaction Gabor Kun Ryan O’Donnelly Suguru

Priors for Bayesian adaptive spline smoothing adaptive spline smoothing 579 Pintore et al. proposed a piecewise constant model for λ(t), which provides a con-venient computational

Nanotextured Spikes of α-Fe2O3/NiFe2O4 Composite for ... 142.pdf · 3 device with equal molar 1:1 ratio of NiFe 2 O 4 and Fe 2 O 3 was found to be highly eﬃcient for PEC water

Basic Math – Vectors & Linescs.queensu.ca/home/comp230/Lectures/Math-Intro/BasicMath-Part1.pdf · Title: Microsoft PowerPoint - BasicMath-Part1.ppt Author: gabor Created Date: 9/18/2009

Universidade Federal do Rio Grande do Sulmat.ufrgs.br/~alopes/dual-smania-rev-21-09-2009-13-00.pdf · PIECEWISE ANALYTIC SUBACTIONS FOR ANALYTIC DYNAMICS G. CONTRERAS (*), A. O. LOPES

Efficient, Correct Simulation of Biological Processes in the ... · biological systems, which differ from most computer systems in fundamental ways. One key difference is that biological

6. Flexible extensions of the LLMM - LMU München · PDF fileThis approximation gives a piecewise polynomial function of degree ... Overview Chapter 6 - Flexible extensions of the

Harmonic maps on domains with piecewise Lipschitz continuous

Eﬃcient computation of molecular response and excited ... · PDF file• Analytical expression for (ss|ss), higher l by analytical recursion (Obara-Saika) • Auxiliary expansion