ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...

22
Regularization of barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS CMO Workshop - 30 April to 5 May 2017 Optimal Transport meets Probability, Statistics and Machine Learning Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 1 / 20

Transcript of ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...

Page 1: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Regularization of barycenters in the Wasserstein space

Elsa Cazelles, joint work with Jérémie Bigot & Nicolas PapadakisUniversité de Bordeaux & CNRS

CMO Workshop - 30 April to 5 May 2017Optimal Transport meets Probability, Statistics and Machine Learning

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 1 / 20

Page 2: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

For Ω ⊂ Rd , P2(Ω) is the set of probability measures.

DefinitionLet Pνn = 1

n∑n

i=1 δνi where δνi is the dirac distribution at νi ∈ P2(Ω). Wedefine the regularized empirical barycenter of the discrete measure Pνn as

µγPνn = argmin

µ∈P2(Ω)

1n

n∑i=1

W 22 (µ, νi ) + γE (µ)

where γ > 0 is a regularisation parameter and the penalty E is a proper,differentiable, lower semicontinuous and strictly convex function.

Case γ = 0 : Wasserstein barycenter of [Agueh and Carlier].

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 2 / 20

Page 3: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

As an example, take E the negative entropy defined as

E (µ) = ∫

Ω f (x) log(f (x))dx , if µ admits a pdf f+∞ otherwise.

Advantage: It is possible to enforce the regularized barycenter to beabsolutely continuous with respect to the Lebesgue measure on Ω.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 3 / 20

Page 4: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn

1 Convergence to a population Wasserstein barycenter

2 Stability of the minimizer

3 Application to real and simulated data

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 4 / 20

Page 5: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn

We define the population Wasserstein barycenter defined as

µ0P ∈ argmin

µ∈P2(Ω)

∫W 2

2 (µ, ν)dP(ν),

and its regularized version

µγP = argminµ∈P2(Ω)

∫W 2

2 (µ, ν)dP(ν) + γE (µ).

where P is a probability measure on P2(Ω) and ν1, . . . , νn iid of law P.We recall that

µγPνn = argmin

µ∈P2(Ω)

1n

n∑i=1

W 22 (µ, νi ) + γE (µ)

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 5 / 20

Page 6: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn

The Bregman divergence DE associated to E is defined for two measuresµ, ζ as

DE (µ, ζ) := E (µ)− E (ζ)−∫

Ω∇E (ζ)(dµ− dζ)

where ∇E denotes the gradient of E .Thus the symmetric Bregman divergence dE is given by

dE (µ, ζ) := DE (µ, ζ) + DE (ζ, µ).

TheoremFor Ω compact in Rd and ∇E (µ0

P) bounded,

limγ→0

DE (µγP, µ0P) = 0,

which corresponds to showing that the squared bias term d2E (µγP, µ0

P) (asclassically referred to in nonparametric statistics) converges to zero whenγ → 0.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 6 / 20

Page 7: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn

TheoremIf Ω is a compact of Rd , then one has that

E(d2E (µγ

Pn, µγP)) ≤

C I(1,H)‖H‖L2(P)γ2n

where C is a positive constant,

H = hµ : ν ∈ P2(Ω) 7→W 22 (µ, ν) ∈ R;µ ∈ P2(Ω)

is a class of functions defined on P2(Ω) with envelope H, and

I(1,H) = supQ

∫ 1

0(1 + logN(ε‖H‖L2(Q),H, ‖ · ‖L2(Q))︸ ︷︷ ︸

Metric entropy

)12 dε

The metric entropy is of order 1εd .

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 7 / 20

Page 8: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn

Remarks on the metric entropy.

Metric entropy of H↓

Metric entropy of P2(Ω)↓

Metric entropy of Ω↓

Compacity of Ω

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 8 / 20

Page 9: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Complementary result in the 1D case

Theorem (1-D)When ν1, . . . ,νn are iid random measures with support included in acompact interval Ω,

E(d2

E

(µγPn, µγP

))≤ Cγ2n .

where C > 0 does not depend on n and γ.

Remark: Metric entropy of the space of quantiles.

It follows that if γ = γn is such that limn→∞ γ2nn = +∞ then

limn→∞

E(d2E

(µγPν

n, µ0

P))

= 0.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 9 / 20

Page 10: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Complementary result with additional regularization

When d > 1, the class of functions H is too large, so we have to addregularity.For Ω smooth and uniformly convex, and (νi )i=1,...,n of law P. We specifythe penalty function E :

E (µ) = ∫

Rd f (x) log(f (x))dx + ‖f ‖Hk (Ω), if f = dµdλ and f > α

+∞ otherwise.

where ‖ · ‖Hk (Ω) designates the Sobolev norm associated to the L2(Ω)space and α > 0 is arbitrarily small.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 10 / 20

Page 11: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Convergence to a population Wasserstein barycenter Complementary result with additional regularization

Sobolev embedding theorem: Hk(Ω) is included in the Hölder spaceCm,β for β = k −m − d/2.Regularity on optimal maps is obtained from regularity on probabilitymeasures (e.g. [De Philippis and Figalli])

Hence we can bound the metric entropy [Van der Vaart] by

K(1ε

)afor any a ≥ d/(m + 1).

Hence, as soon as a/2 < 1, for which k > d − 1 is necessary, we get a rateof convergence.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 11 / 20

Page 12: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Stability of the minimizer

1 Convergence to a population Wasserstein barycenter

2 Stability of the minimizer

3 Application to real and simulated data

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 12 / 20

Page 13: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Stability of the minimizer

Stability of the minimizer for the symmetric Bregman distance dE .

TheoremLet ν1, . . . , νn and η1, . . . , ηn be two sequences of probability measures inP2(Ω). Let µγPνn and µγPηn be the regularized empirical barycentersassociated to the discrete measures Pνn and Pηn, then

dE(µγPνn , µ

γPηn

)≤ 2γn inf

σ∈Sn

n∑i=1

W2(νi , ησ(i)),

where Sn denotes the permutation group of the set of indices 1, . . . , n.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 13 / 20

Page 14: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Stability of the minimizer

Application: Let ν1, . . . , νn be n absolutely continuous probabilitymeasures and X = (X i ,j)1≤i≤n; 1≤j≤pi a dataset of random variables suchthat X i ,j ∼ νi . Then

E(d2

E

(µγPνn ,µ

γX

))≤ 4γ2n

n∑i=1

E(W 2

2 (νi ,νpi )),

where µγX is the random measure satisfying

µγX = argmin

µ∈P2(Ω)

1n

n∑i=1

W 22

µ, 1pi

pi∑j=1

δX i,j

+ γE (µ).

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 14 / 20

Page 15: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Application to real and simulated data Choice of the parameter γ

1 Convergence to a population Wasserstein barycenter

2 Stability of the minimizer

3 Application to real and simulated data

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 15 / 20

Page 16: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Application to real and simulated data 1-D case

We consider 1 ≤ i ≤ n = 100 random Gaussian distributions N (µi ,σ2i )

whereµi random uniform variable on [−2, 2]σ2

i random uniform variable on [0, 1].Then we generate (X ij)1≤i≤n;1≤j≤pi , 5 ≤ pi ≤ 10, random variables suchthat

X ij ∼ N (µi ,σ2i ) for each 1 ≤ i ≤ j .

Finally, let

ν i = 1pi

pi∑j=1

δX ij for each i

.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 16 / 20

Page 17: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Application to real and simulated data 1-D case

-4 -3 -2 -1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

true barycenterkernel methodregularized barycenter

Dirichlet regularization-4 -3 -2 -1 0 1 2 3 4

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

true barycenterkernel methodregularized barycenter

Entropy regularization-4 -3 -2 -1 0 1 2 3 4

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

true barycenterkernel methodregularized barycenter

Dirichlet + Entropy regu-larization

Dashed and black curve = density of the population Wassersteinbarycenter.Blue and dotted curve = the smoothed Wasserstein barycenterobtained by a preliminary kernel smoothing step of the discretemeasures ν i that is followed by quantile averaging.Warn color = regularized Wasserstein barycenters for several gammaand regularization.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 17 / 20

Page 18: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Application to real and simulated data 2-D case

0 40 80

0

20

40

60

80

100

120

01/7/2014

0 40 80

0

20

40

60

80

100

120

01/13/2014

0 40 80

0

20

40

60

80

100

120

01/19/2014

0 40 80

0

20

40

60

80

100

120

01/23/2014

0 40 80

0

20

40

60

80

100

120

01/26/2014

0 40 80

0

20

40

60

80

100

120

01/29/2014

Figure: All crimes registered in the city of Chicago (i.e. an image 137× 88) for 6days of January 2014.

https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data from the Chicago Police Department’s

CLEAR.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 18 / 20

Page 19: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Application to real and simulated data 2-D case

0 10 20 30 40 50 60 70 80 90

0

50

100

150

Regularized Wassersteinbarycenter (Dirichlet)

0 10 20 30 40 50 60 70 80 90

0

50

100

150

Kernel density estimatorgkde2

0 10 20 30 40 50 60 70 80 90

0

50

100

150

Kernel density estimatorkde2d

Figure: Location of Crimes in the city of Chicago during the month of January2014

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 19 / 20

Page 20: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Application to real and simulated data 2-D case

[1] Martial Agueh and Guillaume Carlier. Barycenters in the Wassersteinspace. SIAM Journal on Mathematical Analysis, 43(2):904-924, 2011.[2] Martin Burger, Marzena Franek and Carola-Bibiane Schönlieb.Regularized regression and density estimation based on optimal transport.Applied Mathematics Research eXpress, 2012(2):209-253, 2012.[3] Thibault Le Gouic and Jean-Michel Loubes. Existence and consistencyof Wasserstein barycenters. arXiv preprint arXiv:1506.04153, 2015.[4] Amir Beck and Marc Teboulle. A Fast Iterative Shrinkage-ThresholdingAlgorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences,2(1): 183-202, 2009.[5] Guido De Philippis and Alessio Figalli. The Monge–Ampère equationand its link to optimal transportation. Bulletin of the AmericanMathematical Society, 51(4), 527-580, 2014.[6] Frank Bauer and Axel Munk. Optimal regularization for ill-posedproblems in metric spaces. Journal of Inverse and Ill-posed Problems jiip,15(2), 137-148, 2007

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 20 / 20

Page 21: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Lepskii balancing functional

Automatic selection of the parameter γ through an adaptation of Lepskiibalancing principal [Bauer and Munk].

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

2

4

6

8

10

12

14

Dirichlet10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

Entropy10 20 30 40 50 60 70 80 90 100

0

2

4

6

8

10

12

14

Dirichlet + Entropy

Figure: Balancing functional (times 10−6) in solid line anddE (µP, µ

1/λPn

)/minλ dE (µP, µ1/λPn

) (dotted line) are plotted as functions of λ for 3different regularizations.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop

Page 22: ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters in the Wasserstein space ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis

Lepskii balancing functional

1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5×10

-3

Figure: Smooth balancing functional associated to regularized barycenters fordifferent value of λ.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop