ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...

Regularization of barycenters in the Wasserstein space

Elsa Cazelles, joint work with Jérémie Bigot & Nicolas PapadakisUniversité de Bordeaux & CNRS

CMO Workshop - 30 April to 5 May 2017Optimal Transport meets Probability, Statistics and Machine Learning

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 1 / 20

For Ω ⊂ Rd , P2(Ω) is the set of probability measures.

DefinitionLet Pνn = 1

n∑n

i=1 δνi where δνi is the dirac distribution at νi ∈ P2(Ω). Wedefine the regularized empirical barycenter of the discrete measure Pνn as

µγPνn = argmin

µ∈P2(Ω)

1n

n∑i=1

W 22 (µ, νi ) + γE (µ)

where γ > 0 is a regularisation parameter and the penalty E is a proper,differentiable, lower semicontinuous and strictly convex function.

Case γ = 0 : Wasserstein barycenter of [Agueh and Carlier].


As an example, take E the negative entropy defined as

E (µ) = ∫

Ω f (x) log(f (x))dx , if µ admits a pdf f+∞ otherwise.

Advantage: It is possible to enforce the regularized barycenter to beabsolutely continuous with respect to the Lebesgue measure on Ω.


Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn

1 Convergence to a population Wasserstein barycenter

2 Stability of the minimizer

3 Application to real and simulated data



We define the population Wasserstein barycenter defined as

µ0P ∈ argmin

µ∈P2(Ω)

∫W 2

2 (µ, ν)dP(ν),

and its regularized version

µγP = argminµ∈P2(Ω)

∫W 2

2 (µ, ν)dP(ν) + γE (µ).

where P is a probability measure on P2(Ω) and ν1, . . . , νn iid of law P.We recall that

µγPνn = argmin

µ∈P2(Ω)

1n

n∑i=1

W 22 (µ, νi ) + γE (µ)



The Bregman divergence DE associated to E is defined for two measuresµ, ζ as

DE (µ, ζ) := E (µ)− E (ζ)−∫

Ω∇E (ζ)(dµ− dζ)

where ∇E denotes the gradient of E .Thus the symmetric Bregman divergence dE is given by

dE (µ, ζ) := DE (µ, ζ) + DE (ζ, µ).

TheoremFor Ω compact in Rd and ∇E (µ0

P) bounded,

limγ→0

DE (µγP, µ0P) = 0,

which corresponds to showing that the squared bias term d2E (µγP, µ0

P) (asclassically referred to in nonparametric statistics) converges to zero whenγ → 0.



TheoremIf Ω is a compact of Rd , then one has that

E(d2E (µγ

Pn, µγP)) ≤

C I(1,H)‖H‖L2(P)γ2n

where C is a positive constant,

H = hµ : ν ∈ P2(Ω) 7→W 22 (µ, ν) ∈ R;µ ∈ P2(Ω)

is a class of functions defined on P2(Ω) with envelope H, and

I(1,H) = supQ

∫ 1

0(1 + logN(ε‖H‖L2(Q),H, ‖ · ‖L2(Q))︸︷︷︸

Metric entropy

)12 dε

The metric entropy is of order 1εd .



Remarks on the metric entropy.

Metric entropy of H↓

Metric entropy of P2(Ω)↓

Metric entropy of Ω↓

Compacity of Ω


Convergence to a population Wasserstein barycenter Complementary result in the 1D case

Theorem (1-D)When ν1, . . . ,νn are iid random measures with support included in acompact interval Ω,

E(d2

E

(µγPn, µγP

))≤ Cγ2n .

where C > 0 does not depend on n and γ.

Remark: Metric entropy of the space of quantiles.

It follows that if γ = γn is such that limn→∞ γ2nn = +∞ then

limn→∞

E(d2E

(µγPν

n, µ0

P))

= 0.


Convergence to a population Wasserstein barycenter Complementary result with additional regularization

When d > 1, the class of functions H is too large, so we have to addregularity.For Ω smooth and uniformly convex, and (νi )i=1,...,n of law P. We specifythe penalty function E :

E (µ) = ∫

Rd f (x) log(f (x))dx + ‖f ‖Hk (Ω), if f = dµdλ and f > α

+∞ otherwise.

where ‖ · ‖Hk (Ω) designates the Sobolev norm associated to the L2(Ω)space and α > 0 is arbitrarily small.


Convergence to a population Wasserstein barycenter Complementary result with additional regularization

Sobolev embedding theorem: Hk(Ω) is included in the Hölder spaceCm,β for β = k −m − d/2.Regularity on optimal maps is obtained from regularity on probabilitymeasures (e.g. [De Philippis and Figalli])

Hence we can bound the metric entropy [Van der Vaart] by

K(1ε

)afor any a ≥ d/(m + 1).

Hence, as soon as a/2 < 1, for which k > d − 1 is necessary, we get a rateof convergence.


Stability of the minimizer






Stability of the minimizer for the symmetric Bregman distance dE .

TheoremLet ν1, . . . , νn and η1, . . . , ηn be two sequences of probability measures inP2(Ω). Let µγPνn and µγPηn be the regularized empirical barycentersassociated to the discrete measures Pνn and Pηn, then

dE(µγPνn , µ

γPηn

)≤ 2γn inf

σ∈Sn

n∑i=1

W2(νi , ησ(i)),

where Sn denotes the permutation group of the set of indices 1, . . . , n.



Application: Let ν1, . . . , νn be n absolutely continuous probabilitymeasures and X = (X i ,j)1≤i≤n; 1≤j≤pi a dataset of random variables suchthat X i ,j ∼ νi . Then

E(d2

E

(µγPνn ,µ

γX

))≤ 4γ2n

n∑i=1

E(W 2

2 (νi ,νpi )),

where µγX is the random measure satisfying

µγX = argmin

µ∈P2(Ω)

1n

n∑i=1

W 22

µ, 1pi

pi∑j=1

δX i,j

+ γE (µ).


Application to real and simulated data Choice of the parameter γ





Application to real and simulated data 1-D case

We consider 1 ≤ i ≤ n = 100 random Gaussian distributions N (µi ,σ2i )

whereµi random uniform variable on [−2, 2]σ2

i random uniform variable on [0, 1].Then we generate (X ij)1≤i≤n;1≤j≤pi , 5 ≤ pi ≤ 10, random variables suchthat

X ij ∼ N (µi ,σ2i ) for each 1 ≤ i ≤ j .

Finally, let

ν i = 1pi

pi∑j=1

δX ij for each i

.



-4 -3 -2 -1 0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

true barycenterkernel methodregularized barycenter

Dirichlet regularization-4 -3 -2 -1 0 1 2 3 4

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09


Entropy regularization-4 -3 -2 -1 0 1 2 3 4

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09


Dirichlet + Entropy regu-larization

Dashed and black curve = density of the population Wassersteinbarycenter.Blue and dotted curve = the smoothed Wasserstein barycenterobtained by a preliminary kernel smoothing step of the discretemeasures ν i that is followed by quantile averaging.Warn color = regularized Wasserstein barycenters for several gammaand regularization.



0 40 80

0

20

40

60

80

100

120

01/7/2014

0 40 80

0

20

40

60

80

100

120

01/13/2014

0 40 80

0

20

40

60

80

100

120

01/19/2014

0 40 80

0

20

40

60

80

100

120

01/23/2014

0 40 80

0

20

40

60

80

100

120

01/26/2014

0 40 80

0

20

40

60

80

100

120

01/29/2014

Figure: All crimes registered in the city of Chicago (i.e. an image 137× 88) for 6days of January 2014.

https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data from the Chicago Police Department’s

CLEAR.



0 10 20 30 40 50 60 70 80 90

0

50

100

150

Regularized Wassersteinbarycenter (Dirichlet)

0 10 20 30 40 50 60 70 80 90

0

50

100

150

Kernel density estimatorgkde2

0 10 20 30 40 50 60 70 80 90

0

50

100

150

Kernel density estimatorkde2d

Figure: Location of Crimes in the city of Chicago during the month of January2014



[1] Martial Agueh and Guillaume Carlier. Barycenters in the Wassersteinspace. SIAM Journal on Mathematical Analysis, 43(2):904-924, 2011.[2] Martin Burger, Marzena Franek and Carola-Bibiane Schönlieb.Regularized regression and density estimation based on optimal transport.Applied Mathematics Research eXpress, 2012(2):209-253, 2012.[3] Thibault Le Gouic and Jean-Michel Loubes. Existence and consistencyof Wasserstein barycenters. arXiv preprint arXiv:1506.04153, 2015.[4] Amir Beck and Marc Teboulle. A Fast Iterative Shrinkage-ThresholdingAlgorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences,2(1): 183-202, 2009.[5] Guido De Philippis and Alessio Figalli. The Monge–Ampère equationand its link to optimal transportation. Bulletin of the AmericanMathematical Society, 51(4), 527-580, 2014.[6] Frank Bauer and Axel Munk. Optimal regularization for ill-posedproblems in metric spaces. Journal of Inverse and Ill-posed Problems jiip,15(2), 137-148, 2007


Lepskii balancing functional

Automatic selection of the parameter γ through an adaptation of Lepskiibalancing principal [Bauer and Munk].

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

2

4

6

8

10

12

14

Dirichlet10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

Entropy10 20 30 40 50 60 70 80 90 100

0

2

4

6

8

10

12

14

Dirichlet + Entropy

Figure: Balancing functional (times 10−6) in solid line anddE (µP, µ

1/λPn

)/minλ dE (µP, µ1/λPn

) (dotted line) are plotted as functions of λ for 3different regularizations.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop

Lepskii balancing functional

1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5×10

-3

Figure: Smooth balancing functional associated to regularized barycenters fordifferent value of λ.

Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop

ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...

Documents

Transcript of ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...