ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...
Transcript of ElsaCazelles,jointworkwithJérémieBigot&NicolasPapadakis ...€¦ · Regularization of barycenters...
Regularization of barycenters in the Wasserstein space
Elsa Cazelles, joint work with Jérémie Bigot & Nicolas PapadakisUniversité de Bordeaux & CNRS
CMO Workshop - 30 April to 5 May 2017Optimal Transport meets Probability, Statistics and Machine Learning
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 1 / 20
For Ω ⊂ Rd , P2(Ω) is the set of probability measures.
DefinitionLet Pνn = 1
n∑n
i=1 δνi where δνi is the dirac distribution at νi ∈ P2(Ω). Wedefine the regularized empirical barycenter of the discrete measure Pνn as
µγPνn = argmin
µ∈P2(Ω)
1n
n∑i=1
W 22 (µ, νi ) + γE (µ)
where γ > 0 is a regularisation parameter and the penalty E is a proper,differentiable, lower semicontinuous and strictly convex function.
Case γ = 0 : Wasserstein barycenter of [Agueh and Carlier].
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 2 / 20
As an example, take E the negative entropy defined as
E (µ) = ∫
Ω f (x) log(f (x))dx , if µ admits a pdf f+∞ otherwise.
Advantage: It is possible to enforce the regularized barycenter to beabsolutely continuous with respect to the Lebesgue measure on Ω.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 3 / 20
Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn
1 Convergence to a population Wasserstein barycenter
2 Stability of the minimizer
3 Application to real and simulated data
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 4 / 20
Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn
We define the population Wasserstein barycenter defined as
µ0P ∈ argmin
µ∈P2(Ω)
∫W 2
2 (µ, ν)dP(ν),
and its regularized version
µγP = argminµ∈P2(Ω)
∫W 2
2 (µ, ν)dP(ν) + γE (µ).
where P is a probability measure on P2(Ω) and ν1, . . . , νn iid of law P.We recall that
µγPνn = argmin
µ∈P2(Ω)
1n
n∑i=1
W 22 (µ, νi ) + γE (µ)
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 5 / 20
Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn
The Bregman divergence DE associated to E is defined for two measuresµ, ζ as
DE (µ, ζ) := E (µ)− E (ζ)−∫
Ω∇E (ζ)(dµ− dζ)
where ∇E denotes the gradient of E .Thus the symmetric Bregman divergence dE is given by
dE (µ, ζ) := DE (µ, ζ) + DE (ζ, µ).
TheoremFor Ω compact in Rd and ∇E (µ0
P) bounded,
limγ→0
DE (µγP, µ0P) = 0,
which corresponds to showing that the squared bias term d2E (µγP, µ0
P) (asclassically referred to in nonparametric statistics) converges to zero whenγ → 0.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 6 / 20
Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn
TheoremIf Ω is a compact of Rd , then one has that
E(d2E (µγ
Pn, µγP)) ≤
C I(1,H)‖H‖L2(P)γ2n
where C is a positive constant,
H = hµ : ν ∈ P2(Ω) 7→W 22 (µ, ν) ∈ R;µ ∈ P2(Ω)
is a class of functions defined on P2(Ω) with envelope H, and
I(1,H) = supQ
∫ 1
0(1 + logN(ε‖H‖L2(Q),H, ‖ · ‖L2(Q))︸ ︷︷ ︸
Metric entropy
)12 dε
The metric entropy is of order 1εd .
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 7 / 20
Convergence to a population Wasserstein barycenter Consistency of the regularized barycenter µγPn
Remarks on the metric entropy.
Metric entropy of H↓
Metric entropy of P2(Ω)↓
Metric entropy of Ω↓
Compacity of Ω
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 8 / 20
Convergence to a population Wasserstein barycenter Complementary result in the 1D case
Theorem (1-D)When ν1, . . . ,νn are iid random measures with support included in acompact interval Ω,
E(d2
E
(µγPn, µγP
))≤ Cγ2n .
where C > 0 does not depend on n and γ.
Remark: Metric entropy of the space of quantiles.
It follows that if γ = γn is such that limn→∞ γ2nn = +∞ then
limn→∞
E(d2E
(µγPν
n, µ0
P))
= 0.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 9 / 20
Convergence to a population Wasserstein barycenter Complementary result with additional regularization
When d > 1, the class of functions H is too large, so we have to addregularity.For Ω smooth and uniformly convex, and (νi )i=1,...,n of law P. We specifythe penalty function E :
E (µ) = ∫
Rd f (x) log(f (x))dx + ‖f ‖Hk (Ω), if f = dµdλ and f > α
+∞ otherwise.
where ‖ · ‖Hk (Ω) designates the Sobolev norm associated to the L2(Ω)space and α > 0 is arbitrarily small.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 10 / 20
Convergence to a population Wasserstein barycenter Complementary result with additional regularization
Sobolev embedding theorem: Hk(Ω) is included in the Hölder spaceCm,β for β = k −m − d/2.Regularity on optimal maps is obtained from regularity on probabilitymeasures (e.g. [De Philippis and Figalli])
Hence we can bound the metric entropy [Van der Vaart] by
K(1ε
)afor any a ≥ d/(m + 1).
Hence, as soon as a/2 < 1, for which k > d − 1 is necessary, we get a rateof convergence.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 11 / 20
Stability of the minimizer
1 Convergence to a population Wasserstein barycenter
2 Stability of the minimizer
3 Application to real and simulated data
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 12 / 20
Stability of the minimizer
Stability of the minimizer for the symmetric Bregman distance dE .
TheoremLet ν1, . . . , νn and η1, . . . , ηn be two sequences of probability measures inP2(Ω). Let µγPνn and µγPηn be the regularized empirical barycentersassociated to the discrete measures Pνn and Pηn, then
dE(µγPνn , µ
γPηn
)≤ 2γn inf
σ∈Sn
n∑i=1
W2(νi , ησ(i)),
where Sn denotes the permutation group of the set of indices 1, . . . , n.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 13 / 20
Stability of the minimizer
Application: Let ν1, . . . , νn be n absolutely continuous probabilitymeasures and X = (X i ,j)1≤i≤n; 1≤j≤pi a dataset of random variables suchthat X i ,j ∼ νi . Then
E(d2
E
(µγPνn ,µ
γX
))≤ 4γ2n
n∑i=1
E(W 2
2 (νi ,νpi )),
where µγX is the random measure satisfying
µγX = argmin
µ∈P2(Ω)
1n
n∑i=1
W 22
µ, 1pi
pi∑j=1
δX i,j
+ γE (µ).
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 14 / 20
Application to real and simulated data Choice of the parameter γ
1 Convergence to a population Wasserstein barycenter
2 Stability of the minimizer
3 Application to real and simulated data
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 15 / 20
Application to real and simulated data 1-D case
We consider 1 ≤ i ≤ n = 100 random Gaussian distributions N (µi ,σ2i )
whereµi random uniform variable on [−2, 2]σ2
i random uniform variable on [0, 1].Then we generate (X ij)1≤i≤n;1≤j≤pi , 5 ≤ pi ≤ 10, random variables suchthat
X ij ∼ N (µi ,σ2i ) for each 1 ≤ i ≤ j .
Finally, let
ν i = 1pi
pi∑j=1
δX ij for each i
.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 16 / 20
Application to real and simulated data 1-D case
-4 -3 -2 -1 0 1 2 3 40
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
true barycenterkernel methodregularized barycenter
Dirichlet regularization-4 -3 -2 -1 0 1 2 3 4
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
true barycenterkernel methodregularized barycenter
Entropy regularization-4 -3 -2 -1 0 1 2 3 4
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
true barycenterkernel methodregularized barycenter
Dirichlet + Entropy regu-larization
Dashed and black curve = density of the population Wassersteinbarycenter.Blue and dotted curve = the smoothed Wasserstein barycenterobtained by a preliminary kernel smoothing step of the discretemeasures ν i that is followed by quantile averaging.Warn color = regularized Wasserstein barycenters for several gammaand regularization.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 17 / 20
Application to real and simulated data 2-D case
0 40 80
0
20
40
60
80
100
120
01/7/2014
0 40 80
0
20
40
60
80
100
120
01/13/2014
0 40 80
0
20
40
60
80
100
120
01/19/2014
0 40 80
0
20
40
60
80
100
120
01/23/2014
0 40 80
0
20
40
60
80
100
120
01/26/2014
0 40 80
0
20
40
60
80
100
120
01/29/2014
Figure: All crimes registered in the city of Chicago (i.e. an image 137× 88) for 6days of January 2014.
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data from the Chicago Police Department’s
CLEAR.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 18 / 20
Application to real and simulated data 2-D case
0 10 20 30 40 50 60 70 80 90
0
50
100
150
Regularized Wassersteinbarycenter (Dirichlet)
0 10 20 30 40 50 60 70 80 90
0
50
100
150
Kernel density estimatorgkde2
0 10 20 30 40 50 60 70 80 90
0
50
100
150
Kernel density estimatorkde2d
Figure: Location of Crimes in the city of Chicago during the month of January2014
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 19 / 20
Application to real and simulated data 2-D case
[1] Martial Agueh and Guillaume Carlier. Barycenters in the Wassersteinspace. SIAM Journal on Mathematical Analysis, 43(2):904-924, 2011.[2] Martin Burger, Marzena Franek and Carola-Bibiane Schönlieb.Regularized regression and density estimation based on optimal transport.Applied Mathematics Research eXpress, 2012(2):209-253, 2012.[3] Thibault Le Gouic and Jean-Michel Loubes. Existence and consistencyof Wasserstein barycenters. arXiv preprint arXiv:1506.04153, 2015.[4] Amir Beck and Marc Teboulle. A Fast Iterative Shrinkage-ThresholdingAlgorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences,2(1): 183-202, 2009.[5] Guido De Philippis and Alessio Figalli. The Monge–Ampère equationand its link to optimal transportation. Bulletin of the AmericanMathematical Society, 51(4), 527-580, 2014.[6] Frank Bauer and Axel Munk. Optimal regularization for ill-posedproblems in metric spaces. Journal of Inverse and Ill-posed Problems jiip,15(2), 137-148, 2007
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop 20 / 20
Lepskii balancing functional
Automatic selection of the parameter γ through an adaptation of Lepskiibalancing principal [Bauer and Munk].
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
12
14
Dirichlet10 20 30 40 50 60 70 80 90 100
0
5
10
15
20
25
Entropy10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
12
14
Dirichlet + Entropy
Figure: Balancing functional (times 10−6) in solid line anddE (µP, µ
1/λPn
)/minλ dE (µP, µ1/λPn
) (dotted line) are plotted as functions of λ for 3different regularizations.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop
Lepskii balancing functional
1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5×10
-3
Figure: Smooth balancing functional associated to regularized barycenters fordifferent value of λ.
Elsa Cazelles Regularization of barycenters in the Wasserstein space CMO Workshop