Download - Non-informative reparametrisation for location-scale mixtures

Page 1: Non-informative reparametrisation for location-scale mixtures

Non-informative reparametrisations for location-scale mixturesKaniav Kamary1, Kate Lee2, Christian P. Robert1,31CEREMADE, Université Paris–Dauphine, Paris 2Auckland University of Technology, New Zealand 3Dept. of Statistics, University ofWarwick, and CREST, Paris


Traditional definition of mixture density:

f (x ∣θ,p) =k

i=1pif (x ∣θi)


i=1pi = 1 . (1)

which gives a separate meaning to each component.For the location-scale Gaussian mixture:

f (x ∣θ,p) =k

i=1piN (x ∣µi, σi)

Mengersen and Robert (1996) [2] established that an improper prior on (µ1, σ1) leads to a proper priorwhen

µi = µi−1 + σi−1δi and σi = τiσi−1, τi < 1.Diebolt and Robert (1994) [3] discussed the alternative approach of imposing proper posteriors onimproper priors by banning almost empty components from the likelihood function.

Setting global mean and variance Eθ,p(X) = µ and varθ,p(X) = σ2, imposes natural constraints on thecomponent parameters;

µ =


i=1piµi ; σ2




2i +



2i − µ

2; Eθ,p(X 2) =



2i +




which implies that (µ1, . . . , µk , σ1, . . . , σk) belongs to a specific ellipse.

New reparametrisation: Modifying the parameterization of the location-scale mixture in terms ofthe global mean and variance of the mixture distribution.


f (x ∣θ,p) =k

i=1pif (x ∣µ + σγi/

pi, σηi/√

pi) , (2)

leads a parameter space such that (p1, . . . ,pk , γ1, . . . , γk , η1, . . . , ηk) is constrained by

pi, ηi ≥ 0 (1 ≤ i ≤ k)


i=1pi = 1



piγi = 0k


i + γ2i } = 1.

which implies ∀i 0 ≤ pi ≤ 1 , 0 ≤ γi ≤ 1 , 0 ≤ ηi ≤ 1. The constraints lead that (γ1, . . . , η) belongs to anhypersphere of R2k centered at the origin with the radius of r = 1 intersected with an hyperplane of thisspace passing the origin that results in a circle centered at the origin with radius 1.

Spherical coordinate representation of γ’s:Suppose that ∑k

i=1 γ2i = ϕ

2. The vector γ belongsboth to the hypersphere of radius ϕ and to thehyperplane orthogonal to

pi ; i = 1, . . . ,k .s-th orthogonal base Λs:

Λ1,j =


p2, j = 1√

p1, j = 20, j > 2

s-th vector is given by

Λs,j =




sl=1 pl)

1/2, s > 1, j ≤ s


sl=1 pl)

1/2, s > 1, j = s + 1

0, s > 1, j > s + 1

and s-th orthonormal base is Fs = Λs/∣Λs∣.Figure: Image from Robert Osserman.

(γ1, . . . , γk) can be written as

(γ1, . . . , γk) = ϕcos($1)F1 + ϕsin($1)cos($2)F2 + . . . + ϕsin($1)⋯sin($k−2)Fk−1

with the angles $1, . . . ,$k−3 in [0, π] and $k−2 in [0,2π].

Foundational consequences: The restriction is compact and helpful in selecting improper andnon-informative priors over mixtures.

Prior modeling:Global mean and variance: The posterior distribution associated with the prior π(µ,σ) = 1/σ is properwhen (a) proper distributions are used on the other parameters and (b) there are at least twoobservations in the sample.Component weights: (p1, . . . ,pk) ∼ Dir(α0, . . . , α0),Angles $’s: $1, . . . ,$k−3 ∼ U[0, π] and $k ∼ U[0,2π],Raduis ϕ and η1, . . . , ηk: If k is small, (ϕ2, η2

1, . . . , η2k) ∼ Dir(α, . . . , α) while for k more than 3, (η1, . . . , ηk)

is written through spherical coordinates

ηi =


1 − ϕ2 cos(ξi) , i = 1√

1 − ϕ2i−1

j=1sin(ξj)cos(ξi) , 1 < i < k

1 − ϕ2i−1

j=1sin(ξj) , i = k

Unlike $, the support for all angles ξ1,⋯, ξk−1 is limited to [0, π/2], due to the positivity requirement onthe ηi ’s.

(ξ1,⋯, ξk−1) ∼ U([0, π/2]k−1).

MCMC algorithm

Metropolis-within-Gibbs algorithm for reparameterised mixture model:1 Generate initial values (µ(0), σ(0),p(0), ϕ(0), ξ

(0)1 , . . . , ξ


(0)1 , . . . ,$


2 For t = 1, . . . ,T , the update of (µ(t), σ(t),p(t), ϕ(t), ξ(t)1 , . . . , ξ


(t)1 , . . . ,$


follows;2.1 Generate a proposal µ′ ∼ N (µ(t−1), εµ) and update µ(t) against

π(⋅∣x , σ(t−1),p(t−1), ϕ(t−1), ξ(t−1),$(t−1)).

2.2 Generate a proposal log(σ)′ ∼ N (log(σ(t−1)), εσ) and update σ(t) against

π(⋅∣x , µ(t),p(t−1), ϕ(t−1), ξ(t−1),$(t−1)).

2.3 Generate a proposal (ϕ2)′∼ Beta((ϕ2

)(t)εϕ + 1, (1 − (ϕ2


)εϕ + 1) and update ϕ(t) against

π(⋅∣x , µ(t), σ(t),p(t−1), ξ(t),$(t)).

2.4 Generate a proposal p′∼ Dir(p(t−1)

1 εp + 1, . . . ,p(t−1)k εp + 1), and update p(t) against

π(⋅∣x , µ(t), σ(t), ϕ(t), ξ(t),$(t)).

2.5 Generate proposals ξ′i ∼ U[ξ(t)i − εξ, ξ

(t)i + εξ], i = 1,⋯,k − 1, and update (ξ

(t)1 , . . . , ξ

(t)k−1) against

π(⋅∣x , µ(t), σ(t),p(t), ϕ(t),$(t)).

2.6 Generate proposals $′

i ∼ U[$(t)i − ε$,$

(t)i + ε$], i = 1,⋯,k − 2, and update ($

(t)1 , . . . ,$

(t)k−2) against

π(⋅∣x , µ(t), σ(t),p(t), ϕ(t), ξ(t)).

where p(t)= (p(t)

1 , . . . ,p(t)k ), x = (x1, . . . ,xn), ξ(t) = (ξ

(t)1 , . . . , ξ

(t)k−1) and $(t)

= ($(t)1 , . . . ,$


Ultimixt package

▸ Implementation of the Metropolis-within-Gibbs algorithm for reparametrized mixture distribution;▸ Calibrate the scales of the various proposals by aiming an average acceptance rate of either 0.44 or 0.234

depending on the dimension of the simulated parameter;▸ Accurately estimate the component parameters;

Point estimator of the component parameters in the case of label switching:▸ K-means clustering algorithm;▸ Reordering labels towards producing the shortest distance between the current posterior sample and the

(or a) maximum posterior probability (MAP) estimate; [1].

Mixture of two normal distributions

A sample of size 50 simulated from .65N (−8,2) + .35N (−.5,1),

Figure: Empirical densities of 10 sequences of running Metropolis-within-Gibbs algorithm in parallel with 2e + 05 iterations.

▸ Outcomes of 10 parallel chains startedrandomly from different starting values,are indistinguishable;

▸ Chains are well-mixed;▸ Sampler output covers the entire

sample space;▸ Estimated densities converge to a

neighborhood of the true values;▸ Estimated mixture density is remarkably


Mixture of three normal distributions

A sample of size 50 is simulated from model .27N (−4.5,1) + .4N (10,1) + .33N (3,1)

Figure: Sequences of µi , σi and pi and estimated mixture density; mixture density estimate based on 104 MCMC iterations

Overfitting case

Extreme valued posterior samples for an overfitted model.

Galaxy dataset: Point estimator of the parameters of a mixture of (Left) 6 components; (Right) 4 components.


[1] S. Früwirth. Schnatter. (2001). Markov chain Monte Carlo estimation of classical and dynamic switchingand mixture models. J. American Statist. Assoc., 96 194–209.

[2] K. Mengersen and C. Robert. (1996) Testing for mixtures: A Bayesian entropic approach (with discussion).In Bayesian Statistics 5 (J. Berger, J. Bernardo, A. Dawid, D. Lindley and A. Smith, eds). Oxford UniversityPress, Oxford, 255–276.

[3] J. Diebolt and C. Robert. (1994) Estimation of finite mixture distributions by Bayesian sampling. J. RoyalStatist. Society Series B, 56 363–375.

[email protected]