A derivation of Eqs. (4) and (5) of
Sparse stochastic inference for latent Dirichlet allocation
Tomonari MASADA @ Nagasaki University
May 23, 2013
The evidence can be written as follows:
p(w|α, η) =
∫ ∑z
∏k
p(βk|η)∏d
{p(θd|α)p(zd|θd)p(w|zd,β)
}dθdβ . (1)
By integrating θ out, we have
p(w|α, η) =
∫ ∑z
∏k
p(βk|η)∏d
{p(zd|α)p(w|zd,β)
}dβ . (2)
By applying Jensen’s inequality, we have a lower bound of the evidence as follows:
log p(w|α, η) = log
∫ ∑z
∏k
p(βk|η)∏d
p(zd|α)p(w|zd,β)dβ
= log
∫ ∑z
(∏d
q(zd)∏k
q(βk)
∏k p(βk|η)
∏d p(zd|α)p(w|zd,β)∏
d q(zd)∏k q(βk)
)dβ
≥∫ ∑
z
∏d
q(zd)∏k
q(βk) log
(∏k p(βk|η)
∏d p(zd|α)p(w|zd,β)∏
d q(zd)∏k q(βk)
)dβ
=∑k
∫q(βk) log p(βk|η)dβk +
∑d
∑zd
q(zd) log p(zd|α)
+
∫ ∏k
q(βk)∑d
∑zd
q(zd) log p(w|zd,β)dβ +H(q)
=∑k
∫q(βk) log p(βk|η)dβk +
∑d
∑zd
q(zd) log p(zd|α)
+
∫ ∏k
q(βk)∑d
∑zd
q(zd)
Nd∑i=1
log βzdiwdidβ +H(q) , (3)
whereH(q) =∑k
∫q(βk) log q(βk)dβk+
∑d
∑zdq(zd) log q(zd). Let L denote the lower bound in Eq. (3).
By picking up the terms related to zd from L, we define Lzdas follows:
Lzd=∑zd
q(zd) log p(zd|α) +
∫ ∏k
q(βk)∑zd
q(zd)
Nd∑i=1
log βzdiwdidβ −
∑zd
q(zd) log q(zd) . (4)
We obtain a functional derivative of Lzdwith respect to q(z′d) as follows:
δLzd
δq(z′d)= limε→0
∑zd{q(zd) + εδ(zd − z′d)} log p(zd|α)−
∑zdq(zd) log p(zd|α)
ε
+ limε→0
∫ ∏k q(βk)
∑zd
[{q(zd) + εδ(zd − z′d)}
∑Nd
i=1 log βzdiwdi−∑zdq(zd)
∑Nd
i=1 log βzdiwdi
]dβ
ε
− limε→0
∑zd{q(zd) + εδ(zd − z′d)} log{q(zd) + εδ(zd − z′d)} −
∑zdq(zd) log q(zd)
ε, (5)
1
where
limε→0
∑zd{q(zd) + εδ(zd − z′d)} log{q(zd) + εδ(zd − z′d)} −
∑zdq(zd) log q(zd)
ε
= limε→0
∑zdq(zd) log
q(zd)+εδ(zd−z′d)
q(zd) +∑zdεδ(zd − z′d) log{q(zd) + εδ(zd − z′d)}ε
= limε→0
∑zdq(zd)
{ εδ(zd−z′d)
q(zd) +O(ε2)}
ε+ limε→0
∑zd
δ(zd − z′d) log{q(zd) + εδ(zd − z′d)}
=∑zd
δ(zd − z′d) +∑zd
δ(zd − z′d) log q(zd) = 1 + log q(z′d) . (6)
Therefore,
δLzd
δq(z′d)= log p(z′d|α) +
∫ ∏k
q(βk)
Nd∑i=1
log βz′diwdidβ − 1− log q(z′d) (7)
By solvingδLzdδq(z′
d) = 0, we obtain
q(zd) ∝ p(zd|α) · exp
(∫ ∏k
q(βk)
Nd∑i=1
log βzdiwdidβ
)
= p(zd|α) ·Nd∏i=1
exp
(∫ ∏k
q(βk) log βzdiwdidβ
)∝ Γ(Kα)
Γ(Kα+Nd)
∏k
Γ(α+∑i Izdi=k)
Γ(α)×∏i
exp
(Eq[
log βzdiwdi
])(8)
We assume that q(βk) =Γ(
∑w λkw)∏
w Γ(λkw)
∏w β
λkw−1kw . By picking up the terms related to λ from L, we
define Lλ as follows:
Lλ =∑k
∫q(βk) log p(βk|η)dβk +
∫ ∏k
q(βk)∑d
∑zd
q(zd)
Nd∑i=1
log βzdiwdidβ
−∑k
∫q(βk) log q(βk)dβk . (9)
Each term in Eq. (9) can be rewritten as below.∫q(βk) log p(βk|η)dβk = log Γ(Wη)−
∑w
log Γ(η) +∑w
(η − 1){
Ψ(λkw)−Ψ(∑w
λkw)}
(10)
∫q(βk) log q(βk)dβk = log Γ(
∑w
λkw)−∑w
log Γ(λkw) +∑w
(λkw − 1){
Ψ(λkw)−Ψ(∑w
λkw)}
(11)
∫ ∏k
q(βk)∑d
∑zd
q(zd)
Nd∑i=1
log βzdiwdidβ
=
∫ ∏k
q(βk)∑d
∑zd
q(zd)
Nd∑i=1
∑k
∑w
{I(zdi = k,wdi = w) · log βkw
}dβ
=
∫ ∏k
q(βk)∑k
∑w
log βkw
{∑d
∑zd
q(zd)
Nd∑i=1
I(zdi = k,wdi = w)}dβ
=∑k
∑w
{Ψ(λkw)−Ψ(
∑w
λkw)}{∑
d
∑zd
q(zd)
Nd∑i=1
I(zdi = k,wdi = w)}
(12)
2
Therefore,
∂Lλ∂λkw
=(η − λkw +
∑d
∑zd
q(zd)
Nd∑i=1
I(zdi = k,wdi = w))
Ψ′(λkw)
−(η − λkw +
∑d
∑zd
q(zd)
Nd∑i=1
I(zdi = k,wdi = w))
Ψ′(∑w
λkw) . (13)
By solving ∂Lλ∂λkw
= 0, we obtain
λkw = η +∑d
∑zd
q(zd)
Nd∑i=1
I(zdi = k,wdi = w) . (14)
3
Top Related