Statistical Theory 550.630 Solution to HW5 · Statistical Theory 550.630 Solution to HW5 PROBLEM 1....

3

Click here to load reader

Transcript of Statistical Theory 550.630 Solution to HW5 · Statistical Theory 550.630 Solution to HW5 PROBLEM 1....

Page 1: Statistical Theory 550.630 Solution to HW5 · Statistical Theory 550.630 Solution to HW5 PROBLEM 1. Here we show a proof of (2.4.8) for the case of discrete random variables. The

Statistical Theory 550.630

Solution to HW5

PROBLEM 1. Here we show a proof of (2.4.8) for the case of discrete random variables. The proof is similarfor the continuous case under the assumption that q(s, θ) is positive.

Eθ0(p(X, θ)

p(X, θ0)|S(X) = s

)=

∑x∈X

p(x, θ)

p(x, θ0)

p(x, s, θ0)

p(s, θ0)=∑x∈X

p(x, θ)

p(x, θ0)

p(x, s, θ0)

q(s, θ0)=∑x∈X

p(x, θ)

p(x, θ0)

p(x, θ0)1(S(x) = s)

p(s, θ0)

=∑x∈X

p(x, θ)1(S(x) = s)

q(s, θ0)=∑x∈X

p(x, s, θ)

q(s, θ0)=

q(s, θ)

q(s, θ0)(1)

Using the above result we have

∂θlog(q(s, θ))|θ=θ0 =

∂θ(log(q(s, θ)− log(q(s, θ0))))|θ=θ0 =

∂θlog

(q(s, θ)

q(s, θ0)

)|θ=θ0 =

=∂

∂θlog

(Eθ0

(p(X, θ)

p(X, θ0)|S(X) = s

))|θ=θ0 =

∂∂θEθ0

(p(X,θ)p(X,θ0)

|S(X) = s)|θ=θ0

Eθ0(p(X,θ)p(X,θ0)

|S(X) = s)|θ=θ0

=

=Eθ0

(∂∂θ p(X,θ)

p(X,θ0)|S(X) = s

)|θ=θ0

Eθ0(p(X,θ0)p(X,θ0)

|S(X) = s) =

Eθ0(∂∂θ log(p(X, θ))|S(X) = s

)|θ=θ0

Eθ0 (1|S(X) = s)=

= Eθ0(∂

∂θlog(p(X, θ))|S(X) = s

)|θ=θ0 (2)

PROBLEM 2.We have X1, ..., Xn i.i.d with probability density function

fµ(x) =1

2ϕ(x; 0, 1) +

1

2ϕ(x;µ, 1) =

1

2√

2πe−x

2/2 +1

2√

2πe−(x−µ)

2/2. (3)

To derive the method of moments estimator for this model we consider the first moment:

EX =

∫ ∞−∞

x

2√

2πe−x

2/2 +x

2√

2πe−(x−µ)

2/2dx = µ/2. (4)

Since µ = 2EX, a method of moments estimator for µ is µ̂mom = 2n

∑ni=1Xi.

Obtaining a closed form solution for the maximum likelihood estimator for µ seems intractable so weresort to the EM algorithm. We consider the auxilliary variable Z ∼ Bernoulli(1/2) and we observe thatX ∼ N (Zµ, 1). The E-step consists of defining the function

J(µ|µ0) = Eµ0(log p(X,Z, µ)|X = x) = Eµ0

(log

n∏i=1

p(Xi, Zi, µ)|X = x

)=

n∑i=1

Eµ0(log p(Xi, Zi, µ)|Xi = xi) =

=

n∑i=1

Eµ0(− log(√

2π)− (Xi − Ziµ)2/2|Xi = xi) = const−n∑i=1

Eµ0((Xi − Ziµ)2/2|Xi = xi) =

= const−n∑i=1

n∑zj=0,1

(xi − zjµ)2

2pZi|Xi(zi, xi, µ0) = const−

n∑i=1

∑zj=0,1

−2xizjµ+ z2i µ2

2pZi|Xi(zi, xi, µ0) =

= const−n∑i=1

−2xiµ+ µ2

2pZi|Xi(zi, xi, µ0). (5)

1

Page 2: Statistical Theory 550.630 Solution to HW5 · Statistical Theory 550.630 Solution to HW5 PROBLEM 1. Here we show a proof of (2.4.8) for the case of discrete random variables. The

The M-step is performed by setting the derivative of J(µ|µ0) to zero to obtain:

µ =

∑ni=1 xipZi|Xi(zi, xi, µ0)∑ni=1 pZi|Xi(zi, xi, µ0)

. (6)

The steps of the algorithm are iterated over time:

1. qi = ϕ(xi,µ0,1)ϕ(xi,0,1)+ϕ(xi,µ0,1)

2. µmle =∑ni=1 qixi∑ni=1 qi

3. µ0 = µmle

A suitable starting guess for the µ can be either 0 or µ̂mom.We perform a Monte Carlo experiment in which the behavior of the estimators is analyzed. We simulate

data from the above Gaussian mixture model with µ = 1. We compute the MoM and MLE (EM with100 iterations and an initial value of zero) estimates for a sample of size N , where we consider N to be10, 50, 100, 500, 1000, 5000, 10000. The method of moments is a direct method which is favorable in terms ofcomputational speed. The EM algorithm provides estimates closer to the true one, but is slower. To speedup its performance a good starting point should be selected. These results show that the method of momentsestimate is a good candidate.

2

Page 3: Statistical Theory 550.630 Solution to HW5 · Statistical Theory 550.630 Solution to HW5 PROBLEM 1. Here we show a proof of (2.4.8) for the case of discrete random variables. The

Figure 1: Here we display the final estimates for µ as a function of sample size for each experiment. Asexpected, the variance decreases with the increase of sample size. For very small sample size the method ofmoments has smaller variance, while “asymptotically” the variation of the EM algorithm results is smaller.

Figure 2: This graph shows the mean squared error for both methods: MoM in blue and MLE in red, whichindicates the EM algorithm gives better estimates for small sample size (for large sample the two methodsbehave similarly).

3