Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state...

37
Steven L. Scott Presented by Ahmet Engin Ural

Transcript of Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state...

Page 1: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Steven L. Scott

Presented by Ahmet Engin Ural

Page 2: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Overview of HMM Evaluating likelihoods

▪ The Likelihood Recursion▪ The Forward-Backward Recursion

Sampling HMM DG and FB samplers Autocovariance of samplers Some issues with samplers (in general)

Estimation Marginal MAP Size of the state space

Page 3: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

h (h1 … hn)

Q Qij = q (hi , hj ) (stationary)

π0 Initial state

θ P0 …. Ps-1

Page 4: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Sum over all possible hidden state sequences,

the probability of the observed generated by that

hidden state sequence

Page 5: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Forward variable:

Instead, likelihood recursion O(S2 n) steps

Page 6: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Forward recursion is as likelihood recursion Backward variable:

where

Transition probabilities, p( r - > s at time t | we observed until t)

Backward recursion

Page 7: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Forward recursion is as likelihood recursion Backward variable:

where

Transition probabilities, p( r - > s at time t | we observed until t)

Backward recursion

Page 8: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Forward recursion is as likelihood recursion Backward variable:

where

Transition probabilities, p( r - > s at time t | we observed until t)

Backward recursion

Page 9: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Direct Gibbs Sampling

?

Page 10: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Forward Backward Recursion sampling

At the forward step, the transition matrices, (P) are produced;

Page 11: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Forward Backward Recursion sampling

At the backward step, the state is sampled by

?

Page 12: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

.

T1 =

I ( h1, h1)

I ( h1, h2)

.

.

.

I ( hs, hs)

Page 13: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emissions.

Page 14: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ).

θ(τ)

Page 15: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ).

T(τ)

Page 16: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG: Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

Tj(τ)Tk

(τ - 1)

Page 17: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG: Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

Tk(τ - 1) cond indepTj

(τ) , if θ(τ) , Tp(τ-1) and Tl

(τ) are given.

Tj(τ)Tk

(τ - 1)

{ p > j } { l < j }

Page 18: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG : Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

FB DG

Page 19: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG : Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

Page 20: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG : Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

= 0 for FB

Page 21: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG : Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

same for FB and DG <=

Page 22: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

T is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1)

Let θ(τ) be the sufficient statistics for emission probabilities. FB: T(τ) is conditionally independent of T(τ+1) given θ(τ). DG : Tj

(τ - 1) is conditionally independent of Tk(τ) given θ(τ).

Page 23: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Label switching

1 0 0 11

0 1 1 00

Page 24: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Label switching

Implications

Solution: constraints

Page 25: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Label switching Collapsed states

May be evidence for over parameterizations

Priors

Page 26: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Stating πt(s) is usually sufficient.

Page 27: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Stating πt(s) is usually sufficient. Overall configuration may be needed;

Marginal Distributions

MAP estimates

Page 28: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Stating πt(s) is usually sufficient. Overall configuration may be needed;

Marginal Distributions

▪ Averaging over all runs (1 – m) with indicator function:

Page 29: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Stating πt(s) is usually sufficient. Overall configuration may be needed;

Marginal Distributions

▪ Averaging over all runs (1 – m) with indicator function

▪ Averaging over all runs (1 – m) probabilities

(Rao-Blackwellized estimate)

Page 30: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Stating πt(s) is usually sufficient. Overall configuration may be needed;

Marginal Distributions

▪ Averaging over all runs (1 – m) with indicator function

▪ Averaging over all runs (1 – m) probabilities

MAP estimate ( L = max p ( h, d | θ )

Page 31: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Stating πt(s) is usually sufficient. Overall configuration may be needed;

Marginal Distributions

▪ Averaging over all runs (1 – m) with indicator function

▪ Averaging over all runs (1 – m) probabilities

MAP estimate: to find ĥ

converges when it is same for all s in ht+1.

Page 32: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Calculating p(S | D)

Page 33: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Calculating p(S | D)

Page 34: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Calculating p(S | D) Schwartz criterion C(S):

Page 35: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Calculating p(S | D) Schwartz criterion C(S): Bayesian Information Criterion BIC:

p(S | D) – 2 C(S)

Page 36: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)

Calculating p(S | D) Schwartz criterion C(S): Bayesian Information Criterion BIC

Page 37: Steven L. Scott - Brown UniversityT is a vector that has sufficient statistics for state transitions. T(τ) is the set of all such vectors iteration τ. (T1 is for time 1) Let θ(τ)