Identification of Static and Dynamic Models of Strategic …doubleh/eco273B/dynunobshet.pdf ·...
Transcript of Identification of Static and Dynamic Models of Strategic …doubleh/eco273B/dynunobshet.pdf ·...
Identification of Static and Dynamic Models ofStrategic Interactions
Lecture notes by Han Hong
Department of EconomicsStanford University
5th June 2007
Identifying Dynamic Discrete Decision Processes: two periods
• Thierry Magnac and David Thesmar (2002)
• Two period model.
• i ∈ I = 1, . . . ,K.• State variable h = (x , ε).
• ε = ε1, . . . , εK.• Period 1 utility: ui (x , ε).
• Next period state variables:
h′ =(x ′, ε′
)drawn conditional on h = (x , ε).
Assumptions
• Additive separability
∀i ∈ I , ui (x , ε) = u∗i (x) + εi ,
ε is independent of x .
• Conditional independence: Random preference shocks at twoperiods ε′ and ε are independent and indepedent of x andd = i .
• Discrete support: The support of first period state variables x(resp. second period x ′) is X (resp. X ′). The joint supportX = X ∪ X ′ is discrete and finite, i.e.,
X = x1, . . . , x#X
• Transition matrix of the (x , ε) process
P(h′|h, d
)= P
(x ′, ε′|x , d
)= G
(ε′)P
(x ′|x , d
)• Bellman equation
vi (x , ε) = ui (x) + εi + βE
(max
jvj
(x ′, ε′
)|x , d = i
)• Decompose
vi (x , ε) = v∗i (x) + εi
where
v∗i (x) = u∗i (x) + βE
(max
j
(v∗j
(x ′
)+ ε′j
)|x , d = i
)
• What can be recovered from the data:
∀(d , d ′) ∈ I 2,∀
(x , x ′
)∈ X × X ′,
P(d ′, x ′, d |x
)= P
(d ′|x ′
)P
(x ′|x , d
)P (d |x)
• P (x ′|x , d) is nonparametrically specified and is exactlyidentified from the data.
• Can P (d ′|x ′) and P (d |x) be used to identify the structuralparameters:
b = u∗1 (X ) , . . . , u∗K (X ) , v∗1
(X ′) , . . . , v∗
K
(X ′) ,G , β
where f (X ) is a short-cut for f (x) ,∀x ∈ X. For example,
u∗1 (X ) = u∗1 (x) ,∀x ∈ X.
• The probability that the agent chooses d = i given thestructure b and observable state variable x :
∀ (x , i) ∈ X × I
pi (x ; b) = P
(v∗i (x ; b) + εi = max
j
(v∗j (x ; b) + εj
)|x , b
).
• Definition of identification
• Observable vector of choice probabilities:
p (x) = (p1 (x) , . . . , pK (x))
• Number of observable choice probabilities and number ofparameters that can be identified.
• Mapping between the observable choice probabilities and thevector of value functions:
∀ (x , i) ∈ X × I , v∗i (x) = v∗
K (x) + qi (p (x) ;G )
• There is no loss of generality in setting v∗K (X ′) = 0. Then for
i = 1, . . . ,K − 1:
v∗i
(X ′) = qi
(p
(X ′) ;G
)• There is also no loss of generality setting u∗K (x) = 0.
• Expected second period value function: for
v∗ (x ′
)= v∗
1
(x ′
), . . . , v∗
K
(x ′
)
DefineR
(v∗ (
x ′);G
)= EG max
i∈I
(v∗i
(x ′
)+ εi
)Decompose the first period total utility function
v∗i (x) = u∗i (x) + βE
[R
(v∗ (
x ′);G
)|x , d = i
]• Recall
qi (x) = v∗i (x)− v∗
K (x)
Now
v∗i (x) = ui (x) + βE
[R
(v∗ (
x ′);G
)|x , d = i
]and since u∗K (x) = 0:
v∗K (x) = βE
[R
(v∗ (
x ′);G
)|x , d = K
]
• Combine these relations:
qi (x) =v∗i (x)− v∗
K (x)
=u∗i (x) + βE[R
(v∗ (
x ′);G
)|x , d = i
]− βE
[R
(v∗ (
x ′);G
)|x , d = K
]• Therefore u∗i (x) can be recovered by
u∗i (x) = qi (x)− βE [R (v∗ (x ′) ;G ) |x , d = i ]
+βE [R (v∗ (x ′) ;G ) |x , d = K ]
• So given β, G , u∗K (x) = 0, v∗K (x ′) = 0, the other utility
functions u∗i (x) , i = 1, . . . ,K − 1 and the other second periodvalue functions v∗
i (x ′) , i = 1, . . . ,K − 1 can be identified.
• Estimation follows from identification.
• Exclusion restrictions might identify β: if
∃ (x1, x2) ∈ X 2,∃i ∈ I , such that x1 6= x2, u∗i (x1) = u∗i (x2)
but x1 and x2 still generate different transition probabilitesP (x ′|x , d), then qi (x1) should be different from qi (x2).
• Identifying β:
qi (x1)− βE[R
(v∗ (
x ′);G
)|x1, d = i
]+ βE
[R
(v∗ (
x ′);G
)|x1, d = K
]−
qi (x2)− βE
[R
(v∗ (
x ′);G
)|x2, d = i
]+ βE
[R
(v∗ (
x ′);G
)|x2, d = K
]= 0.
• Parametric restriction: They said it can be used to identify G.Is this true?
Single agent dynamic discrete choice model: infinite horizon
• Players are forward looking.
• Infinite Horizon, Stationary, Markov Transition
• Now players maximize expected discounted utility usingdiscount factor β.
Wi (s, εi ; σ) = maxai∈Ai
Πi (ai , s) + εi (ai )
+ β
Z Xa−i
Wi (s′, ε′i ; σ)g(s ′|s, ai , a−i )σ−i (a−i |s)f (ε′i )dε′i
ff
• Definition: A Markov Perfect Equilibrium is a collection ofδi (s, εi ), i = 1, ..., n such that for all i , all s and all εi , δi (s, εi )maximizes Wi (s, εi ; σi , σ−i ).
• Conditional independence:
• ε distributed i.i.d. over time.
• State variables evolve according to g (s ′|s, ai , a−i ) .
• Define choice specific value function
Vi (ai , s) = Πi (ai , s) + βEˆVi (s
′)|s, ai
˜.
• Players chooose ai to maximize Vi (ai , s) + εi (ai ),• Ex ante value function (Social surplus function)
Vi (s) =Eεi maxai
[Vi (ai , s) + εi (ai )]
=G (Vi (ai , s),∀ai = 0, . . . , K)
=G (Vi (ai , s) − Vi (0, s) ,∀ai = 1, . . . , K) + Vi (0, s)
• When the error terms are extreme value distributed
Vi (s) = logKX
k=0
exp (Vi (k, s))
= logKX
k=0
exp (Vi (k, s) − Vi (0, s)) + Vi (0, s) .
• Relationship between Πi (ai , s) and Vi (ai , s):
Vi (ai , s) =Πi (ai , s) + βEˆG
`Vi (ai , s
′),∀ai = 0, . . . , K´|s, ai
˜=Πi (ai , s) + βE
ˆG
`Vi (k, s ′) − Vi
`0, s ′
´,∀k = 1, . . . , K
´|s, ai
˜+ βE
ˆVi
`0, s ′
´|s, ai
˜• With extreme value distributed error terms
Vi (ai , s) =Πi (ai , s) + βE
"log
KXk=0
exp`Vi
`k, s ′
´− Vi
`0, s ′
´´|s, ai
#+ βE
ˆVi
`0, s ′
´|s, ai
˜
• Hotz and Miller (1993): one to one mapping between σi (ai |s)and differences in choice specific value functions:
(Vi (1, s) − Vi (0, s), ...Vi (K , s) − Vi (0, s)) = Ωi (σi (0|s), ..., σi (K |s))
• Example: i.i.d extreme value f (εi ):
σi (ai |s) =exp(Vi (ai , s) − Vi (0, s))PKk=0 exp(Vi (k, s) − Vi (0, s))
• Inverse mapping:
log (σi (k|s)) − log (σi (0|s)) = Vi (k, s) − Vi (0, s)
• Since we can recover Vi (k, s)− Vi (0, s), we only need toknow Vi (0, s) to recover Vi (k, s) ,∀k.
• If we know Vi (0, s), Vi (ai , s) and Πi (ai , s) is one to one.
• Identify Vi (0, s) first. Set ai = 0:
Vi (0, s) =Πi (0, s) + βE
[log
K∑k=0
exp (Vi (k, s ′)− Vi (0, s ′)) |s, 0
]+ βE [Vi (0, s ′) |s, 0]
• This is a single contraction mapping unique fixed pointiteration.
• Add Vi (0, s) to Vi (k, s)− Vi (0, s) to identify all Vi (k, s).
• Then all Πi (k, s) calculated from Vi (k, s) through
Πi (k, s) = Vi (k, s)− βE [Vi (s′) |s, k] .
• Why normalize Πi (0, s) = 0?
• Why not Vi (0, s) = 0?
• If a firm stays out of the market in period t, current profit 0,but option value of future entry might depend on market size,number of other firms, etc.
• These state variables might evolve stochastically.
• Rest of the identification arguments: identical to the staticmodel.
• Nonparametric and Semiparametric Estimation
• Hotz-Miller inversion recovers Vi (k, s)− Vi (0, s) instead ofΠi (k, s)− Πi (0, s).
• Nonparametrically compute Vi (0, s) using
Vi (0, s) =βE
[log
K∑k=0
exp(Vi (k, s ′)− Vi (0, s ′)
)|s, 0
]+ βE
[Vi (0, s ′) |s, 0
]• Obtain and Vi (k, s) and forward compute Πi (k, s).
• The rest is identical to the static model.
• In semiparametric models, θ converges at a T 1/2 rate and hasnormal asymptotics.
• Apply the results of Newey (1994)-derive appropriate“influence functions”.
• The asymptotic distribution is invariant to the choice ofmethod used to estimate the first stage.
• With proper weighting function (need to estimatenonparametrically), can achieve the same efficiency as fullinformation maximum likelihoood.
• These results hold for both static and dynamic models.
Discrete Games
• Dynamic and static discrete games.
• Private information assumption.
• No unobserved state variables.
• No distinction for identification purpose.
• Therefore suffices to study static games.
• Results immediately translated into dynamic results.
Notations
• Players, i = 1, ..., n.
• Actions ai ∈ 0, 1, . . . ,K.• A = 0, 1, . . . ,Kn and a = (a1, ..., an)..
• si ∈ Si : state for player i .
• S = ΠiSi and s = (s1, ..., sn) ∈ S .
• s is common knowledge and also observed by econometrician.
• For each agent i , K + 1 state variables εi (ai )
• εi (ai ): private information to each agent.
• εi = (εi (0) , . . . , εi (K )).
• Density f (εi ), i.i.d. across i = 1, . . . , n.
• Period utility for player i with action profile a:
ui (a, s, εi ; θ) = Πi (ai , a−i , s; θ) + εi (ai )
• Example: the period profit of firm i for entering the market.
• Generalizes a standard discrete choice model.
• Agents act in isolation in standard discrete choice models.
• Unlike a standard discrete choice model, a−i enters utility.
• Player i ’s decision rule is a function ai = δi (s, εi ).
• Note that ε−i does not enter.
• ε−i is private information of other players.
• Conditional choice probability σi (ai |s) for player i :
σi (ai = k|s) =
∫1 δi (s, εi ) = k f (εi )dεi .
• Choice probability is conditional s: public information.
• Choice specific expected payoff for player i :
Πi (ai , s; θ) =∑a−i
Πi (ai , a−i , s; θ)σ−i (a−i |s).
• Expected utility from choosing ai , excluding preference shock.
• The optimal action for player i satisfies:
σi (ai |s) = Prob
εi |Πi (ai , s; θ) + εi (ai )
> Πi (aj , s; θ) + εi (aj) for j 6= i .
• Πi (ai , a−i , s; θ) is often a linear function, e.g.:
Πi (ai , a−i , s) =
s ′ · β + δ∑j 6=i
1 aj = 1 if ai = 1
0 if ai = 0
• Mean utility from not entering normalized to zero.
• δ measures the influence of j ’s entry choice on i ’s profit.
• If firms compete with each other: δ < 0.
• β measure the impact of the state variables on profits.
• εi (ai ) capture shocks to the profitability of entry.
• Often εi (ai ) are assumed to be i.i.d. extreme value distributed:
f (εi (k)) = e−εi (k)ee−εi (k).
Nonparametric Identification
A1 Assume that the error terms εi (ai ) are distributed i.i.d. acrossactions ai and agents i , and come from a known parametricfamily.
• Not possible to allow nonparametric mean utility and errorterms at once, even in simple single agent problems (e.g. aprobit).
• In Bajari, Hong and Ryan (2005)- even a single agent model isnot identified without an independence assumption.
• Well known that Πi (0, s) are not identified.
• σi (ai |s) only functions of Πi (ai , s)− Πi (0, s).
• Suppose εi (ai ) is extreme value,
σi (ai |s) =exp(Πi (ai , s)− Πi (0, s))∑Kk=0 exp(Πi (k, s)− Πi (0, s))
A2 For all i and all a−i and s, Πi (ai = 0, a−i , s) = 0.
• Can only learn choice specific value functions up to a firstdifference. Need normalization
• Similar to “outside good” assumption in single agent model.
• Entry: the utility from not entering is normalized to zero.
• Hotz and Miller (1993) inversion, for any k, k ′:
log (σi (k|s))− log(σi (k
′|s))
= Πi (k, s)− Πi (k′, s).
• More generally let Γ : 0, ...,K × S → [0, 1]:
(σi (0|s), ..., σi (K |s)) = Γi (Πi (1, s)− Πi (0, s), ...,Πi (K , s)− Πi (0, s))
• And the inverse Γ−1:
(Πi (1, s)− Πi (0, s), ...,Πi (K , s)− Πi (0, s)) = Γ−1i (σi (0|s), ..., σi (K |s))
• Invert equilibrium choice probabilities to nonparametricallyrecover Πi (1, s)− Πi (0, s), ...,Πi (K , s)− Πi (0, s).
• Πi (ai , s) is known by our inversion and probabilites σi can beobserved by econometrician.
• Next step: how to recover Πi (ai , a−i , s) from Πi (ai , s).
• Requires inversion of the following system:
Πi (ai , s) =∑a−i
σ−i (a−i |s)Πi (ai , a−i , s),
∀i = 1, . . . , n, ai = 1, . . . ,K ..
• Given s, n×K × (K + 1)n−1 unknowns utilities of all agents.
• Only n × (K ) known expected utilities.
• Obvious solution: impose exclusion restrictions.
• Partition s = (si , s−i ), and suppose
Πi (ai , a−i , s) = Πi (ai , a−i , si )
depends only on the subvector si .
Πi (ai , s−i , si ) =∑a−i
σ−i (a−i |s−i , si )Πi (ai , a−i , si ).
• Identification: Given each si , the second moment matrix ofthe “regressors” σ−i (a−i |s−i , si ),
Eσ−i (a−i |s−i , si )σ−i (a−i |s−i , si )′
is nonsingular.
• Needs at least (K + 1)n−1 points in the support of theconditional distribution of s−i given si .
• Nonparametric estimation
• Semiparametric estimation
• Linear probability model
• fixed effect panel data
• Multiple equilibria computation.
• Unobserved heterogeneity
• Consider single agent dynamic model first.
• Random coefficient model:
Dan AckerbergA New Use of Importance Sampling to ReduceComputational Burden in Simulation Estimation.Working paper, 2001.
• Bayesian methods:
Imai et alBayesian Estimation of Dynamic Discrete Choice Models.Working paper, 2005.
Andriy NoretzDynamic Discrete Choice Models with serially correlatedunobservables .Working paper, 2006.
Ackerberg 2001
• Static utility: x ′i u + εi , εi extreme value.
• Forward looking agent with discounting
• Random coefficient: u ∼ g (xi , θ).
• Moments to match:
1
n
n∑i=1
[1 (yi = a)− P (yi = a|xi , θ)] t (xi ) .
• Conditional choice probabilities:
P (yi = a|xi , θ) =
∫eV (a,xi ,u)∑
a′∈A eV (a′,xi ,u)g (u|xi , θ) du
=
∫eV (a,xi ,u)∑
a′∈A eV (a′,xi ,u)
g (u|xi , θ)
q (u|xi )q (u|xi ) du
• Using S draws of us , s = 1, . . . ,S from q (u|xi ), theconditional choice probability can be simulated as:
P (yi = a|xi , θ) =1
S
S∑s=1
eV (a,xi ,uis)∑a′∈A eV (a′,xi ,uis)
g (uis |xi , θ)
q (uis |xi )
• Simulated moment conditions:
1
n
n∑i=1
[1 (yi = a)− P (yi = a|xi , θ)
]t (xi ) .
which is
1
n
n∑i=1
[1 (yi = a)− 1
S
S∑s=1
eV (a,xi ,uis)∑a′∈A eV (a′,xi ,uis)
g (uis |xi , θ)
q (uis |xi )
]t (xi ) .
• Separation of simulation and value function computation fromthe estimation step.
• Draw uis , i = 1, . . . , n, s = 1, . . . ,S before estimation starts
• Compute all V (a, xi , uis) for all i = 1, . . . , n and s = 1, . . . ,Sbefore hand.
• No need to recompute V (a, xi , uis) when estimating θ.
• θ only reweights the density
g (uis |xi , θ)
q (uis |xi )
during optimization estimation.
• Same logic applies to simulated MLE, and in Bayesiananalysis.
Noretz 2006
• Bayesian method: serially correlated unobserved statevariables
• st = (yt , xt). xt observed. yt not observed.
• Use Gibbs sampler:• Given θ, dt , xt , draw yt .• Given dt , xt , yt , draw θ
• Joint likelihood:
π (θ) p (yT ,i , xT ,i , dT ,i , . . . , y1,i , x1,i , d1,i |θ)=π (θ) ΠT
t=1p (dt,i |yt,i , xt,i , θ) f (xt,i , yt,i |xt−1,i , yt−1,i , dt−1,i ; θ) .
• conditional choice probability can be indicator:
p (di ,t |yi ,t , xt,i , θ)
=1 (V (yt,i , xt,i , dt,i ; θ) ≥ V (yt,i , xt,i , d ; θ) ,∀d ∈ D) .
• Break unobservables into serially independent and seriallydependent components: yt = (νt , εt),
f (xt+1, νv+1, εt+1|xt , νt , εt , d ; θ)
=p (νv+1|xt+1, εt+1; θ) p (xt+1, εt+1|xt , εt , d ; θ) .
• Joint likelihood becomes
π (θ)∏i,t
p (dt,i |Vt,i ) p (Vi,t |xi,t , εi,t ; θ) p (xt,i , εt,i |xt−1,i , εt−1,i , dt−1,i ; θ)
• Vit can be drawn “analytically” conditional on (xi ,t , εi ,t ; θ)subject to the constraints specified by the p (dt,i |Vt,i )indicators.
• θ and εi ,t are drawn using Metropolis-Hasting steps.
• “Analytic” drawing of Vt,i = Vt,d ,i = V (st,i , d ; θ) , d ∈ D,where s = (x , ε, ν), requires value function updating.
• For example:
u (st,i , d ; θ) = u (xt,i , d ; θ) + vt,d ,i + εt,d ,i .
• Then
V (st,i , d ; θ) =u (xt,i , d ; θ) + vt,d ,i + εt,d ,i
+ βE [V (st+1; θ) |εt,i , xt,i , d ; θ] .
• At every step θm, the expected value function
E[V (st+1; θ
m) |εmt,i , xmt,i , d ; θm
]is updated by averaging over near history point of θ on themcmc chain and over the importance sampling draws of ε’s.
• How to update the approximate value function V m(sm,j ; θm
)V m (s; θm) = max
d∈Du (s, d ; θm) + βE (m)
[V
(s ′; θm
)|s, d ; θm
].
• At each iteration m, Draw randow statessm,j , j = 1, . . . , N (m) from an i.i.d. density g (·) > 0.
• Each iteration m, only keep track of the history of lengthN (m):
θk ; sk,j ,V k(sk,j ; θk
), j = 1, . . . , N (k)m−1
k=m−N(m).
• In this history, find i = 1, . . . , N (m) cloest to θ parameterdraws.
• Only the value functions in importance sampling thatcorrespond to these nearest neighbors are used in theapproximation by averaging.
• Update value function as:
E (m)[V
(s ′; θ
)|s, d ; θ
]=
N(m)∑i=1
N(ki )∑j=1
V ki
(ski ,j ; θki
) f(ski ,j |s, d ; θ
)/g
(ski ,j
)∑N(m)
r=1
∑N(kr )q=1 f (skr ,q|s, d ; θ) /g (skr ,q) .
=
N(m)∑i=1
N(ki )∑j=1
V ki
(ski ,j ; θki
)Wki ,j ,m (s, d , θ)
• weights simplies with i.i.d. known unobservable components.
• expected max value function can be integrated out withextreme value errors.
Particle filtering
• Used in macro dynamic models by Fernandez and Rubio.
• Particle filtering is an importance sampling method.
• x is latent state, y is observable random variable:
• We are interested in the posterior distribution of x given y .
• Want to compute E (t (X ) |y)∫t (x) p (x |y) dx =
∫t (x)
p (y |x) p (x)
p (y)dx
=
∫t (x) p(y |x)p(x)
g(x) g (x) dx∫ p(y |x)p(x)g(x) g (x) dx
• If we have r = 1, . . . ,R draws from the density g (x), then wecan approximate
E (t (X ) |y) ≈ 1
R
R∑r=1
t (xr ) w (xr |y) /1
R
R∑r=1
w (xr |y) ,
where
w (xr |y) =p (y |xr ) p (xr )
g (xr ).
• Or one can write
E (t (X ) |y) ≈ 1
R
R∑r=1
t (xr ) w (xr |y)
where
w (xr |y) = w (xr |y) /
R∑r=1
w (xr |y) .
• In other words, given R draws from g (x), t (X ) is integratedagainst a discrete distribution that places weights w (xr ) oneach of the r = 1, . . . ,R points on the discrete support.
• Alternatively, one can compute
E (t (X ) |y) ≈ 1
R
R∑r=1
t (xr )
where xr , r = 1, . . . ,R are R draws from the weightedempirical distribution on the xr , r = 1, . . . ,R, where theweights are w (xr ).
• When g (x) = p (x), w (x |y) = p (y |x), and
w (xr |y) = p (y |xr ) /
R∑r=1
p (y |xr ) .
• In a Bayesian setup, the coefficients θ are the latent statevariables x . p (x) is then the prior distribution of θ.
• A random coefficient model is just like a hierachical Bayesianmodel where the µ are the hyper-parameters in the priordistribution, and the hyperparameters are to be estimated.The prior for θ can be made dependent on both µ andcovariates (state variables) s.
• Computing the weights p (y |θ) involves value functioniteration and is computationally difficult.
• If we use GMM objective function instead of the likelihood toform p (y |θ), there might be some computation savingsbecause only one simulation r is needed for each observation.
• A more interesting case is when there is dynamic unobservablestate variables.
• Suppose s = (x , v), such that x is observed but v is notobserved.
• We might be interested in the posterior distribution of theentire sample path of v in addition to those of θ, the“parameters”.
• Particle filtering can be used in this case.
• How can computation be maximally sped up?
• Do particles lend themselves to pararrell processing?
p (v0:t |y1:t , x0:t) =p (v0:t , y1:t , x0:t)
p (y1:t , x0:t)
=p (v0:t−1, y1:t−1, x0:t−1) p (vt , yt , xt |v0:t−1, y1:t−1, x0:t−1)
p (y1:t−1, x0:t−1) p (yt , xt |y1:t−1, x0:t−1)
=p (v0:t−1|y1:t−1, x0:t−1)p (vt , yt , xt |v0:t−1, y1:t−1, x0:t−1)
p (yt , xt |y1:t−1, x0:t−1)
=p (v0:t−1|y1:t−1, x0:t−1)p (yt |xt , vt) p (vt , xt |v0:t−1, y1:t−1, x0:t−1)
p (yt , xt |y1:t−1, x0:t−1)
=p (v0:t−1|y1:t−1, x0:t−1)p (yt |xt , vt) p (vt , xt |vt−1, yt−1, xt−1)
p (yt , xt |y1:t−1, x0:t−1)
The fourth equality follows from the conditional independenceassumption and the markovian structure, and the fifth equalityfollows from the markovian structure of the state variabletransition.
• Therefore we only need to make the following smallmodifications to the filtering algorithm:
• First, for each particle in period t − 1, with its associatedvalue of vt−1, simulate from
p (vt |xt , vt−1, yt−1, xt−1) =p (vt , xt |vt−1, yt−1, xt−1)
p (xt |vt−1, yt−1, xt−1).
• If vt and xt are conditionally independent givenvt−1, yt−1, xt−1, then we can directly simulate from
p (vt |vt−1, yt−1, xt−1) .
• Then reweight by p (yt |xt , vt).
• The weights are not easy to compute because they eitherinvolve value function iteration orbackward recursion.
• In summary, three components of recursive particle filteringmethod:
p (v0:t |x1:t , y1:t) ∝p (v0:t−1|y1:t−1, x0:t−1)
p (vt |xt , vt−1, yt−1, xt−1) p (yt |xt , vt)
• The first part p (v0:t−1|y1:t−1, x0:t−1), defines recursion.
• The second part p (vt |xt , vt−1, yt−1, xt−1) , defines theimportance sampling density.
• The third part p (yt |xt , vt), defines the weights on theparticles.
• Relation to the macro model of Jesus Fernandez-Villaverdeand Juan Rubio-Ramırez.
• Their paper: “Estimating Macroeconomic Models: ALikelihood Approach”, forthcoming, Review of EconomicStudies.
• Basically, very similar.
• They also have feedback from yt to the latent state variablesxt+1, vt+1:
p (vt , xt |v0:t−1, y1:t−1, x0:t−1) = p (vt , xt |vt−1, yt−1, xt−1)
unlike stochastic volatility models, where transitions of vt , xt
are automonous.
• They introduce this dependence by allowing for singularity inthe measurement equation:
Yt = g (St ,Vt ; γ)
• The states in their transition equation are not necessarily alllatent:
St = f (St−1,Wt ; γ)
• Feedback from Yt to latent states is allowed when Yt is asubcomponent of St and is directly observed, such that thatparticular component of g (·) has no noise Vt .
• In their assumption 2, they assume that both Y and V arecontinuously distributed and that g (·) (f (·) as well) isinvertable.
• Then the conditional density of Yt (which basically is used tocalculate the weights) can be determined from the density ofVt and the jacobian of the transformation.
• They do mention though, in one brief sentence, that thisassumption can possibly be relaxed as long as the weights canbe computed.
• Our discrete Yt model falls into this extension where there isno invertibility.
• The weights can still be computed through the logitprobability form and the value function iterations (orbackward recursion).
• Calculating value functions by backward induction:
• Assuming binary choice model.
• At time T :
VT (s) = log [exp (Π (s, 1)) + exp (Π (s, 0))]
• Suppose Vt+1 (s) is known, then at time t:
Vt (s, 1) =Π (s, 1) + βE[Vt+1
(s ′
)|s, 1
]Vt (s, 0) =Π (s, 0) + βE
[Vt+1
(s ′
)|s, 0
]Vt (s) = log [exp (Vt (s, 1)) + exp (Vt (s, 0))] .