Introduction - Université de Montréal · distribution for Markov chain Monte Carlo (MCMC)....

THE HESSIAN METHOD WITH CONDITIONAL DEPENDANCE

BARNABE DJEGNENE AND WILLIAM J. MCCAUSLAND

Abstract. We extend a method for simulation smoothing in state space modelswith univariate states to a more general class of models. The state sequenceα = (α1, . . . , αn)> is Gaussian and the sequence y = (y>1 , . . . , y

>n )> of observed

vectors may be conditionally non-Gaussian. We allow the measurement density(or mass) function f(yt|α) to depend not only on αt but also on αt+1. Thisallows conditional dependance between the observed vector and the innovationof the state equation. Many important models feature this, including stochasticvolatility models with leverage. Our method extends the HESSIAN method,described in McCausland (2008), designed for models where f(yt|α) dependsonly on αt.

We develop a close approximation g(α|y) of the conditional density f(α|y)of the state vector given observed data. The approximate distribution can beused as an importance distribution for importance sampling, or as a proposaldistribution for Markov chain Monte Carlo (MCMC). Applications include theapproximate evaluation of likelihood functions and Bayesian posterior inference.We illustrate using Gaussian and Student’s t stochastic volatility models withleverage. We construct an independence Metropolis-Hastings chain and an im-portance sampling chain for simulating the joint posterior distribution of param-eters and latent states. Experiments using simulated and observed data suggestthat our methods, which are not model specific, are more numerically efficientthan the model specific procedure of Omori, Chib, Shephard, and Nakajima(2007). The importance sampling posterior simulation gives a relative numeri-cally efficiency close to 100%, at least 4 times higher than those using Omori,Chib, Shephard, and Nakajima (2007).

1. Introduction

State space models, which govern the interaction of observable data and latentstates, are very useful in capturing dynamic relationships, especially where thereare changing, but latent, economic conditions: the states may be unobserved statevariables in macroeconomic models, log volatility in asset markets or time varyingmodel parameters.

The fact that latent states are unknown complicates inference. Monte Carlosimulation methods have proven useful to Bayesians and frequentists alike forovercoming this difficulty. Bayesians typically do inference using a sample from

Date: Current version: March 30, 2011.Mailing address: Departement de sciences economiques, CIREQ, C.P. 6128, succursale Centre-ville, Montreal QC H3C 3J7, Canada. e-mail: [email protected].

1

2 B. DJEGNENE AND W. J. MCCAUSLAND

the conditional distribution of states and parameters given data. Frequentists oftenuse samples from the conditional distribution of states given data to integrate outthe states numerically. Either way, it is important to be able to simulate theconditional distribution of states given data and parameters. We will call thisdistribution the target distribution.

Currently, drawing from the exact target distribution is feasible for linear Gauss-ian state space models and little else. However, even draws from an approximatedistribution can be used to obtain simulation consistent approximations of valuesof the likelihood function, posterior means, and other integrals of interest. Forexample, one can use draws from the approximate distribution as an importancesample for estimating the value of the likelihood function at a particular value ofthe parameter vector. For another example, one can use the approximate distri-bution as a proposal distribution for a Metropolis-Hastings update of latent statesin a Markov chain Monte Carlo (MCMC) posterior simulator of unknown statesand parameters. Here, the update of latent states is useful for inference aboutparameters, whether or not the posterior distribution of states is of direct interest.We will discuss these examples in more depth in Section 4. The present paperintroduces an efficient method for simulation smoothing in state space models ofthe form

(1)

α1 = d0 + u0, αt+1 = dt + φtαt + ut,

f(y|α) =

[n−1∏t=1

f(yt|αt, αt+1)

]f(yn|αn),

where α ≡ (α1, . . . , αn) is a vector of univariate latent states, the ut are indepen-dent Gaussian random variables with mean 0 and precision (inverse of variance) ωt,the yt are observable random vectors, and the f(yt|αt, αt+1) are measurement den-sity or mass functions. We do not require them to be Gaussian, linear or univariate.Throughout most of the paper, we condition on dt, φt, ωt and any other parameterson which the f(yt|αt, αt+1) might depend, and suppress notation for this condi-tioning. Only in Section 4, where we consider joint inference for parameters andstates, are we explicit about this conditioning. We emphasize that the observationvector yt and the contemporaneous state innovation ut = αt+1 − dt − φtαt maybe conditionally dependent, given the contemporaneous state αt. There is littlereason beyond computational convenience to suppose conditional independence.

There is also evidence suggesting that models with conditional dependance canbe more realistic. Perhaps the best known models featuring conditional depen-dance are stochastic volatility models with an asymmetric volatility effect oftenknown as the leverage effect. In the stochastic volatility model introduced byHarvey and Shephard (1996), the latent states αt are log volatilities, given by

(2) α1 = α +1√

(1− φ2)ωu0, αt+1 = (1− φ)α + φαt +

1√ωut,

THE HESSIAN METHOD WITH CONDITIONAL DEPENDANCE 3

and observed returns yt are given by

(3) yt = exp(αt/2)vt,

where the (ut, vt) are serially independent with

(4) u0 ∼ N(0, 1),

[utvt

]∼ i.i.d. N

(0,

[1 ρρ 1

]),

and (ω, φ, ρ, α) is a vector of parameters.It is easy to see that this model is of the form given by (1). We use (2) to write

ut =√ω[αt+1 − (1− φ)α− φαt],

then use the standard formula for conditional Gaussian distributions to obtain

(5) yt|α ∼ N(ρ√ω exp(αt/2)(αt+1 − (1− φ)α− φαt), (1− ρ2) exp(αt)

).

Others have extended this model. Jacquier, Polson, and Rossi (2004) and Omori,Chib, Shephard, and Nakajima (2007) consider inference in stochastic volatilitymodels with asymmetric volatility and heavy-tailed conditional return distribu-tions. This and other empirical work has shown convincingly that stochasticvolatility models with asymmetric volatility are more realistic descriptions of stockreturns than models without. Conditional dependence may be useful in other mod-els as well. Feng, Jiang, and Song (2004) show that conditional dependence is morerealistic in stochastic conditional duration models.

For models that are fully linear and Gaussian, there are well established methodsfor simulation smoothing. We can distinguish two broad approaches. One is basedon Kalman filtering, and includes the methods introduced by Carter and Kohn(1994), Fruhwirth-Schnatter (1994), de Jong and Shephard (1995) and Durbin andKoopman (2002). The other is based on the precision matrix and covector of theconditional distribution of α given y, and includes the methods proposed by Rue(2001), Chan and Jeliazkov (2009) and McCausland, Miller, and Pelletier (2011).Both approaches accommodate conditional dependence of the observed vector andthe contemporaneous state innovation, given the contemporaneous state.

Simulation smoothing is more difficult for models that are non-linear or non-Gaussian, since the target distribution is not multivariate Gaussian. McCausland(20010) summarizes two broad approaches to this problem, a direct approach and aauxiliary mixture sampling approach. We will now discuss particular applicationsof these approaches to models with conditional dependence.

The direct approach to this problem is to use a Metropolis-Hastings update forblocks (αt, αt+1, . . . , αt+T−1) of length T , where 1 ≤ T ≤ n. In the context ofstochastic volatility models with leverage, Jacquier, Polson, and Rossi (2004) andYu (2005) use blocks of length T = 1. Watanabe and Omori (2004) use blockof various length, and find that for multivariate Gaussian proposals, the optimalblock length is larger than 1 and smaller than n for samples of reasonable size.Simulation efficiency can be much higher for the optimal block length than for a


block length of one, due to posterior autocorrelation in the states. At the sametime, several authors have noted that a multivariate Gaussian approximation ofthe entire target distribution (T = n) is not a close enough approximation fortypical sample sizes. In practical terms, the importance weights are too variablefor effective importance sampling and the MCMC acceptance probabilities are toolow for tractable posterior simulation. See Shephard and Pitt (1997), Gamerman(1998) and Pitt (2000) for demonstrations of these issues in a broader context.Jungbacker and Koopman (2008) consider a stochastic volatility model with lever-age. Finding that T = n is infeasible for multivariate Gaussian proposals, theyresort to smaller blocks.

The auxiliary mixture sampling approach uses transformations, data augmen-tation and approximations of non-Gaussian distributions by Gaussian mixturedistributions to obtain an auxiliary mixture model. The auxiliary mixture modelis designed so that conditioning on auxiliary variables yields a Gaussian linearmodel, for which standard methods are appropriate. Omori, Chib, Shephard, andNakajima (2007) introduce such a method for a stochastic volatility model withleverage, based on the well established method of Kim, Shephard, and Chib (1998)for stochastic volatility models without asymmetric volatility.

Our method is an example of the direct approach, but we do not use a multivari-ate Gaussian proposal. Instead, we use a much closer approximation to the targetdistribution. This allows us to draw α as a single block (T = n), which overcomesthe problem of posterior autocorrelation in α. It also allows us to draw parame-ters and states as a single block, as we will see in Section 4. This leads to furtherefficiency improvements because of posterior dependence between parameters andstates. The approximation is similar to the approximation in McCausland (2008),but unlike the method in that paper, ours does not require conditional indepen-dence.

In Section 2 we describe our approximation g(α|y) of the target density f(α|y).We show how to evaluate it and how to draw from the distribution with densityg(α|y). In Section 3 we subject the code we use to compute g(α|y) and draw fromthe approximate distribution to tests of correctness similar to those described inGeweke (2004).

Section 4 illustrates the method using stochastic volatility models with leverage,with Gaussian and Student’s t measurement innovations. Section 5 concludes.

2. An approximation of the target density

In this section we define our approximation g(α|y) of the target density f(α|y).We do not provide a closed form expression for g(α|y), but instead show how toevaluate g(α|y) and draw from the distribution with this density. The densityg(α|y) is proper and fully normalized. Draws from the approximate distributionand evaluations of the approximate density are exact and have computationalcomplexity O(n).


Our approximation is not model specific. We construct g(α|y) for a particularstate space model using a suitable description of the model, consisting of thefollowing quantities and computational routines.

We specify the state dynamics by providing Ω and c, the precision and covectorof the marginal distribution of α, the state sequence. This gives the distributionof α as α ∼ N(Ω−1c, Ω−1). The precision, being a band diagonal matrix, has O(n)elements, while the variance, its inverse, has O(n2). Section 2.1 of McCausland(2010) describes how to compute Ω and c in terms of the dt, φt and ωt.

We specify the measurement distributions by supplying routines to compute, fort = 1, . . . , n− 1, the functions

ψt(αt, αt+1).= log f(yt|αt, αt+1), ψn(αn) = log f(yn|αn),

and the partial derivatives

ψp,qt (αt, αt+1).=∂p+qψt(αt, αt+1)

∂αpt∂αqt+1

, ψpn(αn) =∂pψ(αn)

∂αpn

up to certain orders p and q. For convenience, Table 1 summarizes this and otherimportant notation.

The routines to compute the ψt(αt, αt+1) and ψn(αn) must give exact results,as they are used to evaluate f(α|y) up to a normalization factor. The partialderivatives, however, may be numerical derivatives or other approximations. Ap-proximation error may make g(α|y) a cruder approximation of f(α|y) and thusdiminish the numerical precision of IS or MCMC. But we will still be able to eval-uate and simulate g(α|y) without error, and so it does not compromise simulationconsistency.

The target density has the Markov property, which allows us to decompose it as

(6) f(α|y) = f(αn|y)1∏

t=n−1

f(αt|αt+1, y).

Our approximation g(α|y) also has the Markov property, and we decompose it as

(7) g(α|y) = g(an|y)1∏

t=n−1

g(αt|αt+1, y),

where each factor is a proper fully normalized density function closely approxi-mating the corresponding factor of f(α|y). Whether we need to evaluate g(α|y),simulate it or both, the decomposition allows us to do so sequentially, for t de-scending from n to 1.

The densities g(αt|αt+1, y) are members of the five-parameter perturbed Gauss-ian distribution described in Appendix F of McCausland (2010). The parame-ters give a mode of the distribution and the second through fifth derivatives oflog g(αt|αt+1, y) at that mode.


We do not provide closed form expressions for the five parameters in terms ofαt+1 and y. Rather, we describe procedures for computing approximations of themode of f(αt|αt+1, y) and the second through fifth derivatives of log f(αt|αt+1, y)at that mode, all as a function of αt+1 and y. We then use these values as theparameters of g(αt|αt+1, y).

The approximation of the factor g(αt|αt+1, y) is based on the following result:

(8)

∂ log f(αt|αt+1, y)

∂αt= ct − Ωt−1,tµt−1|t(αt)− Ωttαt − Ωt,t+1αt+1

+ E[ψ0,1t−1(αt−1, αt)|αt, y] + ψ1,0

t (αt, αt+1),

where µt−1|t(αt) is the conditional mean of αt−1 given αt and y. The result, andanalogous results for the cases t = 1 and t = n, are derived in Appendix C.1.

We compute the approximation g(α|y) using the following steps. We first com-pute the mode a = (a1, . . . , an) of the target distribution using the method de-scribed in McCausland (2010). Then we compute, for t = 1, . . . , n − 1, polyno-mial approximations Bt|t+1(αt+1) and Mt|t+1(αt+1) of the functions bt|t+1(αt+1) andµt|t+1(αt+1), the conditional mode and mean of αt given αt+1 and y. Finally, wecompute, for t = n, . . . , 1, a final approximation g(αt|αt+1, y) of f(αt|αt+1, y), us-ing a particular value of αt+1, available at iteration t. During this backward pass,we can draw αt, evaluate g(αt|αt+1, y), or both. In the rest of this section, wedescribe these steps in more detail. Full detail is left to various Appendices.

2.1. Precomputation. We first compute the precision Ω and covector c of theGaussian prior distribution of states as a function of the dt, φt and ωt of thespecification in (1). We then compute the mode a of the target distribution. This

gives, as bi-products, several quantities used later. This includes the precision ¯Ωand covector ¯c of a Gaussian approximation N(¯Ω−1¯c, ¯Ω−1) of the target density.It also gives the conditional variances Σt

.= Var[αt|αt+1], t = 1, . . . , n − 1, and

Σn.= Var[αn] implied by this Gaussian approximation.

This precomputation is similar to that described in McCausland (2010). Littlemodification is required, and we give details in Appendix A.

2.2. A Forward Pass. In order to describe the forward pass, it will be helpfulto introduce a sequence of multivariate Gaussian conditional distributions. Wedefine, for t = 1, . . . , n− 1, (a1|t+1(αt+1), . . . , at|t+1(αt+1)) as the conditional mode

of (α1, . . . , αt) given αt+1 and y, and ¯Ω1:t|t+1 as the negative Hessian matrix oflog f(α1, . . . , αt|αt+1, y) with respect to (α1, . . . , αt), evaluated at (a1|t+1, . . . , at|t+1).

Thus we can view the distribution N((a1|t+1, . . . , at|t+1),¯Ω−11:t|t+1) as an approxima-

tion of the conditional distribution of (α1, . . . , αt) given αt+1 and y. Result 2.1 of

McCausland, Miller, and Pelletier (2011) implies that if x ∼ N((a1|t+1, . . . , at|t+1),¯Ω−11:t|t+1),


then xt|xt+1 ∼ N(at|t+1,Σt|t+1), where Σt|t+1 is the final value in the following for-ward recursion:

(9) Σ1|t+1.= ¯Ω−111 , Στ |t+1

.= (¯Ωττ − ¯Ω2

τ,τ−1Στ−1|t+1)−1, τ = 2, . . . , t.

We also define, for t = 1, . . . , n− 1, st|t+1(αt+1).= log Σt|t+1(αt+1).

The forward pass consists of performing the following steps, for t = 1, . . . , n−1:

(1) Compute

(10)

a(r)t

.=∂rat|t+1(αt+1)

∂αrt+1

∣∣∣∣αt+1=at+1

, r = 1, . . . , R,

s(r)t

.=∂rst|t+1(at+1)

∂αrt+1

∣∣∣∣αt+1=at+1

, r = 1, . . . , R− 1.

The choice of R determines how closely we can approximate the functionsat|t+1(αt+1) and st|t+1(αt+1) using Taylor series expansions. For our empir-ical illustration, we use R = 5.

Appendix B gives details. Equation (25) gives a(r)1 and for t > 1, (37)

gives a(r)t as a function of a

(i)t−1, i = 1, . . . , r, and a

(i)t , i = 1, . . . , r − 1.

Equations (34), (39), (42) and (45) give simplified expressions for r =1, . . . , 5 and t > 1.

Equation (27) gives s(r)1 and equations (37) and (38) give s

(r)t . Equations

(41), (44) and (46) give simplified expressions for s(r)t , r = 1, . . . , 4 and

t = 2, . . . , n− 1.Appendix B includes a proof that these computations are exact. The

proof uses a first order necessary condition for (a1|t+1, . . . , at|t+1) to max-imize f(α1, . . . , αt|αt+1, y), the identity at−1|t+1(αt+1) = at−1|t(αt+1(αt+1))and the difference equation (9) defining Σt|t+1(αt+1).

(2) Compute approximations Bt, B(1)t , B

(2)t and B

(3)t of the value and first three

derivatives of bt|t+1(αt+1) at at+1. Recall that bt|t+1(αt+1) is the conditionalmode of αt given αt+1 and y. For t = n, we only compute an approximationBn of the value bn, the conditional mode of αn given y. Appendix C.3 de-fines these approximations and shows how to compute them. Specifically,

equation (60) defines B(r)t as a function of the a

(i)t and s

(i)t . The approxi-

mations are based on the approximation of bt|t+1(αt+1)− at|t+1(αt+1) usinga first order necessary condition for bt|t+1(αt+1) to maximize f(αt|αt+1, y).

(3) Compute approximations Mt, M(1)t , and M

(2)t of the value and first two

derivatives of µt|t+1(αt+1) at at+1. Recall that µt|t+1(αt+1) is the conditionalmean of αt given αt+1 and y. Appendix (C.4) defines these approximations.

Equation (67) gives M(r)t as a function of the B

(i)t , a

(i)t and s

(i)t .


2.3. A Backward Pass. We use the backward pass to draw a random variateα∗ from the distribution with density g(α|y) and evaluate g(α∗|y). One can alsoevaluate g(α|y) at an arbitrary value α∗ without drawing.

To implement the backward pass, we use the following approximation of thederivative of log f(αt|αt+1, y), based on (8)

(11)Ht(αt;αt+1)

.= ct − Ωt−1,tMt−1|t(αt)− Ωttαt − Ωt,t+1αt+1

+Xt−1|t(αt) + ψ1,0t (αt, αt+1)

where Mt−1|t(αt) is an approximation of µt−1|t(αt), and X0,1t−1|t(αt) is an approxi-

mation of E[ψ0,1t−1(at−1|t(αt), αt)|αt, y].

The backward pass consists of performing the following steps, for t = n, . . . , 1.

(1) Evaluate Bt|t+1(α∗t+1), where Bt|t+1(αt+1) is the polynomial given by

(12)

Bt|t+1(αt+1).= Bt +B

(1)t (αt+1 − at+1) +

1

2B

(2)t (αt+1 − at+1)

2 +1

6B

(3)t (αt+1 − at+1)

3

+1

24B

(4)t (αt+1 − at+1)

4 +1

120a(5)t (αt+1 − at+1)

5

(2) Evaluate the value and first through fourth derivatives of Ht(αt;αt+1) atat|t+1 using (11).

(3) The density g(αt|αt+1, y) is a member of the five-parameter perturbedGaussian distribution described in Appendix F of McCausland (2010). Let

Ht(αt;αt+1).=

4∑r=0

H(r)t (at|t+1;αt+1)

r!(αt − at|t+1)

r

Then, the parameter values are given byBt|t+1(α∗t+1), and H

(r)t (Bt|t+1(α

∗t+1);α

∗t+1),

r = 1, . . . , 4, where H(r)t (αt;αt+1) is the rth derivatives of Ht(αt;αt+1) with

respect to αt. These approximate bt|t+1(αt+1), the second and third deriva-tives of log f(αt|αt+1, y).

(4) Draw α∗t from this distribution and evaluate g(αt|αt+1, y) at α∗t and α∗t+1.

3. Getting it right

In posterior simulation, analytical or coding errors can lead to reasonable butinaccurate results. Geweke (2004) develops tests for the correctness of posteriorsimulations, based on two alternative methods for simulating the joint distribu-tion of a model’s observable and unobservable variables. Correctness tests takethe form of tests for the equality of the two stationary distributions. Since the twosimulation methods have little in common, the tests have power against a widearray of conceptual and coding errors. We apply these ideas to build tests for thecorrectness of the independence Metropolis-Hastings update of the target distri-bution using the HESSIAN approximation g(α|y, θ) as a proposal distribution.


We choose a fixed value of θ, the vector of parameters. Then we generate a largesample from the conditional distribution of α and y given θ. We initialize with adraw α(0) from the conditional distribution of α given θ, then draw α(m), y(m)Mm=1

as follows. For m = 1, . . . ,M ,

(1) Draw y(m) from the conditional distribution of y given θ and α, with α setto α(m−1).

(2) Update from α(m−1) to α(m) using an independence Metropolis-Hastingsstep, with g(α|y, θ) as a proposal distribution and y = y(m).

This is a Gibbs sampler for the conditional distribution of α and y given θ. Theinitial and stationary distributions of this chain are both equal to this distribution.By induction, so are the distributions of all the (α(m), y(m)). In particular, α(m) ∼N(αı, Ω−1) for all m, where ı is the n-vector with all elements equal to one. Thisimplies that for all m = 1, . . . ,M and q ∈ (0, 1), the following indicators areBernoulli with probability parameter q:

(13) I(m)t,q

.= 1

(α(m)t − α

σ/√

1− φ2≤ Φ−1(q)

), t = 1, . . . , n,

(14) I(m)t|t−1,q

.= 1

(α(m)t − (1− φ)α− φα(m)

t−1

σ≤ Φ−1(q)

), t = 1, . . . , n,

where Φ(x) is the cumulative distribution function of the univariate standardGaussian distribution.

We use sample means of the I(m)t,q and I

(m)t|t−1,q to test the hypotheses that the

corresponding population means are equal to q. We report results for the ASV-Student model. The parameter values are fixed to α = −9.0, φ = 0.97, σ = 0.20,ρ = −0.3 and ν = 12.0. We use a vector of length n = 20 and a sample size ofM = 107. We use the R package coda to compute time series numerical standarderrors and use Gaussian asymptotic approximations to construct symmetric 95%and 99% intervals. The 95% confidence interval does not include q in 10 cases outof 360 (2.78%). The 99% confidence interval does not include q in a single case(0.28%). The sample mean always lies well within the interval [q−0.001, q+0.001].These results fail to cast doubt on the correctness of the implementation.

4. Empirical example

4.1. Models. We consider two different stochastic volatility models with asym-metric volatility. The first model, which we will call ASV-Gaussian, is the basicasymmetric volatility model given in equations (2), (3) and (4).

The second model, which we will call ASV-Student, replaces the observationequation in (3) with

(15) yt = exp(αt/2)vt√λt/ν

,


where λt ∼ χ2(ν) and the λt and (ut, vt) are mutually independent.In order to allow us to draw parameters and states together in a single block,

we will now integrate out λt to obtain the conditional distribution of yt given αtand αt+1. This distribution is a scaled non-central Student’s t. To see this, writeyt = exp(αt/2)

√1− ρ2X, where

X.=ut/√

1− ρ2√λt/ν

.

Now condition on αt and αt+1. The numerator and denominator are independent;the numerator is Gaussian with mean

µ.= ρ

√ω

1− ρ2[αt+1 − dt − φtαt]

and unit variance; and λt is chi-squared with ν degrees of freedom. Therefore X isnon-central Student’s t with non-centrality parameter µ and ν degrees of freedom.The density of X is given by

fX(x; ν, µ) =νν/2

2νΓ(ν + 1)

Γ(ν/2)exp(−µ2/2)(ν + x2)−ν/2

×

√2µx

ν + x2

M(ν2

+ 1; 32; µ2x2

2(ν+x2)

)Γ(ν+1

2)

+1√

ν + x2

M(ν+12

; 12; µ2x2

2(ν+x2)

)Γ(ν/2 + 1)

,(16)

where Γ(ν) is the gamma function and M(a; b; z) is Kummer’s function of the firstkind, a confluent hypergeometric function given by

(17) M(a; b; z) =+∞∑k=0

(a)k(b)k

zk

k!,

where (a)k = a(a+ 1) . . . (a+k−1). See Scharf (1991). We obtain the conditional

density f(yt|αt, αt+1) using the change of variables yt = exp(αt/2)√

1− ρ2X. Thelog conditional density ψt(αt, αt+1) ≡ log f(yt|αt, αt+1) and its derivatives are givenin Appendix D.

For both models, the state equation parameters are ωt = ω, φt = φ and dt =(1 − φ)α for all t > 1. The marginal distribution of the initial state α1 is thestationary distribution, so that ω0 = (1− φ2)ω and d0 = α.

We express our prior uncertainty about the parameters in terms of a multivariateGaussian distribution over the transformed parameter vector

θ = (log σ, tanh−1 φ, α, tanh−1 ρ, log ν).

Based on a prior predictive analysis, in which we track the implied prior distribu-tion of the conditional and unconditional variance of αt, the coefficient of variation


of αt and the half life of volatility shocks, we choose the following prior:

θ ∼ N

−1.82.5−11.0−0.42.5

,

0.125 −0.05 0 0 0−0.05 0.1 0 0 0

0 0 4 0 00 0 0 0.25 00 0 0 0 0.25

.

4.2. MCMC and IS methods for posterior simulation. To illustrate theperformance of the HESSIAN approximation, we use Markov chain Monte Carlo(MCMC) and importance sampling posterior simulations and compare with Omori,Chib, Shephard, and Nakajima (2007). For both posterior simulations, we drawjointly θ and α. We use as proposal density (resp. importance density) g(α, θ|y) =g(α|θ, y)g(θ|y), based on an approximation g(θ|y) of f(θ|y) and the HESSIANapproximation g(α|θ, y) of f(α|θ, y).

We construct g(θ|y) as follows. Just as g(α|θ, y) is a close approximation off(α|θ, y), g(θ|y)

.= f(α, θ, y)/g(α|θ, y) is a good approximation of f(θ|y). Let

θ be the maximizer of g(θ|y) and Σ be the inverse of the negative Hessian oflog g(θ|y) at θ. Also let nθ be the dimension of θ, equal to 4 for the Gaussianmodel and 5 for the Student’s t model.

We choose g(θ|y) to be a nθ-variate Student’s t density with location parameterθ, scale matrix Σ, and degrees of freedom equal to 30.

In the MCMC posterior simulation, we construct independence Metropolis-Hastings chain. The joint proposal (α?, θ?) is accepted with probability

π(θ?, α?, θ, α) = min

[1,f(θ?)f(α?|θ?)f(y|θ?, α?)f(θ)f(α|θ)f(y|θ, α)

g(θ)g(α|θ)g(θ?)g(α?|θ?)

].

In the importance sampling posterior simulation, to reduce variance, we usea combination of quasi-random and pseudo random sequences to draw θ. Weconstruct M blocks of length S each, for a total of MS draws. S must be a powerof two, which is convenient for Sobol quasi-random sequences.

We draw U (m), m = 1, . . . ,M , independently from the uniform distributionon the hypercube (0, 1)nθ . For s = 1, . . . , S, V (s) is the s′th element of the nθ-dimensional Sobol sequence. For m = 1, . . . ,M and s = 1, . . . , S, we computeU (m,s), defined as the modulo 1 sum of U (m) and V (s). Thus U (m,s) is uniformlydistributed on (0, 1)nθ and the M blocks of length S are independent. We useU (m,s) to draw θ(m,s) from g(θ|y) using the same method as in McCausland (2010).

Let h(θ, α) be any function of interest. The importance sampling estimator forE[h(θ, α)|y] is N/D, where

N.=

M∑m=1

S∑s=1

w(m,s)h(θ(m,s), α(m,s)), D.=

M∑m=1

S∑s=1

w(m,s),


and

w(m,s) =f(θ(m,s), α(m,s), y)

g(θ(m,s), α(m,s)|y).

If the posterior mean of h(θ, α) exists, then the ratio R = N/D is a simulationconvergent estimator of E[h(θ, α)|y].

Following Geweke (1989), we approximate the posterior variance of h(θ, α) by

σ2 .=

∑Mm=1

∑Ss=1[w

(m,s)(h(θ(m,s), α(m,s))−R)]2

D2.

We compute a numerical standard error for R using the delta method. This givesthe following approximation of the numerical variance of the ratio R:

σ2R.= (σ2

N − 2RσND +R2σ2D)(MS/D)2,

where σ2N and σ2

D are estimates of the variances of N and D and σND is an estimateof the covariance. Specifically, σ2

N is (1/M) times the sample variance of the Mindependent terms

Nm =1

S

S∑s=1

w(m,s)h(θ(m,s), α(m,s)), m = 1, . . . ,M,

and analogously for σ2D and σND. Then (σ/σR)2 is an estimate of the relative

numerical efficiency.

4.3. Results. For the ASV-Gaussian model, we report results of the HESSIANindependence Metropolis-Hastings and importance sampling posterior simulations.We implement the procedure of Omori, Chib, Shephard, and Nakajima (2007),denoted OCSN, and compare results. We apply the three methods to artificial andobserved data. We generate three artificial data sets from the model, using valuesof α, φ and σ equal to −9.0, 0.97 and 0.15 and respective values of the parameterρ: −0.3, −0.5 and −0.7. The sample size is n = 5000 in all three cases. We use tworeal data sets. The first consists of daily returns of the S&P 500 index from January1980 to December 1987, for a total of 2022 observations. This matches a sampleused by Yu (2005). The second data set consists of 1232 daily returns of the TOPIXindex. This data set, used by Omori, Chib, Shephard, and Nakajima (2007), isavailable at Nakajima’s website http://sites.google.com/site/jnakajimaweb/sv.

In the MCMC posterior simulation, the initial 10 draws are discarded and theindependence Metropolis-Hastings chain is of length 12,800. We choose this chainsize to match the total draws of the importance sampling chain where we useM = 100 and S = 128. In our replication of the OCSN chain, the initial 500values are discarded and we retain the 12,800 subsequent values. Table 2 gives thecomputational time by dataset and estimation procedure. For all three methods,the code is written in C++. The computation was made on a Window PC withan Intel Core Duo 2.00GHz processor.

Table 3 summarizes the estimation results for the artificial data.


The labels HIS, HIM and OCSN indicate the HESSIAN importance sample,the HESSIAN independence Metropolis-Hastings chain, and the chain obtainedusing the Omori, Chib, Shephard, and Nakajima (2007) procedure. The first twocolumns show estimates of the posterior mean and standard deviation, for thevarious parameters.

The third and fourth columns give the numerical standard error (NSE) andthe relative numerical efficiency (RNE) of the posterior mean as an estimator ofthe posterior mean efficiency. The RNE is a measure of efficiency relative to theefficiency of a random sample from the posterior. We use the results of Section 4.2to compute the NSE and RNE of the importance sampling chain and the OCSNchain. We use the contributed coda library of the R software to compute those ofthe HESSIAN independence Metropolis-Hastings method. This software computesthe NSE using a time series method based on the estimated spectral density atfrequency zero.

The three methods produce estimates from the same limiting distribution. Thus,the absolute difference of the posterior mean estimates from two different methodsshould not exceed two times the NSE. The slight difference observed between theHESSIAN’s estimates and the OCSN’s is due to the difference of the parameters’prior. We implement the procedure of Omori, Chib, Shephard, and Nakajima(2007) using the prior described in their article.

The HESSIAN importance sampling outperforms the OCSN method in all cases.It’s numerical efficiency is higher compared to OCSN, and apart from the uncondi-tional mean α of the log volatility, at least four times higher. The efficiency of theimportance sample means are sometimes greater than 1. This is possible becauseof the variance reduction achieved by using quasi-random numbers. In addition,the HIS procedure has a lower computational time and thus a higher precision pertime (1/(Time× NSE2)). Also, except the unconditional mean of the log volatility,the HESSIAN independence Metropolis Hastings methods outperforms the OCSNprocedure for all other three parameters, with regard to the relative numericalefficiency and the precision per time.

Table 4 summarizes results of the ASV-Gaussian model for real data. Here too,we have the same conclusions. The HESSIAN importance sampling proceduregives the highest numerical efficiency and precision per time. Apart the uncondi-tional mean of the log volatility, the HESSIAN independence Metropolis-Hastingsmethod has a higher numerical efficiency and precision per time compared toOCSN. The reported posterior means of the parameters φ, σ and ρ are consistentwith the values reported by Omori, Chib, Shephard, and Nakajima (2007) for theTOPIX index. The difference in the posterior means of α suggest that these au-thors used daily returns in percentages. Similarly for Yu (2005) in the case of theS&P500.

We consider also posterior simulations of the ASV-Student model on artificialand real data, using only the HESSIAN procedures. For the artificial data, we fix


values of α, φ, σ and ρ at respectively −9.0, 0.97, 0.15 and −0.3. We consider twovalues of the parameter ν: 8 and 16. We consider a sample of size n = 2500. Thereal data are S&P500 and TOPIX indexes used previously. For both procedures,we construct a chain of length 12,800. Table 5 summarizes the results of bothartificial and real data. We can see that for the artificial data, the true valuelies in the 95% Bayes credible interval. The estimates of the parameters α, φ, σand ρ, for the real data, are close to those obtained with the ASV-Gaussian. Thenumerical efficiency is also substantially higher.


5. Conclusion

We have derived an approximation g(α|θ, y) of the target density f(α|θ, y) thatcan be used as a proposal density for MCMC or as an importance density forimportance sampling. We have tested the correctness of the HESSIAN posteriorsimulators.

Simulations on artificial and real data suggest that the HESSIAN method, whichis not model specific, is more numerically efficient than the model specific methodof Omori, Chib, Shephard, and Nakajima (2007), which is in turn more efficientthan the methods of Jacquier, Polson, and Rossi (2004) and Omori and Watanabe(2008). The high numerical efficiency relies on g(α|θ, y) being extremely closeto the target density f(α|θ, y). Constructing a joint proposal of θ and α notonly solves the problem of posterior autocorrelation of α but also the problem ofposterior dependence between θ and α.

The scope of applications goes beyond the ASV-Gaussian and ASV-Studentmodels. Application to a new model of the form (1) only requires routines tocompute partial derivatives of the log conditional densities log f(yt|αt, αt+1) withrespect to αt and αt+1. This requirement is not as demanding as it might firstappear, for two reasons. First, we can use numerical derivatives or other approx-imations. Second, we do not require analytic expressions of these derivatives. Iflog f(yt|αt, αt+1) is a composition of primitive functions, we can combine evalu-ations of the derivatives of the primitive functions using routines applying FaaDi Bruno’s rule for multiple derivatives of compound functions. We have alreadycoded these routines, which do not depend on the particular functions involved.

We now require the state vector, α, to be Gaussian. We are currently trying toextend the HESSIAN method to models where the state vector is Markov, but notnecessary Gaussian. We are also working on approximations to filtering densities,useful for sequential learning.


Appendix A. Precomputation

Here we compute the precision Ω and covector c of the marginal distribution ofα, and the mode a = (a1, . . . , an) of the target distribution. Bi-products of the

computation of a include several quantities used elsewhere, including ¯Ω and ¯c, theprecision and covector of a Gaussian approximation N(¯Ω−1¯c, ¯Ω−1) of the targetdistribution, and the conditional variances Σ1, . . . ,Σt, . . . ,Σn.

As the state dynamics are no different, we compute Ω and c exactly as in Mc-Causland (2010). We have

Ωtt = ωt−1 + ωtφ2t , Ωt,t+1 = −ωtφt, t = 1, . . . , n− 1,

Ωnn = ωn−1,

(18) ct =

ωt−1dt−1 − ωtφtdt t = 1, . . . , n− 1,

ωn−1dn−1 t = n.

As in McCausland (2010), we use a modification of the Newton-Raphson methodto find the mode of the target distribution. At each iteration, we compute a pre-cision ¯Ω(α) and covector ¯c(α) of a Gaussian approximation to the target distribu-

tion at the current value of α. We define ¯Ω(α) as the negative Hessian matrix oflog f(α|y) with respect to α. It is a symmetric band diagonal matrix with a singlesub- and super-diagonal. The non-zero upper diagonal elements are given by

¯Ωtt(α) = Ωtt −(ψ2,0t (αt, αt+1) + ψ0,2

t−1(αt−1, αt−1+1)), t = 2, . . . , n− 1,

¯Ω11(α) = Ω11 − ψ2,01 (α1, α1+1),

¯Ωnn(α) = Ωnn −(ψ2n(αn) + ψ0,2

n−1(αn−1, αn−1+1)),

¯Ωt,t+1(α) = Ωt,t+1 − ψ1,1t (αt, αt+1), t = 1, . . . , n− 1.

We define ¯c(α) as

¯c(α).= ¯Ω(α)α +

∂ log f(y|α)

∂α>.

The elements of ¯c are given by(19)

¯ct(α) =

ct + ¯Ωttαt + ¯Ωt,t+1αt+1 + ψ1,0

t (αt, αt+1) t = 1

ct + ¯Ωt,t−1αt−1 + ¯Ωttαt + ¯Ωt,t+1αt+1 + ψ1,0t (αt, αt+1) + ψ0,1

t (αt−1, αt) t = 2, . . . , n− 1

cn + ¯Ωn,n−1αn−1 + ¯Ωnnαn + ψ1n(αn) + ψ0,1

n (αn−1, αn) t = n.

We define ¯Ω.= ¯Ω(a) and ¯c

.= ¯c(a). The mean (and mode) of the Gaussian

approximation N(¯Ω−1¯c, ¯Ω−1) is the mode of the target distribution and its logdensity has the same Hessian matrix as the log target density at this mode.

While the expressions for ¯Ω and ¯c are more complicated than those in Mc-Causland (2010), once we have them, we compute the mode a in the same way.

Roughly speaking, we iterate the computation α′ = ¯Ω(α)−1¯c(α) until numerical


convergence. We use two modifications to this procedure, one to accelerate con-vergence using higher order derivatives and the other to resort to one-at-a-timeupdates of the αt in the rare cases of non-convergence.

Appendix B. Coefficients of the polynomial approximations of at|t+1

and st|t+1

Here we compute coefficients of polynomial approximations of at|t+1(αt+1) andst|t+1(αt+1). Recall that these are the conditional mean and log variance of αtgiven αt+1 according to a Gaussian approximation of the conditional distributionof α1, . . . , αt given αt+1 and y. The approximations are Taylor series expansions ofthese functions around at+1 and so the coefficients are based on their derivativesat at+1.

We derive recursive expressions for these derivatives that are correct for anyorder r. In practice, the computational cost rises quickly and the benefits diminish

quickly in r. We provide simplified expressions for a(r)t

.= a

(r)t|t+1(at+1) up to order

r = 5 and s(r)t

.= s

(r)t|t+1(at+1) up to order r = 4.

The basic strategy involves taking derivatives of two identities. The first is a firstorder necessary condition for at−1|t+1(αt+1) and at|t+1(αt+1) for (a1|t+1(αt+1), . . . , at|t+1(αt+1))to be the mode of (α1, . . . , αt) given αt+1 and y. The second is the identityat−1|t+1(αt+1) = at−1|t(at|t+1(αt+1)).

B.1. General Formula. We begin with the case t = 1. Since f(α1, α2|y) ∝f(α1, α2)f(y1|α1, α2), we can write

(20) log f(α1|α2, y) = −1

2Ω1,1α

21 + c1α1 − Ω1,2α1α2 + log f(y1|α1, α2) + k.

where k does not depend on α1. The mode of α1 given α2 and y, a1|2(α2), maximizeslog f(α1, α2|y) and must therefore satisfy

(21) − Ω1,1a1|2(α2) + c1 − Ω1,2α2 + ψ1,01 (a1|2(α2), α2) = 0.

Taking the derivative of (21) with respect to α2, and using the definitions ¯Ω1,1|2(α2) =

(Ω1,1 − ψ2,01 (a1|2(α2), α2)) and ¯Ω1,2|2(α2) = Ω1,2 − ψ1,1

1 (a1|2(α2), α2) gives

(22) ¯Ω1,1|2(α2)a(1)1|2(α2) = − ¯Ω1,2|2(α2),

Solving for a1|2(α2), we obtain

(23) a(1)1|2(α2) = −Σ1|2(α2)

¯Ω1,2|2(α2),

where Σ1|2(α2) = ¯Ω−11,1|2(α2). Using (22) and (23), we compute two alternative

expressions for a(r)1|2(α2), r ≥ 2. They will be useful to derive expressions for


s(r)1|2(α2), r ≥ 1. First, differentiate (22) (r − 1) times with respect to α2. Using

Leibniz’s rule, we obtain

r−1∑i=0

(r − 1

i

)¯Ω(r−1−i)1,1|2 (α2)a

(i+1)1|2 (α2) = − ¯Ω

(r−1)1,2|2 (α2).

Solving for a(r)1|2(α2), we obtain

(24) a(r)1|2(α2) = −Σ1|2(α2)

[r−2∑i=0

(r − 1

i

)¯Ω(r−1−i)1,1|2 (α2)a

(i+1)1|2 (α2) + ¯Ω

(r−1)1,2|2 (α2)

].

Evaluating (24) at α2 = a2 gives

(25) a(r)1 = −Σ1

[r−2∑i=0

(r − 1

i

)¯Ω(r−1−i)1,1 a

(i+1)1 + ¯Ω

(r−1)1,2

].

Second, differentiate (22) (r−1) times with respect to α2, recalling that Σ1|2(α2) =exp(s1|2(α2)). Using Faa Di Bruno’s formula for derivatives of compound functions,we obtain, for i ≥ 1,

Σ(i)1|2(α2) =

i∑j=1

exp(s1|2(α2))Bi,j(s(1)1|2(α2), . . . , s

(i−j+1)1|2 (α2))

= Σ1|2(α2)Bi(s(1)1|2(α2), . . . , s

(i)1|2(α2)),(26)

where the Bi,j are Bell polynomials and Bi is the i’th complete Bell polynomial.Appendix E shows how to compute these polynomials.

We now compute a(r)1|2(α2), r ≥ 2, using (22). We have

a(r)1|2(α2) =−

r−1∑i=0

(r − 1

i

)Σ

(i)1|2(α2)

¯Ω(r−1−i)1,2|2 (α2)

=− Σ1|2(α2)r−1∑i=0

(r − 1

i

)Bi(s

(1)1|2(α2), . . . , s

(i)1|2(α2))

¯Ω(r−1−i)1,2|2 (α2).

Evaluating at α2 = a2, we have

(27) a(r)1 = −Σ1

r−1∑i=0

(r − 1

i

)Bi(s

(1)1 , . . . , s

(i)1 ) ¯Ω

(r−1−i)1,2 .

We now move on to the case 1 < t < n. We can write

(28) log f(α1:t|αt+1, y) = log f(α1:t+1) + log f(y1:t|α1:t, αt+1) + k,


If we substitute a(i)t−1|t+1(αt+1) of (36) in (35) and set αt+1 = at+1, we obtain

(37)r−1∑i=0

(r − 1

i

)¯Ω(i)t,t−1

[r−i∑j=1

a(j)t−1Br−i,j(a

(1)t , . . . , a

(r−i−j+1)t )

]+ ¯Ω

(i)tt a

(r−i)t

= − ¯Ω

(r−1)t,t+1 .

This gives an expression for a(r)t in terms of a

(i)t , i = 0, . . . , r− 1; a

(i)t−1, i = 0, . . . , r;

¯Ω(i)t,t−1 and ¯Ω

(i)tt , i = 1, . . . , r − 1; and ¯Ω

(r−1)t,t+1 .

We now derive a result that will give us s(r)t in terms of a

(i)t and s

(i)t , i =

1, . . . , r − 1 and ait−1, i = 1, . . . , r + 1. Analogously with equation (26), we have

Σ(r)t|t+1(αt+1) = Σt|t+1(αt+1)Br(s

(1)t|t+1(αt+1), . . . , s

(r)t|t+1(αt+1)).

Using Leibniz’s rule to take derivatives of (33) with respect to αt+1, and evaluatingat αt+1 = at+1, we obtain

(38) a(r)t =

r−1∑i=0

(r − 1

i

)Bi(s

(1)t , . . . , s

(i)t )Σt

¯Ω(r−1−i)t,t+1 .

The quantities ¯Ω(r)t,s involved in the computation of a

(r)t and s

(r)t are mainly based

on higher derivatives of ψp,qt (at|t+1(αt+1), αt+1) with respect to αt+1, evaluated atat+1. These latters are computed using equations (84) and (85) in appendix E.

B.2. Explicit Formula for R = 5. We now derive simplified expressions for a(r)t ,

r = 1, . . . , 5 and s(r)t , r = 1, . . . , 4, for t = 1, . . . , n − 1. We give details of the

computation for t = 2, . . . , n−1. For the special case t = 1, it is easy to show thatwe can obtain analogous results simply by setting any terms with a time index ofzero to zero.

We have already have an expression for a(1)t , t = 1, . . . , n − 1, in (34). Taking

r = 2 in (37) gives

¯Ωt,t−1

(a(1)t−1a

(2)t + a

(2)t−1

(a(1)t

)2)+ ¯Ωtta

(2)t + ¯Ω

(1)t,t−1a

(1)t−1a

(1)t + ¯Ω

(1)tt a

(1)t = ¯Ω

(1)t,t+1.

With some algebra, it turns that

(39) a(2)t =

(γta

(1)t a

(2)t−1 − Σt

¯Ω(1)

t

)a(1)t − Σt

¯Ω(1)t,t+1,

where γt = −Σt¯Ωt,t−1 and ¯Ω

(i)

t = ¯Ω(i)t,t−1a

(1)t + ¯Ω

(i)tt . Let consider the alternative

expression of a(2)t given by (38) for r = 2. We have

(40) a(2)t = s

(1)t a

(1)t − Σt

¯Ω(1)t,t+1.

Then, we set (39)=(40) and solve for s(1)t to obtain

(41) s(1)t = γta

(1)t a

(2)t−1 − Σt

¯Ω(1)

t .


For r = 3, development of equation (37) gives

− ¯Ω(2)t,t+1 =¯Ωt,t−1

(a(1)t−1a

(3)t + 3a

(2)t−1a

(1)t a

(2)t + a

(3)t−1

(a(1)t

)3)+ ¯Ωtta

(3)t

+ 2

(¯Ω(1)t,t−1

(a(1)t−1a

(2)t + a

(2)t−1

(a(1)t

)2)+ ¯Ω

(1)tt a

(1)t

)+ ¯Ω

(2)t,t−1a

(1)t−1a

(1)t + ¯Ω2

tta(1)t

Solving for a(3)t , we obtain

a(3)t =γt

(3a

(1)t a

(2)t a

(2)t−1 +

(a(1)t

)3a(3)t−1

)− 2Σt

(¯Ω(1)t,t−1

(a(1)t

)2a(2)t−1 + ¯Ω

(1)

t a(2)t

)− Σt

¯Ω(2)

t a(1)t − Σt

¯Ω(2)t,t+1

=2(γta

(1)t a

(2)t−1 − Σt

¯Ω(1)

t

)a(2)t +

(γta

(1)t a

(3)t−1 − 2Σt

¯Ω(1)t,t−1a

(2)t−1

)(a(1)t

)2+(γta

(2)t a

(2)t−1 − Σt

¯Ω(2)

t

)a(1)t − Σt

¯Ω(2)t,t+1

Using expression of s(1)t in (41), we finally compute that

(42)a(3)t =2s

(1)t a

(2)t +

(γta

(1)t a

(3)t−1 − 2Σt

¯Ω(1)t,t−1a

(2)t−1

)(a(1)t

)2+(γta

(2)t a

(2)t−1 − Σt

¯Ω(2)

t

)a(1)t − Σt

¯Ω(2)t,t+1

Let consider the alternative expression of a(3)t given by (38) for r = 3. We have

(43)

a(3)t =

(s(2)t +

(s(1)t

)2)a(1)t − Σt

¯Ω(2)t,t+1 − 2s

(1)t Σt

¯Ω(1)t,t+1

=

(s(2)t +

(s(1)t

)2)a(1)t − Σt

¯Ω(2)t,t+1 + 2s

(1)t

(a(2)t − s

(1)t a

(1)t

)=

(s(2)t −

(s(1)t

)2)a(1)t + 2s

(1)t a

(2)t − Σt

¯Ω(2)t,t+1

Setting (42)=(43) and solve for s(2)t , we obtain

(44) s(2)t =

(s(1)t

)2+(γta

(1)t a

(3)t−1 − 2Σt

¯Ω(1)t,t−1a

(2)t−1

)a(1)t +

(γta

(2)t a

(2)t−1 − Σt

¯Ω(2)

t

)


We follow similar procedure to compute explicit formula for a(4)t , s

(3)t , and a

(5)t , s

(4)t ,

which are:(45)

a(4)t =

(γta

(1)t a

(4)t−1 − 3Σt

¯Ω(1)t,t−1a

(3)t−1

)(a(1)t

)3+ 3

(γta

(2)t a

(3)t−1 − Σt

¯Ω(2)t,t−1a

(2)t−1

)(a(1)t

)2+(γta

(3)t a

(2)t−1 − 3Σt

¯Ω(1)t,t−1a

(2)t a

(2)t−1 − Σt

¯Ω(3)

t

)a(1)t − Σt

¯Ω(3)t,t+1

+ 3

(s(2)t −

(s(1)t

)2)a(2)t + 3s

(1)t a

(3)t ,

(46)

s(3)t =−

(s(1)t

)3+ 3s

(1)t s

(2)t +

(γta

(1)t a

(4)t−1 − 3Σt

¯Ω(1)t,t−1a

(3)t−1

)(a(1)t

)2+ 3

(γta

(2)t a

(3)t−1 − Σt

¯Ω(2)t,t−1a

(2)t−1

)a(1)t +

(γta

(3)t − 3Σt

¯Ω(1)t,t−1a

(2)t

)a(2)t−1 − Σt

¯Ω(3)

t

(47)

a(5)t =− Σt

¯Ω(4)t,t+1 +

(γta

(1)t a

(5)t−1 − 4Σt

¯Ω(1)t,t−1a

(4)t−1

)(a(1)t

)4+(γta

(2)t a

(4)t−1 − Σt

¯Ω2t,t−1a

(3)t−1

)(a(1)t

)3+(γta

(3)3 a

(3)t−1 − Σta

(3)t,t−1a

(2)t−1 − 3Σt

¯Ω(1)t,t−1a

(3)3 a

(3)t−1

)(a(1)t

)2+

(γt

(a(3)t a

(2)t−1 + 3

(a(2)2

)2a(3)t

)− Σt

¯Ω(4)

t − 6Σt¯Ω(2)t,t−1a

(2)2 a

(2)t−1

)a(1)t

+ 6s(1)t a

(4)t + 6

(s(2)t −

(s(1)t

)2)a(3)t + 5

(s(3)t +

(s(1)t

)3− 3s

(1)t s

(2)t

)a(2)t ,

and

(48)

s(4)t =− Σt

¯Ω(4)t,t+1 +

(γta

(1)t a

(5)t−1 − 4Σt

¯Ω(1)t,t−1a

(4)t−1

)(a(1)t

)3+(γta

(2)t a

(4)t−1 − Σt

¯Ω(2)t,t−1a

(3)t−1

)(a(1)t

)2+(γta

(3)3 a

(3)t−1 − Σta

(3)t,t−1a

(2)t−1 − 3Σt

¯Ω(1)t,t−1a

(3)3 a

(3)t−1

)a(1)t

+

(γt

(a(3)t a

(2)t−1 + 3

(a(2)2

)2a(3)t

)− Σt

¯Ω(4)

t − 6Σt¯Ω(2)t,t−1a

(2)2 a

(2)t−1

)+(s(1)t

)4+ 4s

(1)t s

(3)t + 3

(s(2)t − 2

(s(1)t

)2)s(2)t .

Appendix C. Coefficients of the polynomial approximations of b(r)t

and µrt


C.1. Derivative of the log conditional density. In this subsection, we derivean exact expression of the derivative of the log conditional density log f(αt|αt+1, y).

The case t = 1 is straightforward using Bayes’ rule. We have

∂ log f(α1|α2, y)

∂α1

=∂ log f(y1|α1, α2)

∂α1

+∂ log f(α2|α1)

∂α1

+∂ log f(α1)

∂α1

= ψ1,01 (α1, α2) + c1 − Ω1,2α2 − Ω1,1α1.

For t = 2, . . . , n−1, we compute f(αt|αt+1, y) as a marginal density of f(α1, . . . , αt|αt+1, y).Thus, we have

(49)f(αt|αt+1, y) =

∫f(α1:t−1, αt|αt+1, y)dα1:t−1

∝ f(αt+1|αt)f(yt|αt, αt+1)c(αt|y),

where

(50) c(αt|y) =

∫f(αt|αt−1)f(yt−1|αt−1, αt)f(y1:t−2, α1:t−1)dα1:t−1.

Taking the logarithm of (49) and differentiating with respect to αt gives

(51)∂ log f(αt|αt+1, y)

∂αt=

1

c(αt|y)

∂c(αt|y)

∂αt+∂ log f(αt+1|αt)

∂αt+∂ log f(yt|αt, αt+1)

∂αt

Taking the derivative of the right hand side of (50) and dividing by c(αt|y) gives:1

(53)1

c(αt|y)

∂c(αt|y)

∂αt= E

[∂ log f(αt|αt−1)

∂αt+∂ log f(yt−1|αt−1, αt)

∂αt

∣∣∣∣αt, y]Then, the expression in (51) becomes(54)∂ log f(αt|αt+1, y)

∂αt=E

[log f(αt|αt−1)

∂αt+

log f(αt+1|αt)∂αt

∣∣∣∣αt, αt+1, y

]+ E

[log f(yt−1|αt−1, αt)

∂αt+

log f(yt|αt, αt+1)

∂αt

∣∣∣∣αt, αt+1, y

]=E

[ct − Ωt,t−1αt−1 − Ωt,tαt − Ωt,t+1αt+1

∣∣αt, αt+1, y]

+ E[ψ0,1t−1(αt−1, αt) + ψ1,0

t (αt, αt+1)∣∣αt, αt+1, y

]=ct − Ωt−1,tµt−1|t(αt)− Ωt,tαt − Ωt,t+1αt+1 + ψ1,0

t (αt, αt+1)

+ E[ψ0,1t−1(αt−1, αt) |αt, y

].

1

(52) E[g(X)|Y = y] =1∫

f(x, y)dx

∫f(y|x)f(x)g(x)dx

for (X,Y ) a joint random variables with density f(x, y) and g any measurable function.


where µt−1|t(αt) = E[αt−1|αt, y]. Finally, we have

(55)ht(αt;αt+1) = ct − Ωt,tαt − Ωt,t+1αt+1 + ψ1,0

t (αt, αt+1)

− Ωt−1,tµt−1|t(αt) + xt−1|t(αt)

with xt−1(αt) = E[ψ0,1t−1(αt−1, αt) |αt, y

]. In the case t = n, similar development

gives

(56) hn(αn) = cn − Ωn,nαn + ψ1n(αn)− Ωn−1,nµn−1|n(αn) + xn−1|n(αn)

C.2. Approximation of the conditional derivatives. Due to the conditionalexpectations, µt−1|t(αt) and xt−1|t(αt), the derivative ht(αt;αt+1) cannot be com-puted easily. Thus, we propose an approximation, Ht(αt;αt+1), of ht(αt;αt+1).For t = 2, . . . , n− 1, we have(57)Ht(αt;αt+1)

.= ct−Ωt,tαt−Ωt,t+1αt+1+ψ1,0

t (αt, αt+1)−Ωt−1,tMt−1|t(αt)+Xt−1|t(αt)

where Mt−1|t(αt) is an approximation of µt−1|t(αt) and Xt−1|t(αt) is an approxima-tion of xt−1|t(αt).

2 We describe in the rest of this subsection, these approximations.Let

M(j)t−1 ≈ µ

(j)t−1 =

∂jµt−1|t(αt)

∂αjt

∣∣∣∣αt=at

, j = 0, 1, 2

We compute Mt−1|t(αt) as a second order Taylor’s expansion around at. Thus, wehave

Mt−1|t(αt).= Mt−1 +Mt−1(αt − at) +

1

2M

(2)t−1(αt − at)2

The approximation Xt−1|t(αt) is constructing in two steps. First, we consider a

second order expansion of ψ0,1t−1(αt−1, αt), as a function of αt−1, around at−1|t(αt):

(58)ψ0,1t−1(αt−1, αt) ≈ ψ0,1

t−1(at−1|t(αt), αt) + ψ1,1t−1(at−1|t(αt), αt)(αt−1 − at−1|t(αt))

+ 12ψ2,1t−1(at−1|t(αt), αt)(αt−1 − at−1|t(αt))2,

Taking the conditional expectation of (58), we have

xt−1|t(αt) ≈ψ0,1t−1(at−1|t(αt), αt) + ψ1,1

t−1(at−1|t(αt), αt)(µt−1|t(αt)− at−1|t(αt))

+1

2ψ2,1t−1(at−1|t(αt), αt)Σt−1|t(αt).

We are using Σt−1|t(αt) as the approximation of E[(αt−1 − at−1|t(αt))2|αt, y

]. Sec-

ond, let Ψp,qt−1|t(αt) an approximation of ψp,qt−1(at−1|t(αt), αt). We have,

Ψp,qt−1|t(αt)

.=

R∑r=0

ψp,q(k)t−1

(αt − at)k

k!

2For t = n, we need just to replace ψ1,0t (αt, αt+1) by ψ1

n(αn) in (57) to obtain Hn(αn), theapproximation of hn(αn).


where ψp,q(k)t is the kth derivatives of ψt(at|t+1(αt+1), αt+1) with respect to αt+1

and evaluated at αt+1 = at+1. It is computed using equations (84) and (85) inappendix E. We define similarly At−1|t(αt) as the approximation of at−1|t(αt),

At−1|t(αt).=

R∑r=0

a(k)t−1

(αt − at)k

k!,

and Ξt−1|t(αt) as the approximation of Σt−1|t(αt)

Ξt−1|t(αt).=

R−1∑r=0

Σ(k)t−1

(αt − at)k

k!.

We take finally as an approximation, Xt−1|t(αt), of xt−1|t(αt)

Xt−1|t(αt).=Ψ0,1

t−1|t(αt) + Ψ1,1t−1(αt)(Mt−1|t(αt)− At−1|t(αt)) +

1

2Ψ2,1t−1(αt)Ξt−1|t(αt)

.

C.3. Coefficients of the polynomial approximation of the conditionalmode bt|t+1. The mode bt|t+1(αt+1) of log f(αt|αt+1) is the root of ht(αt;αt+1).Instead of computing exactly this later, we can do thing pretty well using one it-eration of the Newton-Raphson algorithm for root finding starting at at|t+1. Thus,we have

(59) bt|t+1(αt+1) ≈ at|t+1(αt+1)−ht(at|t+1, αt+1)

h(1)t (at|t+1, αt+1)

.

where h(1)t (αt;αt+1) is the first order derivative of ht(αt;αt+1) with respect to αt.

Let define

N(αt+1).= Ht(at|t+1;αt+1), D(αt+1)

.= H

(1)t (at|t+1;αt+1), V (αt+1)

.=

1

D(αt+1).

with H(1)t (αt;αt+1) the first order derivative of Ht(αt;αt+1) with respect to αt. Sub-

tracting the first order condition (30), for (a1|t+1, . . . , at|t+1) to be the conditionalmode of log f(α1, . . . , αt|αt+1, y), from Ht(at|t+1;αt+1) gives

N(αt+1).= −Ωt−1,t(Mt−1|t(at|t+1)−At−1|t(at|t+1))−Ψ0,1

t−1(at−1|t(at|t+1), at|t+1)+Xt−1|t(at|t+1)

Also, let N (i) (resp. V (i), and D(i)) be the ith derivatives of N(αt+1) (resp. V (αt+1),and D(αt+1)) with respect to αt+1, evaluated at at+1. We have:

N (i) =i∑

k=0

[−Ωt−1,t(M

(k)t−1 − a

(k)t−1)− ψ

0,1(k)t−1 +X

(k)t−1

]Bi,k(a

(1)t , . . . , a

(i−k+1)t )


where X(k)t−1 = X

(k)t−1(at),

D =− Ωt,t + ψ2,0t − Ωt−1,tM

(1)t−1 +X

(1)t−1

D(1) =ψ2,0(1)t +

(−Ωt−1,tM

(2)t−1 +X

(2)t−1

)a(1)t

D(i) =ψ2,0(i−1)t +

i−1∑k=0

X(k+1)t−1 Bi,k(a

(1)t , . . . , a

(i−k+1)t ), i ≥ 2

and

V (i) = −D−1i−1∑j=0

(i

j

)V (j)D(i−j).

Using (59), we take as approximation B(r)t of b

(r)t

(60) B(r)t

.= a

(r)t − (NV )(r).

In the case of t = n, we have a value bn and not a function. We search onlyan approximation of this value. Using a development analogous to the case t =2, . . . , n− 1, we take as approximation of bn:

(61) Bn.= an −

Hn(an)

H(1)n (an)

.

C.4. Coefficients of the polynomial approximations of conditional mean

µt|t+1(αt+1). We provide in this subsection value of M(j)t , j = 0, 1, 2, coefficients of

the polynomial approximation Mt|t+1(αt+1) of µt|t+1(αt+1). The conditional modebt|t+1 is the root of ht(αt;αt+1), thus we have

(62) ht(bt|t+1;αt+1) = 0

Taking the derivative of (62) two times with respect to αt+1 gives

(63) h(1)t (bt|t+1;αt+1)b

(1)t|t+1 = Ωt,t+1 − ψ1,1

t (bt|t+1, αt+1)

and

(64)h(2)t (bt|t+1;αt+1)

(b(1)t|t+1

)2+ h

(1)t (bt|t+1, αt+1)b

(2)t|t+1 =− 2ψ2,1

t (bt|t+1, αt+1)b(1)t|t+1

− ψ1,2t (bt|t+1, αt+1)

Using similar arguments as in ?, we show that

(65) µt|t+1 − bt|t+1 ≈1

2h(2)t (bt|t+1, αt+1)

[h(1)t (bt|t+1, αt+1)

]−2


Using (63), (64) and (65) we derive that(66)

µt|t+1 − bt|t+1 ≈−1

2

1¯Ωt,t+1|t+1(bt|t+1, αt+1)

[b(2)t|t+1

b(1)t|t+1

]

− 1

2

1(¯Ωt,t+1|t+1(bt|t+1, αt+1)

)2 [2ψ2,1t (bt|t+1, αt+1)b

(1)t|t+1 + ψ1,2

t (bt|t+1, αt+1)]

Let define

N1(αt+1).=b(2)t|t+1

b(1)t|t+1

, D1(αt+1).= ¯Ωt,t+1|t+1(bt|t+1, αt+1), V1(αt+1)

.=

1

D1(αt+1)

N2(αt+1).= 2ψ2,1

t (bt|t+1, αt+1)b(1)t|t+1 + ψ1,2

t (bt|t+1, αt+1), V2(αt+1).= V 2

1 (αt+1).

Then, we have

N(i)1

.= − 1

B(1)t

i−1∑j=0

(i

j

)N

(j)1 B

(i−j)t , i = 1, 2

D1.=Ωt,t+1 − ψ1,1

t (Bt, at+1)

D(1)1

.=− ψ2,1

t (Bt, at+1)B(1)t − ψ

1,2t (Bt, at+1)

D(2)1

.=− ψ3,1

t (Bt, at+1)(B

(1)t

)2− 2ψ2,2

t (Bt, at+1)B(1)t − ψ

2,1t (Bt, at+1)B

(2)t − ψ

1,3t (Bt, at+1)

N2.=2ψ2,1

t (Bt, at+1)B(1)t + ψ1,2

t (Bt, at+1)

N(1)2

.= + 2ψ3,1

t (Bt, at+1)(B

(1)t

)2+ 2ψ2,2

t (Bt, at+1)B(2)t + 3ψ2,2

t (Bt, at+1)B(1)t + ψ1,3

t (Bt, at+1)

N(2)2

.= + 2ψ4,1

t (Bt, at+1)(B

(1)t

)(3)+ 6ψ3,1

t (Bt, at+1)B(1)t B

(2)t + 2ψ2,1

t (Bt, at+1)B(3)t

+ 5ψ3,2t (Bt, at+1)

(B

(1)t

)2+ 5ψ2,2

t (Bt, at+1)B(2)t + 4ψ2,3

t (Bt, at+1)B(1)t + ψ1,4

t (Bt, at+1)

where

ψp,qt (Bt, at+1) =

P−p∑r=0

ψp+r,qt

(Bt − at)r

r!.

Values of V(j)i , i = 1, 2 and j = 0, 1, 2 can easily be derived from those of D

(j)1 ,j =

0, 1, 2.We approximate value and first two derivatives of µt|t+1 evaluated at at+1 by

(67) M(i)t

.= B

(i)t −

1

2

((N1V1)

(i) + (N2V2)(i)).


Appendix D. Model derivatives

Here we show how to compute the required derivatives of ψt(αt, αt+1) and ψn(αn)for the ASV-Gaussian and ASV-Student models.

D.1. ASV-Gaussian. Using (5), we can write

(68) ψt(αt, αt+1) = −1

2

[log(2π/β) + αt + β(ϕt − θut)2

], t = 1, . . . , n− 1,

(69) ψn(αn) = −1

2

[log(2π) + αn + ϕ2

n

],

where β.= (1− ρ2)−1, θ .

= ρ√ω, ut

.= αt+1 − dt − φαt and ϕt

.= yt exp(−αt/2).

For t = 1, . . . , n− 1 and (p, q) 6= (0, 0) we have

(70) ψp,qt (αt, αt+1) =

−12− β

2(ϕt,p − 2θ2φut) q = 0, p = 1

−β2

(ϕt,p + 2θ2φ2) q = 0, p = 2

−β2ϕt,p q = 0, p ≥ 3

βθ (ϕt − θut) q = 1, p = 0

βθ(−1

2ϕt + θφ

)q = 1, p = 1

βθ(−1

2

)pϕt q = 1, p ≥ 2

−βθ2 q = 2, p = 0

0 otherwise,

where

(71) ϕt,p.= (−1)pϕ2

t −(−1

2

)p−2θϕt

(pφ+

1

2ut

), t = 1, . . . , n− 1.

For t = n,

(72) ψpn(αn) =

−1

2− 1

2ϕn,p p = 1

−12ϕn,p p ≥ 2,

whereϕn,p = (−1)pϕ2

n.

D.2. ASV-Student. We use the definitions of β, θ, ut and ϕt from D.1. Using(16) we can write ψt(αt, αt+1), for t = 1, . . . , n− 1, as

(73) ψt(αt, αt+1) = k + ψ1,t(αt, αt+1) + ψ2,t(αt) + ψ3,t(αt, αt+1),

where k does not depend on αt or αt+1,

ψ1,t(αt, αt+1).= −1

2(θ2βu2t + αt),

ψ2,t(αt).= −ν + 1

2log

(1 +

β

νϕ2t

)= −(ν + 1) log d(αt),


ψ3,t(αt, αt+1).= logm(z(αt, αt+1)),

m(z) = 2Γ(ν2

+ 1)

Γ(ν+12

) zM (ν

2+ 1;

3

2; z2)

+M

(ν + 1

2;1

2; z2),

z(αt, αt+1) =n(αt, αt+1)

d(αt),

n(αt, αt+1) =θβ√2νutϕt, d(αt) =

√1 +

β

νϕ2t .

Computing expressions for high order partial derivatives of ψt(αt, αt+1) is daunt-ing, but fortunately we can avoid it. All we need to do is compute the derivativesat a point, and for this, we can use general purpose routines to combine derivativesof products, ratios and composite functions. We compute derivatives bottom up,starting with n(αt, αt+1). We obtain

(74) n(p,q)(αt, αt+1) =

βθ√2ν

(−1

2

)p(2pφ+ ut)ϕt p ≥ 0, q = 0

βθ√2ν

(−1

2

)pϕt p ≥ 0, q = 1

0 p ≥ 0, q ≥ 2.

Next, we compute derivatives of d(αt). They satisfy the recursion

(75) d(p)(αt) =

p∑r=0

cp,r

(1 +

β

νϕ2t

) 12−r

,

where c0,0 = 1, cp−1,−1 = cp−1,p = 0 and

cp,r = cp−1,r−1

(1

2− r + 1

)− cp−1,r

(1

2− r), r = 0, . . . , p.

We then combine derivatives of n(αt, αt+1) and d(αt) to obtain derivatives ofz(αt, αt+1) using the recursive formula(76)

z(p,q)(αt, αt+1) =1

d(αt)

[n(p,q)(αt, αt+1)−

p−1∑r=0

(p

r

)z(r,q)(αt, αt+1)d

(p−r)(αt)

],

easily obtained using Leibniz’s rule for the product n(αt, αt+1) = z(αt, αt+1)d(αt).We now compute derivatives of m(z) with respect to z. We obtain

m(r)(z) = 2Γ(ν2

+ 1)

Γ(ν+12

) [zM (r)

(ν

2+ 1;

3

2; z2)

+ rM (r−1)(ν

2+ 1;

3

2; z2)]

+M (r)

(ν + 1

2;1

2; z2),(77)


where we define M (r)(a; b; z2) as the derivative of M(a; b; z2) with respect to z.Let g(z) = z2. Then, we compute M (r)(a; b; z2) using Faa-Di-Bruno

M (r)(a; b; z2) =r∑

k=1

∂kM(a; b;x)

∂xk

∣∣∣∣x=g(z)

Br,k(g(z)),

where Br,k(g(z)) is computed using (83), and the derivatives of Kummer’s functionof the first kind M(a, b, z) with respect to z are straightforward using the property

(78)∂kM(a; b;x)

∂xk

∣∣∣∣x=z

=(a)(k)(b)(k)

M(a+ k; b+ k; z).

We compute them using routines in the GNU Scientific Library (GSL).We then compute derivatives of logm(z) with respect to z using the recursive

formula

(79)∂p logm(z)

∂zp=

1

m(z)

[m(p)(z)−

p−1∑r=1

(p− 1

r − 1

)∂r logm(z)

∂zrm(p−r)(z)

],

obtained using Leibniz’s rule applied to the product m(1)(z) = m(z)∂ logm(z)∂z

.Next we compute partial derivatives of ψ3,t(αt, αt+1) with respect to αt and αt+1

by combining derivatives of logm(z) with respect to z and partial derivatives ofz(αt, αt+1) with respect to αt and αt+1, using Faa di Bruno’s rule

(80) ψp,q3,t (αt, αt+1) =

p+q∑r=1

∂r logm(z(αt, αt+1))

∂zrB(p,q),r(z(αt, αt+1)),

where B(p,q),r(z(αt, αt+1)) is computed using (87).The first component, ψ1,t(αt, αt+1), is a quadratic function of αt and αt+1. Its

derivatives, for (p, q) 6= (0, 0) are

ψ(p,q)1,t (αt, αt+1) =

−12θ2βut p = 0, q = 1,

−12θ2β p = 0, q = 2,

−12(−φθ2βut + 1) p = 1, q = 0,

12φθ2β p = 1, q = 1,

−12φ2θ2β p = 2, q = 1,

0 otherwise.

Recall that ψ2,t(αt) = −(ν + 1) log d(αt). We compute derivatives of log d(αt)with respect to αt using

(81)∂p log d(αt)

∂αpt=

1

d(αt)

[d(p)(αt)−

p−1∑r=1

(p− 1

r − 1

)∂r log d(αt)

∂αrtd(p−r)(αt)

],

similar to (79). Derivatives of ψ2,t(αt) are simply −(ν + 1) times the derivativesof log d(αt).


The special case of t = n is easily handled. We have

ψn(αn) = logΓ(ν+12

)Γ(ν

2)√νπ− 1

2

[αn + (ν + 1) log

(1 +

ϕ2n

ν

)],

whose derivatives are the same as those of ψ2t except for β replaced by 1.

Appendix E. Faa-Di-Bruno formula and Bell polynomial

This Appendix summarises the use of Bell polynomials for evaluating the deriva-tives of compound functions. We treat the relatively familiar univariate case first.Let h be the composition of univariate functions f and g, h(x) = f(g(x)). Thep’th derivative of h with respect to x is given by Faa di Bruno’s formula,

(82) h(p)(x) =

p∑r=1

f (r)(g(x))Bp,r(g(1)(x), . . . , g(i−j+1)(x)),

where the Bp,r(z1, . . . , zp−r+1) are Bell polynomials.The Bell polynomials can be computed using B0,0(z1) = 1, Bi,0(z1, . . . , zi+1) = 0,

i > 0, and the recursion

(83) Bp,r(z1, . . . , zp−r+1) =

p−1∑i=r−1

(p− 1

i

)zp−iBi,r−1(z1, . . . , zi−r) r = 1, . . . , p.

For example, we haveB1,1(z1) = z1B0,0(z1) = z1, which gives h′(x) = f ′(g(x))g′(x),the chain rule. For the second derivative, we compute B2,1(z1, z2) = z2B0,0(z1) +z1B1,0(z1, z2) = z2 and B2,2(z1) = z1B1,1(z1) = z21 , which gives

h(2)(x) = f (1)(g(x))g(2)(x) + f (2)(g(x))(g(1)(x)

)2.

Equations (3.1) and (3.5), in Savits (2006), provide recursive expression of theFaa-Di-Bruno formula for higher derivatives of a compound functions. We reportthese equations, after slight modification, in two special cases necessary to com-pute the derivatives of the forward pass and the derivatives of ASV-Student logdensity. In contrast to Savits (2006), we use the notation Bv,γ(g(x)) instead ofαv,γ(x) for two reasons. First, the function αv,γ(x) refers to Bell polynomial inthe univariate case. Second, the Bell polynomial notation does not follow easily inthe multivariate case and given what are directly involved in the computation arederivatives of the function g(x), we propose this intermediate notation.

We consider now the two bivariate cases used in this article. First, let h(x) =f(g1(x), g2(x)). Then, equations (3.1) and (3.5), in Savits (2006) are simply rewrit-ten as follows

(84) h(p)(x) =∑

1≤r+s≤p

f (r,s)(g1(x), g2(x))Bp,(r,s)(g1(x), g2(x))


with B0,(0,0)(g1(x), g2(x)) = 1, Bi,(j,k)(g1(x), g2(x)) = 0 if i > 0 and (j, k) = (0, 0),or i < j + k, or either j either k is negative, and for 1 ≤ s+ r ≤ p, we have

(85)Bp,(r,s)(g1(x), g2(x)) =

p−1∑i=0

(p− 1

i

)[g(p−i)1 (x)Bi,(r−1,s)(g1(x), g2(x))

+ g(p−i)2 (x)Bi,(r,s−1)(g1(x), g2(x))

].

Second, let h(x, y) = f(g(x, y)). In this case, equation (3.5) of Savits (2006) canbe adapted in different ways. We use the following rules. For q = 0 and p ≥ 1, weconsider the decomposition (p, 0) = (p− 1, 0) + (1, 0), and for q ≥ 1 and p ≥ 0, weuse the decomposition (p, q) = (p, q − 1) + (0, 1). Thus, for p+ q ≥ 1 we have

(86) h(p,q)(x, y) =

p+q∑r=1

f (r)(g(x, y))B(p,q),r (g(x, y))

with B(0,0),0(g(x, y)) = 1, B(i,j),k (g(x, y)) = 0 if i + j < k, or k < 0, or k = 0 and(i, j) ≥ (0, 0) with at least one strict inequality, and for r = 1, . . . , p+ q(87)

B(p,q),r(g(x, y)) =

p−1∑i=r−1

(p− 1

i

)g(p−i,0)(x, y)B(i,0),r−1 (g(x, y)) . q = 0, p ≥ 1

p∑i=0

q−1∑j=0

(p

i

)(q − 1

j

)g(p−i,q−j)(x, y)B(i,j),r−1(g(x, y)) p ≥ 0, q ≥ 1


References

Carter, C. K., and Kohn, R. (1994). ‘On Gibbs Sampling for State Space Models’, Biometrika,81(3): 541–553.

Chan, J. C. C., and Jeliazkov, I. (2009). ‘Efficient Simulation and Integrated Likelihood Esti-mation in State Space Models’, Working paper.

de Jong, P., and Shephard, N. (1995). ‘The Simulation Smoother for Time Series Models’,Biometrika, 82(1): 339–350.

Durbin, J., and Koopman, S. J. (2002). ‘A Simple and Efficient Simulation Smoother for StateSpace Time Series Analysis’, Biometrika, 89(3): 603–615.

Feng, D., Jiang, J. J., and Song, P. (2004). ‘Stochastic conditional durations models with “Lever-age Effect” for financial transaction data’, Journal of Financial Econometrics, 2: 390–421.

Fruhwirth-Schnatter, S. (1994). ‘Data augmentation and Dynamic Linear Models’, Journal ofTime Series Analysis, 15: 183–202.

Gamerman, D. (1998). ‘Markov chain Monte Carlo for dynamic generalised linear models’,Biometrika, 85: 215–227.

Geweke, J. (1989). ‘Bayesian Inference in Econometric Models Using Monte Carlo Integration’,Econometrica, 57: 1317–1340.

Geweke, J. (2004). ‘Getting it Right: Joint Distribution Tests of Posterior Simulators’, Journalof the American Statistical Association, 99: 799–804.

Harvey, A. C., and Shephard, N. (1996). ‘The estimation of an asymmetric stochastic volatilitymodel for asset returns’, Journal of Business and Economic Statistics, 14: 429–434.

Jacquier, E., Polson, N., and Rossi, P. (2004). ‘Bayesian Analysis of Stochastic Volatility Modelswith Leverage Effect and Fat tails’, Journal of Econometrics, pp. 185–212.

Jungbacker, B., and Koopman, S. J. (2008). ‘Monte Carlo estimation for nonlinear non-Gaussianstate space models’, Biometrika, 94: 827–839.

Kim, S., Shephard, N., and Chib, S. (1998). ‘Stochastic Volatility: Likelihood Inference andComparison with ARCH Models’, Review of Economic Studies, 65(3): 361–393.

McCausland, W. J. (2008). ‘The HESSIAN Method (Highly Efficient State Smoothing, In A Nut-shell)’, Cahiers de recherche du Departement de sciences economiques, Universite de Montreal,no. 2008-03.

McCausland, W. J., Miller, S., and Pelletier, D. (2011). ‘Simulation smoothing for state-spacemodels: A computational efficiency analysis’, Computational Statistics and Data Analysis, 55:199–212.

Omori, Y., Chib, S., Shephard, N., and Nakajima, J. (2007). ‘Stochastic volatility with leverage:fast and efficient likelihood inference’, Journal of Econometrics, 140: 425–449.

Omori, Y., and Watanabe, T. (2008). ‘Block sampler and posterior mode estimation for asym-metric stochastic volatility models’, Computational Statistics and Data Analysis, 52: 2892–2910.

Pitt, M. K. (2000). ‘Discussion on “Time series analysis on non-Guassian observations basedon state space models from both classical and Bayesian perspectives,” (by J. Durbin and S. J.Koopman)’, Journal of the Royal Statistical Society Series B, pp. 38–39.

Rue, H. (2001). ‘Fast Sampling of Gaussian Markov Random Fields with Applications’, Journalof the Royal Statistical Society Series B, 63: 325–338.

Savits, T. H. (2006). ‘Some statistical applications of Faa di Bruno’, Journal of MultivariateAnalysis, 97(10): 2131–2140.

Shephard, N., and Pitt, M. K. (1997). ‘Likelihood Analysis of Non-Gaussian Measurement TimeSeries’, Biometrika, 84(3): 653–667.

Watanabe, T., and Omori, Y. (2004). ‘A multi-move sampler for estimating non-Gaussian timeseries models: Comments on Shephard and Pitt (1997)’, Biometrika, 91: 246–248.


Yu, J. (2005). ‘On leverage in a stochastic volatility model’, Journal of Econometrics, 127: 165–178.

Departement de Sciences Economiques, Universite de Montreal

Departement de Sciences Economiques, Universite de Montreal


Notation Descriptionψt(αt, αt+1) log f(yt|αt, αt+1)ψp,qt (αt, αt+1) derivative of ψp,qt (αt, αt+1) with respect to αt and αt+1

of orders p and q.ψn(αn) log f(yn|αn)ψpn(αn) p’th derivative of ψn(αn) with respect to αn

a = (a1, . . . , an) mode of log f(α|y)Σt Var(αt|αt+1, y) for the 1st reference distribution

(a1|t+1(αt+1), . . . , at|t+1(αt+1)) mode of the conditional density f(α1, . . . , αt|αt+1, y)

a(r)t , r = 1, . . . , R r’th derivative of at|t+1(αt+1) at αt+1 = at+1

Σt|t+1(αt+1) Var(αt|αt+1, y) for the 2nd reference distributionst|t+1(αt+1) log Σt|t+1(αt+1)

s(r)t , r = 1, . . . , R− 1 r’th derivatives of st|t+1(αt+1) at αt+1 = at+1.

bt|t+1(αt+1) mode of the conditional density f(αt|αt+1, y)

bt, bt, bt,...b t value and three derivatives of bt|t+1(αt+1) at αt+1 = at+1

bn mode of the conditional density f(αn|y)

Bt, Bt, Bt,...Bt, Bn approximations of bt, bt, bt,

...b t and bn

µt|t+1(αt+1) E[αt|αt+1, y]µt, µt, µt value and two derivatives of µt|t+1(αt+1) at αt+1 = at+1

Mt, Mt, Mt approximations of µt, µt and µtH

(p)t (αt;αt+1), p ≥ 1 approximation of ψ

(p)t (αt;αt+1)

H(p)n (αn), p ≥ 1 approximation of the p’th derivatives of log f(αn|y)

Table 1. Main notations used in the paper

ρ = −0.3 ρ = −0.5 ρ = −0.7 SP500 TOPIXHIS 413 414 413 180 115HIM 555 526 563 259 135OCSN 708 720 726 295 187

Table 2. Computational time by dataset and estimation procedurefor the ASV-Gaussian model.


Parameters Mean Std NSE RNEρ = −0.3

α:his -8.9640 0.0654 7.1866e-4 0.9945α:him -8.9651 0.0653 9.6575e-4 0.3573α:ocsn -8.9585 0.0861 7.8282e-4 0.9516φ:his 0.9637 0.0061 6.9115e-5 0.8201φ:him 0.9637 0.0060 7.9100e-5 0.4561φ:ocsn 0.9639 0.0061 1.2471e-4 0.1856σ:his 0.1669 0.0142 1.3011e-4 1.0322σ:him 0.1668 0.0139 1.8364e-4 0.4452σ:ocsn 0.1637 0.0139 3.6410e-4 0.1152ρ:his -0.3472 0.0560 5.1759e-4 1.0058ρ:him -0.3462 0.0566 6.0084e-4 0.6924ρ:ocsn -0.3536 0.0586 1.3224e-3 0.1607

ρ = −0.5α:his -8.9552 0.0648 7.1224e-4 0.8689α:him -8.9555 0.0645 7.2300e-4 0.6222α:ocsn -8.9503 0.0938 9.5047e-4 0.8125φ:his 0.9724 0.0048 4.5081e-5 1.0750φ:him 0.9724 0.0049 5.5004e-5 0.6122φ:ocsn 0.9725 0.0048 9.8067e-5 0.1953σ:his 0.1341 0.0121 1.2522e-4 0.7526σ:him 0.1342 0.0122 1.0766e-4 1.0069σ:ocsn 0.1321 0.0119 2.8799e-4 0.1425ρ:his -0.4804 0.0543 4.8195e-4 1.0470ρ:him -0.4790 0.0550 4.5585e-4 1.1356ρ:ocsn -0.4920 0.0554 1.1505e-3 0.2121

ρ = −0.7α:his -8.9600 0.0551 5.4038e-4 1.1708α:him -8.9597 0.0537 6.5265e-4 0.5299α:ocsn -8.9534 0.0720 8.6757e-4 0.7624φ:his 0.9695 0.0045 3.9419e-5 1.2985φ:him 0.9695 0.0046 4.1080e-5 0.9672φ:ocsn 0.9692 0.0045 1.0915e-4 0.2118σ:his 0.1421 0.0117 1.1278e-4 0.9769σ:him 0.1420 0.0117 1.2211e-4 0.7129σ:ocsn 0.1417 0.0117 3.2750e-4 0.1742ρ:his -0.6609 0.0450 3.8869e-4 1.0766ρ:him -0.6618 0.0443 4.9000e-4 0.6381ρ:ocsn -0.6750 0.0468 1.3899e-3 0.2366

Table 3. ASV-Gaussian parameter estimation using the HESSIANmethod and the OCSN procedure on simulated data.


Parameters Mean Std NSE RNES& P500

α:his -9.5165 0.1485 2.4592e-3 1.0375α:him -9.5213 0.1506 4.6847e-3 0.0808α:ocsn -9.5029 0.3378 3.4767e-3 0.7428φ:his 0.9753 0.0080 5.7193e-5 1.0155φ:him 0.9753 0.0081 1.3400e-4 0.3658φ:ocsn 0.9776 0.0083 1.8947e-4 0.1506σ:his 0.1522 0.0199 9.364e-5 0.9543σ:him 0.1517 0.0199 3.2779e-4 0.3692σ:ocsn 0.1394 0.0203 5.8443e-4 0.0945ρ:his -0.2028 0.0957 4.8370e-4 1.001ρ:him -0.2057 0.0961 1.3913e-3 0.4771ρ:ocsn -0.2007 0.1005 1.8453e-3 0.2374

TOPIXα:his -8.8528 0.1080 1.0839e-3 0.9642α:him -8.8545 0.1083 1.5951e-3 0.4609α:ocsn -8.8426 0.2172 2.0867e-3 0.8574φ:his 0.9574 0.0158 9.3410e-5 0.8319φ:him 0.9575 0.0155 2.2301e-4 0.4862φ:ocsn 0.9520 0.0185 3.9992e-4 0.1664σ:his 0.1411 0.0256 1.2390e-4 0.9667σ:him 0.1412 0.0254 3.9845e-4 0.4073σ:ocsn 0.1387 0.0266 5.9850e-4 0.1556ρ:his -0.3850 0.1180 6.2623e-4 0.9558ρ:him -0.3847 0.1181 1.6539e-3 0.5097ρ:ocsn -0.3715 0.1231 2.6536e-3 0.1792

Table 4. ASV-Gaussian parameters estimation using the HES-SIAN method and the OCSN’s procedure on S&P500 and TOPIX.


Parameters Mean Std NSE RNEα = 9.0 φ = 0.97 σ = 0.15 ρ = −0.3 ν = 8α:his -8.8593 0.1314 2.0461e-3 1.0330α:him -8.8571 0.1350 2.7729e-3 0.1852φ:his 0.9785 0.0058 8.0966e-5 1.1404φ:him 0.9786 0.0059 8.4845e-5 0.3828σ:his 0.1321 0.0178 1.6158e-4 1.3000σ:him 0.1321 0.0180 2.1685e-4 0.5399ρ:his -0.3022 0.1003 1.0431e-3 0.9418ρ:him -0.3022 0.0994 1.1506e-3 0.5829ν:his 7.8992 1.2854 1.6840e-2 1.0384ν:him 7.8847 1.2649 1.8975e-2 0.3472α = 9.0 φ = 0.97 σ = 0.15 ρ = −0.3 ν = 16α:his -8.9765 0.1063 1.9491e-3 1.1758α:him -8.9759 0.1042 2.6700e-3 0.1190φ:his 0.9769 0.0071 1.1406e-4 0.8530φ:him 0.9772 0.0072 1.4877e-4 0.1834σ:his 0.1177 0.0186 2.3780e-4 0.8415σ:him 0.1171 0.0187 3.6936e-4 0.2005ρ:his -0.4551 0.0970 1.1140e-3 1.0718ρ:him -0.4558 0.0952 2.1637e-3 0.1513ν:his 14.7262 4.1030 8.5466e-2 1.0548ν:him 14.9494 4.4843 1.3942e-1 0.0808

S& P 500α:his -9.7190 0.1730 3.2037e-3 0.8516α:him -9.7421 0.2131 1.6376e-2 0.0132φ:his 0.9849 0.0053 8.3778e-5 1.0595φ:him 0.9852 0.0053 1.7279e-4 0.0739σ:his 0.1065 0.0164 1.9311e-4 0.9288σ:him 0.1063 0.0160 2.5257e-4 0.3136ρ:his -0.2481 0.1199 1.7729e-3 0.8510ρ:him -0.2454 0.1216 2.1485e-3 0.2504ν:his 9.9140 2.1669 3.1571e-2 0.8719ν:him 9.8529 2.0441 3.4950e-2 0.2673

TOPIXα:his -8.9493 0.1138 2.4113e-3 0.8661α:him -8.9487 0.1145 2.8624e-3 0.1250φ:his 0.9623 0.0143 2.1819e-4 0.9366φ:him 0.9621 0.0144 2.2500e-4 0.31842σ:his 0.1267 0.0240 2.6770e-4 0.7927σ:him 0.1266 0.0242 3.3324e-4 0.4120ρ:his -0.4175 0.1256 1.3638e-3 0.9195ρ:him -0.4151 0.1298 2.1049e-3 0.2972ν:his 20.5314 7.8733 1.4118e-1 0.9736ν:him 20.5323 7.7660 1.1410e-1 0.3619

Table 5. ASV-Student model parameters estimation using theHESSIAN with Independent Metropolis-Hastings and ImportanceSampling on artificial data, S&P500 and TOPIX.

Introduction - Université de Montréal · distribution for Markov chain Monte Carlo (MCMC)....

Documents

Transcript of Introduction - Université de Montréal · distribution for Markov chain Monte Carlo (MCMC)....