PDF - arXiv · 4 P. M. ROBINSON AND P. ZAFFARONI where Θ is aprescribedcompact subsetof Rr+2....

Click here to load reader

  • date post

    28-Oct-2018
  • Category

    Documents

  • view

    214
  • download

    0

Embed Size (px)

Transcript of PDF - arXiv · 4 P. M. ROBINSON AND P. ZAFFARONI where Θ is aprescribedcompact subsetof Rr+2....

  • arX

    iv:m

    ath/

    0607

    798v

    1 [

    mat

    h.ST

    ] 3

    1 Ju

    l 200

    6

    The Annals of Statistics

    2006, Vol. 34, No. 3, 10491074DOI: 10.1214/009053606000000245c Institute of Mathematical Statistics, 2006

    PSEUDO-MAXIMUM LIKELIHOOD ESTIMATION

    OF ARCH() MODELS

    By Peter M. Robinson1 and Paolo Zaffaroni

    London School of Economics and Imperial College London

    Strong consistency and asymptotic normality of the Gaussianpseudo-maximum likelihood estimate of the parameters in a wideclass of ARCH() processes are established. The conditions are shownto hold in case of exponential and hyperbolic decay in the ARCHweights, though in the latter case a faster decay rate is required forthe central limit theorem than for the law of large numbers. Partic-ular parameterizations are discussed.

    1. Introduction. ARCH() processes comprise a wide class of mod-els for conditional heteroscedasticity in time series. Consider, for t Z ={0,1, . . .}, the equations

    xt = tt,(1)

    2t = 0 +

    j=1

    0jx2tj ,(2)

    where

    0 > 0, 0j > 0 (j 1),

    j=1

    0j

  • 2 P. M. ROBINSON AND P. ZAFFARONI

    know functions j() of the r 1 vector , for r

  • ESTIMATION OF ARCH() MODELS 3

    observations y2t (with 0 known to be zero), this being asymptotically equiva-lent to constrained least squares regression of y2t on the y

    2ts, s > 0, a method

    employed in special cases of (2) by Engle [11] and Bollerslev [4]. In thesethe spectral density of y2t , when it exists, has a convenient closed form.This property, along with availability of the fast Fourier transform, makesdiscrete-frequency Whittle estimation based on the y2t a computationally at-tractive option for point estimation, even in very long financial time series.However, it has a number of disadvantages, as discussed by Giraitis andRobinson [14]: it is not only asymptotically inefficient under Gaussian t,but never asymptotically efficient; it requires finiteness of fourth momentsof yt for consistency and of eighth moments for asymptotic normality, whichare sometimes considered unacceptable for financial data; its limit covari-ance matrix is relatively complicated to estimate; it is less well motivatedin ARCH models than in stochastic volatility and nonlinear moving averagemodels, such as those of Taylor [33], Robinson and Zaffaroni [30, 31], Harvey[15], Breidt, Crato and de Lima [5] and Zaffaroni [36], where the actual likeli-hood is computationally relatively intractable, while Whittle estimation alsoplays a less special role in the short-memory-in-y2t ARCH models of Giraitisand Robinson [14] than in the long-memory-in-y2t models of the previousfive references, where it entails automatic compensation for possible lackof square-integrability of the spectrum of y2t . Mikosch and Straumann [26]have shown that a finite fourth moment is necessary for consistency of Whit-tle estimates, and that convergence rates are slowed by fat tails in t.

    For Gaussian t, a widely-used approximate maximum likelihood estimateis defined as follows. Denote by = (,, ) any admissible value of 0 anddefine

    xt() = yt ,

    2t () = +

    j=1

    j()x2tj()

    for t Z, and

    2t () = +t1

    j=1

    j()x2tj()1(t 2)

    for t 1, where 1() denotes the indicator function. Define alsoqt() =

    x2t ()

    2t ()+ ln2t (), qt() =

    x2t ()

    2t ()+ ln 2t (), 1 t T,

    QT () = T1

    T

    t=1

    qt(), QT () = T1

    T

    t=1

    qt(),

    T = argmin

    QT (), T = argmin

    QT (),

  • 4 P. M. ROBINSON AND P. ZAFFARONI

    where is a prescribed compact subset of Rr+2. The quantities with over-barare introduced due to yt being unobservable for t 0; T is uncomputable.Because we do not assume Gaussianity in the asymptotic theory, we referto T as a pseudo-maximum likelihood estimate (PMLE).

    We establish strong consistency of T and asymptotic normality of T1/2(T

    0), as T , for a class of j() sequences. In the case of the first prop-erty this is accomplished by first showing strong consistency of T and thenthat T T 0, a.s. In the case of the second we likewise first show itfor T 1/2(T 0) and then show that T T = op(T1/2), but the latterproperty, and thus the asymptotic normality of T 1/2(T 0), is achievedonly under a restricted set of possible 0 values, and this seems of practi-cal concern in relation to some popular choices of the j(). These resultsare presented in the following section, along with a description of regularityconditions and partial proof details. The structure of the proof is similar inseveral respects to earlier ones for the GARCH case of (2), especially thatof Berkes, Horvath and Kokoszka [3]. Sections 3 and 4 apply the results toparticular models.

    2. Assumptions and main results. Our assumptions are as follows.

    Assumption A(q), q 2. The t are i.i.d. random variables with E0 =0, E20 = 1, E|0|q 1 and a function L that is slowly varying at the origin.

    Assumption B. There exist L, U , L, U such that 0

  • ESTIMATION OF ARCH() MODELS 5

    Assumption E. There exists a strictly stationary and ergodic solutionxt to (1) and (2), and for some

    ((d+ 1)1,1),(9)with d as in Assumption D, we have

    E|x0|2 0 and all ih = 1, . . . , r, h= 1, . . . , k, k l.

    Assumption G. For each , there exist integers ji = ji(), i =1, . . . , r, such that 1 j1()< < jr()12(12)

    such that

    0j Kj1d0 ,(13)and (10) holds for

    (4/(2d0 + 3),1).(14)

    Assumption A(q) allows some asymmetry in t, but implies the less prim-itive condition (which does not even require existence of a density) employedin a similar context by Berkes, Horvath and Kokoszka [3]. Assumptions Band C are standard. Inequalities (7) and (13) together imply d0 d, while(8) with (3) is milder than monotonicity but implies 0j = o(j

    1) as j.We take > 0 in Assumption F(l) because j()< 1 for all large enough j,by (7). Assumption G is crucial to the proof of consistency, being used inLemmas 9 and 10 to show that in the limit 0 globally minimizes QT (); italso ensures nonsingularity of the matrix H0 in Proposition 2 and Theorem2 below. This and other assumptions are discussed in Sections 3 and 4 inconnection with some parameterizations of interest.

  • 6 P. M. ROBINSON AND P. ZAFFARONI

    We present asymptotic results for the uncomputable T as propositions,those for T as theorems. All these, and the corollaries in Sections 3 and 4and lemmas in Section 5, assume (1)(5).

    Proposition 1. For some > 0, let Assumptions A(2 + ), B, C, D,E, F(1) and G hold. Then

    T 0 a.s. as T .

    Proof. The proof follows as in, for example, [17], Theorem 6, fromuniform a.s. convergence over of QT () to Q() = Eq0() established in

    Lemma 7, the fact that QT (T ) QT (), and Lemma 10.

    Theorem 1. For some > 0, let Assumptions A(2 + ), B, C, D, E,F(1) and G hold. Then

    T 0 a.s. as T .(15)

    Proof. From Lemmas 7 and 8, QT () converges uniformly to Q() a.s.,whence the proof is as indicated for Proposition 1.

    Denote by j the jth cumulant of t and introduce

    G0 = (2 + 4)M 23(N +N ) +P, H0 =M + 12P,where

    M =E(00), N =E(

    10 0)e

    2, P =E(

    20 )e2e

    2,

    for 0 = 0(0), t() = (/) log2t (), and e2 the second column of the

    (r + 2) (r+ 2) identity matrix. In case 0 is known (e.g., to be zero), weomit the second row and column from M , and have instead G0 = (2+4)M ,H0 =M . In case t is Gaussian, 3 = 4 = 0, so G0 = 2H0 = 2M + P .

    Proposition 2. Let Assumptions A(4), B, C, D, E, F(3) and G hold.Then

    T 1/2(T 0) dN(0,H10 G0H10 ) as T .

    Proof. Write

    Q(1)T () =

    QT ()

    = T1

    T

    t=1

    ut(),

    where

    ut() = t()(1 2t ()) + 2t ()t(),

  • ESTIMATION OF ARCH() MODELS 7

    with

    t() =x2t ()

    2t (), t() =

    x2t ()

    =2xt()e2.

    By the mean value theorem,

    0 =Q(1)T (T ) =Q

    (1)T (0) + HT (T 0),(16)

    where HT has as its ith row the ith row of HT () = T1T

    t=1 ht() evaluated

    at = (i)T , where ht() = (

    2/ )QT (), (i)T 0 T 0, where wedefine A = {tr(AA)}1/2 for any real matrix A. Now ut(0) = t(0)(1 2t ) 2e2t/t is, by Lemmas 2, 3 and 7, a stationary ergodic martingaledifference vector with finite variance, so from Brown [6] and the Cramer

    Wold device, T 1/2Q(1)T (0)d N (0,G0) as T . Finally, by Lemma 7 and

    Theorem 1, HT p H0, whence the proof is completed in standard fashion.

    Define

    ut() =qt()

    , gt() = ut()u

    t(), ht() =

    2qt()

    ,

    GT () = T1

    T

    t=1

    gt(), HT () = T1

    T

    t=1

    ht().

    Theorem 2. Let Assumptions A(4), B, C, D, E, F(3), G and H hold.Then

    T 1/2(T 0) dN(0,H10 G0H10 ) as T ,(17)

    and H10 G0H10 is strongly consistently estimated by H

    1T (T )GT (T )H

    1T (T ).

    Proof. We have

    0 = Q(1)T (T ) = Q

    (1)T (0) + HT (T 0),

    where Q(1)T () = (/)QT () and HT has as its ith row the ith row of HT ()

    evaluated at = (i)T , for

    (i)T 0 T 0. Thus, from (16),

    T T = (H1T H1T )Q(1)T (0) H1T {Q

    (1)T (0)Q

    (1)T (0)},

    where the inverses exist a.s. for all sufficiently large T by Lemma 9. In viewof Proposition 2 and Lemma 8, (17) follows on showing that

    Q(1)T (0)Q

    (1)T (0) = op(T

    1/2).

  • 8 P. M. ROBINSON AND P. ZAFFARONI

    The left-hand side can be written (B1T +B2T +B3T )/T , where

    B1T =T

    t=1

    2t b1t, B2T = T

    t=1

    (2t 1)b2t, B3T = 2e2T

    t=1

    tb3t,

    with

    b1t =

    2(1)t (

    2t 2t )4t

    , b2t =

    2(1)t

    2t

    2(1)t

    2t, b3t =

    2t 2t2t t

    ,

    for 2t = 2t (0),

    2(1)t =

    2(1)t (0),

    2(1)t =

    2(1)t (0), with

    2(1)t () =

    (/)2t (), 2(1)t () = (/)

    2t (). We show thatBiT = op(T

    1/2), i= 1,2,3.For the remainder of this proof, we drop the zero subscript in 0j .

    Consider first B1T . We have

    2(1)t =

    (1,2

    t1

    j=1

    jxtj,t1

    j=1

    (1)j x

    2tj

    ),(18)

    where (1)j =

    (1)j (0). From Assumption F(1),

    2(1)t 1 + 2t1

    j=1

    j|xtj |+Kt1

    j=1

    1j x2tj ,

    for all > 0. Now

    t1

    j=1

    j |xtj | (

    t1

    j=1

    jx2tj

    )1/2(

    j=1

    j

    )1/2Kt,

    so since t L > 0,

    2t

    t1

    j=1

    j|xtj | K1t

  • ESTIMATION OF ARCH() MODELS 9

    Thus, by (8) and (14),

    Eb1t Kt

    j=t

    j K

    j=t

    (1)j Kt1(d0+1)(1),(21)

    choosing < 11/{(d0 +1)}, which (14) enables. Applying the cr-inequalityagain,

    EB1T KT

    t=1

    E|0|2Eb1t.

    Applying (21), this is O(1) when > 2/(d0 + 1), while when 2/(d0 + 1),we may choose so small to bound it by

    KT 2(d0+1)(1) KT /2{1+2(d0+1)(1)}[/22/{1+2(d0+1)(1)}] = o(T /2),using (12) [which requires (13)] and arbitrariness of . Thus, B1T = op(T

    1/2)by Markovs inequality.

    Consider B2T . By independence of t and b2t, by the cr-inequality when 12 , and by the inequality of von Bahr and Esseen [34] and the fact thatthe 2t are i.i.d. with mean 1 when >

    12 ,

    EB2T 2 KT

    t=1

    (E|0|4 + 1)Eb2t2 KT

    t=1

    (Eb4t2 +Eb5t2),

    where

    b4t =

    2(1)t

    2(1)t

    2t, b5t =

    2(1)t (

    2t 2t )

    2t 2t

    .

    Thus, from Assumptions F(1) and H,

    b4t (

    2

    j=t

    j|xtj |+

    j=t

    (1)j x2tj)/

    2t

    2t[2

    {

    j=t

    j

    }1/2+

    {

    j=t

    ((1)j 2/j)x

    2tj

    }1/2]{

    j=t

    jx2tj

    }1/2

    K{(

    j=t

    jd01)1/2

    +

    (

    j=t

    12j x2tj

    )1/2}

    K[td0/2 +

    {

    j=t

    j(d0+1)(12)x2tj

    }1/2],

    so

    Eb4t2 Ktd0 +K

    j=t

    j(d0+1)(12) Kt1(d0+1)(12)

  • 10 P. M. ROBINSON AND P. ZAFFARONI

    for sufficiently small . Thus,T

    t=1Eb4t2 is O(1) for > 2/(d0 +1), whilefor 2/(d0 + 1), it is bounded by

    KT 2(d0+1)(12) KT (d0+2){2/(d0+2)}+2(d0+1) = o(T )from (14) and arbitrariness of . Also, b5t K2(1)t /2t (2t 2t )1/2, sofrom (19) and (20) we have Eb5t2 Kt1(d0+1)(12), and proceeding asbefore,

    T

    t=1

    Eb5t2 = o(T ),

    and thence, B2T = op(T1/2).

    Next,

    EB3T 2 KE

    T

    t=1

    tb3t

    2

    KT

    t=1

    E|0|2Eb23t ,

    applying the cr-inequality when 12 and von Bahr and Esseen [34] when > 12 . Now b3t (2t 2t )1/2

    2t , so from (20),

    EB3T 2 KT

    t=1

    j=t

    j

    K{1( > 2/(d0 + 1)) + (lnT )1(= 2/(d0 + 1))+ T 2(d0+1)1( < 2/(d0 + 1))}

    = o(T ),

    much as before. Thence, B3T = op(T1/2).

    It remains to consider the last statement of the theorem, which followson standard application of Propositions 1 and 2, Theorem 1 and Lemmas 7and 8.

    In earlier versions of this paper we checked the conditions in the case ofGARCH(n,m) models in which the j() decay exponentially and we allowthe possibility that the GARCH coefficients lie in a subspace of dimensionless than m+n; the details are available from the authors on request. How-ever, the literature on asymptotic theory for estimates of GARCH modelsis now extensive, recent references including [3, 7, 12, 16, 22, 32], alongwith investigations of the properties of the models themselves; see recently[2, 18, 25]. We focus instead on alternative models which have received lessattention, and for which our theoretical framework is primarily intended.

    We introduce the generating function

    (z; ) =

    j=1

    j()zj , |z| 1.(22)

  • ESTIMATION OF ARCH() MODELS 11

    3. Fractional GARCH models. A slowly decaying class of ARCH()weights was considered by Robinson [29], Ding and Granger [9] and Koulikov[20], generated by

    (z; ) = 1 (1 z) , 0< < 1,(23)

    where r = 1 and formally

    (1 z)d =

    j=0

    (j d)(d)(j + 1)z

    j, |z| 1, d > 0.(24)

    In these references 0 = 0 was assumed in (2), but we assume 0 > 0 andgeneralize (23) as follows. Introduce the functions aj = aj(), bj = bj() and,for m 1, n 0, n+m r,

    a(z; ) =m

    j=1

    ajzj, b(z; ) = 1

    n

    j=1

    bjzj1(n 1);(25)

    and for all ,

    aj > 0, j = 1, . . . ,m; bj > 0, j = 1, . . . , n;(26)

    b(z; ) 6= 0, |z| 1;(27)a(z; ) and b(z; ) have no common zeros in z.(28)

    Now take (z; ) (22) to be given by

    (z; ) =a(z; ){1 (1 z)d}

    zb(z; ),(29)

    with d= d() satisfying

    d (0,1).(30)

    We call xt based on (29) a fractional GARCH, FGARCH(n,d0,m) process,for d0 = d(0).

    Corollary 1. Let (z; ) be given by (29) and (25) with m 1, n 0,and let d and the aj, bj be continuously differentiable. For some > 0, letAssumptions A(2+ ), B, C and E hold, with all satisfying (26)(28),(30) and

    rank

    {

    (a1, . . . , am, b1, . . . , bn, d)

    }= r.

    Then (15) is true. Let also d and the aj , bj be thrice continuously differen-tiable and d0 >

    12 . Then (17) is true.

  • 12 P. M. ROBINSON AND P. ZAFFARONI

    Proof. Denoting by cj (j 1) and dj (j 0) the coefficients of zj inthe expansions of a(z; )/b(z; ), z1{1 (1 z)d}, respectively, we havej() =

    j1k=0 cjkdk, j 1. From [3], the cj are bounded above and be-

    low by positive, exponentially decaying sequences when n 1, and areall nonnegative when n = 0. Since the dj are all positive, it follows that(6) holds. Also, Stirlings approximation indicates that jd1/K dj Kjd1, so the j() satisfy the same inequalities. Compactness of ,smoothness of d, and (30), imply d() d, to check (7). The above argumentindicates that 0j Kjd01 Kkd01 K0k for j > k 1, so (8) holds,and thus Assumption D. With regard to (11), note that (/d)(z; ) ={a(z; )/b(z; )}z1(1z)d ln(1z), where the coefficient of zj in z1(1z)d ln(1 z) is jk=1 k1djk K(ln j)jd1 Kj(d+1)(1) K

    1j ()

    for any > 0. Derivatives with respect to the aj, bj are dominated, andhigher derivatives can be dealt with similarly, to complete the checking ofAssumption F(l). To check Assumption G, suppress reference to in a, b, and

    (z) = b(z)1{1 (1 z)d}, (z) = b(z)1a(z),and note that

    (z)

    aj= zj1(z), j = 1, . . . ,m,

    (z)

    bj= zj1(z)(z), j = 1, . . . , n,

    (z)

    d= (z)

    z(1 z)d log(1 z).

    Choose ji() = i for i= 1, . . . ,m+ n, , leaving jm+n+1() to be deter-mined subsequently. Fix and write U = (ji,...,jr)(), partitioning it in theratio m+ n : 1 and calling its (i, j)th submatrix Uij . We first show that the(m+ n) (m+ n) matrix U11 is nonsingular. Write R for the n (m+ n)matrix with (i, j)th element ji, and S for the (m+ n) (m+ n) matrixwith (i, j)th element ji+1, where j = j = 0 for j 0, and for j > 0, jand j are respectively given by

    (z) =

    j=1

    jzj , (z) =

    j=1

    jzj ,

    these series converging absolutely for |z| 1 in view of (30). Noting that

    (1)j is given by (/)(z) =

    j=1

    (1)j z

    j , we find that the first m rows ofU11 can be written (Im,O)S, where Im is the m-rowed identity matrix, Ois the m n matrix of zeroes and, when n 1 the last n rows of U11 canbe written RS. Now S is upper-triangular with nonzero diagonal elements.

  • ESTIMATION OF ARCH() MODELS 13

    Thus, for n= 0, U11 = S is nonsingular. For n 1, U11 is nonsingular if andonly if the n n matrix R2 having (i, j)th element m+ji and consistingof the last n columns of R is nonsingular. This is not so if and only if thej , j =m, . . . ,m+ n 1, are generated by a homogeneous linear differenceequation of degree n 1, that is, if there exist scalars 0, 1, . . . , n1, notall zero, such that

    0j n1

    i=1

    iji = 0, j =m, . . . ,m+ n 1.

    But it follows from (25) and (27) that they are generated by the lineardifference equation

    j n1

    i=1

    biji = j, j =m, . . . ,m+ n 1,

    where m = am + bnmn, j = bnjn for j =m+ 1, . . . ,m+ n 1. Sincebn 6= 0, the j are all zero if and only if mn = am/bn and j = 0 forj =m+ 1 n, . . . ,m 1. But this implies m = 0 also, and thence, j = 0,all j mn+1. For m n, this is inconsistent with the requirement aj > 0,j = 1, . . . ,m, and for m>n, it implies a has a factor b, which is inconsistentwith (28). Thus, U11 is nonsingular when n 1. Nonsingularity of U followsif U22 6= U21U111 U12. For large enough jm+n+1 = jm+n+1(), this must betrue because U22 decays like (ln jm+n+1)j

    d1m+n+1, whereas the elements of

    U12 are O(jm+n+1) for some (0,1). Thus Assumption G is true, and

    thence (15). Clearly (13) is true, so under the additional conditions so isAssumption H, and thence (17).

    For m= 1, n= 0, (29) reduces to (23) when a1 = 1, while when a1 (0,1),it gives model (4.24) of Ding and Granger [9]. The important difference be-tween these two cases is that the covariance stationarity condition (1; 0)12 for the central limit theorem in Corollary 1 would

    also be imposed in a corresponding result for FIGARCH. This is automati-cally satisfied in GARCH models but if only d0 (0, 12 ] in (13) is possible inthe general setting of Section 3, it appears that the asymptotic bias in T isof order at least T1/2, whereas that for T is always o(T

    1/2). AssumptionH copes with the replacement of 2t () by

    2t (), the truncation error vary-

    ing inversely with d0. Inspection of the proof of Theorem 2 indicates thatthis bias problem is due to the term H1B1T . The factor

    2t 2t in b1t is

    nonnegative, and if jd01 is an exact rate for 0j , 2t 2t exceeds td0/K

    as t with probability approaching one. So far as the factor 2(1)t /4t inb1t is concerned, the second element of

    2(1) [see (18)] has zero mean, but

    the first is positive, and though the (1)j can have elements of either sign,

    whenever d0 12 it seems unlikely that the last r elements of B1T can beop(T

    1/2). Nor is there scope for relaxing (12) by strengthening other condi-tions. With regard to implications for choice of , when d0 2d+ 12 , (14)entails no restriction over (9).

    Though results of Giraitis, Kokoszka and Leipus [13] indicate existence ofa stationary solution of (1)(3) when (1; 0)< 1, Kazakevicius and Leipus[19] have questioned the existence of strictly stationary FIGARCH processes,and thus the relevance of Assumption E here. The same reservations canbe expressed about FGARCH when a(1; 0) b(1; 0), and more generallyabout ARCH() processes with (1; 0) 1. A sufficient condition for (10)can be deduced as follows. Recursive substitution gives

    2t K +K

    l=1

    (

    j1=1

    jl=1

    0j1 0jl2tj12tj1j2

    2tj1jl

    ),

    so by the cr-inequality,

    2t K +K

    l=1

    (

    j1=1

    jl=1

    0j1 0jl

    |tj1 |2

    |tj1j2|2 |tj1jl |2

    ).

    Thus, from Lemma 2,

    E|xt|2

  • ESTIMATION OF ARCH() MODELS 15

    The last bound is finite if and only if

    E|0|2

    j=1

    0j < 1.(32)

    Thus, (10) holds if there is a satisfying (9) and (32). Recursive substi-tution and the cr-inequality were also used by Nelson ([27], Corollary) toupper-bound E|t|2 in the GARCH(1,1) case, but he employed the simpledynamic structure available there, and (32) does not reduce to his necessaryand sufficient condition.

    If (1; 0)< 1, (32) adds nothing because we already know that Ex20 0, where () = {()/(3)}1/2 (also used by Nelson [28] to modelthe innovation of the exponential GARCH model). We have E0 = 0, E

    20 =

    1 as necessary, Assumption A(q) is satisfied for all q > 0, and E|0|2 =((2+ 1))/{()1(3)}. In case = 0.5, (33) is the normal density,for which T is asymptotically efficient. Here E|0|2 = 2( + 0.5)/

    ,

    and numerical calculations for FIGARCH(0, d0,0) cast doubt on (32). Incase = 1, (33) is the Laplace density, with E|0|2 = 21(2 + 1). As increases, E|0|2 can be made small for fixed < 1, for example, with= 0.95, it is 0.64 when = 10 and 0.42 when = 20.

    4. Generalized exponential and hyperbolic models. FGARCH(n,d0,m)[and FIGARCH(n,d0,m)] processes require d0 (0,1). For d = 1, (29) re-duces to (23), and for d > 1, at least one coefficient in the expansion of (23)is negative, leading to the possibility of negative j(). Because FGARCHj() decay like j

    d1, a large mathematical gap is left relative to GARCHprocesses. Even if exponential decay is anticipated, there is a case for moredirect modeling of the j() than provided by GARCH(n,m), since it isthe j() and their derivatives that must be formed in point and intervalestimation based on the PMLE.

    Consider the choices

    j() =m

    i=1

    (fi + 1)1eid

    fi+1jfiedj ,(34)

    j() =m

    i=1

    (fi + 1)1eid ln

    fi(j + 1)(j + 1)d1,(35)

  • 16 P. M. ROBINSON AND P. ZAFFARONI

    where d= d() and the ei = ei(), fi = fi() are such that satisfies

    d (0,),(36)ei > 0, i= 1, . . . ,m,(37)

    0 f1 fm 0, all j 1. By choosing m large enough in (34) or(35), any finite (1; ) can be arbitrarily well approximated, but (34) and(35) can also achieve parsimony. For real x 1, xfedx and (lnx)fxd1decay monotonically if f = 0, and for f > 0, have single maxima at f/dand ef/(d+1), respectively. Thus, with m= 1 and f1 = 0, we have monotonicdecay in (34) and (35); otherwise, both can exhibit lack of monotonicity,while eventually decaying exponentially or hyperbolically. The scale factorsin (34) and (35) are so expressed because xfedx and (lnx)fxd1 integrateover (0,) to (f + 1)/df+1 and (f + 1)/d, respectively, so that (1; ) m

    i=1 ei in both cases, but the approximation may not be very close and theintegrated case is less easy to distinguish than in GARCH and FGARCHmodels (though it would be possible to alternatively scale the weights byinfinite sums to achieve equality).

    The following corollary covers (34) and (35) simultaneously, and impliesthe special case when the fi are specified a priori, for example, to be non-negative integers; strictly speaking, when the true value of f1 is unknown,Assumption C prevents it from being zero.

    Corollary 2. Let (z; ) be given by (22) and (34) or (35) with m 1and let d and the ei, fi be continuously differentiable. For some > 0, letAssumptions A(2+ ), B, C and E hold, with all satisfying (36)(38)and

    rank

    {

    (e1, f1, . . . , em, fm, d)

    }= r.

    Then (15) is true. Let also d and the ei, fi be thrice continuously differen-tiable and Assumption A(4) hold, and d0 = d(0)>

    12 in case of (35). Then

    (17) is true.

    Proof. Given (36)(38) and the proofs of Corollaries 1 and 2, the verifi-cation of Assumptions D and F(l) is straightforward. We check AssumptionG for (35) only, a very similar type of proof holding for (34). We have

    (1)j =

    [E(u1j , . . . , u

    mj)

    vj

    ]d(j + 1)d1,

  • ESTIMATION OF ARCH() MODELS 17

    where

    uij = (ln ln(j + 1) (/fi) ln(fi+1),1) lnfi(j + 1), i= 1, . . . , r,

    vj = m

    i=1

    ei(fi+1)1 lnfi+1(j + 1),

    and E is the diagonal matrix whose (2i 1)st diagonal element is ei, andwhose even diagonal elements are all 1. Fixing , we show first that the lead-ing (r 1) (r 1) submatrix of (j1,...,jr)() has full rank, equivalently,that Um has full rank, where, for i= 1, . . . ,m, the (2i) (2i) matrix Ui has(k, )th 21 sub-vector ukj , k = 1, . . . , i, = 1, . . . ,2i. Suppose, for somei= 1, . . . ,m 1 and given j1, . . . , j2i, that Ui has full rank, and partition therows and columns of Ui+1 in the ratio 2i : 2, calling its (k, )th submatrix Uk(so U11 = Ui). Take j2i+2 = j

    22i+1. Because ln lnx strictly increases in x > 1, it

    follows that U22 is nonsingular and U122 =O(ln ln j2i+1 lnfi+1 j2i+1). Not-ing that U12 =O(ln ln j2i+1 lnfi j2i+1), while U11 and U21 depend only onj1, . . . , j2i, we can choose j2i+1 such that U11 U12U122 U21 differs negligiblyfrom U11. Thus, Ui+1 has full rank. Since, for f1 0, U1 has full rank (e.g.,when j1 = 1, j2 = 2), it follows by induction that Um has full rank. Sincevj is dominated by a term of order ln

    fm+1 j, while uij =O(ln ln j lnfi j), asimilar argument shows that jr can then be chosen large enough, to completeverification of Assumption G.

    5. Technical lemmas. Define

    2t () = +

    j=1

    j()x2tj ,

    2t = U +

    j=1

    sup

    j()x2tj .

    Lemma 1. Under Assumptions B and D, for all , t Z,

    K12t () 2t ()K2t () a.s.

    Proof. A simple extension of [21], Lemma 1.

    Lemma 2. Under Assumptions A(2), B, C, D and E, for all t Z,

    E|xt|2 0, sup

    2t ()

  • 18 P. M. ROBINSON AND P. ZAFFARONI

    Proof. With respect to (39), the first inequality follows from Jensensinequality, the second is obvious, the third follows from Lemma 1, the fourthfollows from the cr-inequality, (7) and (9), while the last one is due to (10).The proof of (40) uses Lemma 1, 2t () L, (10) and [23], page 121. Toprove (41), | lnx| x+ x1 for x > 0 and Lemma 2 give

    E sup

    | ln2t ()| 1E sup

    2t () +E

    {inf

    2t ()

    }1K.

    Lemma 3. Under Assumptions D, E and F(l), for all , 2t (), qt()and their first l derivatives are strictly stationary and ergodic.

    Proof. Follows straightforwardly from the assumptions.

    Lemma 4. Under Assumption A(2), for positive integer k < (b+ 1)n/2,

    E

    (n

    t=1

    2t

    )k 0, there exists > 0 such that L(1) , (0, ), so

    M20(t) =

    et

    2f()dK

    0et

    2b d+ 2et

    2.

    The last integral is bounded by

    Kt(b1)/2

    0e(b1)/2 dKt(b1)/2.

    Thus, (43) is finite if k + n( b 1)/2 < 0, that is, since is arbitrary, ifk < (b+ 1)n/2.

    The previous version of the paper included a longer, independently ob-tained, proof of the following lemma which we have been able to shortenin one respect by using an idea of Berkes, Horvath and Kokoszka [3] in acorresponding lemma covering the GARCH(n,m) case.

  • ESTIMATION OF ARCH() MODELS 19

    Lemma 5. Under Assumptions A(q), B, C and D, for p < q/2,

    E sup

    (2t2t ()

    )pK 0and k l,

    E sup

    1

    2t ()

    k2t ()

    i1 ik

    p

  • 20 P. M. ROBINSON AND P. ZAFFARONI

    It suffices to take p > 1. By Holders inequality,

    j=1

    |j()|x2tj {

    j=1

    |j()|p/j()1p/x2tj}/p{

    j=1

    j()x2tj

    }1/p,

    so{

    j=1 |j()|x2tj2t ()

    }pK

    j=1

    |j()|pj()p|xtj |2.

    By Assumption F(l), for all > 0

    sup

    |j()|pj()p K sup

    j()p Kj(d+1)(p).

    Since (d+ 1)> 1, we may choose such that (d+ 1)( p)> 1. Thus,

    E sup

    {j=1 |j()|x2tj2t ()

    }p 2,

    k2t ()

    i1 ik=2

    j=1

    j(),

    where now j() = k2j()/i32 ik2. In the first of these cases

    the proof is seen to be very similar to that above after noting that, by theCauchy inequality, (46) is bounded by

    K

    {

    j=1

    |j()|x2tj

    j=1

    |j()|}1/2

    +K

    j=1

    |j()|,

    while in the second it is more immediate; we thus omit the details. We areleft with the cases i1 = i2 = i3 = 2 and i1 = 1, both of which are trivial.The details for (45) are very similar (the truncations in numerator anddenominator match), and are thus omitted.

  • ESTIMATION OF ARCH() MODELS 21

    Define

    gt() = ut()ut(), GT () = T

    1T

    t=1

    gt().

    Lemma 7. For some > 0, under Assumptions A(2 + ), B, C, D, Eand F(1),

    sup

    |QT ()Q()| 0 a.s. as T ,(47)

    and Q() is continuous in . If also Assumption F(2) holds,

    sup

    GT ()G() 0 a.s. as T ,(48)

    and G() is continuous in . If also Assumption F(3) holds,

    sup

    HT ()H() 0 a.s. as T ,(49)

    and H() is continuous in .

    Proof. To prove (47), note first that, by Lemmas 1, 2, 3 and 5,

    supE|q0()| sup

    E| log20()|+ sup

    E0()

  • 22 P. M. ROBINSON AND P. ZAFFARONI

    Thus, E sup u0() is bounded by a constant times

    E sup

    0()+[E sup

    {2020()

    }p]1/p[E sup

    0()p/(p1)

    ]11/p

    +E sup

    {00()

    }+ 1

    for all p > 1. On choosing p < 1+ /2, this is finite by Lemmas 5 and 6. (Ouruse of Lemmas 5 and 6 is similar to Berkes, Horvath and Kokoszkas [3] useof their Lemmas 5.1 and 5.2 in the GARCH(n,m) case.) This completes theproof of (47). Then (48) and (49) follow by applying analogous arguments tothose above, and so we omit the details; indeed, (48) and (49) are only used

    in the proof of consistency of GT (T ), HT (T ) for G0,H0, where convergenceover only a neighborhood of 0 would suffice.

    Lemma 8. Under Assumptions A(2 + ), B, C, D, E and F(1),

    sup

    |QT () QT ()| 0 a.s. as T .(50)

    If also Assumption F(2) holds,

    sup

    GT () GT () 0 a.s. as T .(51)

    If also Assumption F(3) holds,

    sup

    HT () HT () 0 a.s. as T .(52)

    Proof. We have QT ()QT () =AT () +BT (), where

    AT () = T1

    T

    t=1

    ln

    [2t ()

    2t ()

    ], BT () = T

    1T

    t=1

    x2t (){2t () 2t ()}.

    Because

    2t () = 2t () +

    j=0

    j+t()x2j(),

    ln(1 + x) x for x > 0 and 2t () L > 0, it follows that

    |AT ()| KT1T

    t=1

    {2t () 2t ()}

    KT1T

    t=1

    j=t

    j()x2tj()

    KT1

    t=0

    {t+T

    j=t+1

    j()

    }x2t().

  • ESTIMATION OF ARCH() MODELS 23

    Now from (7),

    sup

    t+T

    j=t+1

    j()Kt+T

    j=t+1

    jd1 Kmin(t+ 1, T )(t+ 1)d1.

    Thus,

    supAT ()KT1

    T

    t=0

    (t+ 1)d(x2t + 1) +K

    t=T

    td1(x2t + 1).(53)

    From the cr-inequality, (9) and (10),

    t=1(t+ 1)d1x2t has finite th mo-

    ment, and thus, by Loeve ([23], page 121), is a.s. finite. Thus, the secondterm of (53) tends to zero a.s. as T , while the first does so for the samereasons combined with the Kronecker lemma. Next,

    |BT ()| KT1T

    t=1

    t()

    j=t

    j()x2tj()

    (54)

    KT1T

    t=1

    t()

    j=t

    jd1(x2tj + 1).

    From previous remarks,

    j=t jd1(x2tj + 1) 0 a.s. Also, for each , a.s.

    T1T

    t=1

    t()E0()K{E

    (2020()

    )+ 1

    }K

    by ergodicity and Lemma 5. Thus, (54) 0 a.s. by the Toeplitz lemma.The convergence is uniform in because, from the proof of Lemma 7, for all ,

    sup : 0, under Assumptions A(2 + ), B, C, D, E,F(1) and G, M() is finite and positive definite for all .

    Proof. Fix . Finiteness of M() follows from Lemma 6. Positivedefiniteness follows (by an argument similar to that of Lumsdaine [24] inthe GARCH(1,1) case) if, for all nonnull (r + 2) 1 vectors , M()=E{0()}2 > 0, that is, that

    0()20() 6= 0 a.s.,(55)

  • 24 P. M. ROBINSON AND P. ZAFFARONI

    since 0< 20() 0 a.s., the left-hand side involves the nondegenerate randomvariable t1, which is independent of the right-hand side, so (56) cannot

    hold. Thus, 3(1)j () = 0. Repeated application of this argument indicates

    that, for all , 3(1)j () = 0, j = 1, . . . , jr(). This is contradicted by As-

    sumption G, so (56) cannot hold. Next consider the case 1 = 0, 2 6= 0,3 = 0. If (56) does not hold, we must have

    j=1

    j()xtj() = 0 a.s.(57)

    Let k be the smallest integer such that k() 6= 0. Then (57) implies

    tk = 1tk()

    { 0 1k ()

    j=k+1

    j()xtj()

    }.

    But the left-hand side is nondegenerate and independent of the right-handside, so (57) cannot hold. Next consider the case 1 = 0, 2 6= 0, 3 6= 0. If(55) is not true, then, taking 2 = 1, we must have

    j=1

    {3(1)j ()xtj() 2j()}xtj() = 0 a.s.(58)

  • ESTIMATION OF ARCH() MODELS 25

    Let k be the smallest integer such that either 3(1)k () 6= 0 or k() 6= 0;

    the preceding argument indicates that there exists such k. Then we have

    {2k() 3(1)k ()(tktk + 0 )}{tktk + 0 }

    =

    j=k+1

    {3(1)j ()xtj() 2j()}xtj() a.s.

    The left-hand side is a.s. nonzero and involves the nondegenerate randomvariable tk, which is independent of the right-hand side, so (58) cannothold. We are left with the cases where 1 6= 0. Taking 1 = 1 and notingthat 2t ()t() 1, the preceding arguments indicate that there exist no2 and 3 such that

    22t ()t() +

    3

    2t ()t() = 1 a.s.

    Lemma 10. For some > 0, under Assumptions A(2 + ), B, C, D, E,F(1) and H,

    inf 6=0

    Q()>Q(0).

    Proof. We have

    Q()Q(0) =E[202()

    ln{

    202()

    } 1

    ]+ ( 0)2E

    [1

    20()

    ].

    The second term on the right-hand side is zero only when = 0 and ispositive otherwise. Because x lnx 1 0 for x > 0, with equality onlywhen x= 1, it remains to show that

    ln20() = ln20 a.s., some 6= 0.(59)

    By the mean value theorem, (59) implies that ( 0)0() = 0 a.s., for 6= 0 and some such that 0 0. But by Lemma 9 there isno such .

    Acknowledgments. We thank an Associate Editor and referees for a num-ber of helpful comments that have led to a considerable improvement in thepaper, and Fabrizio Iacone for help with the numerical calculations referredto in Section 3.

  • 26 P. M. ROBINSON AND P. ZAFFARONI

    REFERENCES

    [1] Baillie, R. T., Bollerslev, T. and Mikkelsen, H. O. (1996). Fractionallyintegrated generalized autoregressive conditional heteroskedasticity. J. Econo-metrics 74 330. MR1409033

    [2] Basrak, B., Davis, R. A. and Mikosch, T. (2002). Regular variation of GARCHprocesses. Stochastic Process. Appl. 99 95115. MR1894253

    [3] Berkes, I., Horvath, L. and Kokoszka, P. (2003). GARCH processes: Structureand estimation. Bernoulli 9 201227. MR1997027

    [4] Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity.J. Econometrics 31 307327. MR0853051

    [5] Breidt, F. J., Crato, N. and de Lima, P. (1998). The detection and estimation oflong memory in stochastic volatility. J. Econometrics 83 325348. MR1613867

    [6] Brown, B. (1971). Martingale central limit theorems. Ann. Math. Statist. 42 5966.MR0290428

    [7] Comte, F. and Lieberman, O. (2003). Asymptotic theory for multivariateGARCH processes. J. Multivariate Anal. 84 6184. MR1965823

    [8] Cressie, N., Davis, A., Folks, J. L. and Policello, G., II (1981). The moment-generating function and negative integer moments. Amer. Statist. 35 148150.MR0632425

    [9] Ding, Z. and Granger, C. W. J. (1996). Modelling volatility persistence of spec-ulative returns: A new approach. J. Econometrics 73 185215. MR1410004

    [10] Ding, Z., Granger, C. W. J. and Engle, R. F. (1993). A long memory propertyof stock market returns and a new model. J. Empirical Finance 1 83106.

    [11] Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimatesof the variance of United Kingdom inflation. Econometrica 50 9871007.MR0666121

    [12] Francq, C. and Zakoan, J.-M. (2004). Maximum likelihood estimation of pureGARCH and ARMAGARCH processes. Bernoulli 10 605637. MR2076065

    [13] Giraitis, L., Kokoszka, P. and Leipus, R. (2000). Stationary ARCH models:Dependence structure and central limit theorem. Econometric Theory 16 322.MR1749017

    [14] Giraitis, L. and Robinson, P. M. (2001). Whittle estimation of ARCH models.Econometric Theory 17 608631. MR1841822

    [15] Harvey, A. C. (1998). Long memory in stochastic volatility. In Forecasting Volatilityin the Financial Markets (J. Knight and S. Satchell, eds.) 307320. ButterworthHeinemann, Oxford.

    [16] Jeantheau, T. (1998). Strong consistency of estimators for multivariate ARCH mod-els. Econometric Theory 14 7086. MR1613694

    [17] Jennrich, R. I. (1969). Asymptotic properties of non-linear least squares estimators.Ann. Math. Statist. 40 633643. MR0238419

    [18] Kazakevicius, V. and Leipus, R. (2002). On stationarity in the ARCH() model.Econometric Theory 18 116. MR1885347

    [19] Kazakevicius, V. and Leipus, R. (2003). A new theorem on the existence ofinvariant distributions with applications to ARCH processes. J. Appl. Probab.40 147162. MR1953772

    [20] Koulikov, D. (2003). Long memory ARCH() models: Specification and quasi-maximum likelihood estimation. Working Paper 163, Centre for Analytical Fi-nance, Univ. Aarhus. Available at www.cls.dk/caf/wp/wp-163.pdf.

    [21] Lee, S. and Hansen, B. (1994). Asymptotic theory for the GARCH(1,1) quasi-maximum likelihood estimator. Econometric Theory 10 2952. MR1279689

    http://www.ams.org/mathscinet-getitem?mr=1409033http://www.ams.org/mathscinet-getitem?mr=1894253http://www.ams.org/mathscinet-getitem?mr=1997027http://www.ams.org/mathscinet-getitem?mr=0853051http://www.ams.org/mathscinet-getitem?mr=1613867http://www.ams.org/mathscinet-getitem?mr=0290428http://www.ams.org/mathscinet-getitem?mr=1965823http://www.ams.org/mathscinet-getitem?mr=0632425http://www.ams.org/mathscinet-getitem?mr=1410004http://www.ams.org/mathscinet-getitem?mr=0666121http://www.ams.org/mathscinet-getitem?mr=2076065http://www.ams.org/mathscinet-getitem?mr=1749017http://www.ams.org/mathscinet-getitem?mr=1841822http://www.ams.org/mathscinet-getitem?mr=1613694http://www.ams.org/mathscinet-getitem?mr=0238419http://www.ams.org/mathscinet-getitem?mr=1885347http://www.ams.org/mathscinet-getitem?mr=1953772www.cls.dk/caf/wp/wp-163.pdfhttp://www.ams.org/mathscinet-getitem?mr=1279689
  • ESTIMATION OF ARCH() MODELS 27

    [22] Ling, S. and McAleer, M. (2003). Asymptotic theory for a vector ARMAGARCHmodel. Econometric Theory 19 280310. MR1966031

    [23] Loeve, M. (1977). Probability Theory 1, 4th ed. Springer, New York. MR0651017[24] Lumsdaine, R. (1996). Consistency and asymptotic normality of the quasi-maximum

    likelihood estimator in IGARCH(1, 1) and covariance stationary GARCH(1, 1)models. Econometrica 64 575596. MR1385558

    [25] Mikosch, T. and Starica, C. (2000). Limit theory for the sample autocorre-lations and extremes of a GARCH(1, 1) process. Ann. Statist. 28 14271451.MR1805791

    [26] Mikosch, T. and Straumann, D. (2002). Whittle estimation in a heavy-tailedGARCH(1, 1) model. Stochastic Process. Appl. 100 187222. MR1919613

    [27] Nelson, D. (1990). Stationarity and persistence in the GARCH(1, 1) models. Econo-metric Theory 6 318334. MR1085577

    [28] Nelson, D. (1991). Conditional heteroskedasticity in asset returns: A new approach.Econometrica 59 347370. MR1097532

    [29] Robinson, P. M. (1991). Testing for strong serial correlation and dynamic con-ditional heteroskedasticity in multiple regression. J. Econometrics 47 6784.MR1087207

    [30] Robinson, P. M. and Zaffaroni, P. (1997). Modelling nonlinearity and longmemory in time series. In Nonlinear Dynamics and Time Series (C. D. Cutlerand D. T. Kaplan, eds.) 161170. Amer. Math. Soc., Providence, RI. MR1426620

    [31] Robinson, P. M. and Zaffaroni, P. (1998). Nonlinear time series with longmemory: A model for stochastic volatility. J. Statist. Plann. Inference 68 359371. MR1629599

    [32] Straumann, D. and Mikosch, T. (2006). Quasi-maximum likelihood estimationin conditionally heteroscedastic time series: A stochastic recurrence equationsapproach. Ann. Statist. To appear.

    [33] Taylor, S. J. (1986). Modelling Financial Time Series. Wiley, Chichester.[34] von Bahr, B. and Esseen, C.-G. (1965). Inequalities for the rth absolute mo-

    ment of a sum of random variables, 1 r 2. Ann. Math. Statist. 36 299303.MR0170407

    [35] Whistler, D. E. N. (1990). Semiparametric models of daily and intra-daily exchangerate volatility. Ph.D. dissertation, Univ. London.

    [36] Zaffaroni, P. (2003). Gaussian inference on certain long-range dependent volatilitymodels. J. Econometrics 115 199258. MR1984776

    [37] Zaffaroni, P. (2004). Stationarity and memory of ARCH() models. EconometricTheory 20 147160. MR2028355

    Department of Economics

    London School of Economics

    Houghton Street

    London WC2A 2AE

    United Kingdom

    E-mail: [email protected]

    Tanaka Business School

    Imperial College London

    South Kensington Campus

    London SW7 2AZ

    United Kingdom

    E-mail: [email protected]

    http://www.ams.org/mathscinet-getitem?mr=1966031http://www.ams.org/mathscinet-getitem?mr=0651017http://www.ams.org/mathscinet-getitem?mr=1385558http://www.ams.org/mathscinet-getitem?mr=1805791http://www.ams.org/mathscinet-getitem?mr=1919613http://www.ams.org/mathscinet-getitem?mr=1085577http://www.ams.org/mathscinet-getitem?mr=1097532http://www.ams.org/mathscinet-getitem?mr=1087207http://www.ams.org/mathscinet-getitem?mr=1426620http://www.ams.org/mathscinet-getitem?mr=1629599http://www.ams.org/mathscinet-getitem?mr=0170407http://www.ams.org/mathscinet-getitem?mr=1984776http://www.ams.org/mathscinet-getitem?mr=2028355mailto:[email protected]:[email protected] and main resultsFractional GARCH models Generalized exponential and hyperbolic modelsTechnical lemmasAcknowledgmentsReferencesAuthor's addresses