Linear Regression for Trending Time Series with Endogeneity · 2015. 12. 2. · Nelson and Plosser...

Linear Regression for Trending Time Series with

Endogeneity

Li Chen

Supervised by Professors Jiti Gao, Farshid Vahid, and David Harris

Monash University

Abstract

This paper studies a linear regression model with endogenous trending regressors. The

nonstationary regressors are supposed to contain bounded time trends and stationary α-

mixing innovations. The time trends are estimated nonparametrically in order to avoid

misspecification. Meanwhile, the innovations are allowed to be correlated with the error

term in the regression, thus causes the problem of endogeneity. We employ the control

function approach to deal with the endogenous correlation, and extend the linear regres-

sion to a semiparametric partially linear model. The estimators of the regression coeffi-

cients arep

n-consistent and converge to normal distributions. The asymptotic properties

are verified by Monte Carlo simulations under different forms of trends. We revisit the

long-run relationship between aggregate personal consumption and income as an empir-

ical application.

Keywords: Trend, Time series, Endogeneity, Control function, Semiparametic, Nonpara-

metric, Aggregate consumption, Aggregate income.

1 Introduction

Trends are commonly observed among most economic and financial data. The trending

characteristic dominates the evolution path of particular time series and causes nonsta-

tionarity in the mean. Suppose that we are interested in analyzing the long-run relation-

ship between the aggregate personal income and consumption, while both of them contain

upward trends. Applying the classical regression models as well as their inference proce-

dures may give rise to misleading conclusions as such models are usually not applicable for

nonstationary time series. Eliminating the secular trends in the data and transforming the

nonstationary time series to their stationary alternatives appears to be a plausible method,

which however, is quite arbitrary due to several reasons. First, removing the trend in the

same time discards the dominating information in the data and also alters the interpreta-

tion of the coefficients in the models. As in our example, regression on the detrended data

merely implies the relationship between short term fluctuations of income and consump-

tion, which is not our original concern. Second, it may cause difficulties in econometric

modeling and economic interpretation if we eliminate the trends when we are still unsure

or even ignorant about their source, form, and the way they evolve; see Phillips (2005).

Therefore, in this paper, we try to establish the theories of a linear regression model for the

trending time series directly instead of analyzing its stationary remainders. In addition, we

take into account the problem of endogeneity which frequently occurs in quite a few em-

pirical applications.

The trending time series models have gained much attention in the past decades. Part of

the related papers aim at explaining the trending time series using a set of stationary time

series with a separate trend function to characterize the trending phenomenon. The model

is usually formulated as follows.

yt = ft +x ′tβ+et , (1.1)

for t = 1,2, ...,n, where yt is the trending time series of interest, xt is a vector of stationary

variables that are expected to explain the variation around the long-term trend ft . For ex-

2

ample, Gao and Hawthorne (2006) investigated the behavior of the global and hemispheric

temperature data using (1.1). Rather than employing a pre-specified parametric form that

may cause mis-specification, they allowed the time trend to be a flexible nonparametric

function of time. They found that ft should be a nonlinear function of time and a simple

linear function could hardly approximate the long term trend. In a series of papers since

Robinson (1989), Robinson (1991), to Cai (2007), the coefficients were allowed to vary over

time. Therefore we are able to describe the dynamic influences of the explanatory vari-

ables to the trending time series. The trending characteristic has also been considered in

the panel data analysis. For instance, Robinson (2012) studied the nonparametric trend-

ing regression with cross-sectional dependence. Chen et al. (2012) extended his work to

a semiparametric panel data model by including trending time series as explanatory vari-

ables. But none of these papers considered the problem of endogeneity. In this paper, we

explore a linear regression model of endogenous trending time series. In other words, we

hope to explain one trending time series using one or several other trending time series in

the form of

yt =x ′tβ+et , (1.2)

xt =g (τt )+ vt , (1.3)

where τt = t/n, and xt is a k ×1 vector of trending time series, for t = 1,2, ..,n. The coef-

ficients represent long-run relationship between these trending time series. In (1.3), the

time trend g (·) is supposed to be an unknown nonparametric function of τt and therefore

g (·) does not necessarily follow a linear or polynomial form. The nonparametric approach

brings several advantages such as flexibility and adaptivity to various forms of trends. In

addition, we also allow the occurrence of the problem of endogeneity that the error terms

et and vt are correlated. As a consequence, the endogenous correlation creates bias in the

OLS (ordinary least squares) estimator and the bias does not vanish even for large sample

sizes.

From both perspectives of theory and application, equation (1.3) represents an important

category of nonstationary time series that should receive more attention. In finite sample,

3

they can appear to be quite similar to unit root processes, which have been intensively

studied in the past decades. Since Hall (1978), the logarithm of aggregate consumption has

been widely believed to follow a random walk process with drift as

xt = c +xt−1 + vt . (1.4)

Nelson and Plosser (1982) examined fourteen U.S. macroeconomic time series using the

Dickey-Fuller unit root test, which suggested that most of the aggregate economic time

series contain unit roots. This conclusion has been frequently cited and borrowed as pre-

requisites in many empirical papers. Meanwhile, it has also been challenged all the time.

As suggested in Kwiatkowski et al. (1992) together with the references therein, the failure

to reject the null hypothesis of unit roots does not necessarily imply unit roots in the data

due to the following reasons. First, the Dickey-Fuller type tests usually have low power

against alternative data generating processes that have similar behaviors to unit roots, i.e.,

the unit root tests can not distinguish the null and the alternative, such as the stationary

autoregressive process with roots near unity (see DeJong et al. (1992)) and the fraction-

ally integrated time series (see Diebold and Rudebusch (1991)). The second reason is that

the form in alternative hypothesis is not properly specified, especially when it is falsely

restricted to a small range of data generating processes. For example, compared to usual

unit root tests with liner time trends assumed in the alternative hypothesis, Perron (1990)

and Zivot and Andrews (1992) found that it is more often to reject the unit root null when

taking into account possible structural breaks in the time trend. Bierens (1997) allowed the

time trend to be an arbitrary polynomial function of time, and argued that some of the time

series in the Nelson-Plosser data set should follow a nonlinear trend stationary process as

(1.3)1 instead of unit root process (1.4). Therefore, if the true data generating process fol-

lows equation (1.3), conventional unit root test may lack power to reject the unit root null,

especially when the innovations in (1.3) are serially correlated. Therefore, in the context of

such nonlinear trending time series, we start from revisiting the simplest linear regression

model, which were previously considered as cointegration regression of unit roots.

1In Bierens (1997), g (·) becomes polynomial functions of time.

4

On the basis of our previous arguments, the proposed model in this paper can be regarded

as an analogue to the one in Phillips and Hansen (1990), which studies the cointegration

regression with endogeneity. Their paper suggests that instrumental variable estimators

are consistent even when the IVs are stochastically independent of the regressors. Instead

of using IVs, we use a control function that represents the correlation between et and vt .

Hence the error term et can be written as

et =λ(vt )+ut , (1.5)

where vt and ut are independent. The control function approach has been frequently ap-

plied as an effective tool to deal with endogeneity problems. For example, Cai and Wang

(2014) investigated the predictive regression model and assumes that et and vt are corre-

lated and follow joint normal distributions. Hence λ(·) is specified as a linear function that

represents the projection of et on vt . Following the arguments in the previous discussion,

we do not impose parametric forms on λ(·) in order to avoid mis-specification. Hence it

can take any continuous functional form of vt .

Substitution of et in (1.2) with equation (1.5) leads to a semiparametric partially linear

model as

yt = x ′tβ+λ(vt )+ut . (1.6)

As the error term ut is independent with xt and vt , the problem of endogeneity disap-

pears by extending the regression equation using the control functions. Such partially lin-

ear model has been intensively studied and widely applied to various empirical problems;

see Robinson (1988), Gao (2007), Li and Racine (2007), etc. The identification conditions

require the matrix defined below to be positive definite.

Σ= E[(xt −E[xt |vt ])(xt −E[xt |vt ])′

].

We find that the conventional methods proposed in the papers mentioned above are still

valid in a sample perspective for our model even though the matrix Σ degenerates to zero

(or a zero matrix) when xt is defined as (1.3).

5

The remaining part of this paper is organized as follows. Section 2 proposes the estimation

steps following the conventional method in a sample representation. In Section 3, we intro-

duce the assumptions needed before we establish the asymptotic results of the proposed

estimators. Section 4 shows the Monte Carlo simulation results that verify the consistency

of the estimator. We also revisited the relationship between U.S. aggregate personal in-

come and consumption as a real example. Section 5 concludes our paper and the proofs of

the Theorems are provided in the Appendices.

2 Model estimation

2.1 Kernel estimation

In this section, we introduce the estimation methods of the semiparametric partially linear

model from a sample perspective. The semiparametric model (1.6) can be written in matrix

form as

Y = Xβ+λ(V )+U , (2.1)

where Y = (y1, y2, ..., yn)′, X = (x1, x2, ..., xn)′, xi = (xi 1, xi 2, ..., xi k )′ for i = 1,2, ...,n, λ(V ) =(λ(v1),λ(v2), ...,λ(vn))′, and U = (u1,u2, ...,un)′. On both sides of equation (2.1), apply non-

parametric kernel smoothing, we obtain

W (V )Y =W (V )Xβ+W (V )λ(V )+W (V )U , (2.2)

where W (V ) is the smoothing matrix with respect to V . For example2, using local constant

kernel smoothing, at time t , equation (2.2) becomes

n∑s=1

wns(t )ys =( n∑

s=1wns(t )xs

)′β+

n∑s=1

wns(t )λ(vs)+n∑

s=1wns(t )us , (2.3)

where wns(t ) is the element in W (V ) that

W (t , s) = wns(t ) = K( vs−vt

h

)∑n

q=1 K(

vq−vt

h

) , (2.4)

2For other smoothing methods, such as local linear or local polynomial kernel method, we always have such

smoothing matrix W but in more complex forms.

6

in which h is the bandwidth. For simplicity, we abbreviate W (V ) =W and subtract (2.2) on

both sides of (2.1),

(I −W )Y = (I −W )Xβ+ (I −W )λ(V )+ (I −W )U , (2.5)

where I is the n ×n identity matrix. Let Y = (I −W )Y , X = (I −W )X , λ(V ) = (I −W )λ(V ),

and U = (I −W )U , equation (2.5) can be written in the form as

Y = Xβ+ λ(V )+U . (2.6)

As λ(V ) is negligible3, the estimator for β is proposed as

β= (X ′X

)−1 (X ′Y

). (2.7)

Meanwhile, let z be a k × 1 vector of points at which the function λ(·) is supposed to be

evaluated, we can reveal the form of endogenous correlation by computing

λ(z) =W (V , z)(Y −X β

), (2.8)

where W (V , z) is the smoothing matrix with respect to z. The estimators proposed above

are infeasible to obtain because the smoothing variable V is unobservable. As V is the

error term in the data generating process for X , once the trend component g (·) is properly

estimated, it immediately gives the estimates of V . Hence, we define a feasible estimator

for β where V is replaced by its estimator V as follows.

β= (X ′X

)−1 (X ′Y

), (2.9)

where X = X −W X , Y = Y −W Y , and the new smoothing matrix W is defined as equation

(2.4) using V . We obtain V by V = X − g = X −W ∗X , in which W ∗ is the smoothing matrix

used to get the estimates of the time trends in X . For example, in the local constant kernel

smoothing

W (t , s) = wns(t ) =K

(vs−vt

h

)∑n

q=1 K(

vq−vt

h

) , (2.10)

3We can show this in the Appendix.

7

where vt = xt − g (τt ) and

g (τt ) =n∑

s=1w∗

ns(t )xs , (2.11)

in which

W ∗(t , s) = w∗ns(t ) = K2

(τs−τtb

)∑n

q=1 K2

(τq−τt

b

) . (2.12)

Therefore, the control function can be estimated by

λ(z) = W (V , z)(Y −X β), (2.13)

where β and W are defined in (2.9) and (2.10).

2.2 Non-zero intercept

Sometimes, the regression equation (1.2) may contain a non-zero intercept. For example,

yt =α+x ′tβ+et , (2.14)

whereα 6= 0. In order to eliminate the intercept, we subtract the data with its sample mean.

Let yt = yt − y , xt = xt −x , where y and x are the sample means of xt and yt respectively.

Therefore, we can rewrite the model as

yt = xt′β+et , (2.15)

which is the same as equation (1.2). The intercept term can be estimated by

α= 1

n

n∑t=1

(yt −x ′

t β)

, (2.16)

where β is the estimation of β following equation (2.9).

2.3 Bandwidth selection

Note that the estimation results depend highly on the values of bandwidths h and b be-

cause they control the smoothness of the kernel estimations. Hence, it is critical to employ

appropriate values of bandwidths for kernel smoothing. We use leave-one-out and leave-

r-out Cross-Validation methods respectively to select the optimal bandwidths for h and b;

8

see Fan and Gijbels (1996). The selection is based on a grid search procedure by minimizing

the objective functions defined as follows.

h∗ = arg minh

n∑t=1

(ηt − h−1(vt ,h)

)2, (2.17)

and

b∗ = arg minb

n∑t=1

(xt − g−r (τt ,b)

)2 , (2.18)

where h−1(vt ,h) and g−r (τt ,b) are the leave-one-out and leave-r-out estimators. For exam-

ple, when we estimate the conditional expectation of ηt give vt , we exclude the data point

(vt ,ηt ) to obtain the fitted value of h−1(vt ,h) using the nonparametric kernel method.

While estimating the time trend in xt , we ignore the neighboring 2r + 1 values of xt , i.e.,

we estimate the fitted trend at time t without using values from t − r to t + r .

Remark 2.1. When the error terms are i.i.d., r = 1 is sufficient to eliminate the information

at time t . However, when the error terms are autocorrelated, then we may need to increase

r to ensure that information at time t has been mostly eliminated, otherwise, we may end

up with a smaller value of bandwidth than it should be.

In the following section, we establish the asymptotic properties of the estimators, and ad-

dress the relevant assumptions needed to achieve the main results.

3 Main results

3.1 Assumptions

We introduce the following assumptions that are necessary for deriving the asymptotic the-

ories of the estimators. In this section, we only provide the principal regularity conditions

that describe the basic properties of variables and functions in the model. Some other as-

sumptions are stated in Appendix A as technical conditions required for conducting the

proofs. 4

4Throughout the paper, superscripts in brackets to the upper right side of any function represent the order of

derivatives. For example, ∂ f (x)∂x = f (1)(x),

∂3 ft1,t2,t3,t4 (x1,x2,x3,x4)∂x1∂x3∂x4

= f (3)(1,3,4)t1,t2,t3,t4

(x1, x2, x3, x4).

9

Assumption 3.1. Let g (τ) = (g1(τ), g2(τ), ..., gk (τ))′ be a k ×1 vector of functions, and gi (·)is a bounded function defined on [0,1] to R1 with continuous first order derivatives for

i = 1,2, ...,k. We assume that∫ 1

0 g (τ)dτ= 0 and∫ 1

0g (τ)g (τ)′dτ,Q, (3.1)

where the k ×k matrix Q is positive definite.

Assumption 3.2. λ(·) is a continuous and differentiable function defined on Rk → R1. For

any k-dimensional vector z = (z1, z2, ..., zk )′, denote its first order partial derivatives (the

gradient) as

ζ(z) =(∂λ(z)

∂z1, ...,

∂λ(z)

∂zk

)′. (3.2)

Assumption 3.3. (i) Let wt = ut or wt = vt . wt is a stationary α-mixing time series with

mixing-coefficient α(·) satisfying∞∑

t=1α

δ2+δ (t ) <∞, (3.3)

for some δ> 0 such that

E[||vt ||2+δ

]<∞, (3.4)

and

E[|ut |2+δ

]<∞. (3.5)

(ii) The sequences ut and vs are independent for t , s = 1,2, ...n.

Assumption 3.4. Let f (z) and fi1,...,ip (z1, ..., zp ) be the marginal and the joint probability

density functions of the time series vt for t = 1,2, ...,n. There always exists a function

δp (t1, t2, ..., tp ), such that∣∣∣∣∣ ft1,t2,...,tp (v1, v2, ..., vp )−p∏

i=1f (vi )

∣∣∣∣∣≤ δp (t1, t2, ..., tp ), (3.6)

for p = 2,3, ...,6, and t1 6= t2 6= ... 6= tp , and

n∑t1,t2,...,tp=1t1 6=t2 6=... 6=tp

δp (t1, t2, ..., tp ) =O(np−1). (3.7)

10

Assumption 3.5. K (·) and K2(·) are symmetric and continuous kernel functions that sat-

isfy∫

K (u)du = 1,∫

u2i+1K j (u)du = 0,∫

u2i K j (u)du < ∞, for i = 0,1,2, and j = 1,2,3,4.∫K2(u)du = 1,

∫u2K2(u)du <∞, and

∫ |K ′(u)|du <∞.

Assumption 3.6. When n → ∞, the bandwidths h and b satisfies h → 0,b → 0,nh2k →∞,nb2 →∞,b2/nh3k → 0 where k is the number of regressors in the model.

Remark 3.1. Most of the assumptions listed above are quite common in research papers

with regard to α-mixing error terms and nonparametric kernel estimations. Assumption

3.1 regulates the trend component in the regressors. By ruling out the case of collinearity,

it ensures that the coefficients can be properly identified. The boundedness indicates a

weaker case of nonstationarity compared to unit roots, and affects the rate of convergence

of the estimators. The condition that∫

g (τ)dτ = 0 can be easily satisfied if we centralize

the data by subtracting the mean. Assumption 3.3 contains the standard requirements

for the stationary α-mixing time series, which is the weakest condition among the weakly

dependent time series. The independence of ut and vt guarantees that the endogenous

correlation is separable by the control function. The restriction in Assumption 3.4 is rea-

sonable for the weakly dependent time series vt as the joint probability density converges

to the product of marginal densities when the distances between the time index become

sufficiently large. This assumption is verified by an example of AR(1) process in the be-

ginning of Appendix A. The conditions in Assumption 3.5 can be easily satisfied, for exam-

ple, the Epanechnikov kernel functions. The conditions in Assumption 3.6 are valid when

the bandwidth is selected as the usual optimal bandwidth, which has the same order as

Op (n− 15 ).

3.2 Asymptotic theory

The first Theorem ensures that the information matrix converges to a positive definite limit

and therefore, the estimators are identifiable and properly defined.

Theorem 3.1. Under Assumptions 3.1 to 3.6, as n →∞, we have

11

(i)

Σn = 1

n

n∑t=1

xt x ′t −→P Q; (3.8)

(ii)

Σn = 1

n

n∑t=1

xt x ′t −→P Q. (3.9)

Remark 3.2. In fact, Q is the generalized ’variance-covariance matrix’ that measures the

variation in the time trends. Therefore, in the univariate case, a relatively flat time trend

leads to a small value of Q, which indicates insufficient information in the trend compo-

nent, and thus will cause inefficiency in the estimators as suggested in the next Theorem.

The detailed proofs of this Theorem are attached in Appendix B.

Theorem 3.2. Let Assumptions 3.1 to 3.6 hold, as n →∞,

(i)p

n(β−β)−→D N (0,Ω), (3.10)

(ii)p

n(β−β)−→D N (0,Ω), (3.11)

where Ω = Q−1ΛQ−1, and Λ is the long-run variance-covariance matrix of Lt = vt ut . We

then define an estimator forΩ that Ω= Q−1ΛQ−1, in which Q = n−1 ∑nt=1 xt x ′

t and5

Λ=p∑

l=−pΓL(l ). (3.12)

When computing the long-run variance, ΓL(l ) is the l th sample variance-covariance matrix

of Lt , where Lt = vt ut , vt = xt − g (τt ), and ut = yt −x ′t β− λ(vt ).

Remark 3.3. Theorem 3.2 shows that the estimators are unbiased and converge to the true

value at the rate ofp

n. We show in the Appendix that the potential bias terms converge to

zero at a faster rate thanp

n. Therefore, the estimators are unbiased because the bias term

is op (1) even multiplied byp

n. The asymptotic variance depends on the matrix Q and the

long-run variance of vt and ut .

5There are several methods to obtain consistent estimates of the long-run variance matrix. The integer p is

usually taken as [p

n]−, the integer part ofp

n.

12

The estimation methods as well as the asymptotic properties can be applied to a broader

range of models. For example, when estimating a similar liner regression model with known

structural breaks, we can implement the proposed estimation methods for each regime.

We will demonstrate this generalization in the empirical example.

4 Examples

4.1 A simulated example

We consider a linear regression model with data generating process designed as follows.yt = 0.7xt +et ,

xt = g (τt )+ vt ,

et = 2vt +ut ,

(4.1)

where xt contains deterministic time trends in the form of (1) linear, (2) polynomial (quadratic),

and (3) periodic (Sin, Cos) functions. The endogenous correlation is λ(v) = 2v . vt and ut

follow AR(1) process that vt = 0.1vt−1 + ξt and ut = 0.1ut−1 +ηt , where ξt and ηt follow

i.i.d. uniform distribution from −0.5 to 0.5. Following are three cases of time trend specifi-

cations.

• case I: g (τt ) =−1+2τt ;

• case II: g (τt ) =−0.25−2τt +3τ2t +τ3

t ;

• case III: g (τt ) = 0.8sin(2πτt ).

With sample size 400, 600 and 1000 respectively, we generate the data and estimate β. We

also estimate the IV estimator and the OLS estimator for comparison of the asymptotic

performances. As the time trends are deterministic components, hence exogenous, there-

fore, it is natural to take the estimated time trends as IVs. Let zt = g (τt ), the IV estimator is

defined as

βIV =( n∑

t=1zt xt

)−1 ( n∑t=1

zt yt

), (4.2)

Repeat the simulation procedure 500 times, we can obtain 500 estimates of β, βIV , and

βOLS . Recall that the true value of β = 0.7, therefore, the bias, together with the standard

13

deviation and the root of mean squared errors are computed and reported in Table 1 below.

(I), (II) and (III) represent three cases of time trends in the data generating process.

Table 1: Estimation results for β= 0.7 with simulated data

β βIV βOLS

n 400 600 1000 400 600 1000 400 600 1000

(I)

Bias 0.0141 0.0098 0.0084 0.0262 0.0174 0.0134 0.4042 0.4036 0.4048

Std 0.0585 0.0504 0.0373 0.0598 0.0510 0.0382 0.0382 0.0320 0.0243

RMSE 0.0602 0.0514 0.0382 0.0653 0.0539 0.0405 0.4060 0.4049 0.4056

(II)

Bias 0.0097 0.0072 0.0036 0.0195 0.0137 0.0072 0.3355 0.3365 0.3365

Std 0.0529 0.0426 0.0330 0.0534 0.0434 0.0340 0.0371 0.0292 0.0232

RMSE 0.0538 0.0432 0.0332 0.0568 0.0455 0.0347 0.3376 0.3378 0.3373

(III)

Bias -0.0043 -0.0102 -0.0138 0.0324 0.0234 0.0172 0.4157 0.4163 0.4161

Std 0.0607 0.0510 0.0378 0.0627 0.0525 0.0387 0.0381 0.0333 0.0246

RMSE 0.0608 0.0520 0.0402 0.0706 0.0575 0.0423 0.4175 0.4176 0.4169

Table 1 shows that the proposed estimator β converges to its true value, as the values of

bias are quite small and the values of the RMSE decrease at the rate ofp

n when the sam-

ple size increases. As expected, the OLS estimator is observed to contain a positive bias,

hence it is inconsistent due to the problem of endogeneity. The IV estimator underper-

forms the proposed estimator in all three cases because the IVs are still correlated with the

error term in the regression. In fact, they are just the smoothed versions of the regressors,

hence the correlation has not been removed, though it has been weakened in the process

of nonparametric kernel smoothing.

4.2 An empirical example

We consider the relationship between the aggregate personal consumption and income.

We use the quarterly data of U.S. aggregate personal income and personal consumption

expenditure from 1947 to 2009. The plot of the data is shown in Figure 1 below.

14

01−1940 01−1960 01−1980 01−2000 01−20206.5

7

7.5

8

8.5

9

9.5

Time

Log(

Inco

me)

,Log

(Con

sum

ptio

n) (

Bill

ion

Dol

lars

)

Log(Personal Income)Log(Personal Consumption Expenditure)

Figure 1: Logarithm of aggregate personal consumption and income (1947 - 2009, quarterly).

As the coefficients in the model may not be a constant value over 62 years, we divide the

whole time period into three subperiods with equal lengths and allow the coefficient β to

be three different values for each time period. Meanwhile, subtract the mean of the data in

each period to ensure that the intercept term can be ignored in the regression. In fact, we

are estimating a liner model with structural breaks formulated asc1t = i1tβ1 +λ(vt )+ut for 1 ≤ t ≤ t1,

c2t = i2tβ2 +λ(vt )+ut for t1 +1 ≤ t ≤ t2,

c3t = i3tβ3 +λ(vt )+ut for t2 +1 ≤ t ≤ n,

(4.3)

where t1 = [n/3], t2 = [2n/3]6, and c j t = ct −c j ,i j t = it − i j , for j = 1,2,3. ct and it are

the logarithm of aggregate personal consumption and personal income respectively, while

c j and i j are the averages of ct and it in period j . Meanwhile, in each time period, the

6[x] means the integer part of the real value x.

15

logarithm of aggregate income has an unknown time trend as

i j t = g j (τt )+ vt , (4.4)

where g j (τt ) can be estimated using nonparametric kernel method for τt ∈ [0,1]. In each

time period, we estimate β as well as its standard errors. The values of the estimates are

reported in Table 2.

1947q1-1967q4 1968q1-1988q4 1989q1-2009q4

β(s.e.)

0.9470(0.0054)

1.0006(0.0080)

1.0828(0.0049)

Table 2: Estimates of β with standard errors in the parentheses.

As the estimated coefficients are increasing over time, Table 2 suggests that the aggregate

consumption becomes more sensitive to income changes. Furthermore, the changes are

significantly non-zero as the gaps between the coefficients are several times the standard

deviations. Once we have the estimates of the coefficients in each period, we can obtain

the residuals by et = c j t − i j t β j . Therefore, we are able to evaluate the control function λ(·)over a sequence of points zi from -0.018 to 0.018 with intervals of 0.0005 through nonpara-

metric kernel estimation.

In Figure 2, the the solid line represents the estimated values of λ(·), and the gray area

forms its 90% confidence band. As in the nonparametric estimation, the data speaks for

themselves, hence the estimated solid line is quite informative for suggesting an adequate

parametric form for the endogenous correlation. Particularly in this example, it can be

closely approximated by a cubic function as

λi =−0.3497zi(0.0093)∗∗∗

+1.3071z2i

(0.2646)∗∗∗−2303.9z3

i(42.93)∗∗∗

, (4.5)

for zi ∈ [−0.018,0.018]. As the fitted cubic line lies perfectly inside the 90% confidence

band, it is reasonable to claim that the endogenous correlation should follow a cubic para-

metric form.

16

−0.015 −0.01 −0.005 0 0.005 0.01 0.015−0.03

−0.02

−0.01

0

0.01

0.02

0.03

Estimated vt

Est

imat

ed λ

(vt)

90% Confidence Band

Estimated λ(vt)

Cubic approximation

Figure 2: Nonparametric kernel estimation of λ(·), its 90% confidence band, and a cubic approximation.

Remark 4.1. Since the dotted zeros line is not contained in the 90% confidence band, the

control function is significantly nonzero, and this conclusion verifies the existence of the

endogenous correlation between aggregate income and consumption. Therefore, direct

OLS regression leads to biased and inconsistent estimations and it is necessary to take into

account the problem of endogeneity.

In addition, as the error terms are supposed to be stationary time series, we examine the

plots of residuals as well as their autocorrelation functions in Figure 3 and Figure 4 below.

By visual inspection, we are able to conclude that all the residuals fluctuate around a sta-

ble value and do not contain nonstationary trends. Figure 4 suggests that the residuals vt

behaves quite close to a white noise process though there exists slightly significant auto-

correlations at some lags. ut seems to follow an AR process since its sample autocorrelation

function is decaying over the lags.

17

0 50 100 150 200 250 300

−0.02

0

0.02

Est

imat

ed v

t

0 50 100 150 200 250 300−0.05

0

0.05

Est

imat

ed e

t

0 50 100 150 200 250 300−0.05

0

0.05

Est

imat

ed u

t

t

Figure 3: Plot of the residuals, vt , et , and ut .

0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

Lag

AC

F

Sample Autocorrelation Function of Estimated vt

0 2 4 6 8 10 12 14 16 18 20−0.4

−0.2

0

0.2

0.4

0.6

0.8

Lag

AC

F

Sample Autocorrelation Function of Estimated ut

Figure 4: Plot of the autocorrelation functions of vt and ut .

Since we have shown that the endogenous correlation does exist, the conventional OLS es-

timator would be biased and inconsistent. The bias can be corrected through the use of

control functions, and such correction brings better out-of-sample forecast performance

18

in real examples. We demonstrate this advantage by computing the l-step ahead forecasts

using rolling samples with length n0 = [n/3] for the proposed estimator and the OLS esti-

mator. The forecast procedures are stated below.

Suppose at time t , 1 ≤ t ≤ n −n0 − l ,

Step 1: Construct the initial training data set as

D0 = xt , ..., xt+n0−1; yt , ..., yt+n0−1.

Using D0, we are able to estimate β and βol s , and the function λ(·) .

Step 2: For one-step-ahead forecast, take xt+n0 as given. Estimate the nonlinear time

trend g (·) using

D1 = xt , ..., xt+n0−1, xt+n0 .

Therefore, we can estimate the shock vt+n0 = xt+n0 − g t (1).

Step 3: Calculate the forecast of yt+n0 for the proposed model

yt+n0 = xt+n0 β+ λ(vt+n0 ), (4.6)

and the forecast of conventional linear regression model using OLS estimates

y∗t+n0

= xt+n0 βol s . (4.7)

Step 4: For the case when l > 1, update the training data sets by adding forecast values

of yt under the proposed model and the OLS regression respectively, that

D′0 = xt , ..., xt+n0−1, xt+n0 ; yt , ..., yt+n0−1, yt+n0 ,

and

D∗0 = xt , ..., xt+n0−1, xt+n0 ; yt , ..., yt+n0−1, y∗

t+n0.

Then repeat Step 2 and Step 3 for a further ’one-step-ahead forecast’ usingD′0 andD∗

0 until

the l-step-ahead forecast is obtained. To measure the forecast performance, we follow the

equation for computing the root mean squared forecast error as

RMSF E(l ) =√√√√ 1

n −n0 − l +1

n−l∑t=n0

1

l

l∑p=1

(yt+p − yt+p )2, (4.8)

19

where yt+p is the p-step ahead forecast, for p = 1,2, ..., l . We computed the RMSF E for

l = 1,2,3,4. The statistics of RMSF E are reported in the table below. 7

l 1 2 3 4

RMSF E 0.0144 0.0153 0.0157 0.0160

RMSF EOLS 0.0254 0.0254 0.0255 0.0255

Table 3: RMSFE for our model compared to conventional OLS regression.

Remark 4.2. Table 3 shows that on average, the RMSF E is reduced by approximately 40%

when considering the endogenous correlation. Though the estimations of β may be quite

close, the forecast performance can be quite different because the time series contains

nonstationary trends, hence the bias in βol s is magnified when it is used to forecast future

values of yt .

5 Conclusions and discussion

In this paper, we have studied the linear regression model with endogenous trending time

series. The regressors are supposed to contain a flexible deterministic time trend that can

be estimated nonparametrically. This form of nonstationarity is taken as an complement

to unit root process, which exhibits stochastic trends. We also consider the problem of

endogeneity that usually occur in practice. The control function approach is employed to

deal with the endogenous correlation, and extends the linear regression to a semiparamet-

ric partially linear model.

We have proved that the conventional methods are till applicable for the model proposed

in this paper, though the usual identification condition is not satisfied. Asymptotic proper-

ties show that the estimators arep

n-consistent with a normal limiting distribution. Such

7For some t , the estimated vt+n0+l−1 is an outlier among vt , hence we cannot calculate λ(vt+n0+l−1) using

nonparametric kernel. Therefore, we ignored the time points of this case.

20

properties are examined by Monte Carlo simulations.

The relationship between aggregate income and aggregate consumption is revisited as an

empirical example. We found that the problem of endogeneity significantly exists in the

regression. In addition, our proposed model performs better than the usual ordinary least

squares in terms of out-of-sample forecast.

The model can be extended in several directions. First, we can allow the coefficients to

be time-varying as the real example suggests that the coefficients can hardly remain at a

constant value. Second, as the ’curse of dimensionality’ may occur when the dimension

becomes large, hence we probably need to consider the control function in a single-index

form. Finally, we can consider the nonlinear regression with a nonparametric regression

form that avoids the risk of mis-specification.

References

Bierens, H. J. (1997), ‘Testing the unit root with drift hypothesis against nonlinear trend

stationarity, with an application to the us price level and interest rate’, Journal of Econo-

metrics 81(1), 29–64.

Cai, Z. (2007), ‘Trending time-varying coefficient time series models with serially correlated

errors’, Journal of Econometrics 136(1), 163–188.

Cai, Z. and Wang, Y. (2014), ‘Testing predictive regression models with nonstationary re-

gressors’, Journal of Econometrics 178(1), 4–14.

Chen, J., Gao, J., Li, D. et al. (2012), ‘Estimation in semi-parametric regression with non-

stationary regressors’, Bernoulli 18(2), 678–702.

DeJong, D. N., Nankervis, J. C., Savin, N. E. and Whiteman, C. H. (1992), ‘The power prob-

lems of unit root test in time series with autoregressive errors’, Journal of Econometrics

53(1), 323–343.

21

Diebold, F. X. and Rudebusch, G. D. (1991), ‘On the power of dickey-fuller tests against

fractional alternatives’, Economics letters 35(2), 155–160.

Fan, J. and Gijbels, I. (1996), Local polynomial modelling and its applications: monographs

on statistics and applied probability 66, Vol. 66, CRC Press.

Gao, J. (2007), Nonlinear time series: semiparametric and nonparametric methods, Chap-

man & Hall, London.

Gao, J. and Hawthorne, K. (2006), ‘Semiparametric estimation and testing of the trend of

temperature series’, The Econometrics Journal 9(2), 332–355.

Hall, R. E. (1978), ‘Stochastic implications of the life cycle-permanent income hypothesis:

Theory and evidence’, The Journal of Political Economy 86(6), 971–987.

Kwiatkowski, D., Phillips, P. C., Schmidt, P. and Shin, Y. (1992), ‘Testing the null hypothesis

of stationarity against the alternative of a unit root: How sure are we that economic time

series have a unit root?’, Journal of econometrics 54(1), 159–178.

Li, Q. and Racine, J. S. (2007), Nonparametric econometrics: theory and practice, Princeton

University Press.

Nelson, C. R. and Plosser, C. R. (1982), ‘Trends and random walks in macroeconmic time

series: some evidence and implications’, Journal of Monetary Economics 10(2), 139–162.

Perron, P. (1990), ‘Testing for a unit root in a time series with a changing mean’, Journal of

Business & Economic Statistics 8(2), 153–162.

Phillips, P. C. (2005), ‘Challenges of trending time series econometrics’, Mathematics and

Computers in Simulation 68(5), 401–416.

Phillips, P. C. and Hansen, B. E. (1990), ‘Statistical inference in instrumental variables re-

gression with I(1) processes’, The Review of Economic Studies 57(1), 99–125.

Robinson, P. (1988), ‘Root-n-consistent semiparametric regression’, Econometrica

56(4), 931–954.

22

Robinson, P. M. (1989), Nonparametric estimation of time-varying parameters, Springer.

Robinson, P. M. (1991), Time-varying nonlinear regression, Springer.

Robinson, P. M. (2012), ‘Nonparametric trending regression with cross-sectional depen-

dence’, Journal of Econometrics 169(1), 4–14.

Zivot, E. and Andrews, D. (1992), ‘Further evidence on the great crash, the oil-price shock,

and the unit-root’, Journal of Business & Economic Statistics 10(0), 3.

23

Appendix A Some remarks on the Assumptions

A.1 Verification of Assumption 3.4

We assumed that the stationary α-mixing sequence vt satisfies∣∣∣∣∣ ft1,t2,...,tp (v1, v2, ..., vp )−p∏

i=1f (vi )

∣∣∣∣∣≤ δp (t1, t2, ..., tp ), (A.1)

wheren∑

t1,t2,...,tp=1t1 6=t2 6=... 6=tp

δp (t1, t2, ..., tp ) =O(np−1), (A.2)

for p = 2,3, ...,6. In addition to the mixing conditions, this assumption describes the asymp-

totic independence of the mixing sequence in terms of the joint density and the marginal

density. Suppose p = 2, and the sequence vt follows AR(1) process as

vt = ρvt−1 +εt , (A.3)

where 0 < ρ < 1, and εti .i .d .∼ N (0,1−ρ2). Therefore, the marginal distribution of vt is the

standard normal distribution. Meanwhile, let j = |s − t |, the joint density of vt and vs is

f j (x, y) = 1

2π√

1−ρ2 jexp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

). (A.4)

Therefore, our objective is to show that

n∑t=1

n∑s=1s 6=t

δ2(t , s) =O(n), (A.5)

where δ2(t , s) is an upper bound of | fs,t (x, y)− f (x) f (y)|. Note that

| f j (x, y)− f (x) f (y)| =

∣∣∣∣∣∣∣1

2π√

1−ρ2 jexp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)− 1

2πexp

(−x2 + y2

2

)∣∣∣∣∣∣∣≤

∣∣∣∣∣∣∣1

2π√

1−ρ2 jexp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)− 1

2π√

1−ρ2 jexp

(−x2 + y2

2

)∣∣∣∣∣∣∣+

∣∣∣∣∣∣∣1

2π√

1−ρ2 jexp

(−x2 + y2

2

)− 1

2πexp

(−x2 + y2

2

)∣∣∣∣∣∣∣, F1( j )+F2( j ), (A.6)

24

where

F1( j ) =

∣∣∣∣∣∣∣1

2π√

1−ρ2 jexp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)− 1

2π√

1−ρ2 jexp

(−x2 + y2

2

)∣∣∣∣∣∣∣≤

∣∣∣∣∣∣∣1

2π√

1−ρ2 j

∣∣∣∣∣∣∣∣∣∣∣exp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)−exp

(−x2 + y2

2

)∣∣∣∣≤

∣∣∣∣ 1

2π

∣∣∣∣ ∣∣∣∣exp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)−exp

(− x2 + y2

2(1−ρ2 j )

)+exp

(− x2 + y2

2(1−ρ2 j )

)−exp

(−x2 + y2

2

)∣∣∣∣≤

∣∣∣∣ 1

2π

∣∣∣∣ ∣∣∣∣exp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)−exp

(− x2 + y2

2(1−ρ2 j )

)∣∣∣∣+

∣∣∣∣ 1

2π

∣∣∣∣ ∣∣∣∣exp

(− x2 + y2

2(1−ρ2 j )

)−exp

(−x2 + y2

2

)∣∣∣∣, F11( j )+F12( j ). (A.7)

Let A = (x2 + y2)/2 > 0, therefore,∣∣∣∣∣ ∞∑j=1

F11( j )

∣∣∣∣∣≤ ∞∑j=1

∣∣∣∣ 1

2π

∣∣∣∣ ∣∣∣∣exp

(−x2 + y2 −2ρ j x y

2(1−ρ2 j )

)−exp

(− x2 + y2

2(1−ρ2 j )

)∣∣∣∣=

∣∣∣∣ 1

2π

∣∣∣∣ ∞∑j=1

∣∣∣∣exp

(− x2 + y2

2(1−ρ2 j )

)(exp

(ρ j x y

(1−ρ2 j )

)−1

)∣∣∣∣≤

∣∣∣∣ 1

2πexp

(−x2 + y2

2

)∣∣∣∣ ∞∑j=1

∣∣∣∣(exp

(ρ j x y

(1−ρ2 j )

)−1

)∣∣∣∣=

∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑j=1

∣∣∣∣∣ ∞∑k=1

(x y)k

k !

(ρ j

1−ρ2 j

)k ∣∣∣∣∣≤∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !

∞∑j=1

(ρ j

1−ρ2 j

)k

.

(A.8)

Also note that∣∣∣∣∣ ∞∑j=1

F12( j )

∣∣∣∣∣≤ ∞∑j=1

∣∣∣∣ 1

2π

∣∣∣∣ ∣∣∣∣exp

(− x2 + y2

2(1−ρ2 j )

)−exp

(−x2 + y2

2

)∣∣∣∣=

∞∑j=1

∣∣∣∣exp(−A)

2π

∣∣∣∣ ∣∣∣∣exp

(− A

(1−ρ2 j )+ A

)−1

∣∣∣∣= exp(−A)

2π

∞∑j=1

∣∣∣∣exp

(− ρ2 j A

(1−ρ2 j )

)−1

∣∣∣∣=exp(−A)

2π

∞∑j=1

∣∣∣∣∣ ∞∑k=1

(−1)k Ak

k !

(ρ2 j

1−ρ2 j

)k ∣∣∣∣∣≤ exp(−A)

2π

∞∑k=1

Ak

k !

∞∑j=1

(ρ2 j

1−ρ2 j

)k

. (A.9)

Meanwhile,∣∣∣∣∣ ∞∑j=1

F2( j )

∣∣∣∣∣≤ ∞∑j=1

∣∣∣∣∣∣∣1

2π√

1−ρ2 jexp

(−x2 + y2

2

)− 1

2πexp

(−x2 + y2

2

)∣∣∣∣∣∣∣=

∞∑j=1

exp(−A)

2π

∣∣∣∣∣∣∣1√

1−ρ2 j−1

∣∣∣∣∣∣∣=∞∑

j=1

exp(−A)

2π

∣∣∣∣∣ ∞∑k=1

(−1)k (2k −1)!!ρ2 j k

2k k !

∣∣∣∣∣25

=exp(−A)

2π

∣∣∣∣∣ ∞∑k=1

(−1)k (2k −1)!!

2k k !

∣∣∣∣∣ ∞∑j=1

ρ2 j k ≤ exp(−A)

2π

∞∑k=1

∣∣∣∣∣ (−1)k (2k −1)!!

2k k !

∣∣∣∣∣∣∣∣∣∣ ρ2k

1−ρ2k

∣∣∣∣∣= exp(−A)

2π

∞∑k=1

∣∣∣∣ (2k −1)

2k× 2k −3

2(k −1)×·· ·× 5

6× 3

4× 1

2

∣∣∣∣∣∣∣∣∣ ρ2k

1−ρ2k

∣∣∣∣∣≤ exp(−A)

2π

∞∑k=1

∣∣∣∣∣ ρ2k

1−ρ2k

∣∣∣∣∣ . (A.10)

Define8 k1 =[

ln(1/2)2lnρ

]+1, for 0 < ρ < 1. Therefore, we have 1−ρ2 j > 1/2 when j > k1. Note

that∣∣∣∣∣ ∞∑j=1

F11( j )

∣∣∣∣∣≤∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !

∞∑j=1

(ρ j

1−ρ2 j

)k

≤∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !

k1∑j=1

(ρ j

1−ρ2 j

)k

+∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !

∞∑j=k1+1

(ρ j

1−ρ2 j

)k

≤∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !C +

∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !

∞∑j=k1+1

(ρ j

1/2

)k

=∣∣∣∣exp(−A)C

2π

∣∣∣∣ ∞∑k=1

(|x y |)k

k !+

∣∣∣∣exp(−A)

2π

∣∣∣∣ ∞∑k=1

(2|x y |)k

k !

ρk1+1

1−ρk

=∣∣∣∣exp(−A)C

2π

∣∣∣∣(exp(|x y |)−1)+ ∣∣∣∣exp(−A)

2π

∣∣∣∣ρk1+1k1∑

k=1

(2|x y |)k

k !(1−ρk )+

∣∣∣∣exp(−A)

2π

∣∣∣∣ρk1+1∞∑

k=k1+1

(2|x y |)k

k !(1−ρk )

≤∣∣∣∣exp(−A)C

2π

∣∣∣∣(exp(|x y |)−1)+ ∣∣∣∣exp(−A)

2π

∣∣∣∣ρk1+1C +∣∣∣∣exp(−A)

2π

∣∣∣∣ρk1+1∞∑

k=k1+1

(2|x y |)k

k !(1/2)

≤∣∣∣∣exp(−A)C

2π

∣∣∣∣(exp(|x y |)−1)+ ∣∣∣∣exp(−A)

2π

∣∣∣∣ρk1+1C +∣∣∣∣exp(−A)

π

∣∣∣∣ρk1+1 (exp(2|x y |)−1

)<∞.

(A.11)

Follow the same method, it can be easily shown that∣∣∣∣∣ ∞∑j=1

F12( j )

∣∣∣∣∣<∞, (A.12)

and ∣∣∣∣∣ ∞∑j=1

F2( j )

∣∣∣∣∣<∞. (A.13)

Therefore,∞∑

j=1| f j (x, y)− f (x) f (y)| <∞, (A.14)

8[x] denotes the integer part of x.

26

and it suffices to show that

n∑t=1

n∑s=1s 6=t

δ2(t , s) =n−1∑t=1

n∑s=t+1

δ2(t , s) =n−1∑t=1

n−t∑j=1

δ2(t , s) =O(n). (A.15)

This result can be generalized to the cases when there are more than two variables. i.e.,

n∑t1,t2,...,tp=1t1 6=t2 6=... 6=tp

δp (t1, t2, ..., tp ) =O(np−1). (A.16)

A.2 Additional assumptions for proofs

In addition to the Assumptions listed in the paper, we need some extra conditions to hold

for the marginal and joint density of the stationary mixing time series vt .

1.∫ ∣∣∣ w

f (z)2

∣∣∣d z <∞, where w can be 1, z, or ξ j (z) for j = 1,2, ...,k;

2.

∣∣∣∣∫ f(2)(i1,i2)

t1,t2,t3(z,z,z)

f (z)2 d z

∣∣∣∣<∞, for 1 ≤ i1 < i2 ≤ 3;

3.∣∣∫ ζi (z) f (1)(z)d z

∣∣<∞, for i = 1,2, ...,k;

4.∣∣∫ z f (2)(z)d z

∣∣<∞;

5. maxs 6=t

∣∣∣∫ ft ,s (z,z)f (z)2 d z

∣∣∣<∞;

6. maxt 6=s

∣∣∣∫ z2+δ ft ,s (z,z)f (z)2+δ d z

∣∣∣<∞;

7. maxt1 6=t2 6=t3 6=t4 6=t5

∣∣∣∫ ζi (z)ζ j (z)f (z)4 f

(p)(i1,...,ip )t1,t2,t3,t4,t5

(z, z, z, z, z)d z∣∣∣ < ∞, for 1 ≤ i1 < i2 < ... < ip ≤ 5,

i , j = 1,2, ...,k;

8. maxt1 6=t2 6=t3 6=t4 6=t5 6=t6

∣∣∣∣Î ζi (z1)ζ j (z2) f(p)(i1,...,ip )

t1,t2,t3,t4,t5,t6(z1,z2,z1,z1,z2,z2)

f (z1)2 f (z2)2 d z1d z2

∣∣∣∣<∞; for i , j = 1,2, ...,k;

9. maxt1 6=t2 6=t3

∣∣∣∣∫ ζi (z) j ft1,t2,t3 (z,z,z)f (z)4 d z

∣∣∣∣<∞ for i = 1,2, ...,k, and j = 0,2;

10. maxt1 6=t2 6=t3

∣∣∣∣∫ ζi (z)ζ j (z) f(2)(i1,i2)

t1,t2,t3(z,z,z)

f (z)4 d z

∣∣∣∣<∞, for i , j = 1,2, ...,k;

11. We require that for some vector of generic small values ε= (ε1,ε2, ...,εp )′, there exists

a function m(z, z, ..., z) such that

∣∣ fi1,...,ip (z +ε1, ..., z +εp )− fi1,...,ip (z, ..., z, ..., z)∣∣≤ m(z, z, ..., z)||ε||, (A.17)

and ∫zi f (z) j (

f (q)(z))r

m(z, ..., z)d z <∞,

27

for 0 ≤ i ≤ 2+δ, −2−δ≤ j ≤ 1, q = 1,2 and r = 0,1.

Remark A.1. These assumptions can be easily verified. For instance, let k = 1, λ(z) = z, and

vt follow i.i.d. uniform distribution form -1 to 1. Also let ζ(z) =λ(1)(z) = 1, fi1,i2,...,ip (z, z, ..., z) =1/2p for z ∈ [−1,1]. It is straightforward that Assumptions 1 to 10 are satisfied. In addition,

Assumption 11 is the Lipschitz condition commonly seen for most nonparametric estima-

tions.

Appendix B Proofs of the main Theorems

B.1 Proof of Theorem 3.1(i)

Proof. Recall that xt is a k-dimensional vector of trending time series sequences, hence

Σn is a k ×k matrix. To prove that Σnp−→ Q, it suffices to show that Σn(i , j )

p−→ Q(i , j ), for

i , j = 1, ...,k. Note that xi t = xi t −∑ns=1 wns(t )xi s and xi t = gi (τt )+ vi t , therefore,

Σn(i , j ) = 1

n

n∑t=1

xi t x j t = 1

n

n∑t=1

(xi t −

n∑s=1

wns(t )xi s

)(x j t −

n∑s=1

wns(t )x j s

)= 1

n

n∑t=1

(gi (τt )−

n∑s=1

wns(t )gi (τs)+ vi t −n∑

s=1wns(t )vi s

)×

(g j (τt )−

n∑s=1

wns(t )g j (τs)+ v j t −n∑

s=1wns(t )v j s

)= 1

n

n∑t=1

(gi (τt )−

n∑s=1

wns(t )gi (τs)

)(g j (τt )−

n∑s=1

wns(t )g j (τs)

)+ 1

n

n∑t=1

(vi t −

n∑s=1

wns(t )vi s

)(v j t −

n∑s=1

wns(t )v j s

)+ 1

n

n∑t=1

(gi (τt )−

n∑s=1

wns(t )gi (τs)

)(v j t −

n∑s=1

wns(t )v j s

)+ 1

n

n∑t=1

(vi t −

n∑s=1

wns(t )vi s

)(g j (τt )−

n∑s=1

wns(t )g j (τs)

),S1(i , j )+S2(i , j )+S12(i , j )+S21(i , j ). (B.1)

Let g i ,n = n−1 ∑nt=1 gi (τt ) for i = 1,2, ...,k. Therefore, g i ,n denotes the sample average of

the trend component of xi . We can further decompose S1(i , j ) as

S1(i , j ) = 1

n

n∑t=1

(gi (τt )−

n∑s=1

wns(t )gi (τs)

)(g j (τt )−

n∑s=1

wns(t )g j (τs)

)

28

= 1

n

n∑t=1

(gi (τt )−g i ,n +g i ,n −

n∑s=1

wns(t )gi (τs)

)(g j (τt )−g j ,n +g j ,n −

n∑s=1

wns(t )g j (τs)

)= 1

n

n∑t=1

(gi (τt )−g i ,n

)(g j (τt )−g j ,n

)+ 1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)(g j ,n −

n∑s=1

wns(t )g j (τs)

)+ 1

n

n∑t=1

(gi (τt )−g i ,n

)(g j ,n −

n∑s=1

wns(t )g j (τs)

)+ 1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)(g j (τt )−g j ,n

),M1(i , j )+M2(i , j )+M12(i , j )+M21(i , j ). (B.2)

The following proposition states the probability limits of each term in (B.1) and (B.2), based

on which we are able to complete the proof.

Proposition B.1. Under Assumptions 3.1 to 3.6, for any i , j = 1,2, ...,k, as n →∞,

(i) M1(i , j )p−→Q(i , j ), M2(i , j )

p−→ 0, M12(i , j )p−→ 0, M21(i , j )

p−→ 0, hence S1(i , j )p−→Q(i , j );

(ii) S2(i , j )p−→ 0;

(iii) S12(i , j )p−→ 0,S21(i , j )

p−→ 0.

Proof. (i) (ii) See Appendix C.

(iii) By Cauchy-Schwarz inequality, given (i) and (ii) hold,

|S12(i , j )| =∣∣∣∣n−1

n∑t=1

(gi (τt )−

n∑s=1

wns(t )gi (τs)

)(v j t −

n∑s=1

wns(t )v j s

)∣∣∣∣≤

∣∣∣∣∣n−1n∑

t=1

(gi (τt )−

n∑s=1

wns(t )gi (τs)

)2∣∣∣∣∣1/2 ∣∣∣∣∣n−1

n∑t=1

(v j t −

n∑s=1

wns(t )v j s

)2∣∣∣∣∣1/2

=|S1(i , i )|1/2|S2( j , j )|1/2 = OP (1)oP (1) = oP (1). (B.3)

The same result holds for S21(i , j ).

Hence, for i , j = 1, . . . ,k,

1

n

n∑t=1

xi t x j tp−→

∫ 1

0

(gi (τ)−g i

)(g j (τ)−g j

)dτ, (B.4)

i.e.,

Σn(i , j )p−→Q(i , j ). (B.5)

Therefore, the convergence of every element in Σn yields

Σnp−→Q, (B.6)

as n →∞.

29

B.2 Proof of Theorem 3.1(ii)

Theorem 3.1(ii) shows that the in feasible estimtor β, the matrix Σn = n−1 ∑nt=1 xt x ′

t also

converges in probability to the positive definite matrix Q as in Theorem 3.1(i). To prove the

convergence, we fist introduce the following equations.

1

n

n∑t=1

xt x ′t =

1

n

n∑t=1

(xt − xt + xt ) (xt − xt + xt )′

= 1

n

n∑t=1

(xt − xt )(xt − xt )′+ 1

n

n∑t=1

(xt − xt )x ′t +

1

n

n∑t=1

xt (xt − xt )′+ 1

n

n∑t=1

xt x ′t

=D1(n)+D2(n)+D3(n)+D4(n), (B.7)

where we have proved that D4(n)p−→ Q, and we only need to prove that D1(n) converges

to 0 in probability. Since xt is bounded, by Cauchy-Schwarz Inequality, D2(n)p−→ 0, and

D3(n)p−→ 0 if D1(n) = op (1). Without loss of generality, we assume k = 1 for simplicity that

D1(n) = 1

n

n∑t=1

(xt − xt )2. (B.8)

Hence we can complete the proof if the following proposition holds.

Proposition B.2. Under Assumptions 3.3 to 3.6,

D1(n) = op (1). (B.9)

The detailed proof of this proposition is provided in Appendix D.

B.3 Proof of Theorem 3.2(i) and Theorem 3.2(ii)

Substitute yt in the proposed estimator (2.7), and note that xt = g (τt )+ vt ,

β=( n∑

t=1xt x ′

t

)−1 ( n∑t=1

xt(x ′

tβ+ λ(vt )+ ut))

=β+( n∑

t=1xt x ′

t

)−1 ( n∑t=1

xt λ(vt )

)+

( n∑t=1

xt x ′t

)−1 ( n∑t=1

g (τt )ut

)+

( n∑t=1

xt x ′t

)−1 ( n∑t=1

vt ut

)

=β+Bn +Cn +( n∑

t=1xt x ′

t

)−1 ( n∑t=1

vt ut

). (B.10)

Therefore,

pn

(β−β−Bn −Cn

)= (1

n

n∑t=1

xt x ′t

)−1 (1pn

n∑t=1

vt ut

). (B.11)

30

Theorem 3.1(i) implies that1

n

n∑t=1

xt x ′t

p−→Q. (B.12)

For the latter part,

1pn

n∑t=1

vt ut = 1pn

n∑t=1

(vt −v t )(ut −u t )

= 1pn

n∑t=1

(vt ut − vtu t −v t ut +v tu t ). (B.13)

Under Assumptions 3.3 to 3.6, by Central Limit Theorem (CLT) for mixing processes (see

Fan and Yao (2003), Theorem 2.21), as n →∞,

1pn

n∑t=1

vt utd−→ N (0,Λ), (B.14)

whereΛ is the long-run variances of Lt = vt ut that

Λ= E[Lt L′

t

]+2∞∑

j=1E

[Lt L′

t− j

], (B.15)

For the remaining terms, they are all negligible given the following propositions hold.

Proposition B.3. Under Assumptions 3.3 to 3.6,

1pn

n∑t=1

vtu t = op (1), (B.16)

1pn

n∑t=1

v t ut = op (1), (B.17)

1pn

n∑t=1

v tu t = op (1). (B.18)

Therefore, as n goes to infinity, we have

1pn

n∑t=1

vt utd−→ N (0,Λ). (B.19)

By Slutsky theorem,p

n(β−β−Bn −Cn

) d−→ N (0,Ω), (B.20)

whereΩ= Q−1ΛQ−1. Similarly, we have

pn

(β−β−B n −C n

)d−→ N (0,Ω). (B.21)

The following propositions suggest that the bias terms Bn , B n and Cn , C n are negligible.

31

Proposition B.4. Let Assumptions 3.3 to 3.6 hold,

pnBn = op (1), (B.22)

andp

nB n = op (1). (B.23)

Proposition B.5. Let Assumptions 3.3 to 3.6 hold,

pnCn = op (1), (B.24)

andp

nC n = op (1). (B.25)

Thus based on the above propositions, we are able to complete the proof for Theorem 3.2

thatp

n(β−β)d−→ N (0,Ω), (B.26)

andp

n(β−β)d−→ N (0,Ω). (B.27)

The proof of Proposition B.3, B.4 and B.5 are attached in Appendix E.

Appendix C Proof of Proposition B.1(i) and B.1(ii)

C.1 Proof of Proposition B.1(i)

Proof. Recall that τt = t/n and

M1(i , j ) = 1

n

n∑t=1

(gi (τt )−g i ,n

)(g j (τt )−g j ,n

), (C.1)

where g i ,n = n−1 ∑nt=1 gi (τt ). Since g (·) is a continuous differentiable function of τ ∈ [0,1],

we immediately yields

g i ,n = 1

n

n∑t=1

gi

(t

n

)−→

∫ 1

0gi (τ)dτ=g i = 0, (C.2)

32

as the Riemann sum converges to its integral limit. The same argument applies to M1(i , j ),

that

1

n

n∑t=1

(gi (τt )−g i ,n

)(g j (τt )−g j ,n

)−→

∫ 1

0(gi (τ)−g i )(g j (τ)−g j )dτ= Q(i , j ). (C.3)

Therefore as n →∞, M1(i , j ) −→Q(i , j ), for i , j = 1, . . . ,k.

Applying Cauchy-Schwarz Inequality, we have

∣∣M2(i , j )∣∣= ∣∣∣∣ 1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)(g j ,n −

n∑s=1

wns(t )g j (τs)

)∣∣∣∣≤

∣∣∣∣∣ 1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)2∣∣∣∣∣1/2 ∣∣∣∣∣ 1

n

n∑t=1

(g j ,n −

n∑s=1

wns(t )g j (τs)

)2∣∣∣∣∣1/2

. (C.4)

Hence we can show that M2(i , j ) = op (1), if for any i = 1,2, ...,k,

1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)2

= op (1). (C.5)

Meanwhile, as the terms in the above summation are all non-negative, it is sufficient to

show that

E

[1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)2]= o(1). (C.6)

To prove (C.6), we first write the equation as

1

n

n∑t=1

(g i ,n −

n∑s=1

wns(t )gi (τs)

)2

= 1

n

n∑t=1

( n∑s=1

wns(t )(gi (τs)−g i ,n

))2

= 1

n

n∑t=1

n∑s=1

wns(t )2 (gi (τs)−g i ,n

)2 + 1

n

n∑t=1

n∑s1=1

n∑s2=1,s2 6=s1

wns1 (t )wns2 (t )(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)

,M2,1 +M2,2. (C.7)

As for the kernel density estimator, we have f (v) = f (v)+op (1). Then the first term M2,1

can be written as

M2,1 = 1

n

n∑t=1

n∑s=1

wns(t )2 (gi (τs)−g i ,n

)2 = 1

n

n∑t=1

n∑s=1

( 1nhk K

( vs−vth

)f (vt )

)2 (gi (τs)−g i ,n

)2

= 1

n3h2k

n∑t=1

n∑s=1

(K

( vs−vth

)f (vt )+op (1)

)2 (gi (τs)−g i ,n

)2 = 1+op (1)

n3h2k

n∑t=1

n∑s=1

K 2( vs−vt

h

)f (vt )2

(gi (τs)−g i ,n

)2

= 1+op (1)

n3h2k

n∑t=1

K 2(0)

f (vt )2

(gi (τt )−g i ,n

)2 + 1+op (1)

n3h2k

n∑t=1

n∑s=1s 6=t

K 2( vs−vt

h

)f (vt )2

(gi (τs)−g i ,n

)2 .

(C.8)

33

As nhk →∞ when n →∞,

E

[1

n3h2k

n∑t=1

K 2(0)

f (vt )2

(gi (τt )−g i

)2]

= K 2(0)

n3h2k

n∑t=1

E

[1

f (vt )2

](gi (τt )−g i

)2 = O

(1

n2h2k

)= o(1), (C.9)

given that

E[

f −2(v1)]= ∫

f −1(v1)d v1 <∞, (C.10)

since vt is a sequence of stationary time series. For the second part of (C.8), let z = vt , w =vs−vt

h , we have

E

1

n3h2k

n∑t=1

n∑s=1s 6=t

K 2( vs−vt

h

)f (vt )2

(gi (τs)−g i ,n

)2

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

E

[K 2

( vs−vth

)f (vt )2

](gi (τs)−g i ,n

)2

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

Ï K 2( vs−vt

h

)f (vt )2 ft ,s(vt , vs)d vt d vs

(gi (τs)−g i ,n

)2

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

ÏK 2(w)

f (z)2 ft ,s(z, z +wh)hk d wd z(gi (τs)−g i ,n

)2

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

ÏK 2(w)

f (z)2

(ft ,s(z, z)+O(wh)

)hk d wd z

(gi (τs)−g i ,n

)2

= 1+o(1)

n3hk

n∑t=1

n∑s=1s 6=t

∫K 2(w)d w

∫ft ,s(z, z)

f (z)2 d z(gi (τs)−g i ,n

)2

= (1+o(1))κ2

n3hk

n∑t=1

n∑s=1s 6=t

∫ft ,s(z, z)

f (z)2 d z(gi (τs)−g i ,n

)2 (C.11)

where κ2 =∫

K 2(w)d w <∞. As ft ,s(z, z) ≥ 0 for any t , s, assume that

maxs 6=t

∫ft ,s(z, z)

f (z)2 d z <∞, (C.12)

while in the same time, nhk →∞ as n →∞, we have,

E

1

n3h2k

n∑t=1

n∑s=1s 6=t

K 2( vs−vt

h

)f (vt )2

(gi (τs)−g i ,n

)2

= O

(1

nhk

)= o(1). (C.13)

Therefore,

E[M2,1

]= o(1). (C.14)

34

For the second part of (C.7), M2,2, we consider the conditions when (1) t 6= s1 6= s2, (2)

s1 = t , s2 6= t , and (3) s2 = t , s1 6= t as follows.

M2,2 = 1

n

n∑t=1

n∑s1=1

n∑s2=1,s2 6=s1

wns1 (t )wns2 (t )(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)

= 1

n3h2k

n∑t=1

n∑s1=1

n∑s2=1,s2 6=s1

K(

vs1−vt

h

)f (vt )

K(

vs2−vt

h

)f (vt )

(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)

= 1+op (1)

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2


)(gi (τs2 )−g i ,n

)

+ 1+op (1)

n3h2k

n∑t=1

n∑s2=1,s2 6=t

K (0)K(

vs2−vt

h

)f (vt )2

(gi (τt )−g i ,n

)(gi (τs2 )−g i ,n

)

+ 1+op (1)

n3h2k

n∑t=1

n∑s1=1s1 6=t

K(

vs1−vt

h

)K (0)

f (vt )2


)(gi (τt )−g i ,n

)

= (1+op (1))(M2,2,1 + M2,2,2 + M2,2,3

), (C.15)

where the expectations of M2,2,1, M2,2,2, and M2,2,3 are o(1) by the following arguments.

(1) For the first term when t 6= s1 6= s2, and let η1 = vt ,η2 = vs1 ,η3 = vs2 ,

E[M2,2,1

]=E

1

n3h2k

n∑t 6=s1 6=s2

K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2


)(gi (τs2 )−g i ,n

)= 1

n3h2k

n∑t 6=s1 6=s2

E

K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2


)(gi (τs2 )−g i ,n

)

= 1

n3h2k

n∑t 6=s1 6=s2

Ñ K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2 f (vt , vs1 , vs2 )d vt d vs1 d vs2


)(gi (τs2 )−g i ,n

)= 1

n3h2k

n∑t 6=s1 6=s2

Ñ K(η2−η1

h

)K

(η3−η1

h

)f (η1)2 ft ,s1,s2 (η1,η2,η3)dη1dη2dη3


)(gi (τs2 )−g i ,n

)= 1

n3h2k

n∑t 6=s1 6=s2

Ñ K(η2−η1

h

)K

(η3−η1

h

)f (η1)2

(ft ,s1,s2 (η1,η2,η3)− f (η1) f (η2) f (η3)

)dη1dη2dη3


)(gi (τs2 )−g i ,n

)+ 1

n3h2k

n∑t 6=s1 6=s2

Ñ K(η2−η1

h

)K

(η3−η1

h

)f (η1)2 f (η1) f (η2) f (η3)dη1dη2dη3


)(gi (τs2 )−g i ,n

)35

,D1 +G1. (C.16)

Let w1 = η2−η1

h , w2 = η3−η1

h , z = η1, and note that

|D1| =∣∣∣ 1

n3h2k

n∑t 6=s1 6=s2

Ñ K(η2−η1

h

)K

(η3−η1

h

)f (η1)2

(ft ,s1,s2 (η1,η2,η3)− f (η1) f (η2) f (η3)

)dη1dη2dη3

× (gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)∣∣∣≤ 1

n3h2k

n∑t 6=s1 6=s2

Ñ ∣∣∣∣∣K(η2−η1

h

)K

(η3−η1

h

)f (η1)2

∣∣∣∣∣ ∣∣( ft ,s1,s2 (η1,η2,η3)− f (η1) f (η2) f (η3))∣∣dη1dη2dη3

× ∣∣(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)∣∣≤ 1

n3h2k

n∑t 6=s1 6=s2

Ñ ∣∣∣∣∣K(η2−η1

h

)K

(η3−η1

h

)f (η1)2

∣∣∣∣∣ ∣∣( ft ,s1,s2 (η1,η2,η3)− f (η1) f (η2) f (η3))∣∣dη1dη2dη3

× ∣∣(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)∣∣≤ 1

n3h2k

n∑t 6=s1 6=s2

Ñ ∣∣∣∣∣K(η2−η1

h

)K

(η3−η1

h

)f (η1)2

∣∣∣∣∣δ3(t , s1, s2)dη1dη2dη3∣∣(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)∣∣= 1

n3h2k

n∑t 6=s1 6=s2

δ3(t , s1, s2)Ñ ∣∣∣∣K (w1)K (w2)

f (z)2

∣∣∣∣h2k d w1d w2d z∣∣(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)∣∣= 1

n3

n∑t 6=s1 6=s2

δ3(t , s1, s2)∫

|K (w1)|d w1

∫|K (w2)|d w2

∫| f (z)−2|d z

∣∣(gi (τs1 )−g i ,n

)(gi (τs2 )−g i ,n

)∣∣≤ C

n3

n∑t 6=s1 6=s2

δ3(t , s1, s2) =O(n−1), (C.17)

given that∫ |K (w)|d w <∞,

∫ ∣∣ f −2(z)∣∣d z <∞, and

1

n2

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

δ3(t , s1, s2) =O(1), (C.18)

where∣∣ ft ,s1,s2 (η1,η2,η3)− f (η1) f (η2) f (η3)

∣∣≤ δ3(t , s1, s2). Meanwhile,

G1 = 1

n3h2k

n∑t 6=s1 6=s2

Ñ K(η2−η1

h

)K

(η3−η1

h

)f (η1)2 f (η1) f (η2) f (η3)dη1dη2dη3


)(gi (τs2 )−g i ,n

)= 1

n3h2k

n∑t 6=s1 6=s2

ÑK (w1)K (w2)

f (z)2 f (z) f (z +w1h) f (z +w2h)h2k d zd w1d w2


)(gi (τs2 )−g i ,n

)=1+o(1)

n3

n∑t 6=s1 6=s2

ÑK (w1)K (w2) f (z)d zd w1d w2


)(gi (τs2 )−g i ,n

)=1+o(1)

n3

n∑t 6=s1 6=s2


)(gi (τs2 )−g i ,n

)=O(n−1), (C.19)

36

where∫

K (w)d w = 1,∫

f (z)d z = 1, and

1

n2

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1


)(gi (τs2 )−g i ,n

)=O(1). (C.20)

Therefore, both D1 and G1 are o(1), and imply that

E[M2,2,1

]= o(1). (C.21)

(2) We then move on to the second term, M2,2,2. In the proof, let z = vt , w = vs2−vt

h .

E[M2,2,2

]= E

1

n3h2k

n∑t=1

n∑s2=1,s2 6=t

K (0)K(

vs2−vt

h

)f (vt )2

(gi (τt )−g i

)(gi (τs2 )−g i

)= K (0)

n3h2k

n∑t=1

n∑s2=1,s2 6=t

E

K(

vs2−vt

h

)f (vt )2

(gi (τt )−g i

)(gi (τs2 )−g i

)

= K (0)

n3h2k

n∑t=1

n∑s2=1,s2 6=t

Ï K(

vs2−vt

h

)f (vt )2 f (vt , vs2 )d vt d vs2

(gi (τt )−g i

)(gi (τs2 )−g i

)

= K (0)

n3h2k

n∑t=1

n∑s2=1,s2 6=t

ÏK (w)

f (z)2 ft ,s2 (z, z +wh)hk d wd z(gi (τt )−g i

)(gi (τs2 )−g i

)

= (1+o(1))K (0)

n3hk

n∑t=1

n∑s2=1,s2 6=t

∫K (w)d w

∫ft ,s2 (z, z)

f (z)2 d z(gi (τt )−g i

)(gi (τs2 )−g i

)

= (1+o(1))K (0)

n3hk

n∑t=1

n∑s2=1,s2 6=t

∫ft ,s2 (z, z)


)(gi (τs2 )−g i

), (C.22)

where∫

K (w)d w = 1. Since g (·) is bounded and nhk →∞ when n →∞ and

maxs 6=t

∫ft ,s(z, z)

f (z)2 d z <∞, (C.23)

for any t , s uniformly. For some positive value C , We obtain∣∣∣∣∣∣∣1

n3hk

n∑t=1

n∑s2=1,s2 6=t

∫ft ,s2 (z, z)


)(gi (τs2 )−g i

)∣∣∣∣∣∣∣≤ 1

n3hk

n∑t=1

n∑s2=1,s2 6=t

∣∣∣∣∫ ft ,s2 (z, z)

f (z)2 d z

∣∣∣∣ ∣∣(gi (τt )−g i

)(gi (τs2 )−g i

)∣∣≤ 1

n3hk

n∑t=1

n∑s2=1,s2 6=t

C = O

(1

nhk

)= o(1), (C.24)

37

which implies

E[M2,2,2

]= o(1). (C.25)

Similarly, we have

E[M2,2,3

]= o(1). (C.26)

To summarize, equations (C.21), (C.25) and (C.26) lead to

E[M2,2] = o(1), (C.27)

Therefore, by equations (C.27) and (C.14), we can show (C.6), hence

M2(i , j ) = op (1). (C.28)

Again, using Cauchy-Schwarz Inequality,

|M12(i , j )| ≤ |M1(i , i )|1/2|M2( j , j )|1/2 = Op (1)op (1) = op (1). (C.29)

The same result holds for M21(i , j ) for i , j = 1, . . . ,k. Therefore, M1(i , j ) −→Q(i , j ), M2(i , j )p−→

0, M12(i , j )p−→ 0, and M21(i , j )

p−→ 0, hence as n →∞,

S1(i , j ) = M1(i , j )+M2(i , j )+M12(i , j )+M21(i , j )p−→Q(i , j ). (C.30)

We then complete the proof.

C.2 Proof of Proposition B.1(ii)

Proof. For (ii) in Proposition B.1, S2(i , j ), using Cauchy-Schwarz Inequality, we have

|S2(i , j )| =∣∣∣∣ 1

n

n∑t=1

(vi t −

n∑s=1

wns(t )vi s

)(v j t −

n∑s=1

wns(t )v j s

)∣∣∣∣≤

∣∣∣∣∣ 1

n

n∑t=1

(vi t −

n∑s=1

wns(t )vi s

)2∣∣∣∣∣

12∣∣∣∣∣ 1

n

n∑t=1

(v j t −

n∑s=1

wns(t )v j s

)2∣∣∣∣∣

12

, (C.31)

for i , j = 1, . . . ,k, where

1

n

n∑t=1

(vi t −

n∑s=1

wns(t )vi s

)2

= 1

n

n∑t=1

( n∑s=1

wns(t )(vi s − vi t )

)2

= 1

n

n∑t=1

n∑s=1

wns(t )2(vi s − vi t )2 + 1

n

n∑t=1

n∑s1=1

n∑s2=1,s2 6=s1

wns1 (t )wns2 (t )(vi s1 − vi t )(vi s2 − vi t )

38

,S2,1 +S2,2. (C.32)

Note that9

S2,1 = 1

n

n∑t=1

n∑s=1s 6=t

wns(t )2(vi s − vi t )2 = 1

n3h2k

n∑t=1

n∑s=1s 6=t

K 2( vs−vt

h

)f (vt )2

(vi s − vi t )2

= 1+op (1)

n3h2k

n∑t=1

n∑s=1s 6=t

K 2( vs−vt

h

)f (vt )2 (vi s − vi t )2, (C.33)

where

E[S2,1] = E

1

n3h2k

n∑t=1

n∑s=1s 6=t

K 2( vs−vt

h

)f (vt )2 (vi s − vi t )2

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

E

[K 2

( vs−vth

)(vi s − vi t )2

f (vt )2

]

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

Ï K 2( vs−vt

h

)f (vt )2 (vi s − vi t )2 f (vt , vs)d vsd vt

= 1

n3h2k

n∑t=1

n∑s=1s 6=t

ÏK 2(w)w2

i h2 ft ,s(z, z +wh)

f (z)2 hk d wd z

= 1+o(1)

n3hk−2

n∑t=1

n∑s=1s 6=t

∫w2

i K 2(w)d w∫

ft ,s(z, z)

f (z)2 d z

= (1+o(1))h2

nhk

1

n2

n∑t=1

n∑s=1s 6=t

∫w2

i K 2(w)d w∫

ft ,s(z, z)

f (z)2 d z = o(1), (C.34)

where∫

w2i K 2(w)d w <∞. Note that

maxt 6=s

∫ft ,s(z, z)

f (z)2 d z <∞, (C.35)

and since h → 0,nhk →∞ as n →∞. Therefore, E[S2,1] = O(

h2

nhk

)= o(1), which immedi-

ately gives

E[S2,1] = o(1). (C.36)

For the second term S2,210.

S2,2 = 1

n

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

wns1 (t )wns2 (t )(vi s1 − vi t )(vi s2 − vi t )

9When s = t , vi s − vi t = 0, therefore, we can exclude s = t .10Again, we do not need to consider s1 = t and s2 = t as under these conditions, vi s1 −vi t = 0 and vi s2 −vi t = 0.

39

= 1

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

K(

vs1−vt

h

)f (vt )

K(

vs2−vt

h

)f (vt )

(vi s1 − vi t )(vi s2 − vi t )

= 1+o(1)

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2 (vi s1 − vi t )(vi s2 − vi t ). (C.37)

Note that

E

1

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2 (vi s1 − vi t )(vi s2 − vi t )

= 1

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

E

K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2 (vi s1 − vi t )(vi s2 − vi t )

= 1

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

Ñ K(

vs1−vt

h

)K

(vs2−vt

h

)f (vt )2 (vi s1 − vi t )(vi s2 − vi t ) f (vt , vs1 , vs2 )d vt d vs1 d vs2

= 1

n3h2k

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

ÑK (w1)K (w2)

f (vt )2 w1i w2i h2h2k ft ,s1,s2 (z, z +w1h, z +w2h)d zd w1d w2

= (1+o(1))h2

n3

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

ÑK (w1)K (w2)

f (z)2 w21i w2

2i f (2)(2,3)(i ,i )t ,s1,s2

(z, z, z)h2d zd w1d w2

= (1+o(1))h4

n3

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

∫w2

1i K (w1)d w1

∫w2

2i K (w2)d w2

∫ f (2)(2,3)(i ,i )t ,s1,s2

(z, z, z)

f (z)2 d z

= (1+o(1))h4κ221i

n3

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

∫ f (2)(2,3)(i ,i )t ,s1,s2

(z, z, z)

f (z)2 d z, (C.38)

where ∫w2

1i K (w1)d w1 =∫

K (w11)d w11 · · ·∫

w21i K (w1i )d w1i · · ·

∫K (w1k )d w1k

=∫

w21i K (w1i )d w1i = κ21i <∞, (C.39)

and ∫w2

2i K (w2)d w2 =∫

K (w21)d w21 · · ·∫

w22i K (w2i )d w2i · · ·

∫K (w2k )d w2k

=∫

w22i K (w2i )d w2i = κ21i <∞. (C.40)

We also assumed that

maxt 6=s1 6=s2

∣∣∣∣∫ f (2)(2,3)(i ,i )t ,s1,s2

(z, z, z)d z

∣∣∣∣<∞, (C.41)

40

therefore, ∣∣∣∣∣∣∣1

n3

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

∫ f (2)(2,3)(i ,i )t ,s1,s2

(z, z, z)

f (z)2 d z

∣∣∣∣∣∣∣≤ 1

n3

n∑t=1

n∑s1=1s1 6=t

n∑s2=1,

s2 6=t ,s2 6=s1

∣∣∣∣∣∫ f (2)(2,3)(i ,i )

t ,s1,s2(z, z, z)

f (z)2 d z

∣∣∣∣∣<∞, (C.42)

which implies that

E[S2,2] = O(h4) = o(1). (C.43)

Remark C.1. In the above derivation, we used the formula of Taylor Expansion of function

f (u, v, w) with respect to k-dimensional vector u, v, w as follows.

f (u +∆u, v+∆v, w +∆w) = f (u, v, w)+k∑

j=1

∂ f (u, v, w)

∂u j∆u j +

k∑j=1

∂ f (u, v, w)

∂v j∆v j

+k∑

j=1

∂ f (u, v, w)

∂w j∆w j +

k∑j=1

∂2 f (u, v, w)

2∂u2j

∆u2j +

k∑j=1

∂2 f (u, v, w)

2∂v2j

∆v2j

+k∑

j=1

∂2 f (u, v, w)

2∂w2j

∆w2j +

k∑j=1

k∑l=1

∂2 f (u, v, w)

2∂u j∂vl∆u j∆vl

+k∑

j=1

k∑l=1

∂2 f (u, v, w)

2∂u j∂wl∆u j∆wl +

k∑j=1

k∑l=1

∂2 f (u, v, w)

2∂v j∂wl∆v j∆wl +·· · (C.44)

As the integrals satisfy∫

wi K (wi )d wi = 0 and∫

w2i K (wi )d wi <∞, we need the term with

cross product w1i w2i , which can be obtained from the Taylor Expansion term with second

order derivative with respect to the i th element of the second variable and the j th element

of the third variable, denoted as

f (2)(2,3)(i , j )(u, v, w) = ∂2 f (u, v, w)

∂vi∂w j. (C.45)

Therefore,

E[S2,2] = o(1). (C.46)

Hence, equations (C.31), (C.32), (C.36), (C.46) imply

E

[n−1

n∑t=1

(vi t −

n∑s=1

wns(t )vi s

)2]= o(1), (C.47)

and for any i , j = 1,2, ...,k,

S2,2(i , j ) = op (1). (C.48)

41

Appendix D Proof of Proposition B.2

Substitute x t and xt in D1(n).

D1(n) = 1

n

n∑t=1

(x t − xt )2

= 1

n

n∑t=1

(xt −

n∑s=1

w ns(t )xs −xt +n∑

s=1wns(t )xs

)2

= 1

n

n∑t=1

( n∑s=1

(wns(t )−w ns(t )

)xs

)2

= 1

n

n∑t=1

n∑s=1

(wns(t )−w ns(t )

)2 x2s +

1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

(wns(t )−w ns(t )

)(wnp (t )−w np (t )

)xs xp

=D11(n)+D12(n), (D.1)

where since f (vt ) = f (vt )+op (1),

D11(n) = 1

n

n∑t=1

n∑s=1

(wns(t )−w ns(t )

)2 x2s

= 1

n

n∑t=1

n∑s=1

1nh K

( vs−vth

)f (vt )

−1

nh K(

vs−vth

)f (vt )

2

x2s =

1

n

n∑t=1

n∑s=1

1nh K

( vs−vth

)f (vt )

−1

nh K(

vs−vth

)f (vt )

2

x2s

= 1

n

n∑t=1

n∑s=1

(1

nh K( vs−vt

h

)f (vt ) f (vt )

(f (vt )− f (vt )

)+ 1

nh f (vt )

(K

( vs − vt

h

)−K

(vs − vt

h

)))2

x2s

≤ 2

n3h2

n∑t=1

n∑s=1

K 2( vs−vt

h

)f (vt )2 f (vt )2

(f (vt )− f (vt )

)2x2

s

+ 2

n3h2

n∑t=1

n∑s=1

1

f (vt )2

(K

( vs − vt

h

)−K

(vs − vt

h

))2

x2s

= (1+op (1))(J1(n)+J2(n)) , (D.2)

and

D12(n) = 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

(wns(t )−w ns(t )

)(wnp (t )−w np (t )

)xs xp

= 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

1nh K

( vs−vth

)f (vt )

−1

nh K(

vs−vth

)f (vt )

1nh K

(vp−vt

h

)f (vt )

−1

nh K(

vp−vt

h

)f (vt )

xs xp

= 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

(K

( vs−vth

)nh f (vt ) f (vt )

(f (vt )− f (vt )

)+ 1

nh f (vt )

(K

( vs − vt

h

)−K

(vs − vt

h

))) K

(vp−vt

h

)nh f (vt ) f (vt )

(f (vt )− f (vt )

)+ 1

nh f (vt )

(K

( vp − vt

h

)−K

(vp − vt

h

))xs xp

42

= 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

K( vs−vt

h

)K

(vp−vt

h

)n2h2 f (vt )2 f (vt )2

(f (vt )− f (vt )

)2xs xp

+ 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

K(

vp−vt

h

)n2h2 f (vt ) f (vt )2

(K

( vs − vt

h

)−K

(vs − vt

h

))(f (vt )− f (vt )

)xs xp

+ 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

K( vs−vt

h

)n2h2 f (vt ) f (vt )2

(K

( vp − vt

h

)−K

(vp − vt

h

))(f (vt )− f (vt )

)xs xp

+ 1

n

n∑t=1

n∑s=1

n∑p=1p 6=s

1

n2h2 f (vt )2

(K

( vs − vt

h

)−K

(vs − vt

h

))(K

( vp − vt

h

)−K

(vp − vt

h

))xs xp

= (1+op (1))(J3(n)+J4(n)+J5(n)+J6(n)) , (D.3)

where

J1(n) = 2

n3h2

n∑t=1

n∑s=1

K 2( vs−vt

h

)f (vt )4

(f (vt )− f (vt )

)2x2

s , (D.4)

J2(n) = 2

n3h2

n∑t=1

n∑s=1

1

f (vt )2

(K

( vs − vt

h

)−K

(vs − vt

h

))2

x2s , (D.5)

J3(n) = 1

n3h2

n∑t=1

n∑s=1

n∑p=1p 6=s

K( vs−vt

h

)K

(vp−vt

h

)f (vt )4

(f (vt )− f (vt )

)2xs xp , (D.6)

J4(n) = 1

n3h2

n∑t=1

n∑s=1

n∑p=1p 6=s

K(

vp−vt

h

)f (vt )3

(K

( vs − vt

h

)−K

(vs − vt

h

))(f (vt )− f (vt )

)xs xp , (D.7)

J5(n) = 1

n3h2

n∑t=1

n∑s=1

n∑p=1p 6=s

K( vs−vt

h

)f (vt )3

(K

( vp − vt

h

)−K

(vp − vt

h

))(f (vt )− f (vt )

)xs xp , (D.8)

J6(n) = 1

n3h2

n∑t=1

n∑s=1

n∑p=1p 6=s

xs xp

f (vt )2

(K

( vs − vt

h

)−K

(vs − vt

h

))(K

( vp − vt

h

)−K

(vp − vt

h

)).

(D.9)

Therefore, it is sufficient to prove the results if the following proposition holds.

Proposition D.1. Let Assumptions 3.3 to 3.6 hold, for i = 1,2, ...,6,

Ji (n) = op (1). (D.10)

The proposition suggests that Ji (n) for i = 1,2, ...,6 are o(1), therefore, they are negligi-

ble in magnitude toD4. Hence, Theorem 3.1(ii) is proved. The detailed proof of Proposition

43

D.1 is provided in Appendix F.

Appendix E Proof of Proposition B.3

Proposition B.3 suggest that the leading term of n−1/2 ∑nt=1 vt ut is n−1/2 ∑n

t=1 vt ut , which

converges to a normal distribution. While the rest of the three terms are negligible as all of

them are proportional to op (1). We prove (B.16) and the other two can be proved similarly

using the same method.

1pn

n∑t=1

vtu t = 1pn

n∑t=1

vt

( n∑s=1

wns(t )us

)= 1

np

nh

n∑t=1

vt

(n∑

s=1

K( vs−vt

h

)f (vt )

us

)

= 1+op (1)

np

nh

n∑t=1

n∑s=1

K( vs−vt

h

)f (vt )

vt us = (1+op (1))U1, (E.1)

where

U1 = 1

np

nh

n∑t=1

n∑s=1

K( vs−vt

h

)f (vt )

vt us . (E.2)

Let ξ(x, y) = K( y−x

h

)x

f (x) . Since vt and ut are independent,

E[U1] =E

[1

np

nh

n∑t=1

n∑s=1

ξ(vt , vs)us

]= 1

np

nh

n∑t=1

n∑s=1

E[ξ(vt , vs)us]

= 1

np

nh

n∑t=1

n∑s=1

E[ξ(vt , vs)]E[us] = 0. (E.3)

While for the second moment,

U 21 =

(1

np

nh

n∑t=1

n∑s=1

ξ(vt , vs)us

)2

= 1

n3h2

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

ξ(vt1 , vs1 )ξ(vt2 , vs2 )us1 us2 , (E.4)

Since the two time series vt , ut are independent,

E[U 2

1

]= 1

n3h2

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

E[ξ(vt1 , vs1 )ξ(vt2 , vs2 )us1 us2

]= 1

n3h2

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

E[ξ(vt1 , vs1 )ξ(vt2 , vs2 )

]E

[us1 us2

]. (E.5)

We only consider the condition when t1 6= t2 6= s1 6= s2, which is denoted byΘ411.

E[U 2

1

]= 1

n3h2

∑Θ4

E[ξ(vt1 , vs1 )ξ(vt2 , vs2 )

]E

[us1 us2

]11Other cases are equivalent to a smaller order of such case.

44

= 1

n3h2

∑Θ4

(E

[ξ(vt1 , vs1 )ξ(vt2 , vs2 )

]−E[ξ(vt1 , vs1 )

]E

[ξ(vt2 , vs2 )

])E

[us1 us2

]+ 1

n3h2

∑Θ4

E[ξ(vt1 , vs1 )

]E

[ξ(vt2 , vs2 )

]E

[us1 us2

]= D2 +G2. (E.6)

|D2| =∣∣∣ 1

n3h2

∑Θ4

(E

[ξ(vt1 , vs1 )ξ(vt2 , vs2

]−E[ξ(vt1 , vs1 )

]E

[ξ(vt2 , vs2 )

])E

[us1 us2

]∣∣∣≤ 1

n3h2

∑Θ4

∣∣∣E[ξ(vt1 , vs1 )ξ(vt2 , vs2

]−E[ξ(vt1 , vs1 )

]E

[ξ(vt2 , vs2 )

]∣∣∣∣∣∣E[us1 us2

]∣∣∣≤ 1

n3h2

∑Θ4

C1αδ

2+δ (|t1 − t2|)(E

[ξ2+δ(vt1 , vs1 )

]) 12+δ

(E

[ξ2+δ(vt2 , vs2 )

]) 12+δ

×C2αδ

2+δ (|s1 − s2|)(E

[u2+δ

s1

]) 12+δ

(E

[u2+δ

s2

]) 12+δ

≤ C

n3h2

∑Θ4

αδ

2+δ (|t1 − t2|)αδ

2+δ (|s1 − s2|) =O

(1

nh2

)= o(1), (E.7)

where

E[ξ2+δ(vt , vs)

]=

Ï K 2+δ ( vs−vth

)v2+δ

t

f (vt )2+δ f (vt , vs)d vt d vs

=Ï

K 2+δ(w)z2+δ

f (z)2+δ ft ,s(z, z +wh)hd wd z

=(1+o(1))∫

K 2+δ(w)d w∫

z2+δ ft ,s(z, z)

f (z)2+δ d z <∞, (E.8)

given that∫

K 2+δ(w)d w <∞, and

maxt 6=s

∫z2+δ ft ,s(z, z)

f (z)2+δ d z <∞. (E.9)

Meanwhile, we also applied the result that

1

n

n∑t1=1

n∑t2=1,t2 6=t1

αδ

2+δ (|t1 − t2|)

= 2

n

n−1∑t1=1

n∑t2=t1+1

αδ

2+δ (|t1 − t2|) = 2

n

n−1∑t1=1

n−t1∑j=1

αδ

2+δ ( j ) <∞, (E.10)

as∑n

j=1αδ

2+δ ( j ) <∞. Meanwhile,

G2 = 1

n3h2

∑Θ4

E[ξ(vt1 , vs1 )

]E

[ξ(vt2 , vs2 )

]E

[us1 us2

]= 1

n3h2

∑Θ4

Ïξ(vt1 , vs1 ) f (vt1 , vs1 )d vs1 d vt1

Ïξ(vt2 , vs2 ) f (vt2 , vs2 )d vs2 d vt2 E

[us1 us2

]= 1

n3h2

∑Θ4

Ïξ(η1,η2) ft1,s1 (η1,η2)dη1dη2

Ïξ(η3,η4) ft2,s2 (η3,η4)dη3dη4E

[us1 us2

]45

= 1

n3h2

∑Θ4

Ïξ(η1,η2)

(ft1,s1 (η1,η2)− f (η1) f (η2)+ f (η1) f (η2)

)dη1dη2Ï

ξ(η3,η4)(

ft2,s2 (η3,η4)− f (η3) f (η4)+ f (η3) f (η4))

dη3dη4E[us1 us2

]= 1

n3h2

∑Θ4

Ïξ(η1,η2)

(ft1,s1 (η1,η2)− f (η1) f (η2)

)dη1dη2Ï

ξ(η3,η4)(

ft2,s2 (η3,η4)− f (η3) f (η4))

dη3dη4E[us1 us2

]+ 1

n3h2

∑Θ4

Ïξ(η1,η2)

(ft1,s1 (η1,η2)− f (η1) f (η2)

)dη1dη2Ï

ξ(η3,η4) f (η3) f (η4)dη3dη4E[us1 us2

]+ 1

n3h2

∑Θ4

Ïξ(η1,η2) f (η1) f (η2)dη1dη2Ïξ(η3,η4)

(ft2,s2 (η3,η4)− f (η3) f (η4)

)dη3dη4E

[us1 us2

]+ 1

n3h2

∑Θ4

Ïξ(η1,η2) f (η1) f (η2)dη1dη2

Ïξ(η3,η4) f (η3) f (η4)dη3dη4E

[us1 us2

],Z1 +Z2 +Z3 +Z4, (E.11)

where

|Z1| =∣∣∣ 1

n3h2

∑Θ4

Ï K(η2−η1

h

)η1

f (η1)

(ft1,s1 (η1,η2)− f (η1) f (η2)

)dη1dη2

Ï K(η4−η3

h

)η3

f (η3)

(ft2,s2 (η3,η4)− f (η3) f (η4)

)dη3dη4E

[us1 us2

]∣∣∣≤ 1

n3h2

∑Θ4

Ï ∣∣∣∣∣K(η2−η1

h

)η1

f (η1)

∣∣∣∣∣ ∣∣( ft1,s1 (η1,η2)− f (η1) f (η2))∣∣dη1dη2

Ï ∣∣∣∣∣K(η4−η3

h

)η3

f (η3)

∣∣∣∣∣ ∣∣( ft2,s2 (η3,η4)− f (η3) f (η4))∣∣dη3dη4

∣∣E[us1 us2

]∣∣≤ 1

n3h2

∑Θ4

Ï ∣∣∣∣∣K(η2−η1

h

)η1

f (η1)

∣∣∣∣∣δ2(t1, s1)dη1dη2

Ï ∣∣∣∣∣K(η4−η3

h

)f (η3)

∣∣∣∣∣δ2(t2, s2)dη3dη4∣∣E[

us1 us2

]∣∣= 1

n3h2

∑Θ4

δ2(t1, s1)δ2(t2, s2)Ï ∣∣∣∣∣K

(η2−η1

h

)η1

f (η1)

∣∣∣∣∣dη1dη2

Ï ∣∣∣∣∣K(η4−η3

h

)η3

f (η3)

∣∣∣∣∣dη3dη4∣∣E[

us1 us2

]∣∣≤ C

n3

∑Θ4

δ2(t1, s1)δ2(t2, s2) =O(n−1), (E.12)

for some positive value C and |E[us1 us2 ]| < ∞ for 1 ≤ s1 ≤ n,1 ≤ s2 ≤ n. We also used the

results thatÏ ∣∣∣∣∣K(η2−η1

h

)η1

f (η1)

∣∣∣∣∣dη1dη2 =Ï ∣∣∣∣K (w)z

f (z)

∣∣∣∣hd wd z = h∫

|K (w)|d w∫

|z f −1(z)|d z, (E.13)

46

with∫ |K (w)|d w <∞ and

∫ |z f −1(z)|d z <∞ and

n∑t=1

n∑s=1

δ2(s, t ) =O(n). (E.14)

For Z2,

|Z2| =∣∣∣ 1

n3h2

∑Θ4

Ï K(η2−η1

h

)η1

f (η1)

(ft1,s1 (η1,η2)− f (η1) f (η2)

)dη1dη2

Ï K(η4−η3

h

)η3

f (η3)f (η3) f (η4)dη3dη4E

[us1 us2

]∣∣∣≤ 1

n3h2

∑Θ4

∣∣∣∣∣Ï K

(η2−η1

h

)η1

f (η1)

(ft1,s1 (η1,η2)− f (η1) f (η2)

)dη1dη2

∣∣∣∣∣∣∣∣∣∣Ï K

(η4−η3

h

)η3

f (η3)f (η3) f (η4)dη3dη4

∣∣∣∣∣ ∣∣E[us1 us2

]∣∣= 1

n3h2

∑Θ4

Ï ∣∣∣∣∣K(η2−η1

h

)η1

f (η1)

∣∣∣∣∣ ∣∣( ft1,s1 (η1,η2)− f (η1) f (η2))∣∣dη1dη2∣∣∣∣∣

Ï K(η4−η3

h

)η3

f (η3)f (η3) f (η4)dη3dη4

∣∣∣∣∣ ∣∣E[us1 us2

]∣∣≤ 1

n3h2

∑Θ4

Ï ∣∣∣∣K (w1)z1

f (z1)

∣∣∣∣δ2(t1, s1)hd w1d z1

∣∣∣∣Ï K (w2)z2

f (z2)f (z2) f (z2 +w2h)hd w2d z2

∣∣∣∣Cα

δ2+δ (|s1 − s2|)

(E[u2+δ

s1]) 1

2+δ(E[u2+δ

s2]) 1

2+δ

=1+o(1)

n3h2

∑Θ4

Ï ∣∣∣∣K (w1)z1

f (z1)

∣∣∣∣δ2(t1, s1)hd w1d z1

∣∣∣∣Ï w22K (w2)z2 f (2)(z2)h3d w2d z2

∣∣∣∣Cα

δ2+δ (|s1 − s2|)

(E[u2+δ

s1]) 1

2+δ(E[u2+δ

s2]) 1

2+δ

= (1+o(1))h2

n3

∑Θ4

δ2(t1, s1)∫

|K (w1)|d w1

∫ ∣∣∣∣ z1

f (z1)

∣∣∣∣d z1

∣∣∣∣∫ w22K (w2)d w2

∣∣∣∣ ∣∣∣∣∫ z2 f (2)(z2)d z2

∣∣∣∣Cα

δ2+δ (|s1 − s2|)

(E[u2+δ

s1]) 1

2+δ(E[u2+δ

s2]) 1

2+δ

≤ (1+o(1))h2C

n3

∑Θ4

αδ

2+δ (|s1 − s2|) = (1+o(1))h2C

n

n∑s1=1

n∑s2=1s2 6=s1

αδ

2+δ (|s1 − s2|) =O(h2),

(E.15)

where |δ2(t1, s1)| <∞uniformly for any t1, s1,∫ |K (w)|d w <∞,

∫w2K (w)d w <∞,

∫z f (2)(z)d z <

∞, E[u2+δ] <∞, and

n∑s1=1

n∑s2=1s2 6=s1

αδ

2+δ (|s1 − s2|) = 2n−1∑s1=1

n∑s2=s1+1

αδ

2+δ (|s1 − s2|) = 2n−1∑s1=1

n−s1∑j=1

αδ

2+δ ( j ) =O(n). (E.16)

47

Note that when n →∞, we have h → 0,nh →∞,nh2 →∞, therefore, Z2 = o(1). Follow the

same method, we can easily show that Z3 and Z4 are o(1). Consequently, both D2 and G2

converge to zero when n goes to infinity, and this result implies

E[U 2

1

]= o(1), (E.17)

which immediately gives1pn

n∑t=1

vtu t = op (1). (E.18)

Follow similar method, we can prove that

1pn

n∑t=1

v t ut = op (1), (E.19)

and1pn

n∑t=1

v tu t = op (1). (E.20)

Appendix F Proof of Proposition B.4 and B.5

Proof : Without loss of generality, we assume that k = 1. Note that

pnBn =

(1

n

n∑t=1

xt x ′t

)−1 (1pn

n∑t=1

xt λ(vt )

). (F.1)

Theorem 3.1 implies that n−1 ∑nt=1 xt x ′

tp−→ Q, hence

(n−1 ∑n

t=1 xt x ′t

)−1 p−→ Q−1, which is

Op (1). Our objective becomes showing that

1pn

n∑t=1

xt λ(vt ) = op (1). (F.2)

Note that xt can be written as

xt =xt −n∑

s=1wns(t )xs = vt + g (τt )−

n∑s=1

wns(t )vs −n∑

s=1wns(t )g (τs)

=vt −n∑

s=1wns(t )vs + g (τt )−

n∑s=1

wns(t )g (τs) = vt + g (τt ), (F.3)

where vt = vt −∑ns=1 wns(t )vs and g (τt ) = g (τt )−∑n

s=1 wns(t )g (τs).

Meanwhile, λ(vt ) can be written as

λ(vt ) =λ(vt )−n∑

s=1wns(t )λ(vs) =

n∑s=1

wns(t ) (λ(vt )−λ(vs)) . (F.4)

48

Apply the Taylor expansion for λ(vs),

λ(vs) = λ(vt )+λ(1)(vt )(vs − vt )+ 1

2λ(2)(v∗

t )(vs − vt )2, (F.5)

where v∗t lies between vt and vs , and λ(2)(v∗

t ) <∞. Therefore,

λ(vt ) =n∑

s=1wns(t ) (λ(vt )−λ(vs)) =

n∑s=1

wns(t )

(λ(1)(vt )(vt − vs)− 1

2λ(2)(v∗

t )(vs − vt )2)

=λ(1)(vt )n∑

s=1wns(t )(vt − vs)− 1

2λ(2)(v∗

t )n∑

s=1wns(t )(vt − vs)2

=λ(1)(vt )vt − 1

2λ(2)(v∗

t )n∑

s=1wns(t )(vs − vt )2 (F.6)

Substitute xt in (F.2) with (F.3) and λ(vt ) with its leading term in (F.6), denote λ(1)(vt ) =ζ(vt ),

1pn

n∑t=1

xt λ(vt ) = 1pn

n∑t=1

vt λ(vt )+ 1pn

n∑t=1

g (τt )λ(vt )

=(

1pn

n∑t=1

ζ(vt )v2t +

1pn

n∑t=1

g (τt )ζ(vt )vt

)(1+op (1)

)= (I1n + I2n)

(1+op (1)

), (F.7)

where we defined

I1n = 1pn

n∑t=1

ζ(vt )v2t , (F.8)

and

I2n = 1pn

n∑t=1

g (τt )ζ(vt )vt . (F.9)

We can complete the proof if the following results hold.

Proposition F.1. Let Assumptions 3.3 to 3.6 hold, we have

I1n =op (1), (F.10)

I2n =op (1). (F.11)

Therefore,

pnBn =

(1

n

n∑t=1

xt x ′t

)−1 (1pn

n∑t=1

xt λ(vt )

)(F.12)

=Op (1)(I1n + I2n)(1+op (1)) = op (1). (F.13)

49

Proof: We’ve defined

I1n = 1pn

n∑t=1

ζ(vt )v2t . (F.14)

To prove that I1n is op (1), we need to show that the second moment of I1n is o(1). Denote

L(u) = uK (u).

I 21n =

(1pn

n∑t=1

ζ(vt )v2t

)2

= 1

n

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )v2t1

v2t2

= 1

n

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )

1nh

∑ns=1 K

(vs−vt1

h

)(vs − vt1

)f (vt1 )

2 1nh

∑nl=1 K

(vl−vt2

h

)(vl − vt2

)f (vt2 )

2

= 1+op (1)

n5

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2

( n∑s=1

L( vs − vt1

h

))2(

n∑l=1

L( vl − vt2

h

))2

= 1+op (1)

n5

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

n∑l1=1

n∑l2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 L( vs1 − vt1

h

)L

( vs2 − vt1

h

)L

( vl1 − vt2

h

)L

( vl2 − vt2

h

).

(F.15)

Therefore, we have to prove that the expectation of

1

n5

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

n∑l1=1

n∑l2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 L( vs1 − vt1

h

)L

( vs2 − vt1

h

)L

( vl1 − vt2

h

)L

( vl2 − vt2

h

)(F.16)

is o(1). We consider the proof of following some typical conditions of the index vector

(t1, t2, s1, s2, l1, l2), and for other conditions it can be proved similarly.

(1) LetΘ6 denote t1 6= t2 6= s1 6= s2 6= l1 6= l2, therefore,

1

n5

∑Θ6

E

[ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 L( vs1 − vt1

h

)L

( vs2 − vt1

h

)L

( vl1 − vt2

h

)L

( vl2 − vt2

h

)]= 1

n5

∑Θ6

Ñ Ñζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 L( vs1 − vt1

h

)L

( vs2 − vt1

h

)L

( vl1 − vt2

h

)L

( vl2 − vt2

h

)ft1,t2,s1,s2,l1,l2 (vt1 , vt2 , vs1 , vs2 , vl1 , vl2 )d vt1 d vt2 d vs1 d vs2 d vl1 d vl2

= h4

n5

∑Θ6

Ñ Ñζ(z1)ζ(z2)

f (z1)2 f (z2)2 L(w1)L(w2)L(w3)L(w4)

ft1,t2,s1,s2,l1,l2 (z1, z2, z1 +w1h, z1 +w2h, z2 +w3h, z2 +w4h)d w1d w2d w3d w4d z1d z2

= (1+o(1))h4

n5

∑Θ6

Ñ Ñζ(z1)ζ(z2)

f (z1)2 f (z2)2 L(w1)L(w2)L(w3)L(w4)

f (4)(3,4,5,6)t1,t2,s1,s2,l1,l2

(z1, z2, z1, z1, z2, z2)w1w2w3w4h4d w1d w2d w3d w4d z1d z2

= (1+o(1))h8

n5

∑Θ6

∫w1L(w1)d w1

∫w2L(w2)d w2

∫w3L(w3)d w3

∫w4L(w4)d w4

50

Ï ζ(z1)ζ(z2) f (4)(3,4,5,6)t1,t2,s1,s2,l1,l2

(z1, z2, z1, z1, z2, z2)

f (z1)2 f (z2)2 d z1d z2

= (1+o(1))nh8κ421

n6

∑Θ6

Ï ζ(z1)ζ(z2) f (4)(3,4,5,6)t1,t2,s1,s2,l1,l2

(z1, z2, z1, z1, z2, z2)

f (z1)2 f (z2)2 d z1d z2 (F.17)

where κ21 =∫

wL(w)d w = ∫w2K (w)d w <∞. Assume that

maxt1,t2,s2,s2,l1,l2

∣∣∣∣∣∣Ï ζ(z1)ζ(z2) f (4)(3,4,5,6)

t1,t2,s1,s2,l1,l2(z1, z2, z1, z1, z2, z2)

f (z1)2 f (z2)2 d z1d z2

∣∣∣∣∣∣<∞, (F.18)

therefore, when t1 6= t2 6= s1 6= s2 6= l1 6= l2,

1

n5

∑t1 6=t2 6=s1 6=s2 6=l1 6=l2

E

[ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 L( vs1 − vt1

h

)L

( vs2 − vt1

h

)L

( vl1 − vt2

h

)L

( vl2 − vt2

h

)]=O(nh8) = o(1). (F.19)

(2) LetΘ5 denote t1 = t2 = t , t 6= s1 6= s2 6= l1 6= l2. Therefore,

1

n5

∑Θ5

E

[ζ(vt )2

f (vt )4 L( vs1 − vt

h

)L

( vs2 − vt

h

)L

( vl1 − vt

h

)L

( vl2 − vt

h

)]

= 1

n5

∑Θ5

∫...

∫ζ(vt )2

f (vt )4 L( vs1 − vt

h

)L

( vs2 − vt

h

)L

( vl1 − vt

h

)L

( vl2 − vt

h

)f (vt , vs1 , vs2 , vl1 , vl2 )

d vt d vs1 d vs2 d vl1 d vl2

= 1

n5

∑Θ5

∫...

∫ζ(z)2

f (z)4 L(w1)L(w2)L(w3)L(w4) ft ,s1,s2,l1,l2 (z, z +w1h, z +w2h, z +w3h, z +w4h)

h4d zd w1d w2d w3d w4

= (1+o(1))h8κ421

n5

∑Θ5

∫ζ(z)2

f (z)4 f (4)(2,3,4,5)t ,s1,s2,l1,l2

(z, z, z, z, z)d z =O(h8) = o(1), (F.20)

where κ21 =∫

wL(w)d w = ∫w2K (w)d w <∞, and

maxt 6=s1 6=s2 6=l1 6=l2

∣∣∣∣∫ ζ(z)2

f (z)4 f (4)(2,3,4,5)t ,s1,s2,l1,l2

(z, z, z, z, z)d z

∣∣∣∣<∞. (F.21)

(3) LetΘ3 denote t1 = t2 = t , s1 = s2 = s, l1 = l2 = l , t 6= s 6= l .

1

n5

∑Θ3

E

[ζ(vt )2

f (vt )4 L2( vs − vt

h

)L2

( vl − vt

h

)]

= 1

n5

∑Θ3

Ñζ(vt )2


h

)L2

( vl − vt

h

)ft ,s,l (vt , vs , vl )d vt d vsd vl

= h2

n5

∑Θ3

Ñζ(z)2

f (z)4 L2(w1)L2(w2) ft ,s,l (z, z +w1h, z +w2h)d w1d w2d z

51

= (1+o(1))h2

n5

∑Θ3

Ñζ(z)2

f (z)4 L2(w1)L2(w2) ft ,s,l (z, z, z)d w1d w2d z

= (1+o(1))h2

n5

∑Θ3

∫L2(w1)d w1

∫L2(w2)d w2

∫ζ(z)2 ft ,s,l (z, z, z)

f (z)4 d z

= (1+o(1))h2κ222

n2

1

n3

∑Θ3

∫ζ(z)2 ft ,s,l (z, z, z)

f (z)4 d z, (F.22)

where κ22 =∫

L2(w)d w = ∫w2K 2(w)d w <∞. Assume that

maxt ,s,l

∣∣∣∣∫ ζ(z)2 ft ,s,l (z, z, z)

f (z)4 d z

∣∣∣∣<∞, (F.23)

therefore, in this condition when t1 = t2 = t , s1 = s2 = s, l1 = l2 = l , t 6= s 6= l ,

1

n5

∑t 6=s 6=l

E

[ζ(vt )2


h

)L2

( vl − vt

h

)]= O

(h2

n2

)= o(1). (F.24)

Therefore,

E[I 21n] = o(1), (F.25)

which implies

I1n = op (1). (F.26)

Thus we completed the proof.

We consider the second moment of I2n ,

I 22n =

(1pn

n∑t=1

g (τt )ζ(vt )vt

)2

= 1

n

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )g (τt1 )g (τt2 )vt1 vt2

= 1

n

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )

1nh

∑ns=1 K

(vs−vt1

h

)(g (τs)− g (τt1 )

)f (vt1 )

1nh

∑ns=1 K

(vs−vt1

h

)(vs − vt1

)f (vt1 )

1

nh

∑nl=1 K

(vl−vt2

h

)(g (τl )− g (τt2 )

)f (vt2 )

1nh

∑nl=1 K

(vl−vt2

h

)(vl − vt2

)f (vt2 )

= 1+op (1)

n5h2

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2

(n∑

s1=1K

( vs1 − vt1

h

)(g (τs1 )− g (τt1 )

))(n∑

s2=1K

( vs2 − vt1

h

)( vs2 − vt1

h

))(

n∑l1=1

K( vl1 − vt2

h

)(g (τl1 )− g (τt2 )

))(n∑

l2=1K

( vl2 − vt2

h

)( vl2 − vt2

h

))

= 1+op (1)

n5h2

n∑t1=1

n∑t2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2

(n∑

s1=1K

( vs1 − vt1

h

)(g (τs1 )− g (τt1 )

))(n∑

s2=1L

( vs2 − vt1

h

))

52

(n∑

l1=1K

( vl1 − vt2

h

)(g (τs)− g (τt2 )

))(n∑

l2=1L

( vl2 − vt2

h

))

= 1+op (1)

n5h2

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

n∑l1=1

n∑l2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2

(g (τs1 )− g (τt1 )

)(g (τl1 )− g (τt2 )

)K

( vs1 − vt1

h

)L

( vs2 − vt1

h

)K

( vl1 − vt2

h

)L

( vl2 − vt2

h

), (F.27)

therefore,12 we need to show that the expectation of

1

n5h2

n∑t1=1

n∑t2=1

n∑s1=1

n∑s2=1

n∑l1=1

n∑l2=1

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2

(g (τs1 )− g (τt1 )

)(g (τl1 )− g (τt2 )

)K

( vs1 − vt1

h

)L

( vs2 − vt1

h

)K

( vl1 − vt2

h

)L

( vl2 − vt2

h

)(F.28)

is o(1). We provide the proofs for some typical conditions of the index vector (t1, t2, s1, s2, l1, l2),

and the proofs can be conducted similarly for other conditions.

(1) LetΘ6 denote t1 6= t2 6= s1 6= s2 6= l1 6= l2 and let G (s1, t1, l1, t2) = (g (τs1 )− g (τt1 )

)(g (τl1 )− g (τt2 )

),

E

[1

n5h2

∑Θ6

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 G (s1, t1, l1, t2)K( vs1 − vt1

h

)L

( vs2 − vt1

h

)K

( vl1 − vt2

h

)L

( vl2 − vt2

h

)]

= 1

n5h2

∑Θ6

G (s1, t1, l1, t2)E

[ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 K( vs1 − vt1

h

)L

( vs2 − vt1

h

)K

( vl1 − vt2

h

)L

( vl2 − vt2

h

)]= 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 K( vs1 − vt1

h

)L

( vs2 − vt1

h

)K

( vl1 − vt2

h

)L

( vl2 − vt2

h

)ft1,t2,s1,s2,l1,l2 (vt1 , vt2 , vs1 , vs2 , vl1 , vl2 )d vt1 d vt2 d vs1 d vs2 d vl1 d vl2

= 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(η1)ζ(η2)

f (η1)2 f (η2)2 K(η3 −η1

h

)L

(η4 −η1

h

)K

(η5 −η2

h

)L

(η6 −η2

h

)ft1,t2,s1,s2,l1,l2 (η1,η2,η3,η4,η5,η6)dη1dη2dη3dη4dη5dη6

= 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(η1)ζ(η2)

f (η1)2 f (η2)2 K(η3 −η1

h

)L

(η4 −η1

h

)K

(η5 −η2

h

)L

(η6 −η2

h

)(

ft1,t2,s1,s2,l1,l2 (η1,η2,η3,η4,η5,η6)− f (η1) f (η2) f (η3) f (η4) f (η5) f (η6))

dη1dη2dη3dη4dη5dη6

+ 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(η1)ζ(η2)

f (η1)2 f (η2)2 K(η3 −η1

h

)L

(η4 −η1

h

)K

(η5 −η2

h

)L

(η6 −η2

h

)f (η1) f (η2) f (η3) f (η4) f (η5) f (η6)dη1dη2dη3dη4dη5dη6

=D3 +G3, (F.29)

12The term in the summation becomes 0 when one of the following conditions happens, s1 = t1, s2 = t1, s1 = t2,

or s2 = t2.

53

where

|D3| =∣∣∣∣∣ 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(η1)ζ(η2)

f (η1)2 f (η2)2 K(η3 −η1

h

)L

(η4 −η1

h

)K

(η5 −η2

h

)L

(η6 −η2

h

)(

ft1,t2,s1,s2,l1,l2 (η1,η2,η3,η4,η5,η6)− f (η1) f (η2) f (η3) f (η4) f (η5) f (η6))

dη1dη2dη3dη4dη5dη6∣∣

≤ 1

n5h2

∑Θ6

|G (s1, t1, l1, t2)|∫

...∫ ∣∣∣∣ ζ(η1)ζ(η2)

f (η1)2 f (η2)2 K(η3 −η1

h

)L

(η4 −η1

h

)K

(η5 −η2

h

)L

(η6 −η2

h

)∣∣∣∣∣∣( ft1,t2,s1,s2,l1,l2 (η1,η2,η3,η4,η5,η6)− f (η1) f (η2) f (η3) f (η4) f (η5) f (η6)

)∣∣dη1dη2dη3dη4dη5dη6

≤ 1

n5h2

∑Θ6

|G (s1, t1, l1, t2)|∫

...∫ ∣∣∣∣ ζ(η1)ζ(η2)

f (η1)2 f (η2)2

∣∣∣∣ ∣∣∣K (η3 −η1

h

)∣∣∣ ∣∣∣L (η4 −η1

h

)∣∣∣ ∣∣∣K (η5 −η2

h

)∣∣∣ ∣∣∣L (η6 −η2

h

)∣∣∣δ6(t1, t2, s1, s2, l1, l2)dη1dη2dη3dη4dη5dη6

= 1

n5h2

∑Θ6

|G (s1, t1, l1, t2)|δ6(t1, t2, s1, s2, l1, l2)∫

...∫ ∣∣∣∣ ζ(z1)ζ(z2)

f (z1)2 f (z2)2

∣∣∣∣ |K (w1)| |L(w2)|

|K (w3)| |L(w4)|h4d z1d z2d w1d w2d w3d w4

=h2

n5

∑Θ6

|G (s1, t1, l1, t2)|δ6(t1, t2, s1, s2, l1, l2)∫ ∣∣∣∣ ζ(z1)

f (z1)2

∣∣∣∣d z1

∫ ∣∣∣∣ ζ(z2)

f (z2)2

∣∣∣∣d z2

∫|K (w1)|d w1∫

|L(w2)|d w2

∫|K (w3)|d w3

∫|L(w4)|d w4

≤C h2

n5

∑Θ6

δ6(t1, t2, s1, s2, l1, l2) =O(h2), (F.30)

where |G (s1, t1, l1, t2)| <∞,∫ ∣∣∣ ζ(z)

f (z)2

∣∣∣d z <∞,∫ |K (w)|d w <∞,

∫ |L(w)|d w <∞, and

∑Θ6

δ6(t1, t2, s1, s2, l1, l2) =O(n5). (F.31)

While for G3,

G3 = 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(η1)ζ(η2)

f (η1)2 f (η2)2 K(η3 −η1

h

)L

(η4 −η1

h

)K

(η5 −η2

h

)L

(η6 −η2

h

)f (η1) f (η2) f (η3) f (η4) f (η5) f (η6)dη1dη2dη3dη4dη5dη6

= 1

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

...∫

ζ(z1)ζ(z2)

f (z1)2 f (z2)2 K (w1)L(w2)K (w3)L(w4) f (z1) f (z2) f (z1 +w1h)

f (z1 +w2h) f (z2 +w3h) f (z2 +w4h)h4d z1d z2d w1d w2d w3d w4

=1+o(1)

n5h2

∑Θ6

G (s1, t1, l1, t2)∫

ζ(z1)ζ(z2)

f (z1)2 f (z2)2 K (w1)L(w2)K (w3)L(w4) f (z1) f (z2) f (z1)

(f (z1)+ f (1)(z)w2h

)f (z2)

(f (z2)+ f (1)(z)w4h

)h4d z1d z2d w1d w2d w3d w4

= (1+o(1))h4

n5

∑Θ6

G (s1, t1, l1, t2)∫ζ(z1) f (1)(z1)d z1

∫ζ(z2) f (1)(z2)d z2

∫K (w1)d w1

∫w2L(w2)d w2

54

∫K (w3)d w3

∫w4L(w4)d w4

= (1+o(1))h4κ221

n5

∫ζ(z1) f (1)(z1)d z1

∫ζ(z2) f (1)(z2)d z2

∑Θ6

G (s1, t1, l1, t2) =O(h4), (F.32)

where κ21 =∫

wL(w)d w = ∫w2K (w)d w <∞,

∫K (w)d w = 1,

∫ζ(z) f (1)(z)d z <∞, and

∑Θ6

G (s1, t1, l1, t2) =∑Θ6

(g (τs1 )− g (τt1 )

)(g (τl1 )− g (τt2 )

)=O(n5). (F.33)

Therefore,

G4 =O(h4) = o(1). (F.34)

Hence,

1

n5h2

∑Θ6

G (s1, t1, l1, t2)E

[ζ(vt1 )ζ(vt2 )

f (vt1 )2 f (vt2 )2 K( vs1 − vt1

h

)L

( vs2 − vt1

h

)K

( vl1 − vt2

h

)L

( vl2 − vt2

h

)]=o(1). (F.35)

(2) Let G (s, t , l ) = G (s, t , l , t ), and let Θ3 denote t 6= s 6= l , and assume t1 = t2 = t , s1 = s2 =s,l1 = l2 = l ,.

1

n5h2

∑Θ3

G (s, t , l )E

[ζ(vt )2

f (vt )4 K( vs − vt

h

)L

( vs − vt

h

)K

( vl − vt

h

)L

( vl − vt

h

)]

= 1

n5h2

∑Θ3

G (s, t , l )Ñ

ζ(vt )2


h

)L

( vs − vt

h

)K

( vl − vt

h

)L

( vl − vt

h

)f (vt , vs , vl )d vt d vsd vl

= 1

n5h2

∑Θ3

G (s, t , l )Ñ

ζ(z)2

f (z)4 K (w1)L(w1)K (w2)L(w2) ft ,s,l (z, z +w1h, z +w2h)h2d w1d w1d z

= 1+o(1)

n5

∑Θ3

G (s, t , l )Ñ

ζ(z)2

f (z)4 w21K 2(w1)w2

2K 2(w2) f (2)(2,3)t ,s,l (z, z, z)h2d w1d w1d z

= h2(1+o(1))

n5

∑Θ3

G (s, t , l )∫ ζ(z)2 f (2)(2,3)

t ,s,l (z, z, z)

f (z)4 d z∫

w21K 2(w1)d w1

∫w2

2K 2(w2)d w2

= h2κ222(1+o(1))

n2

1

n3

∑Θ3

G (s, t , l )∫ ζ(z)2 f (2)(2,3)

t ,s,l (z, z, z)

f (z)4 d z, (F.36)

where ∣∣∣∣∣∣ 1

n3

∑Θ3

G (s, t , l )∫ ζ(z)2 f (2)(2,3)

t ,s,l (z, z, z)

f (z)4 d z

∣∣∣∣∣∣≤ 1

n3

∑Θ3

∣∣(g (τs)− g (τt ))∣∣ ∣∣(g (τl )− g (τt )

)∣∣ ∣∣∣∣∣∣∫ ζ(z)2 f (2)(2,3)

t ,s,l (z, z, z)

f (z)4 d z

∣∣∣∣∣∣<∞, (F.37)

55

given that function g (·) is bounded and

maxt ,s,l

∣∣∣∣∣∣∫ ζ(z)2 f (2)(2,3)

t ,s,l (z, z, z)

f (z)4 d z

∣∣∣∣∣∣<∞. (F.38)

Therefore, when t1 = t2 = t , s1 = s2 = s, l1 = l2 = l , s 6= t 6= l ,

1

n5h2

∑Θ2

G (s, t , l , t )E

[ζ(vt )2


h

)L

( vs − vt

h

)K

( vl − vt

h

)L

( vl − vt

h

)]

=O

(h2

n2

)= o(1). (F.39)

Therefore,

E[I 22n] = o(1), (F.40)

which implies

I2n = op (1). (F.41)

Appendix G Proofs of Proposition D.1

The proposition suggests that Ji (n) is op (1) for i = 1,2, ...,6. As the method of proof for each

Ji is the same, we only prove the result for J1(n) and others can be proved similarly. Recall

that

J1(n) = 2

n3h2

n∑t=1

n∑s=1

K 2( vs−vt

h

)f (vt )4

(f (vt )− f (vt )

)2x2

s

= 2

n3h2

n∑t=1

n∑s=1

K 2( vs−vt

h

)x2

s

f (vt )4

1

n2h2

(n∑

q=1

(K

( vq − vt

h

)−K

(vq − vt

h

)))2

= 2

n5h4

n∑t=1

n∑s=1

K 2( vs−vt

h

)x2

s

f (vt )4

n∑q=1

(K

( vq − vt

h

)−K

(vq − vt

h

))2

+ 2

n5h4

n∑t=1

n∑s=1

K 2( vs−vt

h

)x2

s

f (vt )4

n∑q=1

n∑j=1j 6=q

(K

( vq − vt

h

)−K

(vq − vt

h

))(K

( v j − vt

h

)−K

(v j − vt

h

))

= 2

n5h4

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4

(K

( vq − vt

h

)−K

(vq − vt

h

))2

+ 2

n5h4

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

K 2( vs−vt

h

)x2

s

f (vt )4

(K

( vq − vt

h

)−K

(vq − vt

h

))(K

( v j − vt

h

)−K

(v j − vt

h

))

56

=J11(n)+J12(n), (G.1)

where denote k(q, t ) = K ′(

vq−vt

h

), and as xt = g (τt )+vt , xt = g (τt )+vt , by Taylor expansion,

J11(n)

= 2

n5h4

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4

(K

( vq − vt

h

)−K

(vq − vt

h

))2

=2(1+op (1))

n5h4

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )

(vq − vt

h− vq − vt

h

)2

=2(1+op (1))

n5h4

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )

(vq − vq

h− vt − vt

h

)2

=2(1+op (1))

n5h6

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )((

g (τq )− g (τq ))− (

g (τt )− g (τt )))2

≤4(1+op (1))

n5h6

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )(g (τq )− g (τq )

)2

+ 4(1+op (1))

n5h6

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )(g (τt )− g (τt )

)2

=4(1+op (1))

n5h6

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )

(n∑

l=1w∗

nl (q)(g (τl )− g (τq )+ vl

))2

+ 4(1+op (1))

n5h6

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )

(n∑

l=1w∗

nl (t )(g (τl )− g (τt )+ vl

))2

=4(1+op (1))

n7h6b2

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )

(n∑

l=1K2

(τl −τq

b

)(g (τl )− g (τq )+ vl

))2

+ 4(1+op (1))

n7h6b2

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )

(n∑

l=1K2

(τl −τt

b

)(g (τl )− g (τt )+ vl

))2

=4(1+op (1))

n7h6b2

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )n∑

l=1K 2

2

(τl −τq

b

)(g (τl )− g (τq )+ vl

)2 + 4(1+op (1))

n7h6b2

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )n∑

l=1

n∑r=1r 6=l

K2

(τl −τq

b

)(g (τl )− g (τq )+ vl

)K2

(τr −τq

b

)(g (τr )− g (τq )+ vr

)

+ 4(1+op (1))

n7h6b2

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )n∑

l=1K 2

2

(τl −τt

b

)(g (τl )− g (τt )+ vl

)2 + 4(1+op (1))

n7h6b2

n∑t=1

n∑s=1

n∑q=1

K 2( vs−vt

h

)x2

s

f (vt )4 k2(q, t )n∑

l=1

n∑r=1r 6=l

K2

(τl −τt

b

)(g (τl )− g (τt )+ vl

)K2

(τr −τt

b

)(g (τr )− g (τt )+ vr

)

=4(1+op (1))

n7h6b2

( n∑t=1

n∑s=1

n∑q=1

n∑l=1Φ1,ss(t )Ψqq (t )Πl l (q, q)+

n∑t=1

n∑s=1

n∑q=1

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψqq (t )Πl r (q, q)

57

+n∑

t=1

n∑s=1

n∑q=1

n∑l=1Φ1,ss(t )Ψqq (t )Πl l (t , t )+

n∑t=1

n∑s=1

n∑q=1

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψqq (t )Πl r (t , t )), (G.2)

where

Ψq j (t ) = k(q, t )k( j , t ) = K ′( vq − vt

h

)K ′

( v j − vt

h

), (G.3)

Πl r (t , j ) = K2

(τl −τt

b

)(g (τl )− g (τt )+ vl

)K2

(τr −τ j

b

)(g (τr )− g (τ j )+ vr

), (G.4)

and

Φ1,ss(t ) = K 2( vs−vt

h

)x2

s

f (vt )4 . (G.5)

Follow similar steps as above, we have

J12(n) = 2(1+op (1))

n7h6b2

( n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1Φ1,ss(t )Ψq j (t )Πl l (q, j )+

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψq j (t )

Πl r (q, j )−n∑

t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1Φ1,ss(t )Ψq j (t )Πl l (q, t )−

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψq j (t )

Πl r (q, t )−n∑

t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1Φ1,ss(t )Ψq j (t )Πl l (t , j )−

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψq j (t )Πl r (t , j )

+n∑

t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1Φ1,ss(t )Ψq j (t )Πl l (t , t )+

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψq j (t )Πl r (t , t )).

(G.6)

We argue that the 12 summation terms that forms J11(n) and J12(n) are all op (1). As the

proofs are similar for the 12 terms to be op (1), we only prove the first term in J11 and the

last term in J12 as follows.

4

n7h6b2

n∑t=1

n∑s=1

n∑q=1

n∑l=1Φ1,ss(t )Ψqq (t )Πl l (q, q)

= 4

n7h6b2

n∑t=1

n∑s=1

n∑q=1

n∑l=1

K 2( vs−vt

h

)f (vt )4 (g (τs)+ vs)2k2

( vq − vt

h

)K 2

(τl −τq

b

)(g (τl )− g (τq )+ vl )2,

(G.7)

with the leading term as

P1 = 4

n7h6b2

n∑t=1

n∑s=1

n∑q=1

n∑l=1

K 2( vs−vt

h

)f (vt )4 g (τs)2k2

( vq − vt

h

)K 2

(τl −τq

b

)(g (τl )− g (τq ))2. (G.8)

58

We only consider the condition when t 6= s 6= q 6= l , denoted as Θ4. Since both functions

g (·) and K (·) are bounded, κ2 =∫

K 2(w)d w <∞,∫

k2(w)d w = ∫K

′2(w)d w <∞, and

maxt ,s,q

∣∣∣∣∫ ft ,s,q (z, z, z)

f (z)4 d z

∣∣∣∣<∞, (G.9)

in addition, we have nh2 →∞, nb2 →∞ when n →∞. Therefore,

E[P1] = E

[4

n7h6b2

∑Θ4

K 2( vs−vt

h

)f (vt )4 g (τs)2k2

( vq − vt

h

)K 2

(τl −τq

b

)(g (τl )− g (τq ))2

]

= 4

n7h6b2

∑Θ4

g (τs)2 (g (τl )− g (τq )

)2 K 2(τl −τq

b

)E

[K 2

( vs−vth

)f (vt )4 k2

( vq − vt

h

)]

= 4

n7h6b2

∑Θ4

g (τs)2 (g (τl )− g (τq )

)2 K 2(τl −τq

b

)Ñ K 2( vs−vt

h

)k2

(vq−vt

h

)f (vt , vs , vq )

f (vt )4 d vt d vsd vq

= 4

n7h4b2

∑Θ4

g (τs)2 (g (τl )− g (τq )

)2 K 2(τl −τq

b

)ÑK 2(w1)k2(w2)

f (z)4 ft ,s,q (z, z +w1h, z +w2h)

d zd w1d w2

=4(1+o(1))

n7h4b2

∑Θ4

g (τs)2 (g (τl )− g (τq )

)2 K 2(τl −τq

b

)Ñ K 2(w1)k2(w2) ft ,s,q (z, z, z)

f (z)4 d zd w1d w2

=4(1+o(1))

n3h4b2

1

n4

∑Θ4

g (τs)2 (g (τl )− g (τq )

)2 K 2(τl −τq

b

)∫K 2(w1)d w1

∫k2(w2)d w2∫ ft ,s,q (z, z, z)

f (z)4 d z =O

(1

n3h4b2

)= o(1), (G.10)

We then consider the last term in J12.

2

n7h6b2

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

Φ1,ss(t )Ψq j (t )Πl r (t , t )

= 2

n7h6b2

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

K 2( vs−vt

h

)x2

s

f (vt )4 k( vq − vt

h

)k

( v j − vt

h

)K2

(τl −τt

b

)(g (τl )− g (τt )+ vl

)

K2

(τr −τt

b

)(g (τr )− g (τt )+ vr

), (G.11)

has the leading term as

P2 = 2

n7h6b2

n∑t=1

n∑s=1

n∑q=1

n∑j=1j 6=q

n∑l=1

n∑r=1r 6=l

K 2( vs−vt

h

)g (τs)2


h

)k

( v j − vt

h

)K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

). (G.12)

59

We only consider the condition when t 6= s 6= q 6= j 6= l 6= r , denoted byΘ6. Therefore,

E

[2

n7h6b2

∑Θ6

K 2( vs−vt

h

)g (τs)2


h

)k

( v j − vt

h

)K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

)]= 2

n7h6b2

∑Θ6

E

[K 2

( vs−vth

)f (vt )4 k

( vq − vt

h

)k

( v j − vt

h

)]g (τs)2K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

)= 2

n7h6b2

∑Θ6

∫∫∫∫ K 2( vs−vt

h

)f (vt )4 k

( vq − vt

h

)k

( v j − vt

h

)f (vt , vs , vq , v j )d vt d vsd vq d v j

g (τs)2K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

)= 2

n7h6b2

∑Θ6

∫∫∫∫K 2(w1)

f (z)4 k(w2)k(w3) f (z, z +w1h, z +w2h, z +w3h)h3d zd w1d w2d w3

g (τs)2K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

)=2(1+o(1))

n7h3b2

∑Θ6

∫K 2(w1)d w1

∫k(w2)d w2

∫k(w3)d w3

∫ ft ,s,q, j (z, z, z, z)

f (z)4 d z

g (τs)2K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

)=2(1+o(1))

n7h3b2

∑Θ6

C (t , s, q, j )g (τs)2K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

),

(G.13)

Since κ2 =∫

K 2(w)d w <∞,∫

k(w)d w = ∫K ′(w)d w <∞, and

maxt ,s,q, j

∣∣∣∣∫ ft ,s,q, j (z, z, z, z)

f (z)4 d z

∣∣∣∣<∞, (G.14)

we have

∣∣C (t , s, q, j )∣∣= ∣∣∣∣∫ K 2(w1)d w1

∫k(w2)d w2

∫k(w3)d w3

∫ ft ,s,q, j (z, z, z, z)

f (z)4 d z

∣∣∣∣≤C ,

(G.15)

for some C > 0. Meanwhile,∣∣∣∣∣ 2

n7h3b2

∑Θ6

C (t , s, q, j )g (τs)2K2

(τl −τt

b

)(g (τl )− g (τt )

)K2

(τr −τt

b

)(g (τr )− g (τt )

)∣∣∣∣∣≤ 2

n7h3b2

∑Θ6

∣∣C (t , s, q, j )∣∣g (τs)2

∣∣∣K2

(τl −τt

b

)∣∣∣ ∣∣(g (τl )− g (τt ))∣∣ ∣∣∣K2

(τr −τt

b

)∣∣∣ ∣∣(g (τr )− g (τt ))∣∣

60

≤ 2C

n5h3b2

∑t 6=s 6=l≤r

g (τs)2∣∣∣K2

(τl −τt

b

)∣∣∣ ∣∣(g (τl )− g (τt ))∣∣ ∣∣∣K2

(τr −τt

b

)∣∣∣ ∣∣(g (τr )− g (τt ))∣∣

= (1+o(1))2C

nh3b2

∫∫∫∫g (η1)2

∣∣∣K2

(η2 −η3

b

)∣∣∣ ∣∣g (η2)− g (η3)∣∣ ∣∣∣K2

(η4 −η3

b

)∣∣∣ ∣∣g (η4)− g (η3)∣∣

dη1dη2dη3dη4

= (1+o(1))2C

nh3

∫∫∫∫g (η1)2 |K2(w1)| ∣∣(g (η3 +w1b)− g (η3)

)∣∣ |K2(w2)| ∣∣(g (η3 +w2b)− g (η3))∣∣

dη1dη2d w1d w2

= (1+o(1))2C

nh3

∫∫∫∫g (η1)2 |K2(w1)| ∣∣g ′(η2)w1b

∣∣ |K2(w2)| ∣∣g ′(η2)w2b∣∣dη1dη2d w1d w2

= (1+o(1))2b2C

nh3

∫g (η1)2dη1

(∫|w1K2(w1)|d w1

)2 ∫ ∣∣g ′(η2)∣∣2 dη2 =O

(b2

nh3

)= o(1),

(G.16)

where we assumed that∫

g (η1)2dη1 < ∞,∫ |wK2(w)|d w < ∞,

∫ ∣∣g ′(η2)∣∣2 dη2 < ∞, and

b2/nh3 → 0 as n → ∞. Therefore, the expectation of the leading term of J12 is o(1). The

expectation of the rest of the terms in J11(n) and J12(n) are also o(1). Since J1(n) is a sum-

mation of positive terms, and E[J1(n)] = o(1), hence,

J1(n) = op (1). (G.17)

Following similar methods, we can prove that Ji (n) = op (1) for i = 2,3, ...,6. Thus we com-

pleted the proof.

61

Linear Regression for Trending Time Series with Endogeneity · 2015. 12. 2. · Nelson and Plosser...

Documents

Transcript of Linear Regression for Trending Time Series with Endogeneity · 2015. 12. 2. · Nelson and Plosser...