Download - Intro Reg Mod

Transcript

Introduction to Regression Modeling,A SummaryBovasAbraham&JohannesLedolter2. SimpleLinearRegression2.1TheModely= + = 0 + 1x + Withnpairsofobservations(xi, yi)wherei=1, 2, . . . , n, wecancharacterizetheseobser-vationsasyi= 0 + 1xi + i2.1.1ImportantAssumptions1. xiaretakenasconstants,notrandomvariables,foralli = 1, 2, . . . , n2. E(i) = 0foralli = 1, 2, . . . , ni= E(yi) = 0 + 1xiforalli = 1, 2, . . . , n3. V (i) = 2foralli = 1, 2, . . . , n,i.e.,allobservationshavethesameprecision4. Cov(i, j) = 0 for all i = j, i.e., dierent errors i and j, and hence dierent responsesyiandyjareindependent2.1.2ObjectivesoftheAnalysis1. Canweestablisharelationshipbetweenyandx?2. Canwepredictyfromx?Towhatextentcanwepredictyfromx?3. Canwecontrolybyusingx?2.2EstimationofParameters2.2.1MaximumLikelihoodEstimationMaximumlikelihoodestimationselectstheestimatesoftheparameterssuchthatthelikeli-hoodfunctionismaximized. Aprobabilitydistributionforymustbespeciedifonewantstousethisapproach.Assumei N(0, 2). Thisimpliesthatyi N(0+ 1xi, 2). Thepdf fortheithresponseyiisp(yi|0, 1, 2) =122exp_122(yi01xi)2_andthejointpdfofy1, y2, . . . , ynisp(y1, . . . , yn|0, 1, 2) =_122_nexp_122n

i=1(yi01xi)2_1Treating this as a function of the parameters leads us to the likelihood function L(0, 1, 2|y1, . . . , yn),anditslogarithml(0, 1, 2|y1, . . . , yn) = n2ln (2) nln 122n

i=1(yi01xi)2TheMLEsof0, 1, 2maximizel(0, 1, 2).Maximizing the log-likelihood function wrt 0 and 1 is equivalent to minimizing S(0, 1) =

ni=1(yi01xi)2,whichisreferredtoasthemethodofleastsquares.2.2.2LeastSquaresEstimationOnewantstoobtainalinei= 0 +1xithatisclosesttothepoints(xi, yi). Theerrorsi= yii= yi01xishouldbeassmallaspossible. Oneapproachtoachievethisistominimizethefunctionbelowwrt0and1.S(0, 1) =n

i=12i=n

i=1(yii)2=n

i=1(yi01xi)2Solvingfortherst-orderconditionsleadstothenormalequations:n0 + (n

i=1xi)1=n

i=1yi(n

i=1xi)0 + (n

i=1x2i)1=n

i=1xiyiThesolutionstothenormalequationsarecalledtheLSEsof0and1.1=

ni=1(xi x)(yi y)

ni=1(xi x)2=sxysxx0= y 1 x2.3FittedValues,Residuals,andtheEstimateof2Theexpression yi= i=0 +1xiiscalledthettedvaluethatcorrespondstotheithobservationwithxiasthevaluefortheexplanatoryvariable. Thedierencebetweentheobservedvalueyiandthettedvalue i, yi i=ei, isreferredtoastheresidual. Itistheverticaldistancebetweentheobservationyiandtheestimatedline ievaluatedatxi.2.3.1ConsequencesoftheLeastSquaresFitLSEsset thederivativeof S(0, 1) equal tozero. Theequations, evaluatedat theleastsquaresestimates,S(0, 1)0= 2n

i=1[yi(0+ 1xi)] = 0 andS(0, 1)1= 2n

i=1[yi(0+ 1xi)]xi= 0implycertainrestrictions:2i.

ni=1ei= 0(F.O.C.wrt0)ii.

ni=1eixi= 0(F.O.C.wrt1)iii.

ni=1 iei= 0(fromresultsin(i)and(ii))iv. ( x, y)isapointontheline =0 +1xv. S(0, 1) =

ni=1e2iistheminimumofS(0, 1)2.3.2Estimationof2Maximizationofthelog-likelihoodfunctionwrt2leadstotheMLE 2=S(0, 1)n=

ni=1e2inThenumeratoriscalledtheresidualsumofsquares;itistheminimumofS(0, 1).TheLSEof2,alsocalledthemeansquareerror(MSE),iss2=S(0, 1)n 2=

ni=1e2in 2Theresidualsumofsquares,S(0, 1) =

ni=1e2i,consistsofnsquaredresiduals. However,theminimizationof S(0, 1)hasintroducedtwoconstraintsamongthesenresiduals; see(i)and(ii)givenpreviously. Hence, onlyn 2residualsareneededforthecomputation.Theremainingtworesidualscanalwaysbecalculatedfrom

ni=1ei=

ni=1eixi=0. Onesaysthattheresidualsumofsquareshasn 2degreesoffreedom.2.4PropertiesofLSEsLetuswritetheLSEof1inslightlymoreconvenientform:1=

ni=1(xi x)(yi y)

ni=1(xi x)2=

ni=1(xi x)yi y

ni=1(xi x)

ni=1(xi x)2=

ni=1(xi x)yi

ni=1(xi x)2=n

i=1ciyiwhereci=(xi x)

ni=1(xi x)2=(xi x)sxxTheconstantscihaveseveralinterestingproperties:i.

ni=1ci=

ni=1(xi x)/sxx= 0ii.

ni=1cixi=

ni=1xi(xi x)/sxx= 1iii.

ni=1c2i=

ni=1(xi x)2/s2xx= 1/sxxTheseresultscanbeusedtoderivetheexpectedvaluesandthevariancesoftheLSEs.32.4.1ExpectedValuesofLSEs1. E(1) = 1. So1isanunbiasedestimatorof1.E(1) =_n

i=1ciyi_=n

i=1ciE(yi) =n

i=1ci(0 + 1xi) = 0n

i=1ci + 1n

i=1cixi= 12. E(0) = 0. So0isalsounbiasedfor0.E(0) = E( y 1 x) = E( y) xE(1) = E(0 + 1 x) 1 x = 0 + 1 x 1 x = 03. E( 0) = E(0 +1x0) = 0 + 1x0= 0. Hence, 0isunbiasedfor0.4. E(s2) = 2canalsobeshown(SeeChapter4).2.4.2VariancesofLSEs1. V (1) = V (

ni=1ciyi) =

ni=1c2iV (yi) =

ni=1ci2= 2/sxx2. Firstshowthat0= y 1 x =n

i=1_yin_ xn

i=1(xi x)yisxx=n

i=1kiyiwhereki=1n x(xi x)sxxThen,V (0) =

ni=1k2i2= 2_1n+ x2sxx_3. ForthevarianceofV ( 0),wewrite 0=0 +1x0= y 1 x +1x0= y +1(x0 x)=n

i=1_yin+ (x0 x)(xi x)yisxx_=n

i=1diyiwheredi=1n+(x0 x)(xi x)sxxThenV ( 0) =

ni=1d2i2= 2_1n+(x0 x)2sxx_2.5InferencesabouttheRegressionParametersTheuncertaintyintheestimatescanbeexpressedthroughcondenceintervals,andforthisoneneedstomakeassumptionsaboutthedistributionoftheerrors. Assumeyi= 0 + 1xi + iwherei N(0, 2)42.5.1Inferenceabout11=

ni=1ciyiisalinearcombinationofyi N(i, 2). Therefore1 N_1,2sxx_or,afterstandardization,11/sxx N(0, 1)2is unknown and must be estimated. Substituting the factor with estimated error variances2,T=11s/sxx=11/sxx_(n 2)s22(n 2) tn2sinceZ=11/sxx N(0, 1) and U=(n 2)s22 2n2The estimated standard deviation of1, s/sxx, is also referred to as the standarderrorandtellsusaboutthevariabilityofthesamplingdistributionof1.Let t(1/2; n2) denote the 100(1/2) percentile of a t distribution with n2 degreesof freedom. Sincethetdistributionissymmetric, the100(/2)percentilet(/2; n 2)=t(1 /2; n 2). ThenthesamplingdistributionofTimpliesthatP_t_1 2; n 2_ 0. Wecallthematrixpositivesemideniteifthequadraticformy

Ay 0forallyandy

Ay = 0forsomevectory = 0.Wecall thesymmetricmatrixAnegativedeniteifforall vectorsy =0thequadraticform y

Ay < 0. We call the matrix negative semidenite if the quadratic form y

Ay 0 forallyandy

Ay = 0forsomevectory = 0.Orthogonal Matrix AsquarematrixAiscalledorthogonalifAA

= I. SinceA

= A1istheinverseof A, itfollowsthatA

A=I. Hence, anorthogonal matrixsatisesAA

=A

A = I. Therowsofanorthogonalmatrixaremutuallyorthogonalandthelengthoftherowsisone. ThesamecanbesaidaboutthecolumnsofA. Furthermore,thedeterminant|A| = 1. Thisfollowssince |AA

| = |A||A

| = |A|2= |I| = 1.Trace of a Square Matrix The trace of a square mm matrix A is dened as the sum ofitsdiagonalelements;thatis,tr(A) =

mi=1aii. Thedenitionimpliesthattr(A) = tr(A

),tr(A + B) =tr(A) + tr(B). Providedthat thematrices C, D, andEareconformable,tr(CDE) = tr(ECD) = tr(DEC). Conformablemeansthatthedimensionsofthematricesaresuchthatalltheseproductsaredened.IdempotentMatrix AsquarematrixAisidempotentifAA=A. Thedeterminantofanidempotentmatrixiseither0or1. Therankof anidempotentmatrixisequal tothesumofitsdiagonalelements.Vector Space A vector space Vis the set of all vectors that is closed under addition andmultiplication by a scalar and that contains the null vector 0. A set of linearly independentvectors in Rn, x1, x2, . . . , xnis said to span a vector space of dimension n; any other memberofthisvectorspacecanbegeneratedthroughlinearcombinationsofthesevectors. Wecallthissetabasisofthevectorspace. Basisvectorsarenotunique. Sometimesitisusefultoworkwithorthonormalbasisvectors. Thatis, vectorsu1, u2, . . . , unthatareorthogonal(u

iuj= 0fori = j)andhavelengthone(u

iui= 1). Suchorthonormalbasisvectorsalways12exist,andtheGram-Schmidtorthogonalizationcanconstructthemfromagivensetoflinearlyindependentbasisvectorsx1, x2, . . . , xn.SubspaceofRnConsiderk0areknownspeciedweights. ThiscriterionisequivalenttotheoneforGLS,withV1adiagonalmatrixhavingdiagonalelementswi. TheWLSestimatorisgivenbyWLS=_n

i=1wixix

i_1_n

i=1wixiyi_withvarianceV (WLS) = 2_n

i=1wixix

i_1265. SpecicationIssuesinRegressionModels5.1ElementarySpecialCases5.1.1One-sampleProblemSupposey1, . . . , ynobservationsaretakenunderuniformconditionsforastableprocesswithmeanlevel0.yi= 0 + iwith E(yi) = 0InthiscaseE(y) = Xwherey =__y1...yn__; X=__1...1__; = 05.1.2Two-sampleProblemSupposey1, . . . , ymaretakenunderonesetof conditions(standardprocess), whereastheremaining nm observations ym+1, . . . , ynare taken under a dierent set of conditions (newprocess). Let1denotethemeanof thestandardprocessand2denotethemeanof thenewprocess. Thenyi=_1 + ifor i = 1, . . . , m2 + ifor i = m + 1, . . . , nThiscanalsobewrittenasyi= 1xi1 + 2xi2 + iwherexi1=1ifi=1, . . . , mand0ifi=m + 1, . . . , nandxi2=0ifi=1, . . . , mand1ifi = m + 1, . . . , n. Inmatrixform,y = X +whereX=__1 0......1 0 0 1......0 1__and =__12__Ourinterestistotestwhetherthetwoprocesseshavethesamemean,i.e.,1= 2.Equivalently, wecanwrite2=1 + and=2 1, thedierenceof theprocessmeans. Thenyi= 1 + xi2 + iory = X +where,inthiscase, = (1, )

. Ourhypothesisis= 21= 0.275.1.3PolynomialModelsIfoneassumesthataprocessisquadraticinanexplanatoryvariable,yi= 0 + 1xi + 2x2i+ iorE(y) = XwhereX=__1 x1x21.........1 xnx2n__; =__012__5.2SystemsofStraightLinesSupposeyiisexpectedtochangelinearlywithacontinuousexplanatoryvariabletiandisobservedundertwodierentprocessesfori = 1, . . . , mandi = m + 1, . . . , n.Casea) Theprocesshasequaleectatalllevelsofti:E(yi) = 0 + 1ti + 2xi=_0 + 1tifor i = 1, . . . , m(0 + 2) + 1tifor i = m + 1, . . . , nThemodelrepresentstwoparallellineswhere2representsthechangeduetoprocess.Caseb) Theprocesshasaneectthatchangeswiththelevelofti:E(yi) = 0 + 1ti + 2xi + 3tixi=_0 + 1tifor i = 1, . . . , m(0 + 2) + (1 + 3)tifor i = m + 1, . . . , nGraphically, this model has 2 straight lines with dierent slopes and dierent intercepts. Totestwhethertheprocesshasanyeect, wetestthehypothesis2=3=0. Atestofjust3= 0testwhethertheeectofprocessdieroverti.5.3ComparisonofSeveralTreatmentsThisisalsoknownasone-wayclassication,orthek-sampleproblem.Process Observations1 y11y12 y1n12 y21y22 y2n1...............k yk1yk2 ykn128Asintwo-processmodels, E(yij) =iforall j =1, . . . , niorE(y) =X=1x1+2x2 + + kxkwherexiarestringsofzerosandones,indicatingthegroupmembershipoftheobservations.y =__y11...yn1y21...yn2...yk1...ynk__; X= [x1, . . . , xk] =__1 0 0............1 0 0 0 1 0............0 1 0 ............ 0 0 1............0 0 1__; =__12...k__TheLSEof, = (X

X)1X

yiseasy:1= y1, . . . , k= yk. Thehypothesisofinterestisthe equality of the k means, 1= = k. An equivalent but more convenient representationrelates the group means to the mean of a reference group, group 1 for instance. Let i= 1+ifori = 2, . . . , k.E(yij) =_1for i = 11 + ifor i = 2, . . . , kThenE(y) = XwhereX=__1 0 0 0 1 0 ............ 1 0 1__; =__12...k__; and =__ y1 y2 y1... yk y1__29Thenullhypothesisisnowexpressedas2= = k= 0. ThenSSR =

X

y n y2=k

i=1ni( yi y)2Sincetherearek 1regressorvariables (inadditiontotheintercept), thedegrees offreedomfortheSSRarek 1.SSE= S() = y

y

X

y =k

i=1ni

j=1(yij yi)2withdegreesoffreedomn k,wheren =

ki=1ni.Totest1=2= =k, orequivalently, 2=3= =k=0, weperformtheF-testwhereF=SSR/(k 1)SSE/(n k)whichfollowsFk1,nkunderH0.5.4XMatriceswithNearlyLinear-DependentColumnsIngeneral linear model, we assume Xhas linearlyindependent columns. Nowsupposeexplanatoryvariablesx1andx2arenearlylinearlydependentinthemodelyi= 0 + 1xi1 + 2xi2 + iTestofindividualparameters(t-tests)wouldfailtorejectthehypothesesi= 0,i.e.,ifoneof thevariablesisinthemodel, theextracontributionof theothervariabletowardstheregressionisnotimportant. Thetwovariablesexpressthesameinformation,sothereisnopointtoincludeboth. SotheF-test, whichtestswhetherthevariablesaresimultaneouslyzero,willshowstrongevidenceagainstthenullhypothesis1=2=0. Thisphenomenonisknownasmulticollinearity.Withperfect multicollinearityof twovariables, then (p + 1) matrixXhas rankpinsteadofp + 1. Asaconsequence,the(p + 1) (p + 1)matrixX

Xhasrankp,anditisnotpossibletoobtaintheinverse(X

X)1.5.4.1DetectionofMulticollinearityCorrelationsAmongRegressorVariables Suppose we have p regressors and the sam-plecorrelations,rij,betweenpairsofregressorsxiandxj,rij=

nl=1(xil xi)(xjl xj)_

nl=1(xil xi)2

nl=1(xjl xj)2, i, j= 1, 2, . . . , pAmatrixofthecorrelationsC=__1 r12 r1pr121 r2p............r1pr2p 1__30provides an indication of the pairwise associations among the explanatory variables x1, . . . , xp.Ifrijislargeinabsolutevalue(closeto 1),thenthereisstrongpairwiselinearassociationamongxiandxj.It should be noted that if rijis zero, then the regressor variables are orthogonal to eachother.VarianceInationFactors Suppose we standardize the yand x variables of a regressionmodel,yi=yi ysyand zij=xij xjsj, j= 1, 2, . . . , pThelinearmodelcanbeexpressedasy= 1z1 + 2z2 + + pzp + There is no intercept in this model because the standardized variable y has mean E(y) = 0;andthecovariancematrixV () =(X

X)12reducestoV ( ) =C12. ThediagonalelementsofC1arethescaledvariancesoftheleastsquareestimates,V ( i)/2. Ifrij,theo-diagonalelementsofC,werezero,thenC1hasonesinitsdiagonal,andV ( i) = 1. Ifcorrelations are large, then the diagonal elements of C1are larger than one, and V ( i)/2>1. The values V ( i)/2are called varianceinationfactors (VIF) because they measurehowthecorrelationamongtheregressorvariablesinatesthevarianceoftheestimates. Ifthesefactorsaremuchlargerthanonethenthereismulticollinearity.Inthegeneralcaseofpregressors,V IFj=11 R2jwhere R2jis the coecient of determination from the regression of xjon all other regressors.ValuesofVIFlargerthan10aretakenassolidevidenceofmulticollinearity.5.5XMaticeswithOrthogonalColumnsOrthogonalityis anattractivepropertyandthereareadvantages tochoosingorthogonalregressorvectors. Supposeamodely = X +wherethecolumnsofXareorthogonal.y =__y1y2...yn__; X= [1, x1, x2, . . . , xp] =__1 1 1 11 1 1 11 1 1 11 1 1 1...............1 1 1 11 1 1 11 1 1 11 1 1 1__; =__01...p__; =__12...n__31In an experimental situation, xi= +1 would mean that the i-th variable is on or at highlevelandxi= 1wouldmeanooratlowlevel. Fittingthemodelwithleastsquaresleadstotheestimates = (X

X)1X

y =__

yi

xi1yi

xi2yi...

xinyi__Changingxifromlowtohigh(orswitchingiton)wouldchangetheresponseby2iunits.Moreover,reducingoraddingregressorvariablestothemodeldoesnotaecttheestimatesi. Andtheregressionsums of squares areadditive. If SSR(x1) andSSR(x2) aretheregressionsumsof squaresformodelswithx1andx2respectively, andSSR(x1, x2)istheregressionsumofsquaresformodelwithregressorsx1andx2,SSR(x1, x2) = SSR(x1) + SSR(x2)Thesearespecial consequences of orthogonality. Wealsonotethat V () =(X

X)12.With orthogonality, this is a diagonal matrix and the covariances between the elements ofare zero. The additional assumption of normal errors implies that the least square estimatorsarestatisticallyindependent.326. ModelChecking6.1IntroductionAttedmodelcanbeinadequateforseveralreasons:1. functionalformmaybeinappropriate(polynomial,interactionterm)2. errorspecicationmaybeincorrect(constantvariance,normality,independence)3. unusual observations may have undue inuence on model t (outliers, inuential points)6.2ResidualAnalysisResidual plotscantell uswhetherthemodel isadequateoranystandardassumptionsareviolated.6.2.1ResidualandResidualPlotse = y Theresidualestimatestherandomcomponentinthemodel. Misspecicationanddepar-turesfromtheunderlyingassumptionsarereectedinthepatternofresiduals.WeknoweL(X),andhence,e (= X),i.e.,n

i=1ei=n

i=1eixi1=n

i=1eixi2= =n

i=1eixip=n

i=1ei i= 0Thesehold, irrespectiveof whether or not themodel isadequate. What happenstotheresidualsifsomemodelassumptionsareviolated?1. Since E(e) =E(y ) =E[(I H)y] =(I H)E(y), E(e) =0if the trueexplanatoryvalueE(y)isinL(X),i.e.,E(y) + X.(I H)E(y) = (I H)X = (I X

(X

X)1X

)X = 0ForamisspeciedmodelwhereE(y) L(X),E(e) = 0. AssumethatthetruemodelisE(y) = X +u,whereu L(X).E(e) = (I H)E(y) = (I H)(X +y) = (I H)u = 0(I H)u = 0 because u L(X). Since e is related to u in the equation above, a graphofe,theresidualsobtainedfromthemodelwithoutu,againsturevealsapattern.2. Underthemodelassumptions,eand areuncorrelated,i.e.,thettedvaluesshouldnot carry any information on the residuals. Violations of model assumptions introducecorrelationsamongeand ,andagraphofeagainst wouldreectanassociation.333. Ifstandardassumptionsaremet,V (e) = 2(I H),i.e.,V (ei) = 2(1 hii) and Cov(ei, ej) = 2hijAlthoughV (i) =2is constant andCov(i, j) =0for all i =j, V (ei) arenotidenticalandCov(ei, ej)arenotnecessarily0.Thestandardizedresidualsesi=eiswheres2= e

e/(n p 1)istheLSEof2. Thestudentizedresidualsdi=eis1 hiiWhile esiuses the variance of errors, di uses the correct variance of i-th residual. Hence,thevarianceofesiisonlyapproximately1.Theexactdistributionof diiscomplicatedbecauseeiandsarenotstatisticallyin-dependent. However, withnormal errors, di N(0, 1). |di| >2or3wouldmakeusquestion whether the model is adequate for that case i. Our model may be missing animportantcomponent.6.2.2AddedVariablePlotsLet e = y Xand eu= y [X|u]uwhere u is linearly independent to all columns of X.Aplotofeagainsteuiscalledanaddedvariableplot. Systematicpatternsintheplotindicatetheushouldbeincludedinthemodel.Justication: Assumeuispartof themodel y=X + u+ . Theresidualsof theregressionofyonXalonearee = (I H)u. Theslopeintheregressionofeoneuis =e

uee

ueu=u

(I H)yu

(I H)uNowconsidery = X +u + andregressionofyonXandu. TheLSEof(, )is__ __=__X

X X

uu

X u

u__1__X

yu

y__=__A11A12A21A22__1__X

yu

y__Wendthat =1u

(I H)u[u

Hy +u

y] =u

(I H)yu

(I H)u= Hence,anonzeroslope intheaddedvariableplotofeagainsteuindicatesthatushouldbeincluded.346.2.3CheckingtheNormalityAssumptionThe simplest approach is to compare the histogram of the studentized residuals with N(0, 1)distribution.Another, preferable, approachis toprepareanormal probabilityplot. Astraightlinepatternconrmsanormaldistribution. AnS-shapedprobabilityplotindicatesalight-taileddistributionwhereasaninvertedS-shapedprobabilityplot indicatesaheavy-taileddistribution. Othercurvedplotwouldalsoindicateskewnessoftheunderlyingdistribution.6.2.4SerialCorrelationamongtheErrorsIf a regression model is t to time series data, it is likely that the errors are serially correlated.A straight-forward approach to check for the serial correlation is to calculate the lag k samplecorrelationoftheresiduals,betweenetanditsk-thlagetk:rk=

nt=k+1etetk

nt=1e2t, k = 1, 2, . . .Ifthereisnoserialcorrelation,E(rk)= 0 and V (rk)=1nfor k > 0In addition, for large n, rkN(0,1n). A simple check for serial correlation compares rkwithitsstandarderror1n.Graphof rkas afunctionof the lagkis referredtoas the autocorrelationfunc-tionoftheresiduals. Samplecorrelationsoutsidethebandwidthof 2nareindicatorsofautocorrelation.The Durbin-Watson test statistic examines the lag 1 autocorrelation r1 in more detail:D =

nt=2(etet1)2

nt=1e2t=

nt=2e2t+

nt=2e2t12

nt=2etet1

nt=1e2t 2(1 r1)Forindividualerrors,D= 2;forcorrelatederrors,D 2.6.3TheEectofIndividualCasesNextsteptoassessingtheglobal adequacyofourttedmodel withrespecttotheformofthe model and its error structure is to examine whether all observations arise from the samemodel.6.3.1OutliersAnoutlyingcaseisdenedasaparticularobservation(yi, xi1, xi2, . . . , xip)thatdiersfromthemajorityof thecasesinthedataset. Outliersinthexdimensionarecasesthathaveunusual valuesononeormoreofthecovariates. Outliersintheydimensionarelinkedtotheregressionmodel asonetriestoexplaintheresponseasafunctionof thecovariates.Outliersintheydimensionmaybeduetoseveralreasons:351. randomcomponentoftheregressionmodelmaybeunusuallylarge2. errorindatarecording3. another covariate that is missing from the model may explain the strange observationAsimplerststepindetectingoutliersinvolvesthestudentizedresidualsdi=eis1 hiiwhichfollowanapproximate N(0, 1) distributionas longas the model assumptions aresatised. |di| > 2.5areunexpectedandareindicativeofaberrantbehaviourintheresponse(y)dimension. However, notethatacasemaybequitepeculiarandneverthelesshaveasmallstudentizedresidual.6.3.2LeverageandInuenceMeasuresWesaythat anindividual casehas amajor inuenceonagivenstatistical procedureiftheconclusionsof theanalysisaresignicantlyalteredwhenthecaseisomittedfromtheanalysis.Leverage Recall = X(X

X)1X

y = Hy. Thei-thttedvaluecanbewrittenas i= hiiyi +

j=ihijyjTheweighthiiindicateshowheavilyyicontributesto i. Ifhiiislarge(comparedtootherhijs), thenhiiyidominates i. Recall V (ei)=2(1 hii), andhencehii 1. Alargehiimeans V (ei)= 0 and i = yi, implying that the tted model will pass very close to the datapoint(yi, xi1, xi2, . . . , xip). Werefertohiiastheleverageofcasei.Propertiesofleverage:1. hiiisafunctionofthecovariatebutnottheresponsey2.1n hii 13. hiiis small for cases with xi1, . . . , xipnear the centroid ( x1, . . . , xp), and large when farfromthecentroid4.

ni=1hii=tr[H] =tr[X(X

X)1)X

] =tr[X

X(X

X)1)] =tr[Ip+1] =p + 1andh =

ni=1hii/n = (p + 1)/nAcaseforwhichhii>2h=2(p + 1)/nisusuallyconsideredahighleveragecase. Highleverage case may include misrecorded covariates or the case may reect a design point thathas been selected very dierently from the rest. High leverage is a prerequisite for making acaseahighinuencepoint,butnoteveryhighleveragecaseisinuential.36CooksInuenceMeasure Onewaytoexpressinuenceistostudyhowthedeletionof a case aects the parameter estimates. Let(i)denote the estimate of without the i-thcaseandtheestimatewithallcases. Then (i)isagoodmeasureoftheinuenceofthei-thcaseontheparameterestimates. Largeschangesinanycomponentmakethei-thcaseinuential. Onedoesnothavetocomputen + 1LSEstondthedierence. Onecan,instead,showthat(i)= ei1 hii(X

X)1xiwherex

iisthei-throwofXthatcorrespondstothedeletedcase. Hence, (i)=ei1 hii(X

X)1xiIt is oftenuseful tocondensethevector intoasinglenumber (statistic). SinceV () =s2(X

X)1, usingthe inverse of the covariance matrixas weights andstandardizingtheresultleadstothesummarymeasureDi=((i))

(X

X)((i))(p + 1)s2=e2ix

i(X

X)1xi(1 hii)2(p + 1)s2=hiid2i(1 hii)(p + 1)wherediisthestudentizedresidual. DiisknownasCooksDstatistic. Oneneedsbotha large leverage hiiand a large studentized residual dito get a sizable inuence measure Di.CooksDstatisticcanalternativelybewrittenasDi=(X X (i))

(X X (i))(p + 1)s2=( (i))

( (i))(p + 1)s2where (i)= X (i). Notethei-thelementof (i)istheout-of-samplepredictionofyi.PRESSResiduals Arelatedquantityofinterestisthepredictionerrore(i)= yi y(i)where y(i)= x

i(i) is the out-of-sample prediction. The e(i)s are called the PRESS residuals.

ni=1e2(i)isreferredtoasthepredictionerrorsumofsquares(PRESS).Seethate(i)= yix

i(i)= yix

i_ ei(X

X)1xi(1 hii)_= ei +eihii(1 hii)=ei(1 hii)37TheDEFITSStatistics Anotherrelatedmeasurecomparesthettedvalue itotheprediction y(i)ofyi. i y(i)= (yi y(i)) (yi i) = e(i)ei=eihii(1 hii)Note that this compares the tted value i to the prediction y(i) whereas the PRESS residualcomparestheobservationyito y(i).Wedenethestandardizeddierent(alsoknownasdeltats)asDEFITSi=( i y(i))_s2(i)hii_1/2=eihii(1 hii)_s2(i)hii_1/2=_Di(p + 1)s2s2(i)_1/2where s2(i)=

j=i(yjx

j(i))2np2is the unbiased estimator of 2without the i-th case. If can alsobewrittenass2(i)=(n p 1)s2e2i/(1 hii)n p 2TheDEFITSiisaslightvariantofDi. Theratios2/s2(i)willusuallybesmall,andDEFITSi =_(p + 1)Di6.4AssessingtheAdequacyoftheFunctionalForm:TestingforLackofFitA formal test of model adequacy can be performed if one has repeated observations at someof theconstellationsof theexplanatoryvariables. Considerthecaseof asingleregressorvariablerst.x1: y11,y12,. . . ,y1n1x2: y21,y22,. . . ,y2n2......xk: yk1,yk2,. . . ,yknkThedataresembleobservationsfromkgroupsandcanbecharacterizedasyij= i + ij , i = 1, . . . , k, j= 1, . . . , ni38withE(ij) = 0andV (ij) = 2. Thiscanalsobewrittenasy = X +whereX n kwithonesandzerosand = (1, . . . , k)

.TheLSE = ( 1, . . . , k)

= ( y1, . . . , yk)

. TheSSE,S() =k

i=1ni

j=1(yij yi)2hasn kdegreesoffreedom. SSEisalsoreferredtoasthepureerrorsumof squares(PESS).Foraparametricmodeli= 0 + 1xi,S() =k

i=1ni

j=1(yij01xi)2MinimizingS(), weobtain0and1andcancalculateSSES(A)whichislargerthanS() =PESS since the minimization is restricted. It involves only two parameters 0and 1compared to the kparameters (means). S(Ahas n 2 degrees of freedom. The additionalsumofsquaresisgivenbyS(A) S() =k

i=1ni( yi01xi)2It measures the lack of t of the linear model. Hence, it is referred to as the lack-of-tsumofsquares(LFSS)with(n 2) (n k) = k 2degreesoffreedom.ThetestofrestrictionisgivenbytheFstatisticF=_S(A) S()_/(k 2)S()/(n k)=LFSS/(k 2)PESS/(n k)6.4.1Lack-of-FitTestwithmorethanOneIndividualVariable(x11,x12,. . . ,x1p) : y11,y12,. . . ,y1n1(x21,x22,. . . ,x2p) : y21,y22,. . . ,y2n2......(xk1,xk2,. . . ,xkp) : yk1,yk2,. . . ,yknkPESSS() = S( 1, . . . , k)isunchanged. NowS(A)hasn p 1degreesoffreedom.TheLFSS,S(A) S(),hask p 1degreesoffreedom,andF=_S(A) S()_/(k p 1)S()/(n k)396.5Variance-StabilizingTransformationsInsituationswherethevarianceoferrorsisnotconstant,weneedacertaintransformationtostabilizethevariance. Considerthegeneralmodelyi= f(xi; ) + i= i + iwherei=E(yi)=f(xi; ). Inaddition, assumethatthevarianceof iisrelatedtothemeanisuchthatV (yi) = V (i) = [h(i)]22whereh(.)isaknownfunction. Tondatransformationg(yi)suchthatthevarianceofg(yi)isconstant,approximatethefunctionbyarst-orderTaylorseriesaroundi:g(yi) g(i) + (yii)g

(i)ThenV [g(yi)] V[g(i) + (yii)g

(i)]= [g

(i)]2V (yi)= [g

(i)]2[h(i)]22Tostabilizethevariance,choosethetransformationg(.)suchthatg

(i) =1h(i)6.5.1Box-CoxTransformationsBox-CoxTransformationsorthepowertransformationsg(yi) = (yi 1)/If = 1, no transformation is needed. If = 1, we analyze the reciprocal 1/yi. If = 1/2,weanalyzey1/2i. Itcanbeshownthatlim0[(yi 1)/] = ln(yi)We analyze ln(yi) if = 0. Box and Cox (1964) show that the MLE of minimizes SSE(),whereSSE()istheSSEfromttingtheregressionmodelwithtransformedresponsey()i= (yi 1)/ y1gwhere yg= [

ni=1yi]1/n. If = 0,wetakey(=0)i=lim0y()i= yg ln(yi)407. ModelSelection7.1IntroductionThe goal of empirical model building may be to build a model for prediction or, even simpler,to decide which of the potential explanatory variables inuence the response. In this chapter,wefocusonobservational studies. Often,thesemodelsarenotmotivatedbyanytheorybutarecompletelydescriptive.Inmodelbuilding,wewanttoselectthebestmodels,andthisrequiresadenitionofwhatwemeanbybest. Thedenitionneedstoincorporatetheconceptsof model tandmodelsimplicity(parsimony). Somesourcesofdicultywithmodelbuildingare:1. In observational studies, the constellations of the covariates may reect poor combina-tions.2. The real (observational) world and the idealized (model) world are not the same. Theremaybenotruemodelafterallbecauseallmodelsareatbestapproximations.3. The presence of high-leverage cases, outliers, and inuential cases may have an impactonthemodelselection.4. Variablesmaybegiveninacertainmetricthatmakesmodelbuildingverydicult.5. Thepurposeofthemodel maybeunclear. Amodel thatisgoodforpredictionmaynotbebestintermsof providingthemostaccuratehistorict. Thettohistoricdatacanalwaysbeimprovedbyaddingmoreregressorstothemodel. However,eachadditional regressor variable requires the estimation of a parameter. If the covariate isnotneeded, thentheunnecessaryestimationerroraddsvariabilitytotheprediction.See that the prediction of the response ynewfor a new case with covariates xnewis givenby ynew= xnew

. Thepredictionerrorynew ynew= new + (new ynew)combinesthevariabilityofthenewobservationandtheerrorinthettedvalue.ModelSelectionProcedure We consider dierent strategies for model selection. Givenobservationsonaresponseyandqpotential explanatoryvariables, v1, . . . , vq, wewanttoselectamodeloftheform,y= 0 + 1x1 + + pxp + ,wherei. x1, . . . , xpisasubsetoftheoriginalqregressors,v1, . . . , vq;ii. noimportantvariableisleftoutofthemodel;andiii. nounimportantvariableisincludedinthemodel.7.2AllPossibleRegressionsFittingallpossibleregressionsrequiresustot2qmodelsifqvariablesareinvolved.417.2.1R2,R2adj,s2,andAkaikesInformationCriterionLet R2pdenote the R2from a model containing p variables and (p+1) regression coecients.R2p= 1 SSEpSSTEverythingelse equal, one prefers models withlarger R2p. Note that if youallowmoreexibilitybyaddinganother variabletothemodel, thenyouwill automaticallydecrease(oratleastnotincrease)SSEp. Inthelimit, if youhavenobservationsandif themodelcontainsnparameters,SSE= 0andR2= 1. Forthesereasons,weconsidertheadjustedR2,whichincorporatesapenaltyforeachestimatedcoecient,R2adj.p= 1 SSEp/(n p 1)SST/(n 1)= 1 s2ps2yR2adj.pdoes not necessary increase with p. If there is no improvement in R2by the additionalvariable(nochangeinSSEp), thenumeratorwillincreaseandactuallylowerthestatistic.NotethattheadjustedR2comparesthevarianceinthedatabeforeanyregressorsareputin the model with the variance that remains after the regressors have been incorporated. Anequivalentmodel selectiontool examinesthemeansquareerrorof theresidualss2pandndsthemodelthatleadtothesmallestvalues.Akaikesinformationcriterion(AIC)isanothersummarystatisticformodelselec-tion.AICp= nln (SSEp/n) + 2(p + 1)ModelswithsmallervaluesofAICarepreferred.7.2.2CpStatisticMallowsCpstatistic(seeMallows,1973)isanotherusefulsummarystatisticthathelpsuschoose among candidate models. Denote the MSE from the full model with qregressors and(q + 1)parametersbys2. Weassumethatthelargestmodelgivesanadequatedescription,andhenceE(s2) = 2.Consideracandidatemodelwithpregressors(p q)writtenasy=X11 + . Ifthissmallermodelisalreadyadequate,thenSSEp2 2np1Hence,E(SSEp) = (n p 1)2and E_SSEpn p 1_= 2Otherwise, theMSEwill beinated. TheCpstatisticforamodel withpregressorsand(p + 1)parametersisdenedasCp=SSEps2[n 2(p + 1)]42where E(Cp)= p +1 when the candidate model is adequate and larger than p +1 otherwise.We graph Cpagainst p +1, the number of parameters, and add a line through the points(0, 0) and (q +1, q +1). Note that for the largest model with qregressors, Cq= q +1. GoodcandidatemodelsarethosewithfewvariablesandCp = p + 1.7.2.3PRESSStatisticRecallPRESSresidualse(i)= yi y(i), i = 1, 2, . . . , nwhere y(i)=x

i(i). The predictionerror sumof squares (PRESS) is takenas amodelselectioncriterion,PRESSp=n

i=1e2(i)=n

i=1_ei1 hii_2ModelswithsmallerPRESSstatisticsarepreferred.7.3AutomaticMethods7.3.1ForwardSelectionThe algorithm starts with the simplest model and adds variables as necessary. The algorithmproceedsasfollows:1. Fittheqmodelswithasinglecovariate,y= 0 + 1vk + ,k= 1, . . . , q. Setx1= vk,wherevkis thevariablethat has themoist signicantregressioncoecient. Welookatthetratiofortesting1=0or, equivalently, attheFstatistic. Ifthemostsignicant regressor is not signicant enough (i.e., its probability value is larger than apreset signicance level ), the algorithm stops. There is no need to include any of thevariables, and the model that includes just the constant is appropriate. If the smallestprobabilityvalueissmallerthanthepreset, weincludethisvariableinthemodelandgotoStep2.2. LockinthecovariateyouhavefoundinStep1, andrepeattheprocedureinStep1withmodelsthatincludetworegressors,y= 0 + 1x1 + 2vk + ,k = 1, . . . , q,vk = x1Set x2= vk, where vkis the variable that is most signicant. We establish signicancebylookingat thepartial t test of 2=0(or, equivalently, thepartial Ftest thatcomparesthefull model withtherestrictedonefoundinStep1). If theprobabilityvalueassociatedwiththepartialtestislargerthanthepreset,theprocedurestops.Ifitissmallerthan,thevariablevkisaddedtothemodel.3. Continuethis algorithmuntil noremainingvkgenerates aprobabilityvaluethat issmallerthanthepresetsignicancelevel.Onereferstothesignicancelevelasalphatoenter. Notethatthepresetsignicancelevelmattersagreatdeal.437.3.2BackwardElimination1. Startbyttingtheverylargestmodely= 0 + 1v1 + + qvq + Considerdroppingthevariablevk, whichhastheleastsignicantregressioncoe-cient. Calculatethepartial ttests(orpartial Ftests), anddeterminethecoecientwiththesmallest t ratioandlargest probabilityvalue. If this probabilityvalueissmaller than some preset signicance level, then you cannot simplify the model furtherandyoustopthealgorithm. Ifitislargerthanthepreset, youomitthisregressorfromthemodelandgotoStep2.2. RepeattheprocedureinStep1withthesimpliedmodel. Thatis,tthemodelthatdoesnotincludethedroppedvk,y= 0 + 1v1 + + k1vk1 + k+1vk+1 + + qvq + Find the least signicant regression coecient. If the probability value of this coecientis smaller than a preset , then you stop because you cannot simplify the model further.If the probability value is larger, you remove this variable from the model and continueuntil the maximum probability value for any variable left in the model is less than thepresetvalue.Thepresetsignicanceleveliscalledthealphatodrop.7.3.3StepwiseRegressionThismethodcombinesforwardselectionandbackwardelimination.1. Startasinforwardselectionusingthespeciedsignicanceleveltoenter.2. At each stage, once a variable has been included in the model, check all other variablesinthis model for their partial signicance. Remove the least signicant regressorvariableforwhichtheprobabilityvaluefortestingthehypothesisk=0isgreaterthanthepresetsignicanceleveltodrop.3. Continue until no variables can be added and none removed, according to the speciedcriteria.The most commonly used values for the preset signicance levels are between 0.15 and 0.05.All automatic algorithms should be used with caution. In situations in which there is anappreciabledegreeofmulticollinearityamongtheexplanatoryvariables,thethreemethodsmayleadtoquitedierentnal models. Insuchsituations, itispreferabletoexamineallpossible regressions because such an analysis can show that serveral dierent models performquitesimilarly(intermsofR2,s2,Cp).449. NonlinearRegressionModels9.1IntroductionLinearityoftheregressionfunctionintheparametersallowsustodevelopaclosed-formsolution for the LSE. Models that are nonlinear in the parameters,on the other hand,oftenrequireiterativemethodstoobtaintheestimates. Somenonlinearmodelscansometimesbetransformedintolinearrepresentations. However, incertainapplicationstheregressionmodelsareintrinsicallynonlinearintheparameters,meaningthatthemodelcannotbetransformedintoalinearmodel.9.2Overviewof Useful DeterministicModels, withEmphasis onNonlinearGrowthCurveModelsThesimplestgrowthmodelisthelineartrendmodel,t= + tTheexponentialtrendmodelisgivenbyt= exp(t)Note that the exponential trend model can be transformed into a linear model by taking thelogarithmonbothsides.Whilelinearandexponentialtrendsimplyunboundedgrowth,followingmodelsachievesaturationlevels. Themodiedexponentialtrendmodelt= exp(t) > 0,0 < ,> 0hasstartingvalueforatt=0of andthelimitingvalue, astapproaches , of.Likewise,limitingvaluesexistforthelogistictrendmodel,t=1 + exp(t) > 0,> 0,> 0theGompertzgrowthmodel,t= exp[ exp(t)] > 0,> 0,> 0theWeibullgrowthmodel,t= ( ) exp[(t)] > 0,< ,> 0,> 0theRichardsfamily(Richards,1959),t= [1 + ( 1) exp((t ))]1/(1)theMorgan-Mercer-Flodinfamily(Morganetal.,1975),t= 1 + (t)45Untilnow,wehavefocusedongrowthmodelswheretincreaseswithtime. Aninterestingnonlinearmodelthatdoesnothavethisincreasingpropertyisgivenbyt=112[exp(2t) exp(1t)] 1 = 2= 1t exp(1t) 1= 29.3NonlinearRegressionModelsThenonlinearfunctionsin9.2model thesignalintheobservationy. Weneedtoaddaprobabilisticerrorcomponent. Weassumethattheerrorsareadditive, independent, andnormal withmeanzeroandconstantvariance2. Thenonlinearregressionmodel isgivenbyyi= i + i= (xi, ) + iThelog-likelihoodforthismodelcanbewrittenasln L(, 2|y) = nln(2)2nln() 122n

i=1[yi(xi, )]2TheMLEofminimizesthesumofsquaresS() =n

i=1[yi(xi, )]2andisidentical totheLSEof . TheMLEandLSEof thevariance2, respectively, areshownbelow. 2=S()n=

ni=1[yi(xi,)]2ns2=S()n p=

ni=1[yi(xi,)]2n pThedierenceof thesestatisticsfromthoseof linearregressionmodel isthatS()isnolongeraquadraticfunctionoftheparameters,andthatitisnotpossibleanymoretowritedownaclosedformexpressionfortheestimator. Iterativeestimationschemesmustbeused. For a comprehensive discussion of the iterative methods, refer to the books by Gallant(1987),BatesandWatts(1988),SeberandWild(1989),andHuetetal. (1996).9.4InferenceintheNonlinearRegressionModel9.4.1TheNewton-RaphsonMethodofDeterminingtheMinimumofaFunc-tionConsiderminimizingafunctionf()withrespecttothep-dimensionalvector. f()canbe ln L(, 2|y)orS(). Asecond-orderTaylorseriesaroundtheoptimum,f()= f() +g()

( ) +12( )

G()( )46whereg()isthecolumnvectorofrstderivativesoff()withrespecttoandG()isthep pmatrixofsecondderivativesoff()alsoknownastheHessianmatrix.Dierentiatingtheapproximateequationyieldsg()= g() + G()( ) = G()( )Solvingforleadsto = [G()]1g()The equation suggests an iterative procedure in which we revise the current value to obtainarevisedvalueaccordingto = [G()]1g()However,theiterativeprocedureraisesadicultybecausetheHessianmatrixisevaluatedatthetrueoptimum, whichof courseisunknownattheoutset. ThesolutionadoptedbyNewton-RaphsonevaluatestheHessianatthecurrentvalue,onthegroundsthatthiswillprovideagoodapproximationifthecurrentvalueisreasonablyclosetotheoptimum.TheNewton-Raphsonprocedurerevisesthevaluesaccordingto = [G()]1g()The procedure progresses towards a minimum only if the Hessian matrix is positive denite.For concavefunctions, theHessiancanbeshowntobepositivedenite. Theprocedurebreaks downif thematrixGis singular, andmaygowrongtowards tomaximumif thematrixisnegativedenite.9.4.2 Application to Nonlinear Regression: Newton-Raphson and Gauss-NewtonMethodsApply the method to the negative log-likelihood function, ln L(). First and second deriva-tivesneedtobeevaluatedatthecurrentvalue. Sometimesthematrixofnegativesecondderivativesisreplacedbyitsexpectation,E_2ln L()

_= I()I() is called the informationmatrix. The information matrix is always positive denite,sotheproblemswithHessianmatricesareavoided. Theresultingrecursion,= + [I()]1 ln L()iscalledthemethodofscoring.For the least squares (or maximum likelihood with normal errors) estimation of parame-tersinnonlinearregressionmodels,thevectorg()andtheHessianmatrixcanbewrittenasg() =S()= 2n

i=1_i()_i()47G() =2S()

= 2n

i=1__i()__i()_

+2i()

i()_IgnoringthesecondderivativesintheHessianmatrix,therecursionssimplifyto= _n

i=1__i()__i()_

__1n

i=1_i()_i()This is knownas theGauss-Newtonmethod. Theestimationmethods, inalmost allsituations,willconvergetothesameestimates. Letusdenotethisestimateby.9.4.3StandardErrorsoftheMaximumLikelihoodEstimatesGeneral maximumlikelihoodtheoryestablishestheasymptoticnormalityof theestimates. The asymptotic covariance matrix of the MLEs is given by the inverse of the informationmatrix,i.e.,V ()= [I()]1SincetheinformationmatrixcanbeestimatedbytheHessianmatrixasbelow,2ln L()

=1222S()

=12n

i=1__i()__i()_

+2i()

i()_theasymptoticcovariancematrixofisestimatedbyV ( = 2_n

i=1__i()__i()_

+2i()

i()__1= 2_n

i=1_i()__i()_

_1Theunknownparameter2isreplacedbyitsLSE,s2.4810. RegressionModelsforTimeSeriesSituationsyt= 0 + 1xt1 + 2xt2 + + pxtp + twhere (yt, xt1, xt2, . . . , xtp) is the measures on the response yand the p explanatory variablesattimet.10.1ABriefIntroductiontoTimeSeriesModelsWeassumethattheobservationsbecomeavailableatequallyspacedtimeperiod.10.1.1First-OrderAutoregressiveModelInarst-orderautoregressivemodel,allcorrelationsamongobservationsonestepapartarethesame. Thelag1autocorrelationsareCorr(1, 2) = Corr(2, 3) = = Corr(k1, k) = = Corr(n1, n) = andthecovariancematrixofisV () = 2V= 2__1 2 n1 1 n22 1 n3...............n1n2n3 1__where || < 1. Notethatweexcludethecaseofperfectautocorrelation = 1.TheautocorrelationsofobservationskstepsapartarethenCorr(1, k+1) = Corr(2, k+2) = = Corr(tk, k) = = Corr(nk, n) = kThe autocorrelation function of errors that follow a rst-order autoregressive model exhibitsanexponentialdecay;thefartheraparttheobservations,theweakertheautocorrelation.Whydowecallthisanautoregressivemodel?Thiswillbecomeclearnowaswespecifythemodelthatimpliesthisparticularautocorrelationstructure. Considerthemodelt= t1 + atwhichregressesthecorrelatederrorattimet, t, onthepreviouserrort1. Theats,alsoreferredtoasawhitenoisesequenceorarandomshock,aretheusualuncorrelatederrors that satisfy the standard regression assumptions. The model above can be expandedthroughrepeatedsubstitution:t= t1 + at= at + (t2 + at1)= at + at1 + 2t2= at + at1 + 2(t3 + at2)= = at + at1 + 2at2 + 3at3 + 49The mean E(t) = 0 because the weights in the expansion converge to zero (since || < 1).TheautocorrelationsthatareimpliedbythisautoregressivemodelareexactlytheonesgivenbythematrixV . Bymultiplyingthemodelwithtkfork > 0,ttk= t1tk + atkandtakingtheexpectationonbothsideofthisequationleadstoCov(t, tk) = Cov(t1, tk)Dividingbothsidesbythetime-invariantvarianceV (t)leadstok= k1forall k > 0implyingthatk= k1= 2k2= = k0= kwhich are exactly the autocorrelations given by the matrix V . The variance of tis given byV (t) = V (at + at1 + 2at2 + 3at3 + ) = 2a[1 + 2+ 4+ 6+ ] =21 210.1.2RandomWalkModelTherandomwalkmodel is avariant of therst-order autoregressivemodel for when = 1. Thent= t1 + at= at + at1 + at2 + at3 + acumulativesumofallrandomshocksuptotimet.The rst-order AR model has a xed level, and in the model without a constant the leveliszero. Realizationsfromthemodelscatteraroundthexedlevelandsamplepathsdonotleavethislevel forlongperiods. Thisstationarybehaviourisdierentfromthatof therandomwalkbecausetherandomwalkdoesnothaveaxedlevel. Wecall suchamodelnonstationary.Dierencesofarandomwalk,wt= tt1= atare uncorrelated. Although the ts are nonstationary, their rst dierences wt are stationaryandwellbehaved. forarandomwalk,thesuccessivedierencesareinfactuncorrelated.10.1.3Second-OrderAutoregressiveModelThesecond-orderARmodel regressestheerrorattimetontheerrorattimet 1andt 2:t= 1t1 + 2t2 + atOnecanderivetheautocorrelationsthatthismodelimplies:1=11 250andtheremainingautocorrelationscanbeobtainedfromthedierenceequation,k= 1k1 + 2k2for k > 1As in the rst-order AR model, one needs to put restrictions on the autoregressive parametersforstationarityconditions. Theseparametersmustsatisfytheconstraints2 + 1< 1 , 21< 1 , 1 < 2< 110.1.4NoisyRandomWalkModelThenoisyrandomwalkmodeladdsanuncorrelatederror(noise)attoafractionoftherandomwalkthatweconsideredpreviously.t= (1 )[at1 + at2 + at3 + ] + at= (1 )

j=1atj + at1 < < 1Simplemanipulationshowsthatt= t1 + at at1. Thedierencebetweenetandet1thenequalstt1= atat1Successive dierences of a noisy random walk are linear combinations (or moving averages)of the present and the previous white noise errors. Since the ts are the result of integrating(summing) the dierences, the literature refers to the model as the integrated moving averagemodeloforder1(IMA(1,1)).ARmodelsandtheintegratedmodelsinvolvingdierencescanbeextended. BoxandJenkinsdiscussautoregressiveintegratedmovingaverage(ARIMA)modelsthatcon-tainautoregressivecomponents,dierences,andmovingaveragecomponents.10.2TheEectsofIgnoringtheAutocorrelationintheErrors10.2.1IneciencyofLeastSquaresEstimationy = X +whereV () = 2V . Thestandardleastsquaresestimator = (X

X)1X

yisstillunbiased,butnowwithcovariancematrixV () = 2(X

X)1X

V X(X

X)1Ontheotherhand, generalizedleastsquaresandtheGauss-MarkovresultshowsthattheGLSestimatorGLS= (X

V1X)1X

V1yhas the smallest variance among all linear unbiased estimators. The LSE is inecient becauseitscovariancematrixexceedsV (GLS= 2(X

V1X)1byapositivesemidenitematrix.5110.2.2SpuriousRegressionResultswhenNonstationaryErrorsareInvolvedInmanyobservational studieswithtimeseriesdata, boththeresponseandtheregressorvariablesarenonstationary, andinthissituation, thestandardregressionanalysisismis-leading.Withrandomvariables that followrandomwalks, the standarderror s.e.(i) grosslyunderestimatesthetruevariabilityoftheestimate. Consequently, thetratioi/s.e.(i)istoolarge, andthenull hypothesisisrejectedfartoooften. Thestandardanalysiserrsonthe side of nding spurious regression relationships; one is led incorrectly to conclude thatthere is a relationship when none is present. Equivalently, one could talk about the eect ofautocorrelationontheFratio,andhence,R2.Atheoretical explanationfor theinatedR2whenregressingtwoautocorrelatedbutindependent time series can be given. The variance of the correlation coecient rxybetweentwo independent but autocorrelated series x and ywith autocorrelation functions x(k) andy(k)isgivenbyV (rxy) n1

j=x(j)y(j)Ifx(j) = y(j) = |j|,thisvariancesimpliestoV (rxy) n11 + 21 2Iftheautoregressiveparameterislarge,indicatingthattheseriesapproachnonstationarity,theraio(1 + 2)/(1 2)canbecomequitelarge. Hence,itisnotuncommontoobtainanunusuallylargesamplecorrelationcoecient, oranunusuallylargeR2. Oneneedstobeonthealertandwatchforpossibleautocorrelationsintheresidualsatall timestoguardagainstacceptingspuriousregressions.10.3TheEstimationofCombinedRegressionTimeSeriesModelsARegressionModelwithFirst-OrderAutoregressiveErrors52