Statistical Analysis in Climate Research - The Library of...

21
Statistical Analysis in Climate Research Hans von Storch and Francis W. Zwiers

Transcript of Statistical Analysis in Climate Research - The Library of...

Page 1: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

ψψ

Statistical Analysis in Climate Research

Hans von Storchand Francis W. Zwiers

Page 2: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

The Pitt Building, Trumpington Street, Cambridge CB2 1RP, United Kingdom

CAMBRIDGE UNIVERSITY PRESS

The Edinburgh Building, Cambridge CB2 2RU, UK http://www.cup.cam.ac.uk40 West 20th Street, New York, NY 10011-4211, USA http://www.cup.org10 Stamford Road, Oakleigh, Melbourne 3166, Australia

c© Cambridge University Press 1999

This book is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place withoutthe written permission of Cambridge University Press.

First published 1999

Printed in the United Kingdom at the University Press, Cambridge

Typeset in Times 10/12pt [DBD]

A catalogue record for this book is available from the British Library

Library of Congress Cataloguing in Publication data

Storch, H. V. (Hans von), 1949–Statistical analysis in climate research / Hans von Storch and

Francis W. Zwiers.p. cm.

Includes index.ISBN 0 521 45071 31. Climatology – Statistical methods. I. Title.

QC981.S735 1998551.5’072–dc21 98-17416 CIP

ISBN 0 521 45071 3 hardback

Page 3: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

Contents

Preface ixThanks x

1 Introduction 11.1 The Statistical Description 11.2 Some Typical Problems and Concepts 2

I Fundamentals 17

2 Probability Theory 192.1 Introduction 192.2 Probability 202.3 Discrete Random Variables 212.4 Examples of Discrete Random Variables 232.5 Discrete Multivariate Distributions 262.6 Continuous Random Variables 292.7 Example of Continuous Random Variables 332.8 Random Vectors 382.9 Extreme Value Distributions 45

3 Distributions of Climate Variables 513.1 Atmospheric Variables 523.2 Some Other Climate Variables 63

4 Concepts in Statistical Inference 694.1 General 694.2 Random Samples 744.3 Statistics and Sampling Distributions 76

5 Estimation 795.1 General 795.2 Examples of Estimators 805.3 Properties of Estimators 845.4 Interval Estimators 905.5 Bootstrapping 93

II Confirmation and Analysis 95Overview 97

v

Page 4: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

vi CONTENTS

6 The Statistical Test of a Hypothesis 996.1 The Concept of Statistical Tests 996.2 The Structure and Terminology of a Test 1006.3 Monte Carlo Simulation 1046.4 On Establishing Statistical Significance 1066.5 Multivariate Problems 1086.6 Tests of the Mean 1116.7 Test of Variances 1186.8 Field Significance Tests 1216.9 Univariate Recurrence Analysis 1226.10 Multivariate Recurrence Analysis 126

7 Analysis of Atmospheric Circulation Problems 1297.1 Validating a General Circulation Model 1297.2 Analysis of a GCM Sensitivity Experiment 1317.3 Identification of a Signal in Observed Data 1337.4 Detecting the ‘CO2 Signal’ 136

III Fitting Statistical Models 141Overview 143

8 Regression 1458.1 Introduction 1458.2 Correlation 1468.3 Fitting and Diagnosing Simple Regression Models 1508.4 Multiple Regression 1608.5 Model Selection 1668.6 Some Other Topics 168

9 Analysis of Variance 1719.1 Introduction 1719.2 One Way Analysis of Variance 1739.3 Two Way Analysis of Variance 1819.4 Two Way ANOVA with Mixed Effects 1849.5 Tuning a Basin Scale Ocean Model 191

IV Time Series 193Overview 195

10 Time Series and Stochastic Processes 19710.1 General Discussion 19710.2 Basic Definitions and Examples 19910.3 Auto-regressive Processes 20310.4 Stochastic Climate Models 21110.5 Moving Average Processes 213

11 Parameters of Univariate and Bivariate Time Series 21711.1 The Auto-covariance Function 21711.2 The Spectrum 22211.3 The Cross-covariance Function 22811.4 The Cross-spectrum 23411.5 Frequency–Wavenumber Analysis 241

Page 5: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

CONTENTS vii

12 Estimating Covariance Functions and Spectra 25112.1 Non-parametric Estimation of the Auto-correlation Function 25212.2 Identifying and Fitting Auto-regressive Models 25512.3 Estimating the Spectrum 26312.4 Estimating the Cross-correlation Function 28112.5 Estimating the Cross-spectrum 282

V Eigen Techniques 289Overview 291

13 Empirical Orthogonal Functions 29313.1 Definition of Empirical Orthogonal Functions 29413.2 Estimation of Empirical Orthogonal Functions 29913.3 Inference 30113.4 Examples 30413.5 Rotation of EOFs 30513.6 Singular Systems Analysis 312

14 Canonical Correlation Analysis 31714.1 Definition of Canonical Correlation Patterns 31714.2 Estimating Canonical Correlation Patterns 32214.3 Examples 32314.4 Redundancy Analysis 327

15 POP Analysis 33515.1 Principal Oscillation Patterns 33515.2 Examples 33915.3 POPs as a Predictive Tool 34515.4 Cyclo-stationary POP Analysis 34615.5 State Space Models 350

16 Complex Eigentechniques 35316.1 Introduction 35316.2 Hilbert Transform 35316.3 Complex and Hilbert EOFs 357

VI Other Topics 367Overview 369

17 Specific Statistical Concepts in Climate Research 37117.1 The Decorrelation Time 37117.2 Potential Predictability 37417.3 Composites and Associated Correlation Patterns 37817.4 Teleconnections 38217.5 Time Filters 384

18 Forecast Quality Evaluation 39118.1 The Skill of Categorical Forecasts 39218.2 The Skill of Quantitative Forecasts 39518.3 The Murphy–Epstein Decomposition 39918.4 Issues in the Evaluation of Forecast Skill 40218.5 Cross-validation 405

Page 6: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

viii CONTENTS

VII Appendices 407

A Notation 409

B Elements of Linear Analysis 413

C Fourier Analysis and Fourier Transform 416

D Normal Density and Cumulative Distribution Function 419

E The χ2 Distribution 421

F Student’s t Distribution 423

G The F Distribution 424

H Table-Look-Up Test 431

I Critical Values for the Mann–Whitney Test 437

J Quantiles of the Squared-ranks Test Statistic 443

K Quantiles of the Spearman Rank Correlation Coefficient 446

L Correlations and Probability Statements 447

M Some Proofs of Theorems and Equations 451

References 455

Page 7: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1 Introduction

1.1 The Statistical Description andUnderstanding of Climate

Climatology was originally a sub-discipline ofgeography, and was therefore mainly descriptive(see, e.g., Bruckner [70], Hann [155], or Hannand Knoch [156]). Description of the climateconsisted primarily of estimates of its mean stateand estimates of its variability about that state,such as its standard deviations and other simplemeasures of variability. Much of climatology isstill focused on these concerns today. The mainpurpose of this description is to define ‘normals’and ‘normal deviations,’ which are eventuallydisplayed as maps. These maps are then usedfor regionalization (in the sense of identifyinghomogeneous geographical units) and planning.The paradigm of climate research evolved fromthe purely descriptive approach towards anunderstanding of the dynamics of climate with theadvent of computers and the ability to simulate theclimatic state and its variability. Statistics plays animportant role in this new paradigm.

The climate is a dynamical system influencednot only by immense external factors, such as solarradiation or the topography of the surface of thesolid Earth, but also by seemingly insignificantphenomena, such as butterflies flapping theirwings. Its evolution is controlled by more orless well-known physical principles, such as theconservation of angular momentum. If we knewall these factors, and the state of the full climatesystem (including the atmosphere, the ocean, theland surface, etc.), at a given time in full detail,then there would not be room for statisticaluncertainty, nor a need for this book. Indeed, if werepeat a run of a General Circulation Model, whichis supposedly amodelof the real climate system,on the same computer with exactly the same code,operating system, and initial conditions, we obtaina second realization of the simulated climate thatis identical to the first simulation.

Of course, there is a ‘but.’ We do not knowall factors that control the trajectory of climate in

its enormously large phase space.1 Thus it is notpossible to map the state of the atmosphere, theocean, and the other components of the climatesystem in full detail. Also, the models are notdeterministic in a practical sense: an insignificantchange in a single digit in the model’s initialconditions causes the model’s trajectory throughphase space to diverge quickly from the originaltrajectory (this is Lorenz’s [260] famous discovery,which leads to the concept of chaotic systems).

Therefore, in a strict sense, we have a‘deterministic’ system, but we do not havethe ability to analyse and describe it with‘deterministic’ tools, as in thermodynamics.Instead, we use probabilistic ideas and statistics todescribe the ‘climate’ system.

Four factors ensure that the climate system isamenable to statistical thinking.

• The climate is controlled by innumerablefactors. Only a small proportion of thesefactors can be considered, while the restare necessarily interpreted as backgroundnoise. The details of the generation of this‘noise’ are not important, but it is importantto understand that this noise is aninternalsource of variation in the climate system(see also the discussion of ‘stochastic climatemodels’ in Section 10.4).

• The dynamics of climate are nonlinear.Nonlinear components of thehydrodynamicpart include important advective terms, suchas u ∂u

∂x . The thermodynamicpart containsvarious other nonlinear processes, includingmany that can be represented by stepfunctions (such as condensation).

1We use the expression ‘phase space’ rather casually. Itis the space spanned by the state variablesx of a systemdxdt = f (x). In the case of the climate system, the statevariables consist of the collection of all climatic variables atall geographic locations (latitude, longitude, height/depth). Atany given time, the state of the climate system is represented byone point in this space; its development in time is representedby a smooth curve (‘trajectory’).This concept deviates from the classical mechanical definitionwhere the phase space is the space of generalized coordinates.Perhaps it would be better to use the term ‘state space.’

1

Page 8: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

2 1: Introduction

• The dynamics include linearly unstableprocesses, such as the baroclinic instability inthe midlatitude troposphere.

• The dynamics of climate are dissipative. Thehydrodynamic processes transport energyfrom large spatial scales to small spatialscales, while molecular diffusion takes placeat the smallest spatial scales. Energy isdissipated through friction with the solidearth and by means of gravity wave drag atlarger spatial scales.2

The nonlinearities and the instabilities makethe climate systemunpredictablebeyond certaincharacteristic times. These characteristic timescales are different for different subsystems, suchas the ocean, midlatitude troposphere, and tropicaltroposphere. The nonlinear processes in the systemamplify minor disturbances, causing them toevolve irregularly in a way that allows theirinterpretation as finite-amplitude noise.

In general, the dissipative character of thesystem guarantees its ‘stationarity.’ That is, it doesnot ‘run away’ from the region of phase space thatit currently occupies, an effect that can happen ingeneral nonlinear systems or in linearly unstablesystems. The two factors, noise and damping,are the elements required for the interpretation ofclimate as a stationary stochastic system (see alsoSection 10.4).

Under what circumstances should the outputof climate models be considered stochastic? Amajor difference between the real climate and anyclimate model is the size of the phase space. Thephase space of a model is much smaller than that ofthe real climate system because the model’s phasespace is truncated in both space and time. That is,the background noise, due to unknown factors, ismissing. Therefore a model run can be repeatedwith identical results, provided that the computingenvironment is unchanged and the same initialconditions are used. To make the climate modeloutput realistic we need to make the modelunpredictable. Most Ocean General CirculationModels are strongly dissipative and behave almostlinearly. Explicit noise must therefore be addedto the system as an explicit forcing term tocreate statistical variations in the simulated system(see, for instance [276] or [418]). In dynamicalatmospheric models (as opposed to energy-balancemodels) the nonlinearities are strong enough to

2The gravity wave drag maintains an exchange ofmomentum between the solid earth and the atmosphere, whichis transported by means of vertically propagating gravity waves.See McFarlane et al. [269] for details.

create their own unpredictability. These modelsbehave in such a way that a repeated run willdiverge quickly from the original run even if onlyminimal changes are introduced into the initialconditions.

1.1.1 The Paradigms of the Chaotic andStochastic Model of Climate. In the paradigmof the chaotic model of the climate, andparticularly the atmosphere, a small differenceintroduced into the system at someinitial timecauses the system to diverge from the trajectory itwould otherwise have travelled. This is the famousButterfly Effect3 in which infinitesimally smalldisturbances may provoke large reactions. In termsof climate, however, there is not justone smalldisturbance, but myriads of such disturbances atall times. In the metaphor of the butterfly: thereare millions of butterflies that flap their wings allthe time. The paradigm of the stochastic climatemodel is that this omnipresent noise causes thesystem to vary on all time and space scales,independently of the degree of nonlinearity of theclimate’s dynamics.

1.2 Some Typical Problems andConcepts

1.2.0 Introduction. The following examples,which we have subjectively chosen as beingtypical of problems encountered in climateresearch, illustrate the need for statistical analysisin atmospheric and climatic research. The orderof the examples is somewhat random and it iscertainly not a must to read all of them; the purposeof this ‘potpourri’ is to offer a flavour of typicalquestions, answers, and errors.

1.2.1 The Mean Climate State: Interpretationand Estimation. From the point of view ofthe climatologist, the most fundamental statisticalparameter is the mean state. This seemingly trivialanimal in the statistical zoo has considerablecomplexity in the climatological context.

First, the computed mean is not entirely reliableas an estimate of the climate system’s true long-term mean state. The computed mean will containerrors caused by taking observations over a limitedobserving period, at discrete times and a finitenumber of locations. It may also be affectedby the presence of instrumental, recording, and

3Inaudil et al. [194] claimed to have identified a Lausannebutterfly that caused a rainfall in Paris.

Page 9: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 3

Figure 1.1:The 300 hPa geopotential height fields in the Northern Hemisphere: the mean 1967–81January field, the January 1971 field, which is closer to the mean field than most others, and the January1981 field, which deviates significantly from the mean field. Units: 10 m [117].

transmission errors. In addition, reliability is notlikely to be uniform as a function of location.

Reliability may be compromised if the data hasbeen ‘analysed’, that is, interpolated to a regulargrid using techniques that make assumptionsabout atmospheric dynamics. The interpolation isperformed eithersubjectivelyby someone whohas experience and knowledge of the shape ofdynamical structures typically observed in theatmosphere, or it is performedobjectivelyusing acombination of atmospheric and statistical models.Both kinds of analysis are apt to introduce biasesnot present in the ‘raw’ station data, and errorsat one location in analysed data will likely becorrelated with those at another. (See Daley [98]or Thiebaux and Pedder [362] for comprehensivetreatments of objective analysis.)

Second, the mean state isnot a typical state.To demonstrate this we consider the JanuaryNorthern Hemisphere 300 hPa geopotential heightfield4 (Figure 1.1). The mean January height field,obtained by averaging monthly mean analyses foreach January between 1967 and 1981, has contoursof equal height which are primarily circular withminor irregularities. Two troughs are situated overthe eastern coasts of Siberia and North America.The Siberian trough extends slightly farther souththan the North American trough. A secondarytrough can be identified over eastern Europe andtwo minor ridges are located over the northeastPacific and the east Atlantic.

4The geopotential height fieldis a parameter that isfrequently used to describe the dynamical state of theatmosphere. It is the height of the surface of constant pressureat, e.g., 300 hPa and, being a length, is measured in metres. Wewill often simply refer to ‘height’ when we mean ‘geopotentialheight’.

Some individual January mean fields (e.g.,1971) are similar to the long-term mean field.There are differences in detail, but they sharethe zonal wavenumber 2 pattern5 of the meanfield. The secondary ridges and troughs havedifferent intensities and longitudinal phases. OtherJanuaries (e.g., 1981) 300 hPa geopotential heightfields are very different from the mean state. Theyare characterized by a zonal wavenumber 3 patternrather than a zonal wavenumber 2 pattern.

The long-term mean masks a great deal ofinterannual variability. For example, the minimumof the long-term mean field is larger than theminima of all but one of the individual Januarystates. Also, the spatial variability of each of theindividual monthly means is larger than that of thelong-term mean. Thus, the long-term mean field isnot a ‘typical’ field, as it is very unlikely to beobserved as an individual monthly mean. In thatsense, the long-term mean field is a rare event.

Characterization of the ‘typical’ January re-quires more than the long-term mean. Specifically,it is necessary to describe the dominant patternsof spatial variability about the long-term mean andto say something about the range of patterns oneis likely to see in a ‘typical’ January. This can beaccomplished to a limited extent through the use ofa technique calledEmpirical Orthogonal Functionanalysis(Chapter 13).

Third, a climatological mean should be under-stood to be a moving target. Today’s climate isdifferent from that which prevailed during theHolocene (6000 years before present) or evenduring the Little Ice Age a few hundred years ago.

5A zonal wavenumber 2 pattern contains two ridges and twotroughs in the zonal, or east–west, direction.

Page 10: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

4 1: Introduction

We therefore need a clear understanding ofour interpretation of the ‘true’ mean state beforeinterpreting an estimate computed from a set ofobservations.

To accomplish this, it is necessary to think ofthe ‘January 300 hPa height field’ as arandomfield, and we need to determine whether theobserved height fields in our 15-year sample arerepresentative of the ‘true’ mean state we have inmind (presumably that of the ‘current’ climate).From a statistical perspective, the answer is aconditional ‘yes,’ provided that:

1 the time series of January mean 300 hPaheight fields is stationary (i.e., their statisticalproperties do not drift with time), and

2 the memory of this time series is short relativeto the length of the 15-year sample.

Under these conditions, the mean state isrepresentative of the random sample, in the sensethat it lies in the ‘centre’ of the scatter of theindividual points in the state space. As we notedabove, however, it is not representative in manyother ways.

The characteristics of the 15-year sample maynot be representative of the properties of Januarymean 300 hPa height fields on longer time scaleswhen assumption 1 is not satisfied. The uncertaintyof the 15-year mean height field as an estimatorof the long-term mean will be almost as greatas the interannual variability of the individualJanuary means when assumption 2 is not satisfied.We can have confidence in the 15-year meanas an estimator of the long-term mean January300 hPa height field when assumptions 1 and 2hold in the following sense: thelaw of largenumbersdictates that a multi-year mean becomesan increasingly better estimator of the long-termmean as the number of years in the sampleincreases. However, there is still a considerableamount of uncertainty in an estimate based on a15-year sample.

Statements to the effect that a certain estimateof the mean is ‘wrong’ or ‘right’ are often madein discussions of data sets and climatologies. Suchan assessment indicates that the speakers do notreally understand the art of estimation. An estimateis by definition an approximation, or guess, basedon the available data. It is almost certain that theexact value will never be determined. Thereforeestimates are never ‘wrong’ or ‘right;’ rather, someestimates will be closer to the truth than others onaverage.

To demonstrate the point, consider the followingtwo procedures for estimating the long-term meanJanuary air pressure in Hamburg (Germany). Twodata sets, consisting of 104 observations each, areavailable. The first data set is taken at one minuteintervals, the second is taken at weekly intervals,and a mean is computed from each. Both meansare estimates of the long-term mean air pressure inHamburg, and each tells us something about ourparameter.

The reliability of the first estimate is question-able because air pressure varies on time scalesconsiderably longer than the 104 minutes spannedby the data set. Nonetheless, the estimate doescontain information useful to someone who hasno prior information about the climate of locationsnear sea level: it indicates that the mean airpressure in Hamburg is neither 2000 mb nor 20 hPabut somewhere near 1000 mb.

The second data set provides us with amuch more reliable estimate of long-term meanair pressure because it contains 104 almostindependent observations of air pressure spanningtwo annual cycles. The first estimate is not‘wrong,’ but it is not very informative; the secondis not ‘right,’ but it is adequate for many purposes.

1.2.2 Correlation. In the statistical lexicon,the word correlation is used to describe alinear statisticalrelationship between two randomvariables. The phrase ‘linear statistical’ indicatesthat the mean of one of the random variables islinearly dependent upon the random componentof the other (see Section 8.2). The stronger thelinear relationship, the stronger the correlation.A correlation coefficient of+1 (−1) indicates apair of variables that vary together precisely, onevariable being related to the other by means of apositive (negative) scaling factor.

While this concept seems to be intuitivelysimple, it does warrant scrutiny. For example,consider a satellite instrument that makes radianceobservations in two different frequency bands.Suppose that these radiometers have been designedin such a way that instrumental error in onechannel is independent of that in the other. Thismeans that knowledge of the noise in one channelprovides no information about that in the other.However, suppose also that the radiometers drift(go out of calibration) together as they age becauseboth share the same physical environment, sharethe same power supply and are exposed to the samephysical abuse. Reasonable models for the totalerror as a function of time in the two radiometer

Page 11: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 5

Figure 1.2:The monthly mean Southern Oscillation Index, computed as the difference between Darwin(Australia) and Papeete (Tahiti) monthly mean sea-level pressure (‘Jahr’ is German for ‘year’).

Figure 1.3:Auto-correlation function of the index shown in Figure 1.2. Units: %.

channels might be:

e1t = α1(t − t0)+ ε1t ,

e2t = α2(t − t0)+ ε2t ,

where t0 is the launch time of the satellite andα1 andα2 are fixed constants describing the ratesof drift of the two radiometers. The instrumentalerrors,ε1t andε2t , are statistically independent ofeach other, implying that the correlation betweenthe two, ρ(ε1t , ε2t ), is zero. Consequently thetotal errors, e1t and e2t , are also statisticallyindependent even though they share a commonsystematic component. However, simple estimatesof correlation betweene1t and e2t that do notaccount for the deterministic drift will suggest thatthese two quantities are correlated.

Correlations manifest themselves in several dif-ferent ways in observed and simulated climates.Several adjectives are used to describe corre-lations depending upon whether they describerelationships in time (serial correlation, laggedcorrelation), space (spatial correlation, telecon-nection), or between different climate variables(cross-correlation).

A good example ofserial correlation is themonthly Southern Oscillation Index (SOI),6 which

6The Southern Oscillation is the major mode of naturalclimate variability on the interannual time scale. It is frequentlyused as an example in this book.It has been known since the end of the last century(Hildebrandson [177]; Walker, 1909–21) that sea-level pressure(SLP) in the Indonesian region is negatively correlated with thatover the southeast tropical Pacific. A positive SLP anomaly

Page 12: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

6 1: Introduction

is defined as the anomalous monthly meanpressure difference between Darwin (Australia)and Papeete (Tahiti) (Figure 1.2).

The time series is basically stationary, althoughvariability during the first 30 years seems to besomewhat weaker than that of late. Despite thenoisy nature of the time series, there is a distincttendency for the SOI to remain positive or negativefor extended periods, some of which are indicatedin Figure 1.2. This persistence in the sign of theindex reflects the serial correlation of the SOI.

A quantitative measure of the serial correlationis the auto-correlation function, ρSO I(t, t + 1),shown in Figure 1.3, which measures the similarityof the SOI at any time difference1. The auto-correlation is greater than 0.2 for lags up toabout six months and varies smoothly around zerowith typical magnitudes between 0.05 and 0.1for lags greater than about a year. This tendencyof estimated auto-correlation functions not toconverge to zero at large lags, even though thereal auto-correlation is zero at long lags, is anatural consequence of the uncertainty due to finitesamples (see Section 11.1).

A good example of across-correlationis therelationship that exists between the SOI andvarious alternative indices of the Southern Os-cillation [426]. The characteristic low-frequencyvariations in Figure 1.2 are also present in area-averaged Central Pacific sea-surface temperature(Figure 1.4).7 The correlation between the twotime series displayed in Figure 1.4 is 0.67.

Pattern analysis techniques, such as Empiri-cal Orthogonal Function analysis (Chapter 13),Canonical Correlation Analysis (Chapter 14) andPrincipal Oscillation Patterns (Chapter 15), relyupon the assumption that the fields under study are

(i.e., a deviation from the long-term mean) over, say, Darwin(Northern Australia) tends to be associated with a negativeSLP anomaly over Papeete (Tahiti). This seesaw is calledthe Southern Oscillation (SO). The SO is associated withlarge-scale and persistent anomalies of sea-surface temperaturein the central and eastern tropical Pacific (El Nino andLa Nina). Hence the phenomenon is often referred to asthe ‘El Nino/Southern Oscillation’ (ENSO). Large zonaldisplacements of the centres of precipitation are also associatedwith ENSO. They reflect anomalies in the location and intensityof the meridionally (i.e., north–south) oriented Hadley cell andof the zonally oriented Walker cell.The state of the Southern Oscillation may be monitored with themonthly SLP difference between observations taken at surfacestations in Darwin, Australia and Papeete, Tahiti. It has becomecommon practice to call this difference the Southern OscillationIndex (SOI) although there are also many other ways to defineequivalent indices [426].

7Other definitions, such as West Pacific rainfall, sea-levelpressure at Darwin alone or the surface zonal wind in the centralPacific, also yield indices that are highly correlated with theusual SOI. See Wright [427].

spatially correlated. The Southern Oscillation In-dex (Figure 1.2) is a manifestation of the negativecorrelation between surface pressure at Papeeteand that at Darwin. Variables such as pressure,height, wind, temperature, and specific humidityvary smoothly in the free atmosphere and con-sequently exhibit strong spatial interdependence.This correlation is present in each weather map(Figure 1.5, left). Indeed, without this feature,routine weather forecasts would be all but impos-sible given the sparseness of the global observingnetwork as it exists even today. Variables derivedfrom moisture, such as cloud cover, rainfall andsnow amounts, and variables associated with landsurface processes tend to have much smaller spa-tial scales (Figure 1.5, right), and also tend not tohave normal distributions (Sections 3.1 and 3.2).While mean sea-level pressure (Figure 1.5, left)will be more or less constant on spatial scales oftens of kilometres, we may often travel in and outof localized rain showers in just a few kilometres.This dichotomy is illustrated in Figure 1.5, wherewe see a cold front over Ontario (Canada). Theleft panel, which displays mean sea-level pressure,shows the front as a smooth curve. The right paneldisplays a radar image of precipitation occurringin southern Ontario as the front passes through theregion.

1.2.3 Stationarity, Cyclo-stationarity, and Non-stationarity. An important concept in statisticalanalysis isstationarity. A random variable, or arandom process, is said to be stationary if allof its statistical parameters are independent oftime. Most statistical techniques assume that theobserved process is stationary.

However, most climate parameters that aresampled more frequently than one per year arenot stationary butcyclo-stationary, simply becauseof the seasonal forcing of the climate system.Long-term averages of monthly mean sea-levelpressure exhibit a marked annual cycle, which isalmost sinusoidal (with one maximum and oneminimum) in most locations. However, there arelocations (Figure 1.6) where the annual cycle isdominated by asemiannualvariation (with twomaxima and minima). In most applications themean annual cycle is simply subtracted from thedata before the remaininganomaliesare analysed.The process iscyclo-stationary in the meanif it isstationary after the annual cycle has been removed.

Other statistical parameters (e.g., the percentilesof rainfall) may also exhibit cyclo-stationarybehaviour. Figure 1.7 shows the annual cycles

Page 13: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 7

Figure 1.4:The conventional Southern Oscillation Index (SOI = pressure difference between Darwinand Tahiti; dashed curve) and a sea-surface temperature (SST) index of the Southern Oscillation (solidcurve) plotted as a function of time. The conventional SOI has been doubled in this figure.

Figure 1.5:State of the atmosphere over North America on 23 May 1992.Left: Analysis of the sea-level pressure field (12:00 UTC (Universal Time Coordinated); fromEuropaisher Wetterbericht 17, Band 144; with permission of the Deutsher Wetterdienst).Right: Weather radar image, showing rainfall rates, for southern Ontario (19:30 local time; courtesyPaul Joe, AES Canada [94].)Note that the radar image and the weather map refer to different times, namely 12:00 UTC on 23 Mayand 00:30 UTC on 24 May.

of the 70th, 80th, and 90th percentiles8 of 24-hour rainfall amounts for each calendar month at

8Or ‘quantiles,’ that is, thresholds selected so that 70%,80%, or 90% of all 24-hour rainfall amounts are less than therespective threshold [2.6.4].

Vancouver (British Columbia) and Sable Island(off the coast of Nova Scotia) [450].

The Southern Oscillation Index is not strictlystationary. Wright [427] showed that the linearserial correlation of the SOI depends upon the time

Page 14: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

8 1: Introduction

Figure 1.6:Annual cycle of sea-level pressure at extratropical locations.a) Northern Hemisphere Ocean Weather Stations: A =62◦ N, 33◦ W; D = 44◦ N, 41◦ W; E = 35◦ N,48◦ W; J = 52◦ N, 25◦ W; P = 50◦ N, 145◦ W.b) Southern Hemisphere.

Figure 1.7: Monthly 90th, 80th, and 70th per-centiles (from top to bottom) of 24-hour rainfallamounts at Vancouver and Sable Island [450].

of the year. The serial correlation is plotted as afunction of time of year and lag in Figure 1.8.Correlations between values of the SOI in Mayand values in subsequent months decay slowlywith increasing lag, while similar correlations withvalues in April decay quickly. Because of thisbehaviour, Wright defined an ENSO year thatbegins in May and ends in April.

Regular observations taken over extendedperiods at a certain station sometimes exhibitchanges in their statistical properties. These mightbe abrupt or gradual (such as changes that mightoccur when the exposure of a rain gauge changesslowly over time, as a consequence of the growthof vegetation or changes in local land use). Abrupt

Figure 1.8: Seasonal dependence of the lagcorrelations of the SST index of the SouthernOscillation. The correlations are given in hundredsso that isolines represent lag correlations of 0.8,0.6, 0.4, and 0.2. The row labelled ‘Jan’ listscorrelations between January values of the indexand the index observed later ‘lag’ months [427].

changes in the observational record may takeplace if the instrument (or the observer) changes,the site is moved,9 or recording practices arechanged. Such non-natural or artificial changes are

9Karl et al. [213] describe a case in which a precipitationgauge recorded significantly different values after being raisedone metre from its original position.

Page 15: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 9

Figure 1.9:Annual mean daily minimum temper-ature time series at two neighbouring sites inQuebec. Sherbrooke has experienced considerableurbanization since the beginning of the centurywhereas Shawinigan has maintained more of itsrural character.Top: The raw records. The abrupt drop of severaldegrees in the Sherbrooke series in 1963 reflectsthe move of the instrument from downtown Sher-brooke to its suburban airport. The reason forthe downward dip before 1915 in the Shawiniganrecord is unknown.Bottom: Corrected time series for Sherbrookeand Shawinigan. The Sherbrooke data from 1963onward are increased by3.2◦C. The straight linesare trend lines fitted to the corrected Sherbrookedata and the 1915–90 Shawinigan record.Courtesy L. Vincent, AES Canada.

called inhomogeneities. An example is containedin the temperature records of Sherbrooke andShawinigan (Quebec) shown in the upper panelof Figure 1.9. The Sherbrooke observing sitewas moved from a downtown location to asuburban airport in 1963—and the recordedtemperature abruptly dropped by more than 3◦C.The Shawinigan record may also be contaminatedby observational errors made before 1915.

Geophysical time series often exhibit a trend.Such trends can originate from various sources.One source is urbanization, that is, the increasingdensity and height of buildings around an obser-vation location and the corresponding changes inthe properties of the land surface. The temper-ature at Sherbrooke, a location heavily affectedby development, exhibits a marked upward trendafter correction for the systematic change in 1963

(Figure 1.9, bottom). This temperature trend ismuch weaker for the neighbouring Shawinigan,perhaps due to a weaker urbanization effect at thatsite or natural variations of the climate system.Both temperature trends at Sherbrooke and Shaw-inigan are real, not observational artifacts. Thestrong trend at Sherbrooke must not be mistakenfor an indication ofglobal warming.

Trends in the large-scale state of the climatesystem may reflect systematic forcing changesof the climate system (such as variations in theEarth’s orbit, or increased CO2 concentrationin the atmosphere) or low-frequency internallygenerated variability of the climate system. Thelatter may be deceptive because low-frequencyvariability, on short time series, may be mistakenlyinterpreted as trends. However, if the length ofsuch time series is increased, a metamorphosisof the former ‘trend’ takes place and it becomesapparent that the trend is a part of the naturalvariation of the system.10

1.2.4 Quality of Forecasts. The Old Farmer’sAlmanacpublishes regular outlooks for the climatefor the coming year. The method used to preparethese outlooks is kept secret, and scientistsquestion the existence of skill in the predictions.To determine whether these skeptics are right orwrong, measures of the skill of the forecastingscheme are needed. Theseskill scorescan be usedto compare forecasting schemes objectively.

The Almanac makescategorical forecasts offuture temperature and precipitation amount intwo categories, ‘above’ or ‘below’ normal. Asuitable skill score in this case is the number ofcorrect forecasts. Trivial forecasting schemes suchas persistence (no change), climatology, or purechance can be used as reference forecasts if noother forecasting scheme is available. Once wehave counted the number of correct forecasts madewith both the tested and the reference schemes, wecan estimate the improvement (or degradation) offorecast skill by computing the difference in thecounts. Relatively simple probabilistic methodscan be used to make a judgement about the

10This is an example of the importance of time scalesin climate research, an illustration that our interpretation ofa given process depends on the time scales considered. Ashort-term trend may be just another swing in a slowly varyingsystem. An example is the Madden-and-Julian Oscillation(MJO, [264]), which is the strongest intra-seasonal mode in thetropical troposphere. It consists of a wavenumber 1 pattern thattravels eastward round the globe. The MJO has a mean periodof 45 days and has significant memory on time scales of weeks;on time scales of months and years, however, the MJO has notemporal correlation.

Page 16: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

10 1: Introduction

Figure 1.10: Correlation skill scores for threeforecasts of the low-frequency variations withinthe Southern Oscillation Index (Figure 1.2). Ascore of 1 indicates a perfect forecast, while a zeroindicates a forecast unrelated to the predictand[432].

significanceof the change. We will return to theOld Farmer’s Almanacin Section 18.1.

Now consider another forecasting schemein which quantitative rather than categoricalstatements are made. For example, a forecastmight consist of a statement such as:‘the SOIwill be x standard deviations above normal nextwinter.’ One way to evaluate such forecasts is touse a measure called thecorrelation skill scoreρ (Chapter 18). A score ofρ = 1 correspondswith a perfect forecasting scheme in the sense thatforecast changes exactly mirror SOI changes eventhough the dynamic range of the forecast may bedifferent from that of the SOI. In other words,the correlation skill score is one when there isan exact linear relationship between forecasts andreality. Forecasts that are (linearly) unrelated to thepredictand yield zero correlation.

The correlation skill score for several methodsof forecasting the SOI are displayed in Figure 1.10.Specifically, persistence forecasts (Chapter 18),POP forecasts (Chapter 15), and forecasts madewith a univariate linear time series model(Chapters 11 and 12). Forecasts based onpersistence and the univariate time series modelare superior at one and two month lead times. ThePOP forecast becomes more skilful beyond thattime scale.

Regretfully, forecasting schemes generally donot have the same skill under all circumstances.The skill often exhibits a marked annual cycle

(e.g., skill may be high during the dry season, andlow during the wet season). The skilfulness of aforecast also often depends on the low-frequencystate of the atmospheric flow (e.g., blockingor westerly regime). Thus, in most forecastingproblems there are physical considerations (statedependence and the memory of the system) thatmust be accounted for when using statistical toolsto analyse forecast skill. This is done eitherby conducting a statistical analysis of skill thatincorporates the effects of state dependence andserial correlation, or by using physical intuitionto temper the precise interpretation of a simpleranalysis that compromises the assumptions ofstationarity and non-correlation.

There are various pitfalls in the art of forecastevaluation. An excellent overview is given byLivezey [255], who presents various examples inwhich forecast skill is overestimated. Chapter 18is devoted to the art of forecast evaluation.

1.2.5 Characteristic Times and CharacteristicSpatial Patterns. What are the temporal char-acteristics of the Southern Oscillation Index illus-trated in Figure 1.2? Visual inspection suggeststhat the time series is dominated by at least twotime scales: a high frequency mode that describesmonth-to-month variations, and a low-frequencymode associated with year-to-year variations. Howcan one objectively quantify these characteristictimes and the amount of variance attributed tothese time scales? The appropriate tool is referredto as time series analysis (Chapters 10 and 11).

Indices, such as the SOI, are commonly usedin climate research to monitor the temporaldevelopment of a process. They can be thoughtof as filters that extract physical signals from amultivariate environment. In this environment thesignal is masked by both spatial and temporalvariability unrelated to the signal, that is, by spatialand temporal noise.

The conventional approach used to identifyindices is largely subjective. The characteristic pat-terns of variation of the process are identified andassociated with regions or points. Correspondingareal averages or point values are then used toindicate the state of the process.

Another approach is to extract characteristicpatterns from the data by means of analyticaltechniques, and subsequently use the coefficientsof these patterns as indices. The advantagesof this approach are that it is based onan objective algorithm and that it yields thecharacteristic patterns explicitly.Eigentechniquessuch as Empirical Orthogonal Function (EOF)

Page 17: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 11

Figure 1.11: Empirical Orthogonal Functions(EOFs; Chapter 13) of monthly mean wind stressover the tropical Pacific [394].a,b) The first two EOFs. The two patterns arespatially orthogonal.c) Low-frequency filtered coefficient time seriesof the two EOFs shown in a,b). The solid curvecorresponds to the first EOF, which is displayed inpanel a). The two curves are orthogonal.

analysis and Principal Oscillation Pattern (POP)analysis are tools that can be used to definepatterns and indices objectively (Chapters 13 and15).

An example is the EOF analysis of monthlymean wind stress over the tropical Pacific [394].The first two EOFs, shown in Figure 1.11aand Figure 1.11b, are primarily confined to theequator. The two fields are (by construction)orthogonal to each other. Figure 1.11c shows thetime coefficients of the two fields. An analysis ofthe coefficient time series, using the techniquesof cross-spectral analysis (Section 11.4), showsthat they vary coherently on a time scaleT ≈2 to 3 years. One curve leads the other by a timelag of approximatelyT/4 years. The temporal lag-relationship of the time coefficients together withthe spatial quadrature leads to the interpretationthat the two patterns and their time coefficientsdescribe an eastward propagating signal that,

Figure 1.12:A schematic representation of thespatial distributions of simultaneous SST and SLPanomalies at Northern Hemisphere midlatitudes inwinter, when the SLP anomaly induces the SSTanomaly (top), and when the SST anomaly excitesthe SLP anomaly (bottom).The large arrows represent the mean atmosphericflow. The ‘L’ is an atmospheric low-pressuresystem connected with geostrophic flow indicatedby the circular arrow. The hatching representswarm (W) and cool (C) SST anomalies [438].

in fact, may be associated with the SouthernOscillation.

1.2.6 Pairs of Characteristic Patterns. Almostall climate components are interrelated. When onecomponent exhibits anomalous conditions, therewill likely be characteristic anomalies in othercomponents at the same time. The relative shapesof the patterns in related climate components areoften indicative of the processes that dominate thecoupling of the components.

To illustrate this idea we consider large-scaleair–sea interactions on seasonal time scales atmidlatitudes in winter [438] [312]. Figure 1.12

Page 18: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

12 1: Introduction

illustrates the two mechanisms that might beinvolved in air–sea interactions in the NorthAtlantic. The lower panel illustrates how a sea-surface temperature (SST) anomaly pattern mightinduce a simultaneous sea-level pressure (SLP)anomaly pattern. The argument is linear so wemay assume that the SST anomaly is positive. Thispositive SST anomaly enhances the sensible andlatent heat fluxes into the atmosphere above anddownstream of the SST anomaly. Thus SLP isreduced in that area and anomalous cyclonic flowis induced.

The upper panel of Figure 1.12 illustrates howa SLP anomaly might induce an anomalous SSTpattern. The anomalous SLP distribution alters thewind stress across the region by creating strongerzonal winds in the southwest part of the anomalouscyclonic circulation and weaker zonal winds inthe northeast sector. This configuration inducesanomalous mixing of the ocean’s mixed layer andanomalous air–sea fluxes of sensible and latentheat (cf. [3.2.3]). Stronger winds intensify mixingand enhance the upward heat flux whereas weakerwinds correspond to reduced mixing and weakervertical fluxes. The result is anomalous coolingof the sea surface in the southwest sector andanomalous heating in the northeast sector of thecyclonic circulation.

One strategy for finding out which of thetwo proposed mechanisms dominates air–seainteraction is to identify the dominant patterns inSST and SLP that tend to occur simultaneously.This can be accomplished by performing aCanonical Correlation Analysis(CCA, Chapter14). In the CCA two vector variablesEX and EYare considered, and sets of orthogonal patternsEp i

X and Ep iY are constructed so that the expansion

coefficientsαxi and αy

j in EX = ∑i α

xi Ep i

X and

EY = ∑j α

yj Ep j

Y are optimally correlated fori = jor uncorrelated fori 6= j .

Zorita, Kharin, and von Storch [438] appliedCCA to winter (DJF) mean anomalies of NorthAtlantic SST and SLP and found two pairsof CCA patterns Ep i

SST and Ep jSL P that were

associated with physically significant correlations.The pair of patterns with the largest correlation(0.56) is shown in Figure 1.13. The SLP patternrepresents 21% of the total DJF SLP variancewhereas the SST pattern explains 19% of the totalSST variance.11 Clearly the two patterns supportthe hypothesis that the anomalous atmosphericcirculation is responsible for the generation of SST

11The proportion of variance represented by the patterns isunrelated to the correlation.

anomalies off the North American coast. Peng andFyfe [312] refer to this as the ‘atmosphere drivingthe ocean’ mode. See also Luksch [261].

Canonical Correlation Analysis is explained indetail in Chapter 14 and we return to this examplein [14.3.1–2].

1.2.7 Atmospheric General Circulation ModelExperimentation: Evaluation of Paired Sensi-tivity Experiments and Verification of ControlSimulation. Atmospheric General CirculationModels (AGCMs) are powerful tools used to sim-ulate the dynamics of the atmospheric circulation.There are two main applications of these GCMs,one being the simulation of the present, past (e.g.,paleoclimatic conditions), or future (e.g., climatechange) statistics of the atmospheric circulation.The other involves the study of the simulated cli-mate’s sensitivity to the effect of different bound-ary conditions (e.g., sea-surface temperature) orparameterizations of sub-grid scale processes (e.g.,planetary boundary layer).12

In both modes of operation two sets of statisticsare compared. In the first, the statistics of thesimulated climate are compared with those ofthe observed climate, or sometimes with those ofanother simulated climate. In the second modeof experimentation, the statistics obtained in therun with anomalous conditions are compared withthose from the run with thecontrolconditions. Thesimulated atmospheric circulation is turbulent asis that of the real atmosphere (see Section 1.1).Therefore the true signal (excited by the prescribedchange in boundary conditions, parameterization,etc.) or the true model error is masked by randomvariations.

Even when the modifications in the experimen-tal run have no effect on the simulated climate,the difference field will be nonzero and will showstructure reflecting the random variations in thecontrol and experimental runs. Similarly, the meandifference field between an observed distributionand its simulated counterpart will exhibit, possiblylarge scale, features, even if the model is perfect.

12Sub-grid scale processes take place on spatial scales toosmall to be resolved by a climate model. Regardless of theresolution of the climate model, there are unresolved processesat smaller scales. Despite the small scale of these processes,they influence the large-scale evolution of the climate systembecause of the nonlinear character of the climate system.Climate modellers therefore attempt to specify the ‘net effect’of such processes as a transfer function of the large-scale stateitself. This effect is a forcing term for the resolved scales, andis usually expressed as an expected value which is conditionalupon the large-scale state. The transfer function is called a‘parameterization.’

Page 19: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 13

Figure 1.13:The dominant pair of CCA patternsthat describe the connection between simultaneouswinter (DJF) mean anomalies of sea-level pressure(SLP, top) and sea-surface temperature (SST,bottom) in the North Atlantic. The largest featuresof the SLP field are indicated by shading in theSST map, and vice versa. See also [14.3.1]. FromZorita et al. [438].

Therefore, it is necessary to apply statistical tech-niques to distinguish between the deterministicsignal (or model error) and the internal noise.

Appropriate methodologies designed to diag-nose the presence of a signal include the useof interval estimation methods (Section 5.4) orhypothesis testing methods (Chapter 6). Intervalestimation methods use statistical models to pro-duce a range of signal estimates consistent withthe realizations of control and experimental meanfields obtained from the simulation. Hypothesistesting methods use statistical models to determinewhether information in the realizations is consis-tent with the null hypothesis that the differencefields, such as in Figures 1.14 and 1.15, do notcontain a deterministic signal and thus reflect onlythe effects of random variation.

We illustrate the problem with two examples: anexperiment in which there is no significant signal,and another in which modifications to the modelresult in a strong change in the atmospheric flow.

Figure 1.14:The mean SLP difference field be-tween control and experimental atmospheric GCMruns. Evaporation over the Iberian Peninsula wasartificially suppressed in the experimental run. Thesignal is not statistically significant [402].

Figure 1.15:The mean 500 hPa height differencefield between a control run and an experimentalrun in which a positive (El Nino) SST anomalywas imposed in the equatorial Central and EasternPacific. The signal is statistically significant. Seealso Figures 9.1 and 9.2 [393].

In the first case, the surface properties of theIberian peninsula were modified so as to turn itinto a desert in the experimental climate. Thatis, evaporation at the grid points representingthe Iberian peninsula was arbitrarily set to zero.The response, in terms of January NorthernHemisphere sea-level pressure, is shown inFigure 1.14 [402]. The statistical analysis revealed

Page 20: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

14 1: Introduction

that the signal, which appears to be of very largescale, is mainly due to noise and is not statisticallysignificant.

In the second case, anomalously warm sea-surface temperatures were prescribed in thetropical Pacific, in order to simulate the effect ofthe 1982/83 El Nino event on the atmosphere. Theresulting anomalous mean January 500 hPa heightfield is shown in Figure 1.15. In this case the signalis statistically distinguishable from the backgroundnoise.

Before using statistical tests, we must accountfor several methodical considerations (see Chap-ter 6). Straightforward statistical assessments thatcompare the mean states of two simulated climatesgenerally use simple statistical tests that are per-formed locally at grid points. More complexfieldtests, often calledfield significance testsin theclimate literature, are used less frequently.

Grid point tests, while popular because of theirsimplicity, may have interpretation problems. Theresult of a set of statistical tests, one conducted ateach grid point, is a field of decisions denotingwhere differences are, and are not,statisticallysignificant. However, statistical tests cannot beconducted with absolute certainty. Rather, they areconducted in such a way that there is ana priorispecified risk 1−p of rejecting the null hypothesis:‘no difference’ when it is true.13

The specified risk(1 − p) × 100% is oftenreferred to as thesignificance levelof the test.14

A consequence of setting the risk of falserejection to 1− p, 0 < p < 1, is that wecan expect approximately(1 − p) × 100% ofthe decisions to bereject decisions when thenull hypothesis is valid. However, many fields ofinterest in climate experiments exhibit substantial

13The standard, rather mundane statistical nomenclature forthis kind of error isType I error; failure to reject the nullhypothesis when it is false is termed aType IIerror. Specifyinga smaller risk reduces the chance of making a Type I error butalso reduces the sensitivity of the test and hence increases thelikelihood of a Type II error. More or less standard practice isto set the risk of a Type I error to(1 − p) × 100% = 5% intests of the mean and to(1 − p) × 100% = 10% in tests ofvariability. A higher level of risk is usually felt to be acceptablein variance tests because they are generally less powerful thantests concerning the mean state. The reasons for specifying therisk in the form 1− p, wherep is a large probability near 1, willbecome apparent later.

14There is some ambiguity in the climate literature abouthow to specify a ‘significance level.’ Many climatologists usethe expression ‘significant at the 95% level,’ although standardstatistical convention is to use the expression ‘significant at the5% level.’ With the latter convention, which we use throughoutthis book, rejection at the 1% significance level indicates thepresence of stronger evidence against the null hypothesis thanrejection at the 10% significance level.

spatial correlation (e.g., smooth fields such as thegeopotential heights displayed in Figure 1.1).

The spatial coherence of these fields has twoconsequences for hypothesis testing at grid points.The first is that the proportion of the field coveredby reject decisions becomes highly variable fromone realization of the climate experiment to thenext. In some problems a rejection rate of 20%may still be globally consistent with the nullhypothesis at the 5% significance level. Thesecond is that the spatial coherence of the studiedfields also leads to fields of decisions that arespatially coherent: if the difference between twomean 500 hPa height fields is large at a particularpoint, it is also likely to be large at neighbouringpoints because of the spatial continuity of 500 hPaheight. A decision made at one location isgenerally not statistically independent of decisionsmade at other locations. This makes regions ofsignificant change difficult to identify. Methodsthat can be used to assess the field significance ofa field of reject/retain decisions are discussed inSection 6.8. Local, orunivariate, significance testsare discussed in Sections 6.6 and 6.7.

Another approach to the comparison of ob-served and simulated mean fields involves the useof classicalmultivariate statistical tests(Sections6.6 and 6.7). The wordmultivariateis used some-what differently in the statistical lexicon than itis in climatology: it describes tests and other in-ference procedures that operate on vector objects,such as the difference between two mean fields,rather than scalar objects, such as a difference ofmeans at a grid point. Thus a multivariate test is afield significance test; it is used to make a singleinference about a field of differences between theobserved and simulated climate.

Classical multivariate inference methods cannot generally be applied directly to difference ofmeans or variance problems in climatology. Thesemethods are usually unable to cope with fieldsunder study, such as seasonal geopotential means,that are generally ‘observed’ at numbers of gridpoints one to three orders of magnitude greaterthan the number of realizations available.15

15A typical climate model validation problem involves thecomparison of simulated monthly mean fields obtained froma 5–100 year simulation, with corresponding observed meanfields from a 20–50 year climatology. Such a problem thereforeuses a combined total ofn = 25 to 150 realizations of meanJanuary 500 hPa height, for example. On the other hand, thehorizontal resolution of typical present day climate models issuch that these mean fields are represented on global grids withm = 2000 to 8000 points. Except on relatively small regionalscales, the dimension of (or number of points in) the differencefield is greater than the combined number of realizations fromthe simulated and observed climates.

Page 21: Statistical Analysis in Climate Research - The Library of …catdir.loc.gov/catdir/samples/cam032/98017416.pdf ·  · 2003-01-17Statistical analysis in climate research / Hans von

1.2: Some Typical Problems and Concepts 15

One solution to this difficulty is to reduce thedimension of the observed and simulated fields toless than the number of realizations before usingany inference procedure. This can be done usingpattern analysis techniques, such as EOF analysis,that try to identify the climate’s principal modesof variation empirically. Another solution is toabandon classical inference techniques and replacethem with ad hoc methods, such as the ‘PPP’ test(Preisendorfer and Barnett [320]).

Both grid point and field significance tests areplagued with at least two other problems thatresult in interpretation difficulties. The first ofthese is that the wordsignificancedoes not havea specific physical interpretation. The statisticalsignificance of the difference between a simulatedand observed climate depends upon both locationand sample size. Location is a factor that affectsinterpretation because variability is not uniformin space.A 5 m difference between an observedand a simulated mean January 500 hPa heightfield may be statistically very significant in thetropics, but such a difference is not likely tobe statistically, or physically, significant at mid-latitudes where interannual variability is large.Sample size is a factor because the sensitivityof statistical tests is affected by the amount of

information about the mean state contained inthe observed and simulated realizations. Largersamples have greater information content andconsequently result in more powerful tests. Thus,even though a 5 m difference at midlatitudes maynot be physically important, it will be found tobe significant given large enough simulated andobserved climatologies. The statistical strength ofthe signal (or model error) may be quantified bya parameter called thelevel of recurrence, whichis the probability that the signal’s signature willnot be masked by the noise in another identicalbut statistically independent run with the GCM(Sections 6.9–6.10).

The second problem is that objective statis-tical validation techniques are more honest thanmodellers would like them to be. GCMs andanalysis systems have various biases that ensurethat objective tests of their differences will rejectthe null hypothesis of no difference with certainty,given large enough samples. Modellers seem tohave an intuitive grasp of the size and spatialstructure of biases and seem to be able to discounttheir effects when making climate comparisons. Ifthese biases can be quantified, statistical inferenceprocedures can be adjusted to account for them(see Chapter 6).