rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* [email protected] * Acknowledgement:...

26
EnKF overview Rahul Mahajan* [email protected] * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1

Transcript of rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* [email protected] * Acknowledgement:...

Page 1: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

EnKFoverviewRahulMahajan*

[email protected]

*Acknowledgement:JeffWhitaker,DarylKleist,andGSIdevelopmentteam.

1

Page 2: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

2

3DVarCostfunc6on

J:Penalty(Fittobackground+FittoobservaLons)δx:Analysisincrement(xa–xb)xa,xb:Analysis,BackgroundB:Backgrounderrorcovariance(esLmatedoffline)H:ObservaLons(forward)operatorR:ObservaLonerrorcovariance(Instrument+

representaLveness)d=yo-Hxb,whereyoaretheobservaLonsCostfuncLon(J)isminimizedtofindsoluLon,δx[xa=xb+δx]

J δx( ) = 12δxTB-1δx + 1

2Hδx -d( )T R-1 Hδx -d( )

2

Page 3: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

3

HybridEnsembleVarCostfunc6on

Bs:(Sta6c)background-errorcovariance(esLmatedoffline)Be:(Flow-dependent)background-errorcovariance(esLmated

fromensemble)β: WeighLngfactor(0.25meanstotalBis¾ensemble).

Jhybrid δx( ) = β2δxTBs

−1δx+1−β2

δxTBe−1δx+ 1

2Hδx -d( )TR−1 Hδx -d( )

3

Page 4: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

WhatdoesBedo?

Increment(allsta6c) Increment(allensemble)

TemperatureobservaLonnearawarmfront

4

Bs Be

Page 5: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

WhatdoesBedo?

½Sta6c,½ensemble

FirstguessSLP,wind Allsta6c

Allensemble

ZonalwindobservaLonnearahurricane(Ike)

5

Page 6: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

WhatdoesBedo?

Ps ob

First-Guess SLP contours

3DVarincrementwouldbezero!(cross-variablecovarianceshardtomodelwithsta9cBs)

SurfacepressureobservaLonnearan“atmosphericriver”

6

Page 7: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

WhatdoesBedo?

•  Addssitua6on-dependencetoanalysisincrements.

•  SparseobservaLonsnearcoherentdynamicalfeaturesusedmoreeffecLvely.

•  Changesintheobservingnetworkcanbecapturedinbackground-errorvariance.

•  Moreinforma9onextractedfromobserva9ons=>Moreskillfulforecasts

7

Page 8: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Sowhat’sthecatch?

•  Rank Deficiency –  Need a fairly large O(100) ensemble to represent uncertainty

•  Too few degrees of freedom available to fit the observations •  Low rank approximation yields spurious long-distance

correlations

•  Mistreatment of “system error/uncertainty” –  Sampling (as above), model error, observation operator

error, representativeness, etc.

•  State estimate is ensemble average –  This can produce unphysical estimates, smooth out high

fidelity information, etc. 8

Page 9: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Sowhat’sthecatch?

9

•  TheGSIvariaLonalsystemdoesnotprovidetheensemble–itprovidesananalysisthatcanbeinterpretedastheensemblemean,givenanensemblethatrepresentsforecastuncertainty.

•  InNCEPoperaLons,an“EnsembleKalmanFilter”(EnKF)isusedtogeneratethebackgroundensemble.

Page 10: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Dataassimila6onterminology•  y:ObservaLonvector(weatherballoons,satellite

radiances,etc.)•  x:thestateoftheatmosphereasrepresentedbythe

model•  xb:Backgroundstatevector(“prior”)•  xa:Analysisstatevector(“posterior”)•  H:(hopefullylinear)operatortoconvertmodelstateà

observaLonlocaLon&type•  R:ObservaLon-errorcovariancematrix•  Pb:Background-errorcovariancematrix•  Pa:Analysis-errorcovariancematrix

10

Page 11: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

ApproachestoSolu6on•  TwomainperspecLvesofpracLcaldataassimilaLon&hybrid

approach

11

Varia6onalApproach:LeastsquareesLmaLon[maximumlikelihood]–  3D-Var(3diminspace)–  4D-Var(4thdimisLme)

Sequen6al(KF)Approach:MinimumVarianceesLmate[minimumuncertainty]–  OpLmalInterpolaLon(OI)–  (Extended/Ensemble)Kalman

Filter

p(x)

Courtesy:KayoIde

p(x)

Page 12: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

FromBayestheoremto4DVarandthe(Ensemble)KalmanFilter

Variational methods maximize the posterior PDF to find the state trajectory x that best fits the obs y in a least-squares sense. In practice, this is done by minimizing a cost function, which is what’s inside the exp:

The minimum can be found analytically if H is linear (see Lorenc 1986) This gives the equations for the Kalman Filter

•  Matrix Pb is too big for any computer, covariance update step impractical.

•  Instead, represent PDFs of x and y by an ensemble, compute sample estimate of Pb and xb. Evolve the sample, not the full covariance. EnKF gives same result as full KF if ensemble size becomes infinite.

12

p(x|y) / exp

⇣�(x� xb)

TPb�1(x� xb)� (y �Hx)TR�1

(y �Hx)⌘

J (x) / (x� xb)TPb�1(x� xb) + (y �Hx)TR�1(y �Hx)

Page 13: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Computa6onalshortcutsinEnKF:(1)SimplifyingKalmangaincalculaLon

K = PbH T HPbH T +R( )−1

define Hxb = 1m

Hx ib

i=1

m

PbH T =1m−1

x ib − xb( ) Hx ib −Hxb( )

i=1

m

∑T

HPbH T =1m−1

Hx ib −Hxb( ) Hx ib −Hxb( )

i=1

m

∑T

ThekeyhereisthatthehugematrixPbisneverexplicitlyformed13

Page 14: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Computa6onalshortcutsinEnKF:(2)serialprocessingofobservaLons(requiresobservaLonerrorcovarianceRtobediagonal)

EnKFBackgroundforecasts

ObservaLons1and2

Analyses

EnKFBackgroundforecasts

ObservaLon1

Analysesaqerobs1 EnKF

ObservaLon2

Analyses

Method1

Method2

14

Page 15: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

TheserialEnKF–arecipeGivenasingleobyowithexpectederrorvarianceR,anensembleofmodelforecastsxb(modelpriors),andanensembleofpredictedobservaLonsyb=Hxb(observaLonpriors):

Step1:UpdateobservaLonpriors.(1a)updateforobpriormeans(1b)rescalingofobpriorperturba4onswherethescalarK=var(yb)/(var(yb)+R),overbardenotesmeans,primedenotesperturbaLons,superscriptbdenotesprior,adenotesanalysis.Linearinterpola9onbetweenobserva9onandobserva9onpriormeanwithweightK(0<=K<=1),rescalingofobserva4onpriorensemblesoposteriorvarianceisconsistentwithKalmanfilter,i.e.var(ya)=(1-K)var(yb)=var(yb)R/(var(yb)+R).whenvar(yb)<<R,allweightgiventoprior.whenvar(yb)>>R,allweightgiventoobservaLon.

ya

= (1�K)yb

+ Kyo

y0

a =p

(1�K)y0

b

15

Page 16: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

TheserialEnKF–arecipe(2)

Step2:Updatemodelpriors.LetΔx=xa-xbbeanalysisincrementformodelpriors,Δy=ya-ybisanalysisincrementforobservaLonpriors.(2)Δx=GΔycomputa4onofincrementstomodelpriorwhereG=cov(xb,ybT)/var(yb)Linearregressionofmodelpriorsonobserva9onpriors.Onlychangesmodelpriorswhenxbandybarecorrelatedwithintheensemble.Ifthereismorethanoneobtobeassimilated,theobservaLonpriorsforother(notyetassimilated)obs(Yb)shouldbealsobeupdatedusing(2)withΔxreplacedbyΔY.NextiteraLon,replaceybwithnextcolumnofYb,removingthatcolumnfromYb.AqereachiteraLonthemodelpriorsandobservaLonpriorsaresettothelatestanalysisvalues(xareplacesxb,YareplacesYb).ConLnueiteraLngunLlYbisempty.

16

Page 17: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Infla6onandLocaliza6on

•  Infla6on– UsedtoinflateensembleesLmateofuncertaintytoavoidfilterdivergence(addiLveandmulLplicaLve)

•  Localiza6on– DomainLocaliza6on

•  SolvesequaLonsindependentlyforeachgridpoint(LETKF)–  CovarianceLocaliza6on

•  Performedelementwise(Schurproduct)oncovariancesthemselves

17

Page 18: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Methodsforrepresen6ngmodelerror(infla6on)

•  Mul6plica6veinfla6on:–  RelaxaLon-to-priorspread(RTPS)–  RelaxaLon-to-priorperturbaLon(Zhangetal.2004)–  AdapLve(Anderson2007)

•  Addi6veinfla6on:Addrandomsamplesfromaspecified

distribuLontoeachensemblememberaqertheanalysisstep.–  Env.Canadausesrandomsamplesofisotropic3DVarcovariancematrix.

– NCEPusedrandomsamplesof48-h–24-hforecasterror(fcstsvalidatsameLme).

18

Page 19: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

ImperfectModel(Addi9ve+Mul9plica9veInfla9onExample)

•  AddiLveinflaLonaloneoutperformsmulLplicaLveinflaLonalone(comparevaluesy-axistovaluesalongx-axis)

•  AcombinaLonofbothisbewerthaneitheralone.

•  MulLplicaLveandaddiLveinflaLonrepresenLngdifferenterrorsourcesintheDAcycle?

19

Page 20: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Covariance Localization – Simple Example

20

EsLmatesofcovariancesfromasmallensemblewillbenoisy,withsignal-to-noisesmallespeciallywhencovarianceissmall

Page 21: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Covariancelocaliza6on–NWPEx.•  AMSUANOAA15channel6radianceat150E,-50S.

•  Incrementtolevel30(~310mb)temperaturefora1KO-Ffor40,80,160,320and640ensmemberswithnolocalizaLon.

21

N=40 N=80

N=160 N=320 N=640

Page 22: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Factorslimi6ngEnKFperformance1)Treatmentofmodelerror

Mustaccountforthebackgrounderrorcovarianceassociatedwith“modelerror”(anydifferencebetweensimulatedandtrueenvironment).Methodsusedsofar:

1)  mulLplicaLveinflaLon(mult.enspertsbyafactor>1).2)  model-basedschemes(e.g.stochasLckineLcenergy

backscawerforrepresenLngunresolvedprocesses,stochasLcallyperturbedphysicstendeciesforrepresenLngparameterizaLonuncertainty).

3)  addiLveinflaLon(randompertsaddedtoeachmember–e.g.differencesbetween24and48-hforecastsvalidatthesameLme).Opera9onalNCEPsystemusesthefirsttwo.

22

Page 23: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Factorslimi6ngEnKFperformance2)Treatmentofsamplingerror(localiza9on)

• AllEnKFimplementaLonslocalizethespaLalimpactofobservaLonsonthemodelstate.• DonebyspaLallymodulaLngcovariancebetweenobs.priorandmodelstate,orbyonlyusingobservaLons‘close’toamodelstatevariabletoupdatethatvariable.• Neededtoaccountforlowrankofensemble(comparedtomodelstate).• Methodsusedcurrentlyarenotflowdependent,andassumethereisnosamplingerroratoblocaLon.

23

Page 24: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

WhycombineEnKFandVar?FeaturesfromEnKF FeaturesfromVarCanpropagatePbfromacrossassimilaLonwindows

TreatmentofsamplingerrorinensemblePbesLmatedoesnotdependonH.

Moreflexibletreatmentofmodelerror(canbetreatedinensemble)

Dual-resoluLoncapability–canproduceahigh-res“control”analysis.

AutomaLciniLalizaLonofensembleforecasts.

EaseofaddingextraconstraintstocostfuncLon,includingastaLcPbcomponent.

24

Page 25: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

EnKFmemberupdate

member2analysis

highresforecast

GSIEns/Var highresanalysis

member1analysis

member2forecast

member1forecast

recenteranalysisensemble

Ensemble-Varworkflow

member3forecast

member3analysis

Previous Cycle Current Update Cycle 25

Page 26: rahul.mahajan@noaa · 2018. 10. 29. · Rahul Mahajan* rahul.mahajan@noaa.gov * Acknowledgement: Jeff Whitaker, Daryl Kleist, and GSI development team. 1 2 3DVar Cost func6on J:

Summary•  TheEnKFusesanensembleoffirst-guessforecaststoesLmatethebackground-errorcovariance.EveryensemblememberisupdatedateachanalysisLme.–  Parallelcode,scalableouttoO(1000’s)ofprocessorsaslongasnumberofobs<<numberofstatevars.

–  Requiresstatevectorinmodelandobspace,plusobs,asinput.

–  GSIusedtocomputeforwardobservaLonoperator(separatesteprunbeforeEnKF).

•  NeedtocarefullytunelocalizaLonlengthscales(dependsonmodelresoluLon,observingnetwork).

•  Ensemble(co)variancesmustberepresentaLveofcontrolforecasterror.Treatmentofmodelerroriscrucial.

26