Download - Adapting Generalized additive modelling methods for short term grid

Transcript

Page 1: Adapting Generalized additive modelling methods for short term grid

GAM for large datasets and load forecasting

Simon Wood, University of Bath, U.K.

in collaboration with

Yannig Goude, EDF

Page 2: Adapting Generalized additive modelling methods for short term grid

French Grid load

0 500 1000 1500

3050

day

1740 1760 1780 1800 1820

3040

day

I Half hourly load data from 1st Sept 2002.I Pierrot and Goude (2011) successfully predicted one day

ahead using 48 separate models, one for each half hour periodof the day.

Page 3: Adapting Generalized additive modelling methods for short term grid

Why 48 models?

I The 48 separate models each have a form something like

MWt = f (MWt−48) +∑

jgj(xjt) + εt

where f and the gj are a smooth functions to be estimated,and the xj are other covariates.

I 48 Separate models has some disadvantages1. Continuity of effects across the day is not enforced.2. Information is wasted as the correlation between neighbouring

time points is not utilised.3. Fitting 48 models, each to 1/48 of the data promotes estimate

instability and creates a huge model checking task each timethe model is updated (every day?)

I But existing methods for fitting such smooth additive modelscould not cope with fitting one model to all the data.

Page 4: Adapting Generalized additive modelling methods for short term grid

Smooth additive models

I The basic model is

yi =∑

jfj(xji) + εi

where the fj are smooth functions to be estimated.I Need to estimate the fj (including how smooth).I Represent each fj using a spline basis expansion

fj(xj) =∑

k bjk(xj)βjk where bjk are basis functions and βjkare coefficients (maybe 10-100 of each).

I So model becomes

yi =∑

∑k

bjk(xji)βjk + εi = Xiβ + εi

. . . a linear model. But it is much too flexible.

Page 5: Adapting Generalized additive modelling methods for short term grid

Penalizing overfit

I To avoid over fitting, we penalize function wiggliness (lack ofsmoothness) while fitting.

I In particular letJj(fj) = βTSβ

measure wiggliness of fj .I Estimate β to minimize∑

i(yi − Xiβ)

2 +∑

jλjβ

TSβ

where λj are smoothing parameters.I High λj ⇒ smooth fj . Low λj ⇒ wiggly fj .

Page 6: Adapting Generalized additive modelling methods for short term grid

Example basis-penalty: P-splinesI Eilers and Marx have popularized the use of B-spline bases

with discrete penalties.I If bk(x) is a B-spline and βk an unknown coefficient, then

f (x) =K∑k

βkbk(x).

I Wiggliness can be penalized by e.g.

P =K−1∑k=2

(βj−1 − 2βj + βj+1)2 = βTSβ.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

b k(x

)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

f(x)

Page 7: Adapting Generalized additive modelling methods for short term grid

Example varying P-spline penalty

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

basis functions

0.0 0.2 0.4 0.6 0.8 1.0

−2

810

∑i

(βi−1 − 2βi + βi+1)2 = 207

0.0 0.2 0.4 0.6 0.8 1.0

−2

810

∑i

(βi−1 − 2βi + βi+1)2 = 16.4

0.0 0.2 0.4 0.6 0.8 1.0

−2

810

y∑

i(βi−1 − 2βi + βi+1)2 = 0.04

Page 8: Adapting Generalized additive modelling methods for short term grid

Choosing how much to smooth

0.2 0.4 0.6 0.8 1.0

−2

λ too high

0.2 0.4 0.6 0.8 1.0

−2

λ about right

0.2 0.4 0.6 0.8 1.0

−2

λ too low

I Choose λj to optimize model’s ability to predict data it wasn’tfitted too.

I Generalized cross validation chooses λj to minimize

V(λ) = ‖y − Xβ‖2

[n − tr(F)]2

where F = (XTX +∑

j λjSj)−1XTX. tr(F) = effective

degrees of freedom.

Page 9: Adapting Generalized additive modelling methods for short term grid

The main problem

I We estimate β by the minimizer of

‖y − Xβ‖2 +∑

jλjβ

TSβ

and the λj to minimise

V(λ) = ‖y − Xβ‖2

[n − tr(F)]2

I If X is too large, then we can easily run out of computermemory when trying to fit the model.

I A simple approach will let us fit the model without forming Xwhole.

Page 10: Adapting Generalized additive modelling methods for short term grid

A smaller equivalent fitting problem

I Let X be n × pI Consider the QR decomposition X = QR, where R is p × p

and Q has orthogonal columns.I Let f = QTy and γ = ‖y‖2 − ‖f‖2.I ‖y − Xβ‖2 = ‖f − Rβ‖2 + γ and

V(λ) = ‖y − Xβ‖2

[n − tr(F)]2 =‖f − Rβ‖2 + γ

[n − tr(F)]2

where F = (RTR +∑

j λjSj)−1RTR.

I So once we have R and f we can estimate β and λ withoutX. . .

Page 11: Adapting Generalized additive modelling methods for short term grid

QR updating for R and f

I Let X =

[X0X1

], y =

[y0y1

]

I Form X0 = Q0R0 &[

R0X1

]= Q1R.

I Then X = QR where Q =

[Q0 00 I

]Q1.

I Also f = QTy = QT1

[QT

0 y0y1

]I Applying these results recursively, R and f can be

accumulated from sub-blocks of X without forming X whole.

Page 12: Adapting Generalized additive modelling methods for short term grid

XTX updating and Choleski

I R is the Choleski factor of XTX.I Partitioning X row-wise into sub-matrices X1, X2, . . ., we have

XTX =∑

jXT

j Xj

which can be used to accumulate XTX without needing toform X whole.

I At same time accumulate XTy =∑

j XTj y.

I Choleski decomposition gives RTR = XTX, and f = R−1XTy.I This is twice as fast as QR approach, but less numerically

stable.

Page 13: Adapting Generalized additive modelling methods for short term grid

Generalizations

1. Generalized additive models (GAMs) allow distributions otherthan normal (and non-identity link functions).

2. GAMs can be estimated by iteratively estimating workinglinear models, using R and f accumulation tricks on each.

3. REML, AIC etc can also be used for λj estimation. REML canbe made especially efficient in this context.

4. The accumulation methods are easy to parallelize.5. Updating a model fit with new data is very easy.6. AR1 residuals can also be handled easily in the

non-generalized case.

Page 14: Adapting Generalized additive modelling methods for short term grid

Air pollution example

I Around 5000 daily death rates, for Chicago, along with time,ozone, pm10, tmp (last 3 averaged over preceding 3 days).Peng and Welty (2004).

I Appropriate GAM is: deathi ∼ Poi,

log{E(deathi)} = f1(timei) + f2(ozonei , tmpi) + f3(pm10i).

I f1 and f3 penalized cubic regression splines, f2 tensor productspline.

I Results suggest a very strong ozone - temperature interaction.

Page 15: Adapting Generalized additive modelling methods for short term grid

Air pollution Chicago results

−2000 −1000 0 1000 2000

−0.

10.

00.

10.

time

s(tim

e,13

6.85

)

te(o3,tmp,38.63)

−50 0 50 100

010

020

030

tmp

−1se +1se

10 12 14 16 18

−0.

10.

00.

10.

pm10

s(pm

10,3

.42)

Page 16: Adapting Generalized additive modelling methods for short term grid

Air pollution Chicago results

I To test the truth of ozone-temp interaction it would be goodto fit to the equivalent data for around 100 other cities,simultaneously.

I Model is

log{E (deathi)} = γj + αk + fk(o3i , tempi) + f4(ti)

if observation i is from city j and age group k (there are 3 agegroups recorded).

I Model has 802 coefs and is estimated from 1.2M data.I Fitting takes 12.5 minutes using 4 cores of a $600 PC.I Same model to just Chicago, takes 11.5 minutes by previous

methods.

Air pollution all cities results

−2000 −1000 0 1000 2000

−0.

10.

00.

10.

20.

time

s(tim

e,27

7.36

)<65

tmp

−0.5

−0.3 −0.2

−0.2 −0.2

−0.1

0.1 0.2

0.3

−100 0 100 200

010

020

030

040

65−74

tmp

−0.05

0.05

0.1

0.15

−100 0 100 200

010

020

030

040

>74

tmp

−0.02

0.02

0.0

0.02

0.04 0.04 0.04

0.0

0.04

0.06

0.06 0.08

0.08

0.1

0.12 0.14

−100 0 100 200

010

020

030

040

Page 18: Adapting Generalized additive modelling methods for short term grid

Back to grid load

0 500 1000 1500

3050

day

1740 1760 1780 1800 1820

3040

day

I Now the 48 separate grid load model can be replaced by

Li = γj + fk(Ii , Li−48) + g1(ti) + g2(Ii , toyi) + g3(Ti , Ii)

+ g4(T.24i , T.48i) + g5(cloudi) + STih(Ii) + ei

if observation i is from day of the week j, and day class k.I ei = ρei−1 + εi and εi ∼ N(0, σ2) (AR1).

Page 19: Adapting Generalized additive modelling methods for short term grid

Fitting grid load model

I Using QR update fitting takes under 30 minutes and weestimate ρ = 0.98.

I Update with new data then takes under 2 minutes.I We fit up to 31 August 2008, and then predicted the next

years data, one day at a time, with model update.I Predictive RMSE was 1156MW (1024MW for fit). Predictive

MAPE 1.62% (1.46% for fit).I Setting ρ = 0 gives overfit, and worse predictive performance.

Page 20: Adapting Generalized additive modelling methods for short term grid

Residuals

0 10 20 30 40

−60

00−

2000

020

0040

0060

0 10 20 30 40

0.5

1.0

1.5

2.0

2.5

3.0

Instant

)

MoTuWeTh

FrSaSu

9/1/

2002

12/1

9/20

4/8/

2003

8/12

/200

12/4

/200

3/22

/200

7/26

/200

11/1

7/20

3/7/

2005

7/6/

2005

10/2

3/20

2/18

/200

6/20

/200

10/8

/200

2/3/

2007

6/4/

2007

9/21

/200

1/17

/200

5/6/

2008

8/31

/200

−60

00−

2000

020

0040

0060

1/9/

2008

17/9

/200

84/

10/2

008

21/1

0/20

0815

/11/

2008

2/12

/200

819

/12/

2008

13/1

/200

930

/1/2

009

16/2

/200

94/

3/20

0921

/3/2

009

7/4/

2009

28/4

/200

927

/5/2

009

17/6

/200

94/

7/20

0925

/7/2

009

11/8

/200

931

/8/2

009

−60

00−

4000

−20

000

2000

4000

6000

Page 21: Adapting Generalized additive modelling methods for short term grid

Temperature effect

I[t]

T[t]0

2030

L[t]

50000

55000

60000

65000

70000

Temperature Effect

Page 22: Adapting Generalized additive modelling methods for short term grid

Lagged load Effects

I[t]

L[t − 1]

30000

40000

5000060000

7000080000

L[t]

40000

50000

60000

70000

Lagged Load Effect, WW

I[t]

L[t − 1]

30000

40000

5000060000

7000080000

L[t]

50000

60000

70000

80000

Lagged Load Effect, WH

I[t]

L[t − 1]

30000

40000

5000060000

7000080000

L[t]

50000

60000

70000

80000

Lagged Load Effect, HH

I[t]

L[t − 1]

30000

40000

5000060000

7000080000

L[t]

40000

50000

60000

70000

80000

90000

Lagged Load Effect, HW

Top Related

Visualização Cientíﬁca – MAI5015 · 2020. 4. 28. · Interpolação em Células de um Grid Objetivo: dada uma função f i amostrada nos vértices i do grid, queremos calcular

Pseudorandomness in Computer Science and in Additive Combinatorics

Hyers-Ulam Stability Additive -Functional …2020/05/08 · Hyers-Ulam stability additive -functional inequalities 221 of Rassias’ approach. f x+ y 2 = 1 2 f x + 1 2 f y is called

Evaluation of System-Integrated Smart Grid NREL Devices ...€¦ · The information contained in this poster is subject to a government license. IEEE PES Innovative Smart Grid Technologies

Extended smooth Modelling and R - University of Bristolsw15190/talks/snw-R-talk.pdf · Generalized Additive Models I A generalized additive models relates a response variable yi to

The impact of additive manufacturing on micro reactor technology (slideshare 17 juni 2015)

Hierarchical Polynomial-Bases & Sparse Grids 1/21 grid: Gitter сéтка sparse: spärlich, dünn рéдкий.

5. PRESSURE AND VELOCITY SPRING 2020 5.1 The Momentum … · 5.2.3 Staggered Grid (Harlow and Welch, 1965) In the staggered-grid arrangement, velocity components are stored half-way