· Web view-th Fourier frequency for j=- (T-1) 2 ,…, T 2 and x denotes the largest integer not...

Combining seasonality tests: a random forest approachKarsten Webel ( [email protected] )1, Daniel Ollech ( [email protected] )1

Keywords: classification trees, JDemetra+, simulation study, supervised learning.

1. INTRODUCTION

When deciding whether an observed time series is seasonal or non-seasonal, a variety of tests can be used. As an example, the output of release version 2.1.0 of JDemetra+ (JD+) reports the results of six different seasonality tests, among many other things. This raises two general questions: first, if the tests come to the same decision, are they indeed informative or rather redundant and what is their value added; second, if the tests arrive at different decisions, which ensemble decision regarding seasonality of the series should be made and which tests are the most reliable ones? Focussing primarily on the latter question, we demonstrate that random forests can be used (1) to combine the outcomes of the seasonality tests in order to find an ensemble decision and (2) to identify the most informative tests.

This paper is organised as follows: Section 2 briefly introduces the seasonality tests currently implemented in JD+ and provides basic theory of random forests. Section 3 contains the results of a simulation study that aims at comparing the performances of the random forest approach and the seasonality tests. Finally, Section 4 concludes.

2. METHODS

2.1. Seasonality tests in JD+

JD+ incorporates six seasonality tests, where the null hypothesis (H 0) always states absence of seasonality. We briefly outline the basic ideas of five of these tests in the order of their appearance in the JD+ output, ignoring seasonal peaks. To this end, let {z t } be a stationary series of length T with τ observations per year:

The modified QS-test (QS) checks the series for significant positive autocorrelation at seasonal lags. Let γ (h )=E ( zt+h z t )−[ E(z t)]

2 and ρ (h )=γ (h)/γ (0) denote the series’ lag-h autocovariance and -correlation, respectively. The test statistic is defined as follows: if ρ̂ (τ )<0, then QS = 0; otherwise,

QS=T (T+2 )( ρ̂2 (τ )T−τ

+[max {0 , ρ̂ (2 τ ) }]2

T−2 τ ) .

The null distribution of this statistic is unknown but it can be shown that QS χ22 holds

approximately under H 0, see [1]. The Friedman-test (FT) checks for significant differences between the period-

specific mean ranks of the observations. Assume that each period i=1 ,…, τ has n observations and let ri be the mean rank of the observations in period i, where the ranks are assigned separately for each year. The test statistic is then given by

FT = τ−1τ ∑

i=1

τ n[ri−(τ+1)/2]2

( τ2−1 )/12,where FT χτ −1

2 holds approximately under H 0. Thus,

the Friedman-test is a one-way ANOVA with repeated measures on the ranks of the observations.

1 Deutsche Bundesbank, Wilhelm-Epstein-Strasse 14, 60431 Frankfurt am Main, Germany1

mailto:[email protected]

mailto:[email protected]

The Kruskal-Wallis-test (KW) follows the same idea as the Friedman-test but allows for period-specific numbers ni of observations and assigns ranks over the entire observation period. In the absence of ties the test statistic is

KW=T−1T ∑

i=1

τ ni[ri−(T+1)/2]2

( T2−1 ) /12.Under H 0, KW χτ−1

2 holds true asymptotically.

Hence, the Kruskal-Wallis-test can be interpreted as a one-way ANOVA without repeated measures.

The periodogram-test (PD) checks if a weighted sum of the periodogram estimator at seasonal frequencies is significantly different from zero. This estimator is given by (2π )−1 I (ω j) with

I ( ω j )={∑|h|≤ Tγ̂ (h)e−ih ωj , ω j ≠0

T|z|2 , ω j=0,where ω j=2πj /T is the j-th Fourier frequency for

j=−⌊(T−1)/2 ⌋ , …, ⌊T /2 ⌋ and ⌊ x ⌋ denotes the largest integer not exceeding x. The weighted sum of the periodogram estimator evaluated at seasonal frequencies then

follows an F-distribution with τ−1 and T−τ degrees of freedom. The F-test on seasonal dummies (SD) checks if the effects of the τ−1 seasonal

dummies in the regARIMA model( pdq ) (000 ) noise+mean+seasonal dummiesare simultaneously zero. In the first

version, the non-seasonal orders ( pdq) are set to (011), whereas they are determined via automatic model identification in a second version. Either way, the test statistic is

given by

SD=β̂' Cov ( β̂ )−1 β̂

τ−1× T−d−p−q−τ−1

T−d−p−q,where β̂ is the vector of the estimated effects

of the seasonal dummies. It follows that SD F τ−1 ,T −d−p−q−τ−1 under H 0.

2.2. Random forests

The random forest approach originally developed by [2] can be applied to regression and classification problems. In the latter case, the basic idea is to use the majority vote of an ensemble of classification trees as a classifier. Further information on classification trees is provided by [3]. For a set of paired observations (x i , y i) with i=1 , …,N , where x i=( xi 1 , …, x ip) is a set of p predictors and y i is a categorical response, we draw B bootstrap samples with replacement from the training data (either the original data or a random sample thereof). For each bootstrap sample b=1 ,…, B an unpruned classification tree is grown, but at each terminal node only a random sample from the set of predictors is used. From this sample, the predictor which minimises the node’s impurity is used to create a binary split of the node. New nodes for each tree are grown until the minimum size of terminal nodes is reached.

To evaluate the performance of the random forest, misclassification rates are typically considered for the “out-of-bag” (OOB) data, which is the training data not in the bootstrap samples, and/or external validation (VAL) data.

As the number of trees in the random forest is usually very large, the interaction of the predictors tends to be rather complex. Therefore importance measures are calculated separately for each predictor. These include (1) the decrease of node impurity obtained by splitting into daughter nodes using the predictor, averaged over all nodes of all trees, and (2) the difference between the misclassification rates before and after random

2

permutation of the values of the predictor in the OOB sample, averaged over all trees and normalised by the standard deviation of the differences.

3. RESULTS

In total, we simulated 48000 seasonal and non-seasonal ARIMA time series with three different lengths: 60, 120, and 240 months.2 Using JD+, we ran the five seasonality tests3 for all of these series and used their p-values as predictors and their seasonal/non-seasonal nature as the binary response. We split the entire sample into training and validation data, each of size 24000, according to a simple random selection without replacement. We used the training data to grow a random forest with B=500 trees, in line with the suggestions of [3], and a bootstrap sample size of 24000. The Gini-index Qm=∑

kp̂mk (1− p̂mk ), where p̂mk is the proportion of training data in node m from class k,

was taken as the impurity measure. The number of candidate predictors at each terminal node was ⌊√ p ⌋=2.

Table 1. Misclassification rates of the random forest and the seasonality tests as a percentage (N-S = non-seasonal series, S = seasonal series).

Classifier

Simulated seriesAll lengths 5-year 10-year 20-yearN-S S N-S S N-S S N-S S

Random forest

OOB

1.6 6.7 2.1 9.0 1.2 6.5 1.5 4.8

VAL 1.7 6.4 1.7 8.6 1.5 6.4 1.8 4.3Seasonality tests with α=0.01

QS 1.6 8.1 1.5 11.6 1.3 7.4 2.0 5.2FT 1.3 13.4 0.8 17.1 1.4 12.0 1.7 11.0KW 1.8 11.0 1.2 12.8 1.9 10.5 2.4 9.7PD 3.1 10.1 3.2 11.6 3.0 9.8 3.0 9.0SD 1.9 9.1 2.4 9.4 1.7 9.2 1.7 8.7

Seasonality tests with α=0.05

QS 4.2 6.4 3.9 6.0 3.8 6.9 4.8 4.5FT 5.7 9.3 4.9 10.8 5.6 8.9 6.5 8.0KW 7.2 7.7 6.3 8.5 7.0 7.7 8.2 6.9PD 8.5 7.3 8.8 8.3 7.9 7.1 8.9 6.6SD 7.0 6.6 7.9 6.9 6.4 6.6 6.8 6.3

Table 1 reports the misclassification rates of the random forest and the seasonality tests, broken down by the seasonal/non-seasonal nature and the length of the simulated series. As expected, the misclassification rates of the seasonality tests depend on the chosen significance level, whereas they do not depend on any user-specified significance level for the random forest since the p-values of the seasonality tests have been used as its input. More precisely, the seasonality tests yield misclassification rates that tend to be low (high) for non-seasonal (seasonal) series if the significance level is small and high (low) for non-seasonal (seasonal) series if the significance level is higher, regardless of series length. In contrast, the random forest yields misclassification rates close to the best seasonality test for both OOB and VAL data, independent of the seasonal/non-seasonal

2 For each of the following ARIMA models, 1000 time series were simulated for each of the three lengths considered. Stationary non-seasonal: (002), (100), (201). Non-stationary non-seasonal: (011), (011) + outliers, (110), (212), (311). Stationary seasonal: (000)(101), (011) + sine + cosine, (100)(100). Non-stationary seasonal: (011)(011) for three parameter constellations, (110)(110), (212)(111).3 For the F-test on seasonal dummies, we used the variant ( pdq )=(011).

3

nature and the length of the series. Relative to the seasonality tests, the results of the random forest improve noticeably with the length of the series.

Table 2 reports the two importance measures mentioned in Section 2.2 for the assessment of each predictor´s contribution to the ensemble decision. Here, the Gini-index was used again to measure node impurity. Either measure identifies the QS- and the F-test on seasonal dummies as the most informative seasonality tests. Intuitively, this may be explained by the fact that the F-test is designed to capture deterministic seasonality while the QS-test also allows for stochastic seasonality. Combining these two tests in a sensible manner will therefore cover both time-constant and time-varying seasonal patterns.

4. CONCLUSIONS

Table 2. Importance measures of the p-values of the seasonality tests.

Seasonality testMean decrease ofnode impurity (Gini-index)

Standardised mean difference of misclassification rate

QS 4771.95 68.75FT 349.44 28.55KW 2364.82 27.58PD 1223.01 22.55SD 3276.29 33.34

We showed by means of a simulation study that random forests can be applied to combine the results of different seasonality tests in order to reduce the misclassification rates for non-seasonal and seasonal series. In addition, we identified the QS-test and the F-test on seasonal dummies as the most informative tests for the forest decision on seasonality.

Since the p-values of the candidate seasonality tests, which we used as predictors during tree growing, are likely to be positively correlated, we intend to cross-check the latter finding by calculating conditional rather than unconditional importance measures. Apart from that, future research could extend our simulation design by including quarterly series as well as other seasonality tests, such as the test for seasonal peaks and the F-test on seasonal dummies with automatic identification of the non-seasonal model part. Also, the random forest approach could be investigated in unsupervised mode, working with real-world rather than simulated data. Eventually, the results reported here and future findings of the random forest approach could be used to create a meta test from the candidate seasonality tests.

REFERENCES

[1] A. Maravall, Seasonality Tests and Automatic Model Identification in TRAMO-SEATS, 2011, Mimeo, Bank of Spain.

[2] L. Breiman, Random Forests, Machine Learning 45 (2001), 5-32.

[3] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd Edition), 2009, Springer.

4

· Web view-th Fourier frequency for j=- (T-1) 2 ,…, T 2 and x denotes the largest integer not...

Documents

Transcript of · Web view-th Fourier frequency for j=- (T-1) 2 ,…, T 2 and x denotes the largest integer not...