On Clustering Financial Time Series - Beyond Correlation

download On Clustering Financial Time Series - Beyond Correlation

of 1

Embed Size (px)

Transcript of On Clustering Financial Time Series - Beyond Correlation

  • ON CLUSTERING FINANCIAL TIME SERIESGAUTIER MARTI, PHILIPPE DONNAT AND FRANK NIELSEN

    NOISY CORRELATION MATRICESLet X be the matrix storing the standardized re-turns of N = 560 assets (credit default swaps)over a period of T = 2500 trading days.

    Then, the empirical correlation matrix of the re-turns is

    C =1

    TXX>.

    We can compute the empirical density of itseigenvalues

    () =1

    N

    dn()

    d,

    where n() counts the number of eigenvalues ofC less than .

    From random matrix theory, the Marchenko-Pastur distribution gives the limit distribution asN , T and T/N fixed. It reads:

    () =T/N

    2

    (max )( min)

    ,

    where maxmin = 1 + N/T 2N/T , and

    [min, max].

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    1.6

    1.8

    (

    )

    Figure 1: Marchenko-Pastur density vs. empirical den-sity of the correlation matrix eigenvalues

    Notice that the Marchenko-Pastur density fitswell the empirical density meaning that most ofthe information contained in the empirical corre-lation matrix amounts to noise: only 26 eigenval-ues are greater than max.The highest eigenvalue corresponds to the mar-ket, the 25 others can be associated to industrialsectors.

    CLUSTERING TIME SERIESGiven a correlation matrix of the returns,

    0 100 200 300 400 5000

    100

    200

    300

    400

    500

    Figure 2: An empirical and noisy correlation matrix

    one can re-order assets using a hierarchical clus-tering algorithm to make the hierarchical correla-tion pattern blatant,

    0 100 200 300 400 5000

    100

    200

    300

    400

    500

    Figure 3: The same noisy correlation matrix re-orderedby a hierarchical clustering algorithm

    and finally filter the noise according to the corre-lation pattern:

    0 100 200 300 400 5000

    100

    200

    300

    400

    500

    Figure 4: The resulting filtered correlation matrix

    BEYOND CORRELATIONSklars Theorem. For any random vector X = (X1, . . . , XN ) having continuous marginal cumulativedistribution functions Fi, its joint cumulative distribution F is uniquely expressed as

    F (X1, . . . , XN ) = C(F1(X1), . . . , FN (XN )),

    where C, the multivariate distribution of uniform marginals, is known as the copula of X .

    Figure 5: ArcelorMittal and Socit gnrale prices are projected on dependence distribution space; notice theirheavy-tailed exponential distribution.

    Let [0, 1]. Let (X,Y ) V2. Let G = (GX , GY ), where GX and GY are respectively X and Y marginalcdf. We define the following distance

    d2(X,Y ) = d21(GX(X), GY (Y )) + (1 )d20(GX , GY ),

    where d21(GX(X), GY (Y )) = 3E[|GX(X)GY (Y )|2], and d20(GX , GY ) = 12R

    (dGXd

    dGYd

    )2d.

    CLUSTERING RESULTS & STABILITY

    0 5 10 15 20 25 30

    Standard Deviation in basis points0

    5

    10

    15

    20

    25

    30

    35

    Num

    ber

    of

    occ

    urr

    ence

    s

    Standard Deviations Histogram

    Figure 6: (Top) The returns correlation structure ap-pears more clearly using rank correlation; (Bottom)Clusters of returns distributions can be partly describedby the returns volatility

    Figure 7: Stability test on Odd/Even trading days sub-sampling: our approach (GNPR) yields more stableclusters with respect to this perturbation than standardapproaches (using Pearson correlation or L2 distances).