N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of...

31
N.D.Gagunashvili University of Akureyri, I celand Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms Nikolai Gagunashvili School of Computing, University of Akureyri, Iceland [email protected]

Transcript of N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of...

Page 1: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Pearson´s χ2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

Nikolai Gagunashvili

School of Computing, University of Akureyri, Iceland

[email protected]

Page 2: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Contents

• Introduction • χ2 test for comparison two (unweighted)

histograms• Unweighted and weighted histograms comparison• Two weighted histograms comparison• Numerical example and experiments• Conclusions• References

Page 3: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Introduction

A frequently used technique in data analysis is the comparison of histograms.

First suggested by Pearson at 1904 the χ2 test of homogeneity is used widely for comparing usual (unweighted) histograms.

The modification of χ2 test for comparison of weighted and unweighted histograms was proposed at 2005 (see Proceedings of PHYSTAT2005, Oxford 2005).

This report develops the ideas presented at the PHYSTAT2005 conference.

Page 4: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

χ2 test for comparison two (unweighted) histograms

Let us consider two histograms with the same binning and the number of bins equal to r.

Let us denote: The number of events in the ith bin in the first histogram ni The number of events in the ith bin in the second histogram mi

The total number of events are equal to for the first histogram, for the second histogram.

Page 5: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

χ2 test for comparison two (unweighted) histograms

The hypothesis of homogeneity: Two histograms represent random values with identical distributions.

It is equivalent: There exist r constants,

and the probability of belonging to the ith bin for some measured value in both experiments is equal to pi.

Page 6: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

χ2 test for comparison two (unweighted) histograms

The number of events in the ith bin is a random variable with a distribution approximated by a Poisson probability distribution

for the first histogram, for the second histogram.

If the hypothesis of homogeneity is valid, then the maximum likelihood estimator of

Page 7: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

χ2 test for comparison two (unweighted) histograms

and then

Page 8: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

χ2 test for comparison two (unweighted) histograms

The comparison procedure can include an analyses of the residuals which is often helpful in identifying the bins of histogram responsible for a significant overall X2 value. Most convenient for analysis are normalized residuals

If hypotheses of homogeneity are valid then residuals ri are approximately independent and identically distributed random variables having distribution.

Page 9: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

χ2 test for comparison two (unweighted) histograms

The application of the χ2 test has restrictions related to the value of the expected frequencies Npi, Mpi, i = 1,…, r.

A conservative rule is that all the expectations must be 1 or greater for both histograms.

In practical cases when expected frequencies are notknown the estimated expected frequencies can be used.

Page 10: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparison

A simple modification of the ideas described above can be used for the comparison of the usual (unweighted) and weighted histograms.

Let us denote: The number of events in the ith bin in the unweighted histogram ni

The weight of events in the ith bin of the weighted histogram wi

The number of events in the unweighted histogram is equal to

The total weight of events in the weighted histogram is equal to

Page 11: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparison

The hypothesis of identity of an unweighted histogram to a weighted histogram: There exist r constants p1,…, pr, such that

and the probability of belonging to the ith bin for some measured value is equal to pi for the unweighted histogram

and expectation values of weights wi equal to Wpi for the weighted histogram.

Page 12: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparison

The number of events in the ith bin of unweighted histogram is a random variable with distribution approximated by the Poisson probability distribution

The weight wi is a random variable with a distribution approximated by the normal probability distribution

where σi

2 is the variance of the weight wi.

Page 13: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparison

If we replace the variance σi2 with estimate si

2 (sum of squares of weights of events in the ith bin)

and the hypothesis of identity is valid, then the maximum likelihood estimator of pi, i = 1,..,r, is

Page 14: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparison

We may then use the test statistic

and it is plausible that this has approximately a distribution

where

Page 15: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparisonThe variance zi

2 of the difference between the weight wi and the estimated expectation value of the weight is approximately equal to:

The residuals

have approximately a normal distribution with mean equal to 0 and standard deviation equal to 1

Page 16: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Unweighted and weighted histograms comparison

Restrictions

The minimal expected frequency for an unweightedhistogram must be 1.

The expected frequencies recommended for the weighted histogram is more than 25.

Page 17: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparisonLet us consider two histograms with the same binning and the number of bins equal to r.

Let us denote:

The weight of events in the ith bin of the first histogram w1i

The weight of events in the ith bin of the second histogram w2i

The total weight of events in the first histogram is equal to

The total weight of events in the second histogram is equal to

Page 18: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparisonThe hypothesis of identity of two weighted histograms:There exist r constants p1,…, pr, such that

expectation values of weights w1i equal to W1pi for the first histogram andexpectation values of weights w2i equal to W2pi for the second histogram

Page 19: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparison

Weights in both the histograms are random variables with distributions which can be approximated

by a normal probability distributionfor the first histogram

and by a normal probability distributionfor the second histogram

Here σ1i2 and σ2i

2 are the variances of w1i and w2i with estimators s1i

2 and s2i2 respectively.

Page 20: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparison

If the hypothesis of identity is valid, then the maximum likelihood and Least Square Method estimator of pi , 1,…, r, is

Page 21: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparison

We may then use the test statistic

and it is plausible that this has approximately a distribution.

Page 22: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparison

The normalized residuals

have approximately a normal distribution with mean equal to 0 and standard deviation 1.

where

Page 23: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Two weighted histograms comparison

Restriction A recommended minimal expected frequency is equal to 25 for the proposed test.

Page 24: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Numerical example and experiments

The method described herein is now illustrated with an example. We take a distribution

defined on the interval [4; 16]. Events distributed according to the formula are simulated to create the unweighted histogram. Uniformly distributed events are simulated for the weighted histogram with weights calculated by formula. Each histogram has the same number of bins: 20.

Page 25: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

An example of comparison of the unweighted histogram with 200 events and the weighted histogram with 500 events

weighted histogram

unweighted histogram

Q-Q plotresiduals

Page 26: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Numerical example and experiments

The value of the test statistic X2 is equal to 21.09 with p-value equal to 0.33, therefore the hypothesis of identity of the two histograms can be accepted.

The behavior of the normalized residuals plot and the normal Q-Q plot of residuals are regular and we cannot identify the outliers or bins with a big influence on X2.

Page 27: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Chi-square Q-Q plots of X2 statistics for two unweighted histograms with different minimal expected frequencies.

Page 28: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Chi-square Q-Q plots of X2 statistics for unweighted and weighted histograms with different minimal expected frequencies.

Page 29: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

Chi-square Q-Q plots of X2 statistics for two weighted histograms with different minimal expected frequencies.

Page 30: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

ConclusionsA test for comparing the usual (unweighted) histogram and the weighted histogram was proposed.

A test for comparing two weighted histograms was proposed.

In both cases formulas for normalized residuals were presented that can be useful for the identifications of bins that are outliers, or bins that have a big influence on X2.

The proposed in this paper approach can be generalized for acomparison of several unweighted and weighted histograms or just weighted histograms. The test statistic has approximately a distribution for s histograms with r bins.

2

2

Page 31: N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

N.D.Gagunashvili University of Akureyri, Iceland

[7] Gagunashvili, N., Comparison of weighted and unweighted histograms, arXiv:physics/0605123, 2006