Practical Statistics for Particle Physicists Lecture 2

49
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő, Hungary 5 – 18 June, 2013

description

Practical Statistics for Particle Physicists Lecture 2. Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő , Hungary 5 – 18 June, 2013. Likelihood – Higgs to γγ (CMS). Example 4 : Higgs to γγ background model signal model. - PowerPoint PPT Presentation

Transcript of Practical Statistics for Particle Physicists Lecture 2

Page 1: Practical Statistics for Particle Physicists Lecture 2

Practical Statistics for Particle PhysicistsLecture 2

Harrison B. ProsperFlorida State University

European School of High-Energy PhysicsParádfürdő, Hungary

5 – 18 June, 2013

Page 2: Practical Statistics for Particle Physicists Lecture 2

Example 4: Higgs to γγ

background model

signal model

Likelihood – Higgs to γγ (CMS)

2

x =m γγ

fb(x |c, a)=Aexp[−(cx + ax2 )]

fs (x | m,w) = Gaussian(x, m , w)

p(x | s, m,w,c,b,a) =exp[−(s+ b)] sfs(xi |m ,w)+ bfb(xi |c,a)[ ]i=1

N

fb (x | c, a)dx =1∫

fs (x | m, w)dx =1∫

Page 3: Practical Statistics for Particle Physicists Lecture 2

Likelihood – Higgs to γγ (CMS)

3

7 TeV 8 TeV

Page 4: Practical Statistics for Particle Physicists Lecture 2

Outline

h Lecture 1h Descriptive Statisticsh Probability & Likelihood

h Lecture 2h The Frequentist Approach h The Bayesian Approach

h Lecture 3h Analysis Example

4

Page 5: Practical Statistics for Particle Physicists Lecture 2

The Frequentist ApproachConfidence Intervals

Page 6: Practical Statistics for Particle Physicists Lecture 2

The Frequentist Principle

The Frequentist Principle (Neyman, 1937)

Construct statements so that a fraction f ≥ p of them are guaranteed be true over an ensemble of statements.

The fraction f is called the coverage probability and p is called the confidence level (C.L.).

Note: The confidence level is a property of the ensemble to which the statements belong. Consequently, the confidence level may change if the ensemble changes.

6

Page 7: Practical Statistics for Particle Physicists Lecture 2

Confidence Intervals – 1

Consider an experiment that observes D events with expected signal s and no background.

Neyman devised a way to make statements of the form

with the guarantee that at least a fraction p of them will be true.

But, since we don’t know the true value of s, this property must hold whatever the true value of s.

7

Page 8: Practical Statistics for Particle Physicists Lecture 2

Confidence Intervals – 2

8

Observation space

Para

met

er sp

ace

For each value s find a region in the observation space with probability f ≥ CL

a L a R

Page 9: Practical Statistics for Particle Physicists Lecture 2

Confidence Intervals – 3

h Central Intervals (Neyman) Has equal probabilities on either side

h Feldman – Cousins IntervalsContains largest values of the ratios p(D | s) / p(D | D)

h Mode – Centered IntervalsContains largest probabilities p(D | s)

By construction, all these intervals satisfy the frequentist principle: coverage probability ≥ confidence level

9

Page 10: Practical Statistics for Particle Physicists Lecture 2

Confidence Intervals – 4

10

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20Intervals - Poisson Distribution

Count

Para

met

er

Central

Feldman-Cousins

Mode-Centered

D ± √D

Page 11: Practical Statistics for Particle Physicists Lecture 2

Confidence Intervals – 5

11

Central

Feldman-Cousins

Mode-Centered

D ± √D0 2 4 6 8 10 12 14 16 18 20

0

2

4

6

8

10

CentralFeldman-CousinsModeRoot(N)

Width of Intervals

Count

Inte

rval

Wid

th

Page 12: Practical Statistics for Particle Physicists Lecture 2

Confidence Intervals – 6

12

Central

Feldman-Cousins

Mode-Centered

D ± √D0 5 10 15 20

0

0.2

0.4

0.6

0.8

1

CentralFeldman-CousinsModeRoot(N)68.3%

Coverage Probability

Poisson Parameter

Prob

abili

ty

Page 13: Practical Statistics for Particle Physicists Lecture 2

The Frequentist ApproachThe Profile Likelihood

Page 14: Practical Statistics for Particle Physicists Lecture 2

Maximum Likelihood – 1

Example: Top Quark Discovery (1995), D0 ResultsD = 17 eventsB = 3.8 ± 0.6 events

whereB = Q / kδB = √Q / k

14

Page 15: Practical Statistics for Particle Physicists Lecture 2

15

Maximum Likelihood – 2

knowns:D = 17 eventsB = 3.8 ± 0.6 background events

unknowns:b expected background counts expected signal count

Find maximum likelihood estimates (MLE):

∂ln p(17 | s,b)∂s

= ∂ ln p(17 | s,b)

∂b= 0 ⇒ s, b

s = D − B, b = B

Page 16: Practical Statistics for Particle Physicists Lecture 2

16

Maximum Likelihood – 3

The Goodh Maximum likelihood estimates (MLE) are consistent:

RMS goes to zero as more and more data are acquiredh If an unbiased estimate for a parameter exists, the

maximum likelihood procedure will find ith Given the MLE for s, the MLE for y = g(s) is just

The Bad (according to some!)h In general, MLEs are biased

The Uglyh Correcting for bias, however, can waste data and

sometimes yield absurdities

y =γ(s)

Exercise 7: Show thisHint: consider a Taylorexpansion about the MLE

Page 17: Practical Statistics for Particle Physicists Lecture 2

Example: The Seriously UglyThe moment generating function of a probability

distribution P(k) is the average:

For the binomial, this is

which is useful for calculating moments

e.g., M2 = (np)2 + np – np2

17

Exercise 8a: Show this

Page 18: Practical Statistics for Particle Physicists Lecture 2

Given that k events out of n pass a set of cuts, the MLE of the event selection efficiency is

p = k / n and the obvious estimate of p2 is

k2 / n2 But

is a biased estimate of p2. The best unbiased estimate of p2 isk (k – 1) / [n (n – 1) ]

Note: for a single success in n trials, p = 1/n, but p2 = 0!

18

Exercise 8b: Show this k2 / n2 =p2 +V / n

Exercise 8c: Show this

Example: The Seriously Ugly

Page 19: Practical Statistics for Particle Physicists Lecture 2

The Profile Likelihood – 1

19

In order to make an inference about the signal, s, the 2-parameter problem,

must be reduced to one involving s only by getting rid of all nuisance parameters, such as b.

In principle, this must be done while respecting the frequentist principle: coverage prob. ≥ confidence level.

This is very difficult to do exactly.

p(D | s, b) =(s+b)D e−(s+b)

D !(bk)Qe−bk

G(Q +1)

Page 20: Practical Statistics for Particle Physicists Lecture 2

The Profile Likelihood – 2

20

In practice, we replace all nuisance parameters by their conditional maximum likelihood estimates (CMLE), whichyields a function called the profile likelihood, pPL(D | s).

In the top quark discovery example, we find an estimate of b as a function of s

Then, in the likelihood p(D | s, b), b is replaced with its estimate.

Since this is an approximation, the frequentist principle is not guaranteed to be satisfied exactly

b =f(s)

Page 21: Practical Statistics for Particle Physicists Lecture 2

The Profile Likelihood – 3

21

Wilks’ Theorem (1938)

If certain conditions are met, and pmax is the value of the likelihood p(D | s, b) at its maximum, the quantity

has a density that is asymptotically χ2. Therefore, by setting y(s) = 1 and solving for s, we can compute approximate 68% confidence intervals.

This is what Minuit (now TMinuit) has been doing for 40 years!

y(s) =−2lnpPL(D |s)

pm ax

Page 22: Practical Statistics for Particle Physicists Lecture 2

The Profile Likelihood – 4

The CMLE of b is

with s = D – Bb = B

the mode (peak) of thelikelihood

b(s) =γ+ γ2 + 4(1+ k)Qs

2(1+ k)γ=D +Q−(1+ k)s

22

Page 23: Practical Statistics for Particle Physicists Lecture 2

The Profile Likelihood – 5

By solving

for s, we can make the statement

@ 68% C.L.

23

Exercise 9: Show this

Page 24: Practical Statistics for Particle Physicists Lecture 2

The Frequentist ApproachHypothesis Tests

Page 25: Practical Statistics for Particle Physicists Lecture 2

Hypothesis Tests

25

Fisher’s Approach: Null hypothesis (H0), background-only

The null hypothesis isrejected if the p-value is judged to be small enough.

Note: this can be calculated only if p(x | H0) is known

Page 26: Practical Statistics for Particle Physicists Lecture 2

Example – Top Quark Discovery

26

p(D | H0 ) =Poisson(D |B)

Background, B = 3.8 events (ignoring uncertainty)

p-value = Poisson(D |3.8)D =17

∑ =5.7 ×10−7

D =17

D is observed count

This is equivalent to 4.9 σ

Page 27: Practical Statistics for Particle Physicists Lecture 2

Hypothesis Tests – 2

27

p(x | H0 )p(x | H1)

x

a = p(x | H0 )dxxα

Alternative hypothesis

significance (or size) of test

A fixed significance α is chosen before data are analyzed.

Neyman argued forcefully that it is necessary to consider alternative hypotheses H1

Neyman’s Approach: Null hypothesis (H0) + alternative (H1)

Page 28: Practical Statistics for Particle Physicists Lecture 2

The Neyman-Pearson Test

28

x

In Neyman’s approach,hypothesis tests area contest betweensignificance and power, i.e., the probability to accept a true alternative.

a = p(x | H0 )dxxα

∫ p = p(x |H1)dxxa

∫powersignificance of test

p(x | H0 ) p(x | H1)

Page 29: Practical Statistics for Particle Physicists Lecture 2

The Neyman-Pearson Test

29

p

a

Power curvepower vs. significance. Note: in general, no analysis is uniformly the most powerful.

a = p(x | H0 )dxxα

∫ p = p(x |H1)dxxa

∫powersignificance of test

Blue is the morepowerful belowthe cross-over pointand green is the more powerful after.

Page 30: Practical Statistics for Particle Physicists Lecture 2

The Bayesian Approach

Page 31: Practical Statistics for Particle Physicists Lecture 2

The Bayesian Approach

Definition:A method is Bayesian if 1. it is based on the degree of belief interpretation of

probability and2. it uses Bayes’ theorem

for all inferences.D observed dataθ parameter of interestω nuisance parameters π prior density

p(θ,w |D )=

p(D |θ,w)p(θ,w)p(D )

31

Page 32: Practical Statistics for Particle Physicists Lecture 2

The Bayesian Approach – 2

Bayesian analysis is just applied probability theory.

Therefore, the method for eliminating nuisance parameters is “simply” to integrate them out:

a procedure called marginalization.

The integral is a weighted average of the likelihood.

p(θ |D )= p(θ,w |D )∫ dw

∝ p(D |θ,w)p(θ,w)dw∫

32

Page 33: Practical Statistics for Particle Physicists Lecture 2

The Bayesian ApproachAn Example

Page 34: Practical Statistics for Particle Physicists Lecture 2

Example – Top Quark Discovery – 1

D0 1995 Top Discovery DataD = 17 eventsB = 3.8 ± 0.6 events

Calculations1. Compute the posterior density p(s | D)

2. Compute the Bayes factor B10 = p(D | H1) / p(D | H0).

3. Compute the p-value, including the effect of background uncertainty.

34

Page 35: Practical Statistics for Particle Physicists Lecture 2

Step 1: Construct a probability model for the observations

then put in the data

D = 17 eventsB = 3.8 ± 0.6 background events

B = Q / kδB = √Q / k

to arrive at the likelihood.

p(D | s, b) =e−(s+b)(s+ b)D

D !e−kb(kb)Q

G(Q +1)

35

Q =(B /dB)2 =40.1

k =B /dB2 =10.6

Example – Top Quark Discovery – 2

Page 36: Practical Statistics for Particle Physicists Lecture 2

Example – Top Quark Discovery – 3

Step 2: Write down Bayes’ theorem:

and specify the prior:

It is useful to compute the following marginal likelihood:

sometimes referred to as the evidence for s.

p(s, b | D) =p(D, s, b)

p(D )=p(D |s, b)p(s, b)

p(D )

36

p (s, b) = π (b | s) π (s)

p(D | s) = p(D |s,b)p(b |s)db∫

Page 37: Practical Statistics for Particle Physicists Lecture 2

Example – Top Quark Discovery – 4

The Prior: What do

andrepresent?

They encode what we know, or assume, about the mean background and signal in the absence of new observations.

We shall assume that s and b are non-negative.

After a century of argument, the consensus today is that there is no unique way to represent such vague information.

37

p (b | s)p (s)

Page 38: Practical Statistics for Particle Physicists Lecture 2

38

Example – Top Quark Discovery – 5

For simplicity, we take π(b | s) = 1.

We may now eliminate b from the problem:

where,

and where we have introduced the symbol H1 to denote the background + signal hypothesis.

p(D | s, H1) = p(0

∫ D |s,b) p(b|s)d(kb)

=1Q(1−x)2 Beta(x, r+1, Q)

r=0

D

∑ Poisson(D −r|s)

Exercise 10: Show this

x =

11+ k

, Beta(x, n, m )= G(n + m )G(n)G(m )

xn−1(1−x)m −1

Page 39: Practical Statistics for Particle Physicists Lecture 2

p(17|s, H1) as a function of the expected signal s.

39

Example – Top Quark Discovery – 6

Page 40: Practical Statistics for Particle Physicists Lecture 2

40

Example – Top Quark Discovery – 7

The posterior density

Given the marginal likelihood

we can compute

where

Page 41: Practical Statistics for Particle Physicists Lecture 2

Assuming a flat prior for the signal π (s | H1) = 1, the posterior density is given by

from which we can compute the central interval

@ 68% C.L.

Example – Top Quark Discovery – 8

41

Exercise 11: Derive an expressionfor p(s | D, H1) assuming a gammaprior Gamma(qs, U +1) for π(s | H1)

s ∈[9.9, 18.4]

Page 42: Practical Statistics for Particle Physicists Lecture 2

The Bayesian ApproachHypothesis Testing

Page 43: Practical Statistics for Particle Physicists Lecture 2

43

Bayesian Hypothesis Testing – 1

Conceptually, Bayesian hypothesis testing proceeds in exactly the same way as any other Bayesian calculation: one computes the posterior density

and marginalize it with respect to all parameters except those label the hypotheses

posterior prior

p(θ,φ,H |D )=

p(D |θ,φ,H)p(θ,φ,H)p(D )

likelihood

p(H | D) = p(θ,φ,H |D )dθ dφ∫∫

and of course get your pHD!

Page 44: Practical Statistics for Particle Physicists Lecture 2

44

Bayesian Hypothesis Testing – 2

However, just like your PhD, it is usually more convenient, and instructive, to arrive at the p(H | D) in stages.

1. Factorize the priors: p(θ, f, H) = p(θ, f | H) p(H)

2. Then, for each hypothesis, H, compute the function

3. Then, compute the probability of each hypothesis, H

( | ) ( | , , ) ( , | )p D H p D H H d dθ f p θ f θ f=∫∫

( | ) ( )( | )( | ) ( )

H

p D H Hp H Dp D H H

pp

=

Page 45: Practical Statistics for Particle Physicists Lecture 2

45

Bayesian Hypothesis Testing – 3

It is clear, however, that to compute p(H | D), it is necessary to specify the priors p(H).

Unfortunately, consensus on these numbers is highly unlikely!

Instead of asking for the probability of an hypothesis, p(H | D), we could compare probabilities:

The ratio in the first bracket is called the Bayes factor, B10.

p(H1 | D)p(H0 | D)

=p(D |H1)p(D |H0)

⎡⎣⎢

⎤⎦⎥

p(H1)p(H0)⎡⎣⎢

⎤⎦⎥

Page 46: Practical Statistics for Particle Physicists Lecture 2

In principle, in order to compute the number

twe need to specify a proper prior for the signal, that is, a prior that integrates to one.

For simplicity, let’s assume π (s | H1) = δ(s – 14).

This yields: p(D | H0 ) = 3.86 x 10-6

p(D | H1 ) = p(D |14, H1 ) = 9.28 x 10-2

Bayesian Hypothesis Testing – 4

46

p(D | H1) = p(D |s,H1)p(s|H1)ds

0

Page 47: Practical Statistics for Particle Physicists Lecture 2

Since,p(D | H0 ) = 3.86 x 10-6

p(D | H1 ) = 9.28 x 10-2

we conclude that the hypothesis with s = 14 events is favored over that with s = 0 by 24,000 to 1.

The Bayes factor can be mapped to a measure akin to “n-sigma”

Bayesian Hypothesis Testing – 5

47

Z = 2lnB10 =4.5

Exercise 12: Compute Z for the D0 results

Page 48: Practical Statistics for Particle Physicists Lecture 2

Hypothesis Testing – A Hybrid Approach

48

Background, B = 3.8 ± 0.6 events

p-value = p(D |H0 )D =17

∑ =5.4 ×10−6

D =17

D is observed count

This is equivalent to 4.4 σwhich may be compared with the 4.5 σ obtained with B10

p(D | H0 ) =p(D |s=0, H1)

=1Q(1−x)2Beta(x, D +1, Q)

Exercise 13: Verify this calculation

Page 49: Practical Statistics for Particle Physicists Lecture 2

Summary

Frequentist Approach 1) Models statistical uncertainties with probabilities2) Uses the likelihood3) Ideally, respects the frequentist principle4) In practice, eliminates nuisance parameters through the

approximate procedure of profiling

Bayesian Approach 5) Models all uncertainties with probabilities6) Uses likelihood and prior probabilities7) Eliminate nuisance parameters through marginalization.

49