adakdh1/ada/2017/ada04c.pdf · 1 N X i i=1 N ∑ € X≠X IfX i =X,then. Sample Mean : Average of...

3
Page 1 Review: Functions of Random Variables Y = y( X) dY dX = ! y ( X) Conserve probability : d(Prob) = f (Y )dY = f ( X)dX f (Y ) = f ( X) dX dY = f ( X) ! y ( X) mean value (biased) Y = y X ( ) + 1 2 !! y X ( ) σ X 2 + ... standard deviation (stretched) σ Y = σ X dy dx X= X + ... X = 100 ± 10 Y = X Y = ? σ Y = ? X Y f(Y) f(X) Y = y(X) < X > The Central Limit Theorem (a.k.a. the Law of Large Numbers) Sum up a large number N of independent random variables X i . The result resembles a Gaussian: The means and variances accumulate: But higher moments are forgotten. The original distributions f( X i ) dont matter -- all shape information is lost. This is why Gaussians are special. This is why measurements often give Gaussian error distributions. (Fast computers let us do more exact Monte Carlo analysis.) X i i=1 N G(μ, σ 2 ) as N →∞ μ = X i i=1 N σ 2 = σ 2 i=1 N ( X i ) Example: Coin Toss C =+1 if heads 1 if tails C = 0 σ C 2 = 1 S N C i i=1 N S N = C i i=1 N = 0 σ 2 S N ( ) = σ 2 C i ( ) = i=1 N N σ C 2 = N +1 1 +2 2 0 +1 1 +3 3 +2 H H 0 H T 0 T H -2 T T +1 H -1 T +3 H H H +1 H H T, H T H, T H H -1 H T T, T H T, T T H -3 T T T Coin Toss => Gaussian Uniform => Gaussian Biased Coin => Gaussian

Transcript of adakdh1/ada/2017/ada04c.pdf · 1 N X i i=1 N ∑ € X≠X IfX i =X,then. Sample Mean : Average of...

Page 1: adakdh1/ada/2017/ada04c.pdf · 1 N X i i=1 N ∑ € X≠X IfX i =X,then. Sample Mean : Average of N data points Sample Mean is a statistic. It has a probability distribution, with

Page 1

Review: Functions of Random Variables

Y = y(X) dYdX

= !y (X)

Conserve probability :d(Prob) = f (Y ) dY = f (X) dX

f (Y ) = f (X) dXdY

=f (X)!y (X)

mean value (biased)

Y = y X( )+ 12!!y X( )σ X

2 + ...

standard deviation (stretched)

σY =σ Xdydx X= X

+ ...

X =100 ±10

Y = X

Y = ? σY

= ?

X

Y f(Y)

f(X)

Y = y(X)

< X >

The Central Limit Theorem•  (a.k.a. the Law of Large Numbers)•  Sum up a large number N of independent random variables Xi .•  The result resembles a Gaussian:

•  The means and variances accumulate:

•  But higher moments are forgotten.•  The original distributions f( Xi ) don’t matter -- all shape information is lost.•  This is why Gaussians are special.•  This is why measurements often give Gaussian error distributions.•  (Fast computers let us do more exact Monte Carlo analysis.)

Xi

i=1

N

∑ →G(µ,σ 2) as N→∞

µ = Xi

i=1

N

∑ σ 2 = σ 2

i=1

N

∑ (Xi)

Example: Coin Toss

C = +1 if heads −1 if tailsC = 0 σC

2 =1

SN ≡ Cii=1

N

SN = Cii=1

N

∑ = 0

σ 2 SN( ) = σ 2 Ci( ) =i=1

N

∑ N σC2 = N

+1

−1

+2

−2

0

+1

−1

+3

−3

+2 H H 0 H T 0 T H

-2 T T

+1 H-1 T

+3 H H H+1 H H T, H T H, T H H -1 H T T, T H T, T T H

-3 T T T

Coin Toss => Gaussian

Uniform => Gaussian Biased Coin => Gaussian

Page 2: adakdh1/ada/2017/ada04c.pdf · 1 N X i i=1 N ∑ € X≠X IfX i =X,then. Sample Mean : Average of N data points Sample Mean is a statistic. It has a probability distribution, with

Page 2

Poisson => Gaussian•  Poisson distribution P(λ)

–  < X > = λ , Var(X) = λ, x = 0, 1, 2, …•  Add N independent xi values:•  Sum xi ∼ P( N λ )•  CLT ensures that for large λ, Poisson -> Gaussian:

–  P( λ ) => G ( µ, σ2 ) –  with µ = λ, σ2 = λ

Lambda = 1

0

0.1

0.2

0.3

0.4

- 5 0 5 10

No of photons detected

Pro

ba

bil

ity

Poisson

Gaussian

Lambda = 4

0

0.05

0.1

0.15

0.2

- 5 0 5 10

No of photons detected

Pro

ba

bil

ity

Poisson

Gaussian

Lambda = 16

0

0.02

0.04

0.06

0.08

0.1

-10 10 30 50

No of photons detected

Pro

ba

bil

ity

Poisson

Gaussian

Lambda = 100

0

0.01

0.02

0.03

0.04

-10 40 90 140

No of photons detected

Pro

ba

bil

ity

Poisson

Gaussian

Definition : What is a Statistic?•  Anything you measure or compute from the data.•  Any function of the data.•  Because the data “jiggle”, every statistic also “jiggles”.•  Example: the average of N data points is a statistic:

•  It has a definite value for a particular dataset.•  It has a probability distribution describing how it “jiggles”

with the ensemble of repeated datasets.

•  Note that Why?• 

X =1

NXi

i=1

N

X ≠ X

If Xi = X , then X = X .

Sample Mean : Average of N data pointsSample Mean is a statistic.

It has a probability distribution, with a mean value:

and a variance:

X ≡1

NXi

i=1

N

X =1

NXi

i

∑ =1

NXi

i

∑ =1

NXi

i

Var X( ) =Var1

NXi

i

=

1

N2Var X

i

i

=

1

N2

Var Xi( )

i

assuming Cov[ Xa , Xb ] = Var[ Xa ] δab

Sample Mean: Unbiased and lower Variance

If Xi have the same mean, < Xi > = < X > , then:

If Xi all have the same variance, Var[ Xi ] = σ2, and are uncorrelated, Cov[ Xi, Xj ] = σ2 δi j , then:

X =1

NXi

i=1

N

∑ =N X

N= X

∴ X is an unbiased estimator of X .

Var X( ) =1

N2

Var Xi( )

i

=

N σ 2

N2

=σ 2

N

∴ σ X( ) =σ

N, i.e. X " jiggles" much less

than a single data value Xi does.

Many other Unbiased Statistics•  Sample median (half points above, half below)

•  ( Xmax + Xmin ) / 2

•  Any single point Xi chosen at random from sequence

•  Weighted average:

•  Which un-biased statistic is best ? (best = minimum variance)

wiXi

i

wi

i

X uses weights wi=1

Inverse-variance weights are best!•  Variance of the weighted mean ( assume Cov[ Xi, Xj ] = σi

2 δij ) :

•  What are the optimal weights ?•  The variance of the weighted average is minimised when:

•  Let’s verify this -- it’s important!

wi=

1

Var(Xi)≡1

σi

2.

Var

wiXi

i

wi

i

=

Var wiXi

i

wi

i

2=

wi

2Var X

i[ ]i

wi

i

2=

wi

2 σi

2

i

wi

i

2

Page 3: adakdh1/ada/2017/ada04c.pdf · 1 N X i i=1 N ∑ € X≠X IfX i =X,then. Sample Mean : Average of N data points Sample Mean is a statistic. It has a probability distribution, with

Page 3

Optimising the weights•  To minimise the variance of the weighted average, set:

0 =∂

∂wk

wi

2σi

2

i

wi

i

2

=2 w

kσk

2

wi

i

2−

2 wi

2σi

2

i

wi

i

3

∂ wi

i

∂wk

=2

wi

i

2wkσk

2 −

wi

2σi

2

i

wi

i

⇒ wk

=1

σk

2.

(Note : wi

2σi

2∑ = wi∑ for w

i=1/σ

i

2)

The Optimal Average•  Good principles for constructing statistics:

–  Unbiased -> no systematic error–  Minimum variance -> smallest possible statistical error

•  Optimal (inverse-variance weighted) average:

•  Is unbiased, since:

•  And minimum variance: €

ˆ X ≡

Xi/σ

i

2

i

1/σi

2

i

∑ˆ X = X

σ 2 ˆ X ( ) =1

1/σi

2

i

N datapoints: Xi= X ±σ

i

Xi= X Cov X

i,X

j =σ i

2δij

Memorise !

Compare: Equal vs Optimal Weights•  Both are unbiased:•  Bad data spoils the Sample Mean (information lost).•  Optimal average ALWAYS improves with more data.•  Consider N = 2 :€

ˆ X = X = Xi

= X

X = X1 + X2

2 X̂ =

X1

σ12 +

X2

σ 22

1σ1

2 +1σ 2

2

Var X!" #$=

σ12 +σ 2

2

4 Var X̂!" #

$=1

1σ1

2 +1σ 2

2

Averaging with Equal Error Bars2 data points with equal error bars:

X =0 +1

2=

1

2, σ 2

(X ) =1

2+1

2

4=

1

2.

ˆ X =

0

12

+1

12

1

12

+1

12

=1

2, σ 2

( ˆ X ) =1

1

12

+1

12

=1

2.

In this case ˆ X = X since the σi are all the same.

0 ± 1

0.5 ± 0.71

1 ± 1

Averaging with Unequal Error Bars2 data points with unequal error bars:

Optimal weights retain all the information. Optimal Average always improves with new data.

X =0 +1

2=

1

2, σ 2

(X ) =1

2+ 4

2

4=

17

4.

Information lost since σ(X ) > σ(X1).

ˆ X =

0

12

+1

42

1

12

+1

42

=1

17, σ 2

( ˆ X ) =1

1

12

+1

42

=1

17 /16=

16

17.

Now σ ( ˆ X ) < σ(X1) .

0 ± 1

0.5 ± 2.06

1 ± 4

0 ± 1

0.06 ± 0.97

1 ± 4

Compare: Equal vs Optimal Weights

Equal weights: Poor data degrades the result.

Better to ignore “bad” data.Information lost.

Optimal weights: New data always improves the result.

Use ALL the data, but with appropriate 1 / Variance weights.

Must have good error bars.

ˆ X ≡

Xi/σ

i

2

i

1/σi

2

i

σ 2 ˆ X ( ) =1

1/σi

2

i

X ≡1

NX

i

i

σ 2X ( ) =

1

N2

σi

i

∑2