adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X...

3
Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian) X i = μ i ± σ i X i i=1 N G μ, σ 2 ( ) μ = μ i i=1 N = 0 σ 2 = σ 2 i i=1 N Convolution of Gaussians = Gaussian Convolution of Lorentzians = Lorentzian X i ~ G μ i , σ i ( ) X i i=1 N ~ G(μ, σ ) μ = μ i i=1 N σ 2 = σ 2 i i=1 N X i ~ L μ i , σ i ( ) X i i=1 N ~ L(μ, σ ) μ = μ i i=1 N σ = σ i i=1 N Review: Normal vs Optimal Average Equal weights: Poor data degrades the result. Better to ignore “bad” data. Information lost. Optimal weights: New data always improves the result. Use ALL the data, but with appropriate 1 / Variance weights. Must have good error bars. ˆ X X i /σ i 2 i 1/σ i 2 i σ 2 ˆ X ( ) = 1 1/σ i 2 i X 1 N X i i σ 2 X ( ) = 1 N 2 σ i i 2 A = area under the curve, e.g. flux of the star, strength of a spectral line. Assume (for now) zero background. Model: How to measure A ? Simple method: Integrate the Data: If P i = fraction of photons in pixel i , Measuring a Feature A X i i=1 N A = A P i i=1 N σ 2 A # $ % & = σ i 2 i=1 N Dilemma: How many data points to include ? A N Biased if N too small. Noisy if N too large. Can we do better? Yes, if the pattern P is known. P i i=1 N =1 μ i X i = AP i Cov X i , X j " # $ % = σ i 2 δ ij Optimal Scaling of a Pattern Scale the pattern P i by factor A to fit the data. 1: Construct independent unbiased estimates. 2: Optimal average, with 1/σ 2 weights. A i X i P i unbiased: A i = A Cov A i , A j = σ i P i 2 δ ij Optimal average : w i = 1 Var A i [ ] = P i σ i ( ) 2 ˆ A = w i A i i w i i = P i σ i 2 X i P i i P i σ i 2 i = X i P i σ i 2 i P i 2 σ i 2 i Var ˆ A = Var X i [ ] P i σ i 2 ( ) i P i 2 σ i 2 i 2 2 = 1 P i 2 σ i 2 i Data: X i ±σ i Model: μ i X i = AP i Pattern: P i Sum the Data vs Optimal Scaling ˆ A = X i P i σ i 2 i P i 2 σ i 2 i Var ˆ A [ ] = 1 P i 2 σ i 2 i A X i i=1 N A = A P i i=1 N σ 2 A [ ] = σ i 2 i=1 N A N ˆ A N Biased if N too small. Noisy if N too large. No bias. Result improves with N. Optimal Scaling of known Pattern. Sum up the data.

Transcript of adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X...

Page 1: adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Page 1

Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)

Xi = µi ±σ i Xii=1

N

∑ ⇒G µ,σ 2( ) µ = µii=1

N

∑ = 0 σ 2 = σ 2i

i=1

N

Convolution of Gaussians = Gaussian

Convolution of Lorentzians = Lorentzian

Xi ~G µi,σ i( ) Xii=1

N

∑ ~G(µ,σ ) µ = µii=1

N

∑ σ 2 = σ 2i

i=1

N

Xi ~ L µi,σ i( ) Xii=1

N

∑ ~ L(µ,σ ) µ = µii=1

N

∑ σ = σ ii=1

N

Review: Normal vs Optimal Average

Equal weights: Poor data degrades the result.

Better to ignore “bad” data.Information lost.

Optimal weights: New data always improves the result.

Use ALL the data, but with appropriate 1 / Variance weights.

Must have good error bars.

ˆ X ≡

Xi/σ

i

2

i

1/σi

2

i

σ 2 ˆ X ( ) =1

1/σi

2

i

X ≡1

NX

i

i

σ 2X ( ) =

1

N2

σi

i

∑2

A = area under the curve, e.g. flux of the star, strength of a spectral line.

Assume (for now) zero background.Model:How to measure A ?

Simple method: Integrate the Data:

If Pi = fraction of photons in pixel i ,

Measuring a Feature

A ≡ Xii=1

N

A = A Pii=1

N

∑ σ 2 A#$ %& = σ i2

i=1

N

Dilemma: How many data points to include ?

A

NBiased if N too small. Noisy if N too large.

Can we do better? Yes, if the pattern P is known.

Pi

i=1

N

∑ =1

µi ≡ Xi = APi Cov Xi,Xj"# $%=σ i

2 δ i j

Optimal Scaling of a PatternScale the pattern Pi by factor A to fit the data.1: Construct independent unbiased estimates.2: Optimal average, with 1/σ2 weights. Ai ≡ Xi Pi unbiased: Ai = A Cov Ai,Aj

=σ i

Pi

2

δ i j

Optimal average : wi =1 Var Ai[ ] = Pi σ i( )2

A =

wi Aii

wi

i

∑=

Pi

σ i

2

Xi

Pi

i

Pi

σ i

2

i

∑=

Xi Pi σ i

2

i

Pi2 σ i

2

i

Var A =

Var Xi[ ] Pi σ i

2( )i

Pi2 σ i

2

i

2

2

=1

Pi2 σ i

2

i

Data: Xi ±σ i

Model: µi ≡ Xi = A Pi Pattern: Pi

Sum the Data vs Optimal Scaling

ˆ A =

Xi

Piσ

i

2

i

Pi

2

σi

2

i

Var ˆ A [ ] =1

Pi

i

2

i

A ≡ Xi

i=1

N

A = A Pi

i=1

N

∑ σ 2A [ ] = σ

i

2

i=1

N

A

N

ˆ A

NBiased if N too small. Noisy if N too large.

No bias.Result improves with N.

Optimal Scaling of known Pattern.Sum up the data.

Page 2: adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Page 2

Optimal Scaling

Data : Xi±σ

i

Model : Xi≡ µ

i= A P

i

Optimal Scaling :

ˆ A =

Xi

Piσ

i

2

i

Pi

i

2

i

Var ˆ A [ ] =1

Pi

i

2

i

Optimal Average is a special case of Optimal Scaling, with pattern Pi = 1.

“Golden Rule” of Optimal Data Analysis:

Memorise this result. Know how to derive it.

Fitting Models by minimising χ2

Data: Xi ±σ i i =1... N Model: Xi ≡ µi α( ) Parameters: αk k =1... M Error: εi ≡ Xi −µi α( )

Normalised Error: χ i ≡εiσ i

=Xi −µi α( )

σ i

"Badness - of -Fit" statistic:

χ 2 X,σ ,α( ) ≡ χ i2

i=1

N

∑ =Xi -µi α( )

σ i

$

%&

'

()

2

i=1

N

Best-fit parameters α minimise χ 2.

ˆ α

χ 2

α

χmin

2

( BoF a.k.a. “Goodness-of-Fit” statistic ).

Example: Estimate < X > by χ2 Fitting Model: Xi = µ Cov Xi,Xj

!" #$=σ i2 δ i j

Badness-of-Fit statistic:

χ 2 =Xi −µσ i

&

'(

)

*+

2

i∑

Minimise χ 2 :∂ χ 2

∂µ= −2 Xi −µ

σ i2

i∑ = 0 at µ=µ

Xi

σ i2

i∑ =

µσ i

2i∑ ⇒ µ =

Xi /σ i2

i∑

1/σ i2

i∑

= X.

ˆ µ

χ 2

µ

χmin

2The Optimal Average minimises χ2 !

Parameter Error Bar: 1-σ at Δχ2 = 1

From χ 2 fit : ˆ µ = ˆ X = Optimal Average

Must have σ 2( ˆ µ ) = σ 2( ˆ X ) =1

1/σi

2

i

∂ χ 2

∂µ= −2

Xi−µ

σi

2

i

∂ 2χ 2

∂µ2= +2

1

σi

2

i

χ 2= χ

min

2+

1

2

∂ 2χ 2

∂µ 2

µ = µ

(µ − µ)2+...

= χmin

2+

1

σi

2

i

(µ − µ)

2+...

= χmin

2+µ − µ

σ µ( )

2

+...

∴ Δχ 2 ≡ χ 2 − χmin

2 ≈µ − µ

σ µ( )

2

=1 for µ = µ ±σ µ( )€

ˆ µ ±σ( ˆ µ )

Δχ 2= 1

χ 2

χmin

2

µ

“Fill with water” to depth of 1

Parameter Error Bar: 1-σ from χ2 Curvature

Δχ 2 ≡ χ 2 − χmin

2 ≈1

2

∂ 2χ 2

∂α 2

α = α

(α −α )2

=α −α

σ α( )

2

=1 for α = α ±σ α( )

∴ σ 2 α( ) =2

∂ 2χ 2

∂α 2

α = α

ˆ α ±σ( ˆ α )

Δχ 2= 1

χ 2

χmin

2

α

Exact for linear models, BoF(α) quadratic in α.

Approximate for non-linear models, BoF(α) not quadratic in α.

Test Understanding

χ 2

α

Which dataset ( 1 or 2 ) gives the better estimate of parameter α ?

How would you combine these datasets ?

From Dataset 1 From Dataset 2

0

1

2

Page 3: adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Page 3

Scaling a Pattern by χ2 minimization

ˆ A =

Xi

Pi/σ

i

2

i

Pi

2

/σi

2

i

∑.

Model: µi ≡ Xi = A PiBadness-of-fit:

χ 2 =Xi − A Piσ i

#

$%

&

'(

2

i∑

Minimise χ 2:

0 = ∂ χ2

∂ A= −2 (Xi − A Pi ) Pi

σ i2

i∑

⇒Xi Piσ i

2i∑ =

A Pi2

σ i2

i∑

Same result as Optimal Scaling.

∂ 2 χ 2

∂ A2= +2

Pi

2

σi

2

i

σ 2 ˆ A ( ) =2

∂ 2 χ 2

∂ A2

=1

Pi

2

σi

2

i

Summary•  Two 1-parameter models:

–  Estimating < X > :–  Scaling a pattern:

•  Two equivalent methods:–  Algebra of Random Variables: Optimal Average and Optimal Scaling

–  Minimising χ2 gives same result:

ˆ A =

XiP

i/σ

i

2

i

Pi

2

/σi

2

i

∑σ 2 ˆ A ( ) =

1

Pi

2

/σi

2

i

ˆ X =

Xi/σ

i

2

i

1/σi

2

i

∑σ 2 ˆ X ( ) =

1

1/σi

2

i

µi ≡ Xi = µ

µi ≡ Xi = A Pi

Δχ 2 ≡ χ 2 − χmin

2=α −α

σ α( )

2

+...

σ 2 α( ) =2

∂ χ 2

∂α 2

α = α

ˆ α ±σ( ˆ α )

Δχ 2= 1

χ 2

χmin

2

α