Download - adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Transcript
Page 1: adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Page 1

Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)

Xi = µi ±σ i Xii=1

N

∑ ⇒G µ,σ 2( ) µ = µii=1

N

∑ = 0 σ 2 = σ 2i

i=1

N

Convolution of Gaussians = Gaussian

Convolution of Lorentzians = Lorentzian

Xi ~G µi,σ i( ) Xii=1

N

∑ ~G(µ,σ ) µ = µii=1

N

∑ σ 2 = σ 2i

i=1

N

Xi ~ L µi,σ i( ) Xii=1

N

∑ ~ L(µ,σ ) µ = µii=1

N

∑ σ = σ ii=1

N

Review: Normal vs Optimal Average

Equal weights: Poor data degrades the result.

Better to ignore “bad” data.Information lost.

Optimal weights: New data always improves the result.

Use ALL the data, but with appropriate 1 / Variance weights.

Must have good error bars.

ˆ X ≡

Xi/σ

i

2

i

1/σi

2

i

σ 2 ˆ X ( ) =1

1/σi

2

i

X ≡1

NX

i

i

σ 2X ( ) =

1

N2

σi

i

∑2

A = area under the curve, e.g. flux of the star, strength of a spectral line.

Assume (for now) zero background.Model:How to measure A ?

Simple method: Integrate the Data:

If Pi = fraction of photons in pixel i ,

Measuring a Feature

A ≡ Xii=1

N

A = A Pii=1

N

∑ σ 2 A#$ %& = σ i2

i=1

N

Dilemma: How many data points to include ?

A

NBiased if N too small. Noisy if N too large.

Can we do better? Yes, if the pattern P is known.

Pi

i=1

N

∑ =1

µi ≡ Xi = APi Cov Xi,Xj"# $%=σ i

2 δ i j

Optimal Scaling of a PatternScale the pattern Pi by factor A to fit the data.1: Construct independent unbiased estimates.2: Optimal average, with 1/σ2 weights. Ai ≡ Xi Pi unbiased: Ai = A Cov Ai,Aj

=σ i

Pi

2

δ i j

Optimal average : wi =1 Var Ai[ ] = Pi σ i( )2

A =

wi Aii

wi

i

∑=

Pi

σ i

2

Xi

Pi

i

Pi

σ i

2

i

∑=

Xi Pi σ i

2

i

Pi2 σ i

2

i

Var A =

Var Xi[ ] Pi σ i

2( )i

Pi2 σ i

2

i

2

2

=1

Pi2 σ i

2

i

Data: Xi ±σ i

Model: µi ≡ Xi = A Pi Pattern: Pi

Sum the Data vs Optimal Scaling

ˆ A =

Xi

Piσ

i

2

i

Pi

2

σi

2

i

Var ˆ A [ ] =1

Pi

i

2

i

A ≡ Xi

i=1

N

A = A Pi

i=1

N

∑ σ 2A [ ] = σ

i

2

i=1

N

A

N

ˆ A

NBiased if N too small. Noisy if N too large.

No bias.Result improves with N.

Optimal Scaling of known Pattern.Sum up the data.

Page 2: adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Page 2

Optimal Scaling

Data : Xi±σ

i

Model : Xi≡ µ

i= A P

i

Optimal Scaling :

ˆ A =

Xi

Piσ

i

2

i

Pi

i

2

i

Var ˆ A [ ] =1

Pi

i

2

i

Optimal Average is a special case of Optimal Scaling, with pattern Pi = 1.

“Golden Rule” of Optimal Data Analysis:

Memorise this result. Know how to derive it.

Fitting Models by minimising χ2

Data: Xi ±σ i i =1... N Model: Xi ≡ µi α( ) Parameters: αk k =1... M Error: εi ≡ Xi −µi α( )

Normalised Error: χ i ≡εiσ i

=Xi −µi α( )

σ i

"Badness - of -Fit" statistic:

χ 2 X,σ ,α( ) ≡ χ i2

i=1

N

∑ =Xi -µi α( )

σ i

$

%&

'

()

2

i=1

N

Best-fit parameters α minimise χ 2.

ˆ α

χ 2

α

χmin

2

( BoF a.k.a. “Goodness-of-Fit” statistic ).

Example: Estimate < X > by χ2 Fitting Model: Xi = µ Cov Xi,Xj

!" #$=σ i2 δ i j

Badness-of-Fit statistic:

χ 2 =Xi −µσ i

&

'(

)

*+

2

i∑

Minimise χ 2 :∂ χ 2

∂µ= −2 Xi −µ

σ i2

i∑ = 0 at µ=µ

Xi

σ i2

i∑ =

µσ i

2i∑ ⇒ µ =

Xi /σ i2

i∑

1/σ i2

i∑

= X.

ˆ µ

χ 2

µ

χmin

2The Optimal Average minimises χ2 !

Parameter Error Bar: 1-σ at Δχ2 = 1

From χ 2 fit : ˆ µ = ˆ X = Optimal Average

Must have σ 2( ˆ µ ) = σ 2( ˆ X ) =1

1/σi

2

i

∂ χ 2

∂µ= −2

Xi−µ

σi

2

i

∂ 2χ 2

∂µ2= +2

1

σi

2

i

χ 2= χ

min

2+

1

2

∂ 2χ 2

∂µ 2

µ = µ

(µ − µ)2+...

= χmin

2+

1

σi

2

i

(µ − µ)

2+...

= χmin

2+µ − µ

σ µ( )

2

+...

∴ Δχ 2 ≡ χ 2 − χmin

2 ≈µ − µ

σ µ( )

2

=1 for µ = µ ±σ µ( )€

ˆ µ ±σ( ˆ µ )

Δχ 2= 1

χ 2

χmin

2

µ

“Fill with water” to depth of 1

Parameter Error Bar: 1-σ from χ2 Curvature

Δχ 2 ≡ χ 2 − χmin

2 ≈1

2

∂ 2χ 2

∂α 2

α = α

(α −α )2

=α −α

σ α( )

2

=1 for α = α ±σ α( )

∴ σ 2 α( ) =2

∂ 2χ 2

∂α 2

α = α

ˆ α ±σ( ˆ α )

Δχ 2= 1

χ 2

χmin

2

α

Exact for linear models, BoF(α) quadratic in α.

Approximate for non-linear models, BoF(α) not quadratic in α.

Test Understanding

χ 2

α

Which dataset ( 1 or 2 ) gives the better estimate of parameter α ?

How would you combine these datasets ?

From Dataset 1 From Dataset 2

0

1

2

Page 3: adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X i =µ i ±σ i X i i=1 N ∑⇒G(µ,σ2)µ=µ i i=1 N ∑=0σ2=σ2 i i=1 N ∑

Page 3

Scaling a Pattern by χ2 minimization

ˆ A =

Xi

Pi/σ

i

2

i

Pi

2

/σi

2

i

∑.

Model: µi ≡ Xi = A PiBadness-of-fit:

χ 2 =Xi − A Piσ i

#

$%

&

'(

2

i∑

Minimise χ 2:

0 = ∂ χ2

∂ A= −2 (Xi − A Pi ) Pi

σ i2

i∑

⇒Xi Piσ i

2i∑ =

A Pi2

σ i2

i∑

Same result as Optimal Scaling.

∂ 2 χ 2

∂ A2= +2

Pi

2

σi

2

i

σ 2 ˆ A ( ) =2

∂ 2 χ 2

∂ A2

=1

Pi

2

σi

2

i

Summary•  Two 1-parameter models:

–  Estimating < X > :–  Scaling a pattern:

•  Two equivalent methods:–  Algebra of Random Variables: Optimal Average and Optimal Scaling

–  Minimising χ2 gives same result:

ˆ A =

XiP

i/σ

i

2

i

Pi

2

/σi

2

i

∑σ 2 ˆ A ( ) =

1

Pi

2

/σi

2

i

ˆ X =

Xi/σ

i

2

i

1/σi

2

i

∑σ 2 ˆ X ( ) =

1

1/σi

2

i

µi ≡ Xi = µ

µi ≡ Xi = A Pi

Δχ 2 ≡ χ 2 − χmin

2=α −α

σ α( )

2

+...

σ 2 α( ) =2

∂ χ 2

∂α 2

α = α

ˆ α ±σ( ˆ α )

Δχ 2= 1

χ 2

χmin

2

α