adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X...
Transcript of adakdh1/ada/2017/ada05c.pdf · Page 1 Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)X...
Page 1
Recap : Central Limit Theorem : (e.g. Coin Toss => Gaussian)
Xi = µi ±σ i Xii=1
N
∑ ⇒G µ,σ 2( ) µ = µii=1
N
∑ = 0 σ 2 = σ 2i
i=1
N
∑
Convolution of Gaussians = Gaussian
Convolution of Lorentzians = Lorentzian
Xi ~G µi,σ i( ) Xii=1
N
∑ ~G(µ,σ ) µ = µii=1
N
∑ σ 2 = σ 2i
i=1
N
∑
Xi ~ L µi,σ i( ) Xii=1
N
∑ ~ L(µ,σ ) µ = µii=1
N
∑ σ = σ ii=1
N
∑
Review: Normal vs Optimal Average
Equal weights: Poor data degrades the result.
Better to ignore “bad” data.Information lost.
Optimal weights: New data always improves the result.
Use ALL the data, but with appropriate 1 / Variance weights.
Must have good error bars.
€
ˆ X ≡
Xi/σ
i
2
i
∑
1/σi
2
i
∑
σ 2 ˆ X ( ) =1
1/σi
2
i
∑
€
X ≡1
NX
i
i
∑
σ 2X ( ) =
1
N2
σi
i
∑2
A = area under the curve, e.g. flux of the star, strength of a spectral line.
Assume (for now) zero background.Model:How to measure A ?
Simple method: Integrate the Data:
If Pi = fraction of photons in pixel i ,
Measuring a Feature
A ≡ Xii=1
N
∑
A = A Pii=1
N
∑ σ 2 A#$ %& = σ i2
i=1
N
∑
Dilemma: How many data points to include ?
€
A
€
NBiased if N too small. Noisy if N too large.
Can we do better? Yes, if the pattern P is known.
€
Pi
i=1
N
∑ =1
µi ≡ Xi = APi Cov Xi,Xj"# $%=σ i
2 δ i j
Optimal Scaling of a PatternScale the pattern Pi by factor A to fit the data.1: Construct independent unbiased estimates.2: Optimal average, with 1/σ2 weights. Ai ≡ Xi Pi unbiased: Ai = A Cov Ai,Aj
=σ i
Pi
2
δ i j
Optimal average : wi =1 Var Ai[ ] = Pi σ i( )2
A =
wi Aii
∑
wi
i
∑=
Pi
σ i
2
Xi
Pi
i
∑
Pi
σ i
2
i
∑=
Xi Pi σ i
2
i
∑
Pi2 σ i
2
i
∑
Var A =
Var Xi[ ] Pi σ i
2( )i
∑
Pi2 σ i
2
i
∑
2
2
=1
Pi2 σ i
2
i
∑
Data: Xi ±σ i
Model: µi ≡ Xi = A Pi Pattern: Pi
Sum the Data vs Optimal Scaling
€
ˆ A =
Xi
Piσ
i
2
i
∑
Pi
2
σi
2
i
∑
Var ˆ A [ ] =1
Pi
2σ
i
2
i
∑
€
A ≡ Xi
i=1
N
∑
A = A Pi
i=1
N
∑ σ 2A [ ] = σ
i
2
i=1
N
∑
€
A
€
N
€
ˆ A
€
NBiased if N too small. Noisy if N too large.
No bias.Result improves with N.
Optimal Scaling of known Pattern.Sum up the data.
Page 2
Optimal Scaling
€
Data : Xi±σ
i
Model : Xi≡ µ
i= A P
i
Optimal Scaling :
ˆ A =
Xi
Piσ
i
2
i
∑
Pi
2σ
i
2
i
∑
Var ˆ A [ ] =1
Pi
2σ
i
2
i
∑
Optimal Average is a special case of Optimal Scaling, with pattern Pi = 1.
“Golden Rule” of Optimal Data Analysis:
Memorise this result. Know how to derive it.
Fitting Models by minimising χ2
Data: Xi ±σ i i =1... N Model: Xi ≡ µi α( ) Parameters: αk k =1... M Error: εi ≡ Xi −µi α( )
Normalised Error: χ i ≡εiσ i
=Xi −µi α( )
σ i
"Badness - of -Fit" statistic:
χ 2 X,σ ,α( ) ≡ χ i2
i=1
N
∑ =Xi -µi α( )
σ i
$
%&
'
()
2
i=1
N
∑
Best-fit parameters α minimise χ 2.
€
ˆ α
€
χ 2
€
α
€
χmin
2
( BoF a.k.a. “Goodness-of-Fit” statistic ).
Example: Estimate < X > by χ2 Fitting Model: Xi = µ Cov Xi,Xj
!" #$=σ i2 δ i j
Badness-of-Fit statistic:
χ 2 =Xi −µσ i
&
'(
)
*+
2
i∑
Minimise χ 2 :∂ χ 2
∂µ= −2 Xi −µ
σ i2
i∑ = 0 at µ=µ
Xi
σ i2
i∑ =
µσ i
2i∑ ⇒ µ =
Xi /σ i2
i∑
1/σ i2
i∑
= X.
€
ˆ µ
€
χ 2
€
µ
€
χmin
2The Optimal Average minimises χ2 !
Parameter Error Bar: 1-σ at Δχ2 = 1
€
From χ 2 fit : ˆ µ = ˆ X = Optimal Average
Must have σ 2( ˆ µ ) = σ 2( ˆ X ) =1
1/σi
2
i
∑
€
∂ χ 2
∂µ= −2
Xi−µ
σi
2
i
∑
∂ 2χ 2
∂µ2= +2
1
σi
2
i
∑
χ 2= χ
min
2+
1
2
∂ 2χ 2
∂µ 2
µ = µ
(µ − µ)2+...
= χmin
2+
1
σi
2
i
∑
(µ − µ)
2+...
= χmin
2+µ − µ
σ µ( )
2
+...
∴ Δχ 2 ≡ χ 2 − χmin
2 ≈µ − µ
σ µ( )
2
=1 for µ = µ ±σ µ( )€
ˆ µ ±σ( ˆ µ )
Δχ 2= 1
€
χ 2
€
χmin
2
€
µ
“Fill with water” to depth of 1
Parameter Error Bar: 1-σ from χ2 Curvature
Δχ 2 ≡ χ 2 − χmin
2 ≈1
2
∂ 2χ 2
∂α 2
α = α
(α −α )2
=α −α
σ α( )
2
=1 for α = α ±σ α( )
∴ σ 2 α( ) =2
∂ 2χ 2
∂α 2
α = α
€
ˆ α ±σ( ˆ α )
Δχ 2= 1
€
χ 2
€
χmin
2
€
α
Exact for linear models, BoF(α) quadratic in α.
Approximate for non-linear models, BoF(α) not quadratic in α.
Test Understanding
€
χ 2
€
α
Which dataset ( 1 or 2 ) gives the better estimate of parameter α ?
How would you combine these datasets ?
From Dataset 1 From Dataset 2
0
1
2
Page 3
Scaling a Pattern by χ2 minimization
€
ˆ A =
Xi
Pi/σ
i
2
i
∑
Pi
2
/σi
2
i
∑.
Model: µi ≡ Xi = A PiBadness-of-fit:
χ 2 =Xi − A Piσ i
#
$%
&
'(
2
i∑
Minimise χ 2:
0 = ∂ χ2
∂ A= −2 (Xi − A Pi ) Pi
σ i2
i∑
⇒Xi Piσ i
2i∑ =
A Pi2
σ i2
i∑
Same result as Optimal Scaling.
€
∂ 2 χ 2
∂ A2= +2
Pi
2
σi
2
i
∑
€
σ 2 ˆ A ( ) =2
∂ 2 χ 2
∂ A2
=1
Pi
2
σi
2
i
∑
Summary• Two 1-parameter models:
– Estimating < X > :– Scaling a pattern:
• Two equivalent methods:– Algebra of Random Variables: Optimal Average and Optimal Scaling
– Minimising χ2 gives same result:
€
ˆ A =
XiP
i/σ
i
2
i
∑
Pi
2
/σi
2
i
∑σ 2 ˆ A ( ) =
1
Pi
2
/σi
2
i
∑
€
ˆ X =
Xi/σ
i
2
i
∑
1/σi
2
i
∑σ 2 ˆ X ( ) =
1
1/σi
2
i
∑
µi ≡ Xi = µ
µi ≡ Xi = A Pi
Δχ 2 ≡ χ 2 − χmin
2=α −α
σ α( )
2
+...
σ 2 α( ) =2
∂ χ 2
∂α 2
α = α
€
ˆ α ±σ( ˆ α )
Δχ 2= 1
€
χ 2
€
χmin
2
€
α