9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

16
9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties Confidence Intervals and Regions Fitting a Linear Trend Orthogonal vs Correlated Parameters (KDH: START THE RECORDING!)

Transcript of 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Page 1: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

9am Thu 2021 Sep 30 -- ADA 09 Parameter UncertaintiesConfidence Intervals and RegionsFitting a Linear Trend Orthogonal vs Correlated Parameters(KDH: START THE RECORDING!)

Page 2: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Max Likelihood and Bayesian Inference

P(X |α)

α1

α2

Parameter Space

X1

X2

Data Space

α1€

α2

Model

P(α | X) =P(X |α) P(α)

P(X |α) P(α) dα∫∝L(α) P(α)

X1

X2 Specific Dataset

Likelihood Function

Bayesian Inference:€

αML

maximises L(α)

L(α) ≡ P(X |α)

+

Likelihood updates the Prior.

Posterior Probability

αMAP maximises posterior : L α( )P α( )

Page 3: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Monte-Carlo Error Propagation1. Create mock datasets.

2. Fit the model to each mock dataset.

3. Observe how the best-fit parameter values “dance”.

4. Accumulate histograms approximating the parameter probability distributions.5. Compute mean, median, variance, etc. of the parameters, or any function of the parameters.

Xi±σ

i

Xi= a t

i+ b

a

b

1b. (and/or) “Bootstrap” samples: Pick N data points at random, with replacement (some points omitted, some repeated). * Requires more data than parameters ( N > M ). * Works with no error bars available.

1a. “Jiggle” the data points (using Gaussian random numbers). * Requires good error bars.

Page 4: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10

-48

-47

-46

-45

-44

-43

-42

-41

-40

0 2 4 6 8 10

χ2

L(α) P(α)

α

α

Δ χ2=1

α

Confidence interval on a single parameter(1-parameter, k-sigma confidence interval)

The 1-σ confidence interval on α includes 68% of the area under the likelihood function:

or posterior probability distribution, for non-uniform prior P(α) :

L(α) ≡ P(X |α)∝e−χ 2 2

σi

i

P(α | X)∝L(α) P(α)

For a k-σ (1-parameter ) confidence interval, use Δχ2 = k2,

Δχ2 = 1 for 1-σ, 68% probability Δχ2 = 4 for 2-σ, 95.4% probability Δχ2 = 9 for 3-σ, 99.73% probability …

Generalise: χ2 => -2 ln( L(α) P(α) )

Page 5: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

2-parameter 1-sigma Confidence RegionIf Y is a “nuisance parameter”, use the 1-parameter 1-sigma confidence interval in X, tangent to the Δχ2 = 1 contour in (X,Y). This interval encloses 68% probability.

Δχ 2 =1

Δχ 2 = 2.30If both X and Y are of interest, use the 2-parameter 1-sigma confidence region, the Δχ2 = 2.30 contour in (X,Y). This contour encloses 68% probability.

Use -2 ln( L(α) P(α) ) instead of χ2, if needed.

Note: Contour enclosing 68% probability must be wider than the 1-parameter confidence interval.

Why?

L(α) ≡ P(X |α)∝e−χ 2 2

σi

i

Page 6: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

M - parameter k - σ Confidence Regions

The M -parameter confidence region is enclosed by the Δχ2 surface including the desired probability.

All nuisance parameters must be re-fitted (or integrated over) for each set of fixed values for the M parameters in the sub-space of interest. The Δχ2 in the M-parameter sub-space has a χ2

M distribution, with M degrees of freedom.

Δχ 2 =1

Δχ 2 = 2.30

Prob M =1 2 3 4

1−σ 68% 1 2.30 3.53 4.72

2 −σ 95.4% 4 6.17 8.02 9.70

3−σ 99.73% 9 11.8 14.2 16.3

Δχ2 thresholds for M-parameter k-σ Confidence Regions

Page 7: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Example: Estimate both µ and σL(µ,σ ) ≡ P(X |µ,σ ) = e−χ

2 /2

2π( )N /2σ N

−2ln L = Xi −µσ

⎝⎜

⎠⎟

2

i=1

N

∑ + 2N lnσ + const

0 = ∂∂µ

−2ln L⎡⎣ ⎤⎦= −2Xi −µσ 2

i=1

N

0 = ∂∂σ

−2ln L⎡⎣ ⎤⎦= −2Xi −µ( )

2

σ 3i=1

N

∑ +2Nσ

µML =1N

Xii∑ σML

2 =1N

Xi −µML( )2

i∑

Posterior ∝ Likelihood × PriorP(µ,σ | X )∝ L(µ,σ ) P(µ,σ ) Note: ML gives biased estimate for σ.

2-parameter 1,2,3-σ confidence regions:

Page 8: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Example: Estimate both µ and σ

Contours: 1,2,3-sigma 2-parameter confidence regions for µ and σ. Dashed curves: maximum-likelihood estimates for µML and σML.

True values: µ = 0 and σ = 1.

Page 9: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Fit a line to N=1 data pointFit y = a x + b to N=1 data point: Blue lines : χ2 = 0 Red lines : χ2 = 1

χ2 contours in the (a,b) plane:

Solution is degenerate, since M=2 parameters are constrained by only N=1 data point.

Bayes: prior P(a,b) needed to determine a unique solution.

a

b

χ2 = 0

χ2 = 1

χ2 = 1

Page 10: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Fit a line to N = 2 data points•  Fit y = a x + b to N = 2 data points:

–  red lines give χ2 = 2–  blue line gives χ2 = 0

•  Note that a, b are not independent.

b

aχ2 = 0

χ2 = 2

All solutions (a,b) on the red ellipsegive χ2 = 2

bx

y

y = a x + b

Page 11: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Correlated Parameters L•  Parameters a and b are correlated : (•  To find the optimal ( a, b ) we must:

–  minimize χ2 with respect to a at a sequence of fixed b values–  then minimise the resulting χ2 values with

respect to b.

•  If a and b were independent, then all slices through the χ2 surface at each fixed b would have same shape and minimum.

•  Similarly for a.

•  We could then optimize a and b independently, saving a lot of calculation

•  How to make a and b independent of each other?

b

aχ2 = 0

χ2 = 2

bx

y

y = a x + b

Page 12: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Orthogonal Parametersfor fitting a line to N = 2 data points

•  Fit •  Different parameters for same model.•  Note: a, b are now independent !

b

x

y b

a

χ2 = 0

χ2 = 2

y = a (x − ˆ x ) + b

ˆ x =x

1+ x

2

2

y = a (x − x)+ b

Page 13: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Orthogonal slope and intercept

b = y = y1 + y22

σ 2 (b) = 2σ2

4

σ (b) = σ2

ˆ a =y

2− y

1

(x2− x

1)

σ 2( ˆ a ) =

2σ 2

(x2− x

1)

2

σ( ˆ a ) = 2σ

(x2− x

1)

b

χ2

b

y = a (x − ˆ x ) + b

ˆ x =x

1+ x

2

2

b

a

χ2 = 0

χ2 = 2

b

x

y

a

χ2

a

Corresponds to Δχ2 = 1.

Analysis using the algebra of random variables:

Page 14: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Orthogonal vs Correlated Parameters

b

χ2

ba

χ2

a

y = a (x − ˆ x ) + bb

a

χ2 = 0

χ2 = 2

Orthogonal Parameters J

y = a x + b

Correlated Parameters L

b

a

χ2 = 0χ2 = 2

a

χ2

a

b

χ2

b

For each a, a different b minimises χ2.

For each b, a different a minimises χ2.For any a, the same b minimises χ2.

For any b, the same a minimises χ2.

Page 15: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Fit a line to N data points•  If we use

then a, b are correlated.•  Make a, b orthogonal:

•  Intercept: Set a = 0 and optimise b: optimal average: •  Slope: Set b = 0 and optimise a: optimal scaling of pattern:

y = a x + b

ˆ x =x

i

2∑1 σ

i

2∑

y = a (x − ˆ x ) + b

ˆ b = ˆ y =yi σ i

2∑1 σ i

2∑, Var ˆ b [ ] =

1

1 σ i

2∑

a =yi xi − x( ) σ i

2∑xi − x( )2 σ i

2∑, Var a[ ] = 1

xi − x( )2 σ i2∑

ˆ x

ˆ y

Pi = xi − x

( to be proven later)

Page 16: 9am Thu 2021 Sep 30 -- ADA 09 Parameter Uncertainties ...

Choose Orthogonal Parameters•  Good practice (when possible). •  Results for any one parameter don’t depend on

values of other parameters.•  Example: fit a gaussian profile.

2 fit parameters:–  Width, w–  Area or peak value. Which is best?

f (x) = Pe−1

2

x−x0w

2

g(x) =A

w 2πe−1

2

x− x0w

2

Area is (more nearly) independentof width – good

Peak value depends on width – bad

PA