AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of...

11
Applied Statistics Principle of maximum likelihood “Statistics is merely a quantisation of common sense” Troels C. Petersen (NBI) 1

Transcript of AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of...

Page 1: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Applied Statistics Principle of maximum likelihood

“Statistics is merely a quantisation of common sense”

Troels C. Petersen (NBI)

!1

Page 2: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Likelihood function

“I shall stick to the principle of likelihood…” [Plato, in Timaeus]

!2

Page 3: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Likelihood function

Given a PDF f(x) with parameter(s) θ, what is the chance that with N observations, xi falls in the intervals [xi, xi + dxi]?

L(✓) =Y

i

f(xi, ✓)dxi

!3

Page 4: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Given a set of measurements x, and parameter(s) θ, the likelihood function is defined as:

The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of parameter(s) (here θ) given some data (here x).

Likelihood function

L(x1, x2, . . . , xN ; ✓) =Y

i

p(xi, ✓)

!4

Page 5: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Given a set of measurements x, and parameter(s) θ, the likelihood function is defined as:

The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of parameter(s) (here θ) given some data (here x).

The likelihood function plays a central role in statistics, as it can be shown to be:• Consistent (converges to the right value).• Asymptotically normal (converges with Gaussian errors).• Efficient (reaches the Minimum Variance Bound (MVB, Cramer-Rao) for large N).

To some extend, this means that the likelihood function is “optimal”, that is, if it can be applied in practice.

Likelihood function

L(x1, x2, . . . , xN ; ✓) =Y

i

p(xi, ✓)

!5

V (a) � 1

< (d lnL/da)2 ><latexit sha1_base64="gCP2V2M6m7huwXnZv2F/N+Y2Ymw=">AAACE3icbZC7SgNBFIbPejfeopY2gyJEC9210UIkaGNhoWCikI3h7OxsMmR2dpmZFcKy72Djq9hYKGJrY+cb2PkKThILbz8MfPznHM6cP0gF18Z135yR0bHxicmp6dLM7Nz8Qnlxqa6TTFFWo4lI1GWAmgkuWc1wI9hlqhjGgWAXQfeoX7+4ZkrzRJ6bXsqaMbYljzhFY61WebNe8Ttociw2iN9mxI8U0twr8v1KSHwhycl2iBtXO+SgaJXX3C13IPIXvC9Yq1Y/3h8B4LRVfvXDhGYxk4YK1Lrhualp5qgMp4IVJT/TLEXaxTZrWJQYM93MBzcVZN06IYkSZZ80ZOB+n8gx1roXB7YzRtPRv2t9879aIzPRXjPnMs0Mk3S4KMoEMQnpB0RCrhg1omcBqeL2r4R20KZibIwlG4L3++S/UN/Z8iyf2TQOYagpWIFVqIAHu1CFYziFGlC4gTt4gEfn1rl3npznYeuI8zWzDD/kvHwCqGWfEQ==</latexit><latexit sha1_base64="MorciGjndgzzw/QVpv23qHtEvFs=">AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/j</latexit><latexit sha1_base64="MorciGjndgzzw/QVpv23qHtEvFs=">AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/j</latexit><latexit sha1_base64="K9SOZQo1/TD2N5nPE/zBFjQ9xi4=">AAACE3icbZC7TsMwFIYdrqXcAowsFhVSywAJCwwIIVgYGIpES6UmVCeO01p1nMh2kKoo78DCq7AwgBArCxtvg1sycPslS5/+c46Ozx+knCntOB/W1PTM7Nx8ZaG6uLS8smqvrbdVkklCWyThiewEoChngrY005x2UkkhDji9DoZn4/r1LZWKJeJKj1Lqx9AXLGIEtLF69k677g1A51A0sNen2IskkNwt8qN6iD0u8MVeCI2bfXxc9Oyas+tMhP+CW0INlWr27HcvTEgWU6EJB6W6rpNqPwepGeG0qHqZoimQIfRp16CAmCo/n9xU4G3jhDhKpHlC44n7fSKHWKlRHJjOGPRA/a6Nzf9q3UxHh37ORJppKsjXoijjWCd4HBAOmaRE85EBIJKZv2IyAJOKNjFWTQju75P/Qnt/1zV86dROTss4KmgTbaE6ctEBOkHnqIlaiKA79ICe0LN1bz1aL9brV+uUVc5soB+y3j4BfLub+g==</latexit>

Page 6: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Given a set of measurements x, and parameter(s) θ, the likelihood function is defined as:

The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of parameter(s) (here θ) given some data (here x).

The likelihood function plays a central role in statistics, as it can be shown to be:• Consistent (converges to the right value).• Asymptotically normal (converges with Gaussian errors).• Efficient (reaches the Minimum Variance Bound (MVB, Cramer-Rao) for large N).

To some extend, this means that the likelihood function is “optimal”, that is, if it can be applied in practice.

Likelihood function

L(x1, x2, . . . , xN ; ✓) =Y

i

p(xi, ✓)

!6

V (a) � 1

< (d lnL/da)2 ><latexit sha1_base64="gCP2V2M6m7huwXnZv2F/N+Y2Ymw=">AAACE3icbZC7SgNBFIbPejfeopY2gyJEC9210UIkaGNhoWCikI3h7OxsMmR2dpmZFcKy72Djq9hYKGJrY+cb2PkKThILbz8MfPznHM6cP0gF18Z135yR0bHxicmp6dLM7Nz8Qnlxqa6TTFFWo4lI1GWAmgkuWc1wI9hlqhjGgWAXQfeoX7+4ZkrzRJ6bXsqaMbYljzhFY61WebNe8Ttociw2iN9mxI8U0twr8v1KSHwhycl2iBtXO+SgaJXX3C13IPIXvC9Yq1Y/3h8B4LRVfvXDhGYxk4YK1Lrhualp5qgMp4IVJT/TLEXaxTZrWJQYM93MBzcVZN06IYkSZZ80ZOB+n8gx1roXB7YzRtPRv2t9879aIzPRXjPnMs0Mk3S4KMoEMQnpB0RCrhg1omcBqeL2r4R20KZibIwlG4L3++S/UN/Z8iyf2TQOYagpWIFVqIAHu1CFYziFGlC4gTt4gEfn1rl3npznYeuI8zWzDD/kvHwCqGWfEQ==</latexit><latexit sha1_base64="MorciGjndgzzw/QVpv23qHtEvFs=">AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/j</latexit><latexit sha1_base64="MorciGjndgzzw/QVpv23qHtEvFs=">AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/j</latexit><latexit sha1_base64="K9SOZQo1/TD2N5nPE/zBFjQ9xi4=">AAACE3icbZC7TsMwFIYdrqXcAowsFhVSywAJCwwIIVgYGIpES6UmVCeO01p1nMh2kKoo78DCq7AwgBArCxtvg1sycPslS5/+c46Ozx+knCntOB/W1PTM7Nx8ZaG6uLS8smqvrbdVkklCWyThiewEoChngrY005x2UkkhDji9DoZn4/r1LZWKJeJKj1Lqx9AXLGIEtLF69k677g1A51A0sNen2IskkNwt8qN6iD0u8MVeCI2bfXxc9Oyas+tMhP+CW0INlWr27HcvTEgWU6EJB6W6rpNqPwepGeG0qHqZoimQIfRp16CAmCo/n9xU4G3jhDhKpHlC44n7fSKHWKlRHJjOGPRA/a6Nzf9q3UxHh37ORJppKsjXoijjWCd4HBAOmaRE85EBIJKZv2IyAJOKNjFWTQju75P/Qnt/1zV86dROTss4KmgTbaE6ctEBOkHnqIlaiKA79ICe0LN1bz1aL9brV+uUVc5soB+y3j4BfLub+g==</latexit>

Barlow 5.1 (slightly rewritten)1. Add up all values and divide by N.2. Add of the first 10 values and divide by 10 - ignore the rest.3. Add up all values and divide by N-1.4. Throw away the data and give the answer 42.5. Multiply all values and take the Nth root.6. Choose the most popular/common value (the mode).7. Add the largest and smallest value and divide by 2.8. Add up every second value and divide by N/2 for N even or (N-1)/2 for N odd.

Page 7: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Likelihood vs. Chi-SquareFor computational reasons, it is often much easier to minimise the logarithm of the likelihood function:

In problems with Gaussian errors, it turns out that the likelihood function boils down to the Chi-Square with a constant offset and a factor -2 in difference.

The likelihood function for fits comes in two versions:• Binned likelihood (using Poisson) for histograms.• Unbinned likelihood (using PDF) for single values.

The “trouble” with the likelihood is, that it is unlike the Chi-Square, there is NOsimple way to obtain a probability of obtaining certain likelihood value!

@ lnL@✓

����✓=✓

= 0

!7

See Barlow 5.6

Page 8: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Binned LikelihoodThe binned likelihood is a sum over bins in a histogram:

N observedi

N expected

i

i

L(✓)binned =NbinsY

i

Poisson(N expected

i , N observedi )

!8

f(n,�) =�n

n!e�

Page 9: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Unbinned LikelihoodThe binned likelihood is a sum over single measurements:

i

PDF(xobservedi )

L(✓)unbinned =Nmeas.Y

i

PDF(xobservedi )

!9

Page 10: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

Notes on the likelihoodFor a large sample, the maximum likelihood (ML) is indeed unbiased and has the minimum variance - that is hard to beat! Also, the binned LLH approaches the unbinned version. However...

For the ML, you have to know your PDF. This is also true for the Chi-Square, but unlike for the Chi-Square, you get no goodness-of-fit measure to check it!

Also, the small statistics, the ML is not necessarily unbiased, but still fares much better than the ChiSquare! But be careful with small statistics.The way to avoid this problem is using simulation - more to follow.

!10

Page 11: AS2019 11 26 MaximumLikelihoodpetersen/Teaching/Stat2019/Week2/AS2019_11… · The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of

The likelihood ratio testNot unlike the Chi-Square, were one can compare χ2 values, the likelihood between two competing hypothesis can be compared (SAME offset constant/factor!).

While their individual LLH values do not say much, their RATIO says everything!

As with the likelihood, one often takes the logarithm and multiplies by -2 to match the Chi-Square, thus the “test statistic” becomes:

If the two hypothesis are simple (i.e. no free parameters) then the Neyman-Pearson Lemma states that this is the best possible test one can make.

If the alternative model is not simple but nested (i.e. contains the null hypothesis), this difference approximately behaves like a Chi-Square distribution with Ndof = Ndof(alternative) - Ndof(null). This is called Wilk’s Theorem.

!11