AS2020 11 24 MaximumLikelihood - nbi.dkpetersen/Teaching/Stat2020/Week2/AS2020_11_24... ·...

13
Applied Statistics Principle of maximum likelihood “Statistics is merely a quantisation of common sense” Troels C. Petersen (NBI) 1

Transcript of AS2020 11 24 MaximumLikelihood - nbi.dkpetersen/Teaching/Stat2020/Week2/AS2020_11_24... ·...

  • Applied Statistics Principle of maximum likelihood

    “Statistics is merely a quantisation of common sense”

    Troels C. Petersen (NBI)

    1

  • Likelihood function

    “I shall stick to the principle of likelihood…” [Plato, in Timaeus]

    2

  • Likelihood function

    Given a PDF f(x) with parameter(s) θ, what is the chance that with N observations, xi falls in the intervals [xi, xi + dxi]?

    L(✓) =Y

    i

    f(xi, ✓)dxi

    3

  • Given a set of measurements x, and parameter(s) θ, the likelihood function is defined as:

    The principle of maximum likelihood for parameter estimation consist of maximising the likelihood of parameter(s) (here θ) given some data (here x).There is nothing strange about this - it is exactly the same we do for the ChiSquare!

    The likelihood function plays a central role in statistics, as it can be shown to be:• Consistent (converges to the right value).• Asymptotically normal (converges with Gaussian errors).• Efficient (reaches the Minimum Variance Bound (MVB, Cramer-Rao) for large N).

    To some extend, this means that the likelihood function is “optimal”, that is, if it can be applied in practice.

    Likelihood function

    L(x1, x2, . . . , xN ; ✓) =Y

    i

    p(xi, ✓)

    4

    V (â) � 1< (d lnL/da)2 >

    AAACE3icbZC7SgNBFIbPejfeopY2gyJEC9210UIkaGNhoWCikI3h7OxsMmR2dpmZFcKy72Djq9hYKGJrY+cb2PkKThILbz8MfPznHM6cP0gF18Z135yR0bHxicmp6dLM7Nz8Qnlxqa6TTFFWo4lI1GWAmgkuWc1wI9hlqhjGgWAXQfeoX7+4ZkrzRJ6bXsqaMbYljzhFY61WebNe8Ttociw2iN9mxI8U0twr8v1KSHwhycl2iBtXO+SgaJXX3C13IPIXvC9Yq1Y/3h8B4LRVfvXDhGYxk4YK1Lrhualp5qgMp4IVJT/TLEXaxTZrWJQYM93MBzcVZN06IYkSZZ80ZOB+n8gx1roXB7YzRtPRv2t9879aIzPRXjPnMs0Mk3S4KMoEMQnpB0RCrhg1omcBqeL2r4R20KZibIwlG4L3++S/UN/Z8iyf2TQOYagpWIFVqIAHu1CFYziFGlC4gTt4gEfn1rl3npznYeuI8zWzDD/kvHwCqGWfEQ==AAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/jAAACE3icbZC7SgNBFIZnvRtvUcHGZvACiYXu2mghErSxsFAwMZCN4ezsbDJkdnaZmRXCsu9gY+dz2CgoYmtj5xvY+QpOEgtN/GHg4z/ncOb8XsyZ0rb9YY2Mjo1PTE5N52Zm5+YX8otLFRUlktAyiXgkqx4oypmgZc00p9VYUgg9Ti+99nG3fnlNpWKRuNCdmNZDaAoWMALaWI38VqXgtkCnkBWx26TYDSSQ1MnSg4KPXS7w6Y4PxatdfJg18uv2tt0THgbnB9ZLpa/Px5W7jbNG/t31I5KEVGjCQamaY8e6noLUjHCa5dxE0RhIG5q0ZlBASFU97d2U4U3j+DiIpHlC4577eyKFUKlO6JnOEHRLDda65n+1WqKD/XrKRJxoKkh/UZBwrCPcDQj7TFKieccAEMnMXzFpgUlFmxhzJgRn8ORhqOxuO4bPTRpHqK8ptIrWUAE5aA+V0Ak6Q2VE0A26R0/o2bq1HqwX67XfOmL9zCyjP7LevgHA6p/jAAACE3icbZC7TsMwFIYdrqXcAowsFhVSywAJCwwIIVgYGIpES6UmVCeO01p1nMh2kKoo78DCq7AwgBArCxtvg1sycPslS5/+c46Ozx+knCntOB/W1PTM7Nx8ZaG6uLS8smqvrbdVkklCWyThiewEoChngrY005x2UkkhDji9DoZn4/r1LZWKJeJKj1Lqx9AXLGIEtLF69k677g1A51A0sNen2IskkNwt8qN6iD0u8MVeCI2bfXxc9Oyas+tMhP+CW0INlWr27HcvTEgWU6EJB6W6rpNqPwepGeG0qHqZoimQIfRp16CAmCo/n9xU4G3jhDhKpHlC44n7fSKHWKlRHJjOGPRA/a6Nzf9q3UxHh37ORJppKsjXoijjWCd4HBAOmaRE85EBIJKZv2IyAJOKNjFWTQju75P/Qnt/1zV86dROTss4KmgTbaE6ctEBOkHnqIlaiKA79ICe0LN1bz1aL9brV+uUVc5soB+y3j4BfLub+g==

  • Likelihood vs. Chi-SquareFor computational reasons, it is often much easier to minimise the logarithm of the likelihood function:

    In problems with Gaussian errors, it turns out that the likelihood function boils down to the Chi-Square with a constant offset and a factor -2 in difference.

    The likelihood function for fits comes in two versions:• Binned likelihood (using Poisson) for histograms.• Unbinned likelihood (using PDF) for single values.

    The “trouble” with the likelihood is, that it is unlike the Chi-Square, there is NOsimple way to obtain a probability of obtaining certain likelihood value!

    @ lnL@✓

    ����✓=✓̄

    = 0

    5

    See Barlow 5.6

  • ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:

    N observedi

    N expectedi

    i6

    �2(✓) =NbinsX

    i

    ✓N observedi �N

    expected

    i

    �(N observedi )

    ◆2

    AAACj3icfVFdaxNBFJ1dv2qsmuqjL4NBSB4Mu6FQpVWCvuiLVDBtIZMss5O72aHzsczcLQ3L/h1/kG/+GydpHmojXhg4nHPu3Jlz80pJj0nyO4rv3X/w8NHe486T/afPnncPXpx5WzsBE2GVdRc596CkgQlKVHBROeA6V3CeX35e6+dX4Ly05geuKphpvjSykIJjoLLuTyZKOR/1GZaAfEA/UOZrncl58y1rmM7tdcNQmhXNpfFt21KmoMA+ZYXjIniC8bbL5h7cFSyC8S3dUeG6AoFrtW2Yl0vN+/+5YRCGObkscTAfZd1eMkw2RXdBugU9sq3TrPuLLayoNRgUins/TZMKZw13KIWCtsNqDxUXl3wJ0wAN1+BnzSbPlr4JzIIW1oVjkG7Y2x0N196vdB6cmmPp72pr8l/atMbi3ayRpqoRjLgZVNSKoqXr5dCFdCEgtQqACyfDW6koeUgawwo7IYT07pd3wdlomB4O338/7I0/bePYI6/Ia9InKTkiY/KFnJIJEdF+NIqOo5P4ID6KP8bjG2scbXtekr8q/voHsVnKAw==

  • ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:

    N observedi

    N expectedi

    i7

    �2(✓) =NbinsX

    i

    (N observedi �Nexpected

    i )2

    N observedi

    <latexit sha1

    _base64="m/DnbuNayyLGq0Hj

    kl/bb0LAYqI=">AAACenicfVH

    LahsxFNVMX6n7ctNlKYiYtjYl

    ZsYYmi4Kod10FVKok4DHHjTyn

    VhEj0G6E2LEfER/rbt+STddV

    ON4kcalBwSHc85F0rlFJYXDJP

    kZxXfu3rv/YOdh59HjJ0+fdZ/

    vnjhTWw4TbqSxZwVzIIWGCQqU

    cFZZYKqQcFpcfG7900uwThj9D

    VcVzBQ716IUnGGQ8u73jC/FfN

    TPcAnIBvQjzVytcjH3R7nPVGG

    ufIZCr2ghtGuahmalZdz3j9rI

    Td8UDuwlLEJkn265cFUBx9Ydz

    EeN/890k3d7yTBZg26TdEN6ZI

    PjvPsjWxheK9DIJXNumiYVzj

    yzKLiEppPVDirGL9g5TAPVTIG

    b+XV1DX0dlAUtjQ1HI12rNyc8

    U86tVBGSiuHS3fZa8V/etMbyY

    OaFrmoEza8vKmtJ0dB2D3QhbG

    hErgJh3IrwVsqXLHSLYVudUEJ

    6+8vb5GQ0TMfDD1/HvcNPmzp2

    yEuyR/okJe/JIflCjsmEcPIre

    hW9id5Gv+O9eBC/u47G0WbmBf

    kL8fgPqHvDsA==</latexit>

  • ChiSquareRecall, the ChiSquare is a sum over bins in a histogram:

    N observedi

    N expectedi

    i8

    �2(✓) =NbinsX

    i

    (N observedi �Nexpected

    i )2

    N expectedi

    <latexit sha1

    _base64="dhDUQ/T5bI+iF1d

    Fo9EZzJAdOwI=">AAACenicf

    VFdaxNBFJ1dv2r8ivVRhKFBT

    ZCG3RCwPghFX3wqFUxbyCbL7O

    RuM3Q+lpm7pWHYH+Ff881f4o

    sPzqZ5qI14YeBwzrncuecWlR

    QOk+RnFN+5e+/+g52HnUePnz

    x91n2+e+JMbTlMuJHGnhXMgRQ

    aJihQwlllgalCwmlx8bnVTy/

    BOmH0N1xVMFPsXItScIaByrv

    fM74U81E/wyUgG9CPNHO1ysX

    cH+U+U4W58hkKvaKF0K5pGpqV

    lnHfP2otN3VTOLCXsAiWfbql

    wlUFHFt1MB81/j96k3d7yTBZ

    F90G6Qb0yKaO8+6PbGF4rUAj

    l8y5aZpUOPPMouASmk5WO6gYv

    2DnMA1QMwVu5tfRNfR1YBa0N

    DY8jXTN3uzwTDm3UkVwKoZLd

    1tryX9p0xrLg5kXuqoRNL8eV

    NaSoqHtHehC2LCxXAXAuBXhr5

    QvWcgWw7U6IYT09srb4GQ0TM

    fDD1/HvcNPmzh2yEuyR/okJe

    /JIflCjsmEcPIrehW9id5Gv+

    O9eBC/u7bG0abnBfmr4vEfnEH

    DqA==</latexit>

  • Binned LikelihoodThe binned likelihood is a sum over bins in a histogram:

    N observedi

    N expectedi

    i

    L(✓)binned =NbinsY

    i

    Poisson(N expectedi , Nobservedi )

    9

    f(n,�) =�n

    n!e�

  • Unbinned LikelihoodThe binned likelihood is a sum over single measurements:

    i

    PDF(xobservedi )

    L(✓)unbinned =Nmeas.Y

    i

    PDF(xobservedi )

    10

  • Notes on the likelihoodFor a large sample, the maximum likelihood (ML) is indeed unbiased and has the minimum variance - that is hard to beat! Also, the binned LLH approaches the unbinned version. However...

    For the ML, you have to know your PDF. This is also true for the Chi-Square, but unlike for the Chi-Square, you get no goodness-of-fit measure to check it!

    Also, the small statistics, the ML is not necessarily unbiased, but still fares much better than the ChiSquare! But be careful with small statistics.The way to avoid this problem is using simulation - more to follow.

    11

  • Evaluating the likelihoodAs mentioned, it is possible to evaluate a likelihood fit (i.e. get a p-value).This is done by first fitting the data, and then (given the fit parameters) one simulates data according to these parameters, and fit these many times.

    12Finally, you see what fraction of the fits have a worse LLH. Note the Gaussian shape!

  • The likelihood ratio testNot unlike the Chi-Square, were one can compare χ2 values, the likelihood between two competing hypothesis can be compared (SAME offset constant/factor!).

    While their individual LLH values do not say much, their RATIO says everything!

    As with the likelihood, one often takes the logarithm and multiplies by -2 to match the Chi-Square, thus the “test statistic” becomes:

    If the two hypothesis are simple (i.e. no free parameters) then the Neyman-Pearson Lemma states that this is the best possible test one can make.

    If the alternative model is not simple but nested (i.e. contains the null hypothesis), this difference approximately behaves like a Chi-Square distribution with Ndof = Ndof(alternative) - Ndof(null). This is called Wilk’s Theorem.

    13