Bayesian perspective on QCD global analysis ... 1/17 Bayesian perspective on QCD global analysis...

download Bayesian perspective on QCD global analysis ... 1/17 Bayesian perspective on QCD global analysis Nobuo

of 17

  • date post

    13-Jul-2020
  • Category

    Documents

  • view

    3
  • download

    0

Embed Size (px)

Transcript of Bayesian perspective on QCD global analysis ... 1/17 Bayesian perspective on QCD global analysis...

  • 1 / 17

    Bayesian perspective on QCD global analysis

    Nobuo Sato University of Connecticut/JLab DIS18, Kobe, Japan, April 16-20, 2018

    In collaboration with: A. Accardi E. Nocera W. Melnitchouk

  • Bayesian methodology in a nutshell

    2 / 17

    In QCD global analysis PDFs are parametrized at some scale Q0. e.g.

    f(x) = Nxa(1− x)b(1 + c √ x+ dx+ ...)

    f(x) = Nxa(1− x)bNN(x; {θ, wi})

    “fitting” is essentially estimation of

    E[f ] = ∫ dna P(a|data) f(a)

    V[f ] = ∫ dna P(a|data) (f(a)− E[f ])2

    The probability density P is given by the Bayes’ theorem

    P(f |data) = 1 Z L(data|f)π(f)

    a = (N, a, b, c, d, ...)

  • Bayesian methodology in a nutshell

    3 / 17

    The likelihood function is not unique. A standard choice is the Gaussian likelihood

    L(d|a) = exp [ −12

    ∑ i

    ( di − thyi(a)

    δdi

    )2]

    Priors are design to veto unphysical regions in parameter space. e.g.

    π(a) = ∏

    i

    θ(ai − amini )θ(amaxi − ai)

    How do we compute E[f ],V[f ]? + Maximum likelihood + Monte Carlo methods

  • Maximum Likelihood

    4 / 17

    Estimation of expectation value

    E[f ] = ∫ dna P(a|data) f(a) ' f(a0)

    a0 is estimated from optimization algorithm

    max [P(a|data)] = P(a0|data) max [L(data|a)π(a)] = L(data|a0)π(a0)

    or equivalently Chi-squared minimization

    min [−2 log (L(data|a)π(a))] = −2 log (L(data|a0)π(a0))

    = ∑

    i

    ( di − thyi(a0)

    δdi

    )2 − 2 log (π(a0))

    = χ2(a0)− 2 log (π(a0))

  • Maximum Likelihood

    5 / 17

    Estimation of variance (Hessian method)

    V[f ] = ∫ dna P(a|data) (f(a)− E[f ])2

    ' ∑

    k

    ( f(tk = 1)− f(tk = −1)

    2

    )2 It relies on factorization of P(a|data) along eigen directions

    P(a|data) ∝ ∏ k

    exp ( −12 t

    2 k

    ) +O(∆a3)

    and linear approximation of f(a)

    (f(a)− E[f ])2 = (∑

    k

    ∂f

    ∂tk tk

    )2 +O(a3)

  • Maximum Likelihood

    6 / 17

    pros + Very practical. Most PDF groups use this method + It is computationally inexpensive + f and its eigen directions can be precalculated/tabulated

    cons + Assumes local Gaussian approximation of the likelihood + Assumes linear approximation of the observables O around a0 + The assumptions are strictly valid for linear models. + Computation of the Hessian matrix is numerically unstable if flat

    directions are present examples

    → if f(x) = a+ bx+ cx2 then E[f(x)] = E[a] + E[b]x+ E[c]x2

    → but f(x) = Nxa(1− x)b then E[f(x)] 6= E[N ]xE[a](1− x)E[b]

  • Monte Carlo Methods

    7 / 17

    Recall that we are interested in computing

    E[f ] = ∫ dna P(a|data) f(a)

    V[f ] = ∫ dna P(a|data) (f(a)− E[f ])2

    Any MC method attempts to do this using MC sampling

    E[f ] ' ∑

    k

    wkf(ak)

    V[f ] ' ∑

    k

    wk(f(ak)− E[f ])2

    i.e to construct the sample distribution {wk,ak} of the parent distribution P(a|data)

  • Monte Carlo Methods

    8 / 17

    Resampling + cross validation Nested Sampling (NS) Hybrid Markov chain (HMC); Gabin Gbedo, Mangin-Brinet (2017)

    0 0.2 0.4 0.6 0.8 1 0

    0.1

    0.2

    0.3

    0.4 x∆u+

    JAM17

    JAM15

    0 0.2 0.4 0.6 0.8 1

    −0.15

    −0.10

    −0.05

    0 x∆d+

    0.4 0.810−3 10−2 10−1

    −0.04 −0.02

    0

    0.02

    0.04 x(∆ū + ∆d̄)

    DSSV09

    0.4 0.810−3 10−2 10−1

    −0.04 −0.02

    0

    0.02

    0.04

    x(∆ū−∆d̄)

    0.4 0.8 x

    10−3 10−2 10−1

    −0.04 −0.02

    0

    0.02

    0.04 x∆s+

    JAM17 + SU(3) 0.4 0.8

    x 10−3 10−2 10−1

    −0.1 −0.05

    0

    0.05

    0.1 x∆s−

    resampling + CV, Ethier et al (2017)

    0 0.2 0.4 0.6 x –3

    –2

    –1

    0

    1 hu1

    hd1

    0.2 0.4 0.6 z –0.4

    –0.2

    0

    0.2

    0.4 zH ⊥(1) 1(fav)

    zH ⊥(1) 1(unf)

    0 0.2 0.4 δu

    –1.2

    –0.8

    –0.4

    0

    δ d SIDIS

    SIDIS+lattice

    (a)

    0 0.5 1 gT 0

    2

    4

    6

    n o rm

    a li ze

    d y ie

    ld (b)SIDIS+lattice SIDIS

    Nested Sampling, Lin et al (2018)

  • Resampling+cross validation (R+CV)

    9 / 17

    Resample the data points within quoted uncertainties using Gaussian statistics

    d (pseudo) k,i = d

    (exp) i + σ

    (exp) i Rk,i

    Fit each pseudo data sample k = 1, .., N to obtain parameter vectors ak:

    P(a|data)→ {wk = 1/N,ak}

    For large number of parameters, split the data into tranining and validation sets and find ak that best describes the validation sample

    sampler priors

    fit

    fit

    fit

    posteriors

    original data

    pseudo data

    training data

    fit

    parameters from minimization steps

    validation data

    validation

    posterior

    as initial guess

    prior

  • Nested Sampling (NS)

    10 / 17

    The basic idea: compute

    Z = ∫ L(data|a)π(a)dna =

    ∫ 1 0 L(X)dX

    + The procedure collects samples from isolikelihoods and they are weighted by their likelihood values

    + Insensitive to local minima → faithful conversion of

    P(a|data)→ {wk,ak}

    + Multiple runs can be combined into one single run → the procedure can be parallelized

    L(data|a) in a space

    L(X) in X space

    - arXiv:astro-ph/0508461v2 - arXiv:astro-ph/0701867v2 - arxiv.org/abs/1703.09701

  • Comparison between the methods

    11 / 17

    Given a likelihood, does the evaluation of E[f ] and V[f ] depend on the method? → use stress testing numerical example

    Setup: + Simulate a synthetic data via rejection sampling + Estimate E[f ] and V[f ] using different methods

    10−2 10−1 100

    x

    0

    1

    2

    3

    f (x

    )

    10−2 10−1 100

    x

    0

    1

    2

    3

    f (x

    )

  • Comparison between the methods

    12 / 17

    10−3 10−2 10−1 100

    x

    0

    1

    2

    3

    f (x

    )

    NS

    0.00 0.25 0.50 0.75 1.00

    x

    0.00

    0.01

    0.02

    0.03

    0.04

    0.05

    δf (x

    )

    HESS

    NS

    R

    RCV(50/50)

    40 50 60 70 80

    tf

    0.6

    0.8

    1.0

    1.2

    1.4

    (δ f /f

    )/ (δ f /f

    ) N S

    x = 0.1

    x = 0.3

    x = 0.5

    x = 0.7

    HESS, NS and R provide the same uncertainty

    R+CV over estimates the uncertainty by roughly a factor of 2

    Uncertainties also depends on training fraction (tf)

    The results confirmed also within a neural net parametrization

  • Beyond gaussian likelihood

    13 / 17

    The Gaussian likelihoods are not adequate to describe uncertainties in the presence of incompatible data sets Example:

    + Two measurements of a quantity m: (m1, δm1), (m2, δm2)

    + The expectation value and variance can be computed exactly

    E[m] = m1δm2 +m2δm1 δm22 + δm21

    V[m] = δm 2 2δm

    2 1

    δm22 + δm21

    + note: V[m] is independent of |m1 −m2| To obtain more realistic uncertainties, the likelihood function needs to be modified. (e.g. Tolerance criterion)

  • Likelihood profile in CJ15

    14 / 17

    −100 −50 0 50 100 0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    0.40

    L ik

    el ih

    o o d

    1

    2 3

    4

    5

    6 7

    8

    9

    10

    11

    12

    13

    1415 16

    1718

    19 20 212223

    24 25 26

    27

    28

    29

    30 31

    3233

    34

    −100 −50 0 50 100 0

    2000

    4000

    6000

    8000

    10000

    ∆ χ

    2

    1

    2 3

    4

    5

    67

    8

    910 11

    12

    13141516

    1718

    1920212223242526

    27

    28

    29

    3031323334

    −10 −5 0 5 10 0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    0.40

    L ik

    el ih

    o o d

    1

    2 3

    5 8 12

    17 18 27

    28

    29

    30 31

    34

    −10 −5 0 5 10 0

    20

    40

    60

    80

    100

    ∆ χ

    2

    (0) TOTAL

    (1) HerF2pCut

    (2) slac p

    (3) d0Lasy13

    (4) e866pd06xf

    (5) BNS F2nd

    (6) NmcRatCor

    (7) slac d

    (8) D0 Z

    (9) H2 NC ep 3

    (10) H2 NC ep 2

    (11) H2 NC ep 1

    (12) H2 NC ep 4

    (13) CDF Wasy

    (14) H2 CC ep

    (15) cdfLasy05

    (16) NmcF2pCor

    (17) e866pp06xf

    (18) H2 CC em

    (19) d0run2cone

    (20) d0 gamjet1

    (21) CDFrun2jet

    (22) d0 gamjet3

    (23) d0 gamjet2

    (24) d0 gamjet4

    (25) jl00106F2d

    (