Bayesian perspective on QCD global analysis...1/17 Bayesian perspective on QCD global analysis Nobuo...

1 / 17

Bayesian perspective on QCD global analysis

Nobuo SatoUniversity of Connecticut/JLabDIS18,Kobe, Japan,April 16-20, 2018

In collaboration with:A. AccardiE. NoceraW. Melnitchouk

Bayesian methodology in a nutshell

2 / 17

In QCD global analysis PDFs are parametrized at somescale Q0. e.g.

f(x) = Nxa(1− x)b(1 + c√x+ dx+ ...)

f(x) = Nxa(1− x)bNN(x; {θ, wi})

“fitting” is essentially estimation of

E[f ] =∫dna P(a|data) f(a)

V[f ] =∫dna P(a|data) (f(a)− E[f ])2

The probability density P is given by the Bayes’ theorem

P(f |data) = 1ZL(data|f)π(f)

a = (N, a, b, c, d, ...)

Bayesian methodology in a nutshell

3 / 17

The likelihood function is not unique. A standard choiceis the Gaussian likelihood

L(d|a) = exp[−1

2∑

i

(di − thyi(a)

δdi

)2]

Priors are design to veto unphysical regions in parameterspace. e.g.

π(a) =∏

i

θ(ai − amini )θ(amax

i − ai)

How do we compute E[f ],V[f ]?+ Maximum likelihood+ Monte Carlo methods

Maximum Likelihood

4 / 17

Estimation of expectation value

E[f ] =∫dna P(a|data) f(a) ' f(a0)

a0 is estimated from optimization algorithm

max [P(a|data)] = P(a0|data)max [L(data|a)π(a)] = L(data|a0)π(a0)

or equivalently Chi-squared minimization

min [−2 log (L(data|a)π(a))] = −2 log (L(data|a0)π(a0))

=∑

i

(di − thyi(a0)

δdi

)2− 2 log (π(a0))

= χ2(a0)− 2 log (π(a0))

Maximum Likelihood

5 / 17

Estimation of variance (Hessian method)


'∑

k

(f(tk = 1)− f(tk = −1)

2

)2

It relies on factorization of P(a|data) along eigendirections

P(a|data) ∝∏k

exp(−1

2 t2k

)+O(∆a3)

and linear approximation of f(a)

(f(a)− E[f ])2 =(∑

k

∂f

∂tktk

)2

+O(a3)

Maximum Likelihood

6 / 17

pros+ Very practical. Most PDF groups use this method+ It is computationally inexpensive+ f and its eigen directions can be precalculated/tabulated

cons+ Assumes local Gaussian approximation of the likelihood+ Assumes linear approximation of the observables O around a0

+ The assumptions are strictly valid for linear models.+ Computation of the Hessian matrix is numerically unstable if flat

directions are presentexamples

→ if f(x) = a+ bx+ cx2 then E[f(x)] = E[a] + E[b]x+ E[c]x2

→ but f(x) = Nxa(1− x)b then E[f(x)] 6= E[N ]xE[a](1− x)E[b]

Monte Carlo Methods

7 / 17

Recall that we are interested in computing

E[f ] =∫dna P(a|data) f(a)


Any MC method attempts to do this using MC sampling

E[f ] '∑

k

wkf(ak)

V[f ] '∑

k

wk(f(ak)− E[f ])2

i.e to construct the sample distribution {wk,ak} of theparent distribution P(a|data)

Monte Carlo Methods

8 / 17

Resampling + cross validationNested Sampling (NS)Hybrid Markov chain (HMC); Gabin Gbedo, Mangin-Brinet (2017)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4 x∆u+

JAM17

JAM15

0 0.2 0.4 0.6 0.8 1

−0.15

−0.10

−0.05

0 x∆d+

0.4 0.810−3 10−2 10−1

−0.04

−0.02

0

0.02

0.04 x(∆u + ∆d)

DSSV09

0.4 0.810−3 10−2 10−1

−0.04

−0.02

0

0.02

0.04

x(∆u−∆d)

0.4 0.8x

10−3 10−2 10−1

−0.04

−0.02

0

0.02

0.04 x∆s+

JAM17 + SU(3)0.4 0.8

x10−3 10−2 10−1

−0.1

−0.05

0

0.05

0.1 x∆s−

resampling + CV, Ethier et al (2017)

0 0.2 0.4 0.6 x–3

–2

–1

0

1 hu1

hd1

0.2 0.4 0.6 z–0.4

–0.2

0

0.2

0.4 zH⊥(1)1(fav)

zH⊥(1)1(unf)

0 0.2 0.4δu

–1.2

–0.8

–0.4

0

δd SIDIS

SIDIS+lattice

(a)

0 0.5 1 gT0

2

4

6

norm

alize

dyie

ld (b)SIDIS+lattice

SIDIS

Nested Sampling, Lin et al (2018)

Resampling+cross validation (R+CV)

9 / 17

Resample the data points withinquoted uncertainties using Gaussianstatistics

d(pseudo)k,i = d

(exp)i + σ

(exp)i Rk,i

Fit each pseudo data samplek = 1, .., N to obtain parametervectors ak:

P(a|data)→ {wk = 1/N,ak}

For large number of parameters, splitthe data into tranining and validationsets and find ak that best describesthe validation sample

sampler priors

fit

fit

fit

posteriors

original data

pseudodata

trainingdata

fit

parameters fromminimization steps

validationdata

validation

posterior

as initialguess

prior

Nested Sampling (NS)

10 / 17

The basic idea: compute

Z =∫L(data|a)π(a)dna =

∫ 1

0L(X)dX

+ The procedure collects samples fromisolikelihoods and they are weighted by theirlikelihood values

+ Insensitive to local minima → faithfulconversion of

P(a|data)→ {wk,ak}

+ Multiple runs can be combined into onesingle run → the procedure can beparallelized

L(data|a) in a space

L(X) in X space

- arXiv:astro-ph/0508461v2- arXiv:astro-ph/0701867v2- arxiv.org/abs/1703.09701

Comparison between the methods

11 / 17

Given a likelihood, does the evaluation of E[f ] and V[f ]depend on the method? → use stress testing numericalexample

Setup:+ Simulate a synthetic data via rejection sampling+ Estimate E[f ] and V[f ] using different methods

10−2 10−1 100

x

0

1

2

3

f(x

)

10−2 10−1 100

x

0

1

2

3

f(x

)

Comparison between the methods

12 / 17

10−3 10−2 10−1 100

x

0

1

2

3

f(x

)

NS

0.00 0.25 0.50 0.75 1.00

x

0.00

0.01

0.02

0.03

0.04

0.05

δf(x

)

HESS

NS

R

RCV(50/50)

40 50 60 70 80

tf

0.6

0.8

1.0

1.2

1.4

(δf/f

)/(δf/f

) NS

x = 0.1

x = 0.3

x = 0.5

x = 0.7

HESS, NS and R provide the sameuncertainty

R+CV over estimates theuncertainty by roughly a factor of 2

Uncertainties also depends ontraining fraction (tf)

The results confirmed also within aneural net parametrization

Beyond gaussian likelihood

13 / 17

The Gaussian likelihoods are not adequate to describeuncertainties in the presence of incompatible data setsExample:

+ Two measurements of a quantity m:(m1, δm1), (m2, δm2)

+ The expectation value and variance can be computedexactly

E[m] = m1δm2 +m2δm1

δm22 + δm2

1

V[m] = δm22δm

21

δm22 + δm2

1

+ note: V[m] is independent of |m1 −m2|To obtain more realistic uncertainties, the likelihoodfunction needs to be modified. (e.g. Tolerance criterion)

Likelihood profile in CJ15

14 / 17

−100 −50 0 50 1000.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Lik

elih

ood

1

23

4

5

67

8

9

10

11

12

13

141516

1718

1920 212223

2425 26

27

28

29

3031

3233

34

−100 −50 0 50 1000

2000

4000

6000

8000

10000

∆χ

2

1

23

4

5

67

8

91011

12

13141516

1718

1920212223242526

27

28

29

3031323334

−10 −5 0 5 100.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Lik

elih

ood

1

23

5 812

17 18 27

28

29

3031

34

−10 −5 0 5 100

20

40

60

80

100

∆χ

2

(0) TOTAL

(1) HerF2pCut

(2) slac p

(3) d0Lasy13

(4) e866pd06xf

(5) BNS F2nd

(6) NmcRatCor

(7) slac d

(8) D0 Z

(9) H2 NC ep 3

(10) H2 NC ep 2

(11) H2 NC ep 1

(12) H2 NC ep 4

(13) CDF Wasy

(14) H2 CC ep

(15) cdfLasy05

(16) NmcF2pCor

(17) e866pp06xf

(18) H2 CC em

(19) d0run2cone

(20) d0 gamjet1

(21) CDFrun2jet

(22) d0 gamjet3

(23) d0 gamjet2

(24) d0 gamjet4

(25) jl00106F2d

(26) HerF2dCut

(27) BcdF2dCor

(28) CDF Z

(29) D0 Wasy

(30) H2 NC em

(31) jl00106F2p

(32) d0Lasy e15

(33) BcdF2pCor−1.0

−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

Pro

ject

ion

0

1

23

4

5 6

7

8

9

10 11 12 13 14

15

16 17

18

19

20

21 22 23

(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du

(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3

24 parameters,33 data sets

Eigen directionwithoutincompatibilities

Likelihood profile in CJ15

15 / 17

−100 −50 0 50 1000.0

0.1

0.2

0.3

0.4

0.5

0.6

Lik

elih

ood

1

2

3

4

5

6

7

8

910

11

12

13

14

1516

1718

19

20

21

22

2324

25

26

2728

29

3031

32

33

34

−100 −50 0 50 1000

2000

4000

6000

8000

10000

∆χ

2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

1920

2122

232425

26

27

28

29

30

31

32

33

34

−10 −5 0 5 100.0

0.1

0.2

0.3

0.4

0.5

0.6

Lik

elih

ood

1

2

3

4

5

6

7

8

910

11

12

13

14

16

1718

19

2022

23

26

2728

29

3031

32

33

34

−10 −5 0 5 100

20

40

60

80

100

∆χ

2

(0) TOTAL

(1) HerF2pCut

(2) slac p

(3) d0Lasy13

(4) e866pd06xf

(5) BNS F2nd

(6) NmcRatCor

(7) slac d

(8) D0 Z

(9) H2 NC ep 3

(10) H2 NC ep 2

(11) H2 NC ep 1

(12) H2 NC ep 4

(13) CDF Wasy

(14) H2 CC ep

(15) cdfLasy05

(16) NmcF2pCor

(17) e866pp06xf

(18) H2 CC em

(19) d0run2cone

(20) d0 gamjet1

(21) CDFrun2jet

(22) d0 gamjet3

(23) d0 gamjet2

(24) d0 gamjet4

(25) jl00106F2d

(26) HerF2dCut

(27) BcdF2dCor

(28) CDF Z

(29) D0 Wasy

(30) H2 NC em

(31) jl00106F2p

(32) d0Lasy e15

(33) BcdF2pCor−0.8

−0.6

−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

Pro

ject

ion

0

1

2

3

4

5

6

7 89

10

1112

13 14

15

16

17

18

19

20

21

22

23

(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du

(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3

24 parameters,33 data sets

Eigen direction withincompatibilities

Modified likelihoodfunction is needed

Beyond gaussian likelihood

16 / 17

Tolerance criterion (standard choice)Disjoint likelihood function. e.g.joint:

L(m1,m2|m; δm1δm2) = L(m1|m; δm1)L(m2|m; δm2)

E[m] = m1δm2 +m2δm1δm2

2 + δm21

V[m] = δm22δm

21

δm22 + δm2

1

disjoint:

L(m1,m2|m; δm1δm2) = 12(L(m1|m; δm1) + L(m2|m; δm2))

E[m] = 12(m1 +m2) V[m] = 1

2(δm21 + δm2

2) +(m1 −m2

2

)2

Empirical Bayes, hierarchical Bayes ...Many alternatives still to be explored

Summary and outlook

17 / 17

+ Bayesian formulation for global analysis provides a more generalperspective for global fits than the traditional chi-squaredminimization

+ MC approaches are useful to explore new likelihood functions andpriors

+ Uncertainties on PDFs depend on parametrization as well asassumptions about the likelihood function and the priors

+ Given the likelihood function and priors, uncertainties on PDFsshould be independent of the parametrization in the region wherePDFs can be constrained

+ Also the results should be independent of the MC sampling method

Bayesian perspective on QCD global analysis...1/17 Bayesian perspective on QCD global analysis Nobuo...

Documents

Transcript of Bayesian perspective on QCD global analysis...1/17 Bayesian perspective on QCD global analysis Nobuo...