     • date post

13-Jul-2020
• Category

## Documents

• view

3

0

Embed Size (px)

### Transcript of Bayesian perspective on QCD global analysis ... 1/17 Bayesian perspective on QCD global analysis...

• 1 / 17

Bayesian perspective on QCD global analysis

Nobuo Sato University of Connecticut/JLab DIS18, Kobe, Japan, April 16-20, 2018

In collaboration with: A. Accardi E. Nocera W. Melnitchouk

• Bayesian methodology in a nutshell

2 / 17

In QCD global analysis PDFs are parametrized at some scale Q0. e.g.

f(x) = Nxa(1− x)b(1 + c √ x+ dx+ ...)

f(x) = Nxa(1− x)bNN(x; {θ, wi})

“fitting” is essentially estimation of

E[f ] = ∫ dna P(a|data) f(a)

V[f ] = ∫ dna P(a|data) (f(a)− E[f ])2

The probability density P is given by the Bayes’ theorem

P(f |data) = 1 Z L(data|f)π(f)

a = (N, a, b, c, d, ...)

• Bayesian methodology in a nutshell

3 / 17

The likelihood function is not unique. A standard choice is the Gaussian likelihood

L(d|a) = exp [ −12

∑ i

( di − thyi(a)

δdi

)2]

Priors are design to veto unphysical regions in parameter space. e.g.

π(a) = ∏

i

θ(ai − amini )θ(amaxi − ai)

How do we compute E[f ],V[f ]? + Maximum likelihood + Monte Carlo methods

• Maximum Likelihood

4 / 17

Estimation of expectation value

E[f ] = ∫ dna P(a|data) f(a) ' f(a0)

a0 is estimated from optimization algorithm

max [P(a|data)] = P(a0|data) max [L(data|a)π(a)] = L(data|a0)π(a0)

or equivalently Chi-squared minimization

min [−2 log (L(data|a)π(a))] = −2 log (L(data|a0)π(a0))

= ∑

i

( di − thyi(a0)

δdi

)2 − 2 log (π(a0))

= χ2(a0)− 2 log (π(a0))

• Maximum Likelihood

5 / 17

Estimation of variance (Hessian method)

V[f ] = ∫ dna P(a|data) (f(a)− E[f ])2

' ∑

k

( f(tk = 1)− f(tk = −1)

2

)2 It relies on factorization of P(a|data) along eigen directions

P(a|data) ∝ ∏ k

exp ( −12 t

2 k

) +O(∆a3)

and linear approximation of f(a)

(f(a)− E[f ])2 = (∑

k

∂f

∂tk tk

)2 +O(a3)

• Maximum Likelihood

6 / 17

pros + Very practical. Most PDF groups use this method + It is computationally inexpensive + f and its eigen directions can be precalculated/tabulated

cons + Assumes local Gaussian approximation of the likelihood + Assumes linear approximation of the observables O around a0 + The assumptions are strictly valid for linear models. + Computation of the Hessian matrix is numerically unstable if flat

directions are present examples

→ if f(x) = a+ bx+ cx2 then E[f(x)] = E[a] + E[b]x+ E[c]x2

→ but f(x) = Nxa(1− x)b then E[f(x)] 6= E[N ]xE[a](1− x)E[b]

• Monte Carlo Methods

7 / 17

Recall that we are interested in computing

E[f ] = ∫ dna P(a|data) f(a)

V[f ] = ∫ dna P(a|data) (f(a)− E[f ])2

Any MC method attempts to do this using MC sampling

E[f ] ' ∑

k

wkf(ak)

V[f ] ' ∑

k

wk(f(ak)− E[f ])2

i.e to construct the sample distribution {wk,ak} of the parent distribution P(a|data)

• Monte Carlo Methods

8 / 17

Resampling + cross validation Nested Sampling (NS) Hybrid Markov chain (HMC); Gabin Gbedo, Mangin-Brinet (2017)

0 0.2 0.4 0.6 0.8 1 0

0.1

0.2

0.3

0.4 x∆u+

JAM17

JAM15

0 0.2 0.4 0.6 0.8 1

−0.15

−0.10

−0.05

0 x∆d+

0.4 0.810−3 10−2 10−1

−0.04 −0.02

0

0.02

0.04 x(∆ū + ∆d̄)

DSSV09

0.4 0.810−3 10−2 10−1

−0.04 −0.02

0

0.02

0.04

x(∆ū−∆d̄)

0.4 0.8 x

10−3 10−2 10−1

−0.04 −0.02

0

0.02

0.04 x∆s+

JAM17 + SU(3) 0.4 0.8

x 10−3 10−2 10−1

−0.1 −0.05

0

0.05

0.1 x∆s−

resampling + CV, Ethier et al (2017)

0 0.2 0.4 0.6 x –3

–2

–1

0

1 hu1

hd1

0.2 0.4 0.6 z –0.4

–0.2

0

0.2

0.4 zH ⊥(1) 1(fav)

zH ⊥(1) 1(unf)

0 0.2 0.4 δu

–1.2

–0.8

–0.4

0

δ d SIDIS

SIDIS+lattice

(a)

0 0.5 1 gT 0

2

4

6

n o rm

a li ze

d y ie

ld (b)SIDIS+lattice SIDIS

Nested Sampling, Lin et al (2018)

• Resampling+cross validation (R+CV)

9 / 17

Resample the data points within quoted uncertainties using Gaussian statistics

d (pseudo) k,i = d

(exp) i + σ

(exp) i Rk,i

Fit each pseudo data sample k = 1, .., N to obtain parameter vectors ak:

P(a|data)→ {wk = 1/N,ak}

For large number of parameters, split the data into tranining and validation sets and find ak that best describes the validation sample

sampler priors

fit

fit

fit

posteriors

original data

pseudo data

training data

fit

parameters from minimization steps

validation data

validation

posterior

as initial guess

prior

• Nested Sampling (NS)

10 / 17

The basic idea: compute

Z = ∫ L(data|a)π(a)dna =

∫ 1 0 L(X)dX

+ The procedure collects samples from isolikelihoods and they are weighted by their likelihood values

+ Insensitive to local minima → faithful conversion of

P(a|data)→ {wk,ak}

+ Multiple runs can be combined into one single run → the procedure can be parallelized

L(data|a) in a space

L(X) in X space

- arXiv:astro-ph/0508461v2 - arXiv:astro-ph/0701867v2 - arxiv.org/abs/1703.09701

• Comparison between the methods

11 / 17

Given a likelihood, does the evaluation of E[f ] and V[f ] depend on the method? → use stress testing numerical example

Setup: + Simulate a synthetic data via rejection sampling + Estimate E[f ] and V[f ] using different methods

10−2 10−1 100

x

0

1

2

3

f (x

)

10−2 10−1 100

x

0

1

2

3

f (x

)

• Comparison between the methods

12 / 17

10−3 10−2 10−1 100

x

0

1

2

3

f (x

)

NS

0.00 0.25 0.50 0.75 1.00

x

0.00

0.01

0.02

0.03

0.04

0.05

δf (x

)

HESS

NS

R

RCV(50/50)

40 50 60 70 80

tf

0.6

0.8

1.0

1.2

1.4

(δ f /f

)/ (δ f /f

) N S

x = 0.1

x = 0.3

x = 0.5

x = 0.7

HESS, NS and R provide the same uncertainty

R+CV over estimates the uncertainty by roughly a factor of 2

Uncertainties also depends on training fraction (tf)

The results confirmed also within a neural net parametrization

• Beyond gaussian likelihood

13 / 17

The Gaussian likelihoods are not adequate to describe uncertainties in the presence of incompatible data sets Example:

+ Two measurements of a quantity m: (m1, δm1), (m2, δm2)

+ The expectation value and variance can be computed exactly

E[m] = m1δm2 +m2δm1 δm22 + δm21

V[m] = δm 2 2δm

2 1

δm22 + δm21

+ note: V[m] is independent of |m1 −m2| To obtain more realistic uncertainties, the likelihood function needs to be modified. (e.g. Tolerance criterion)

• Likelihood profile in CJ15

14 / 17

−100 −50 0 50 100 0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

L ik

el ih

o o d

1

2 3

4

5

6 7

8

9

10

11

12

13

1415 16

1718

19 20 212223

24 25 26

27

28

29

30 31

3233

34

−100 −50 0 50 100 0

2000

4000

6000

8000

10000

∆ χ

2

1

2 3

4

5

67

8

910 11

12

13141516

1718

1920212223242526

27

28

29

3031323334

−10 −5 0 5 10 0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

L ik

el ih

o o d

1

2 3

5 8 12

17 18 27

28

29

30 31

34

−10 −5 0 5 10 0

20

40

60

80

100

∆ χ

2

(0) TOTAL

(1) HerF2pCut

(2) slac p

(3) d0Lasy13

(4) e866pd06xf

(5) BNS F2nd

(6) NmcRatCor

(7) slac d

(8) D0 Z

(9) H2 NC ep 3

(10) H2 NC ep 2

(11) H2 NC ep 1

(12) H2 NC ep 4

(13) CDF Wasy

(14) H2 CC ep

(15) cdfLasy05

(16) NmcF2pCor

(17) e866pp06xf

(18) H2 CC em

(19) d0run2cone

(20) d0 gamjet1

(21) CDFrun2jet

(22) d0 gamjet3

(23) d0 gamjet2

(24) d0 gamjet4

(25) jl00106F2d

(