Bayesian perspective on QCD global analysis...1/17 Bayesian perspective on QCD global analysis Nobuo...
Transcript of Bayesian perspective on QCD global analysis...1/17 Bayesian perspective on QCD global analysis Nobuo...
1 / 17
Bayesian perspective on QCD global analysis
Nobuo SatoUniversity of Connecticut/JLabDIS18,Kobe, Japan,April 16-20, 2018
In collaboration with:A. AccardiE. NoceraW. Melnitchouk
Bayesian methodology in a nutshell
2 / 17
In QCD global analysis PDFs are parametrized at somescale Q0. e.g.
f(x) = Nxa(1− x)b(1 + c√x+ dx+ ...)
f(x) = Nxa(1− x)bNN(x; {θ, wi})
“fitting” is essentially estimation of
E[f ] =∫dna P(a|data) f(a)
V[f ] =∫dna P(a|data) (f(a)− E[f ])2
The probability density P is given by the Bayes’ theorem
P(f |data) = 1ZL(data|f)π(f)
a = (N, a, b, c, d, ...)
Bayesian methodology in a nutshell
3 / 17
The likelihood function is not unique. A standard choiceis the Gaussian likelihood
L(d|a) = exp[−1
2∑
i
(di − thyi(a)
δdi
)2]
Priors are design to veto unphysical regions in parameterspace. e.g.
π(a) =∏
i
θ(ai − amini )θ(amax
i − ai)
How do we compute E[f ],V[f ]?+ Maximum likelihood+ Monte Carlo methods
Maximum Likelihood
4 / 17
Estimation of expectation value
E[f ] =∫dna P(a|data) f(a) ' f(a0)
a0 is estimated from optimization algorithm
max [P(a|data)] = P(a0|data)max [L(data|a)π(a)] = L(data|a0)π(a0)
or equivalently Chi-squared minimization
min [−2 log (L(data|a)π(a))] = −2 log (L(data|a0)π(a0))
=∑
i
(di − thyi(a0)
δdi
)2− 2 log (π(a0))
= χ2(a0)− 2 log (π(a0))
Maximum Likelihood
5 / 17
Estimation of variance (Hessian method)
V[f ] =∫dna P(a|data) (f(a)− E[f ])2
'∑
k
(f(tk = 1)− f(tk = −1)
2
)2
It relies on factorization of P(a|data) along eigendirections
P(a|data) ∝∏k
exp(−1
2 t2k
)+O(∆a3)
and linear approximation of f(a)
(f(a)− E[f ])2 =(∑
k
∂f
∂tktk
)2
+O(a3)
Maximum Likelihood
6 / 17
pros+ Very practical. Most PDF groups use this method+ It is computationally inexpensive+ f and its eigen directions can be precalculated/tabulated
cons+ Assumes local Gaussian approximation of the likelihood+ Assumes linear approximation of the observables O around a0
+ The assumptions are strictly valid for linear models.+ Computation of the Hessian matrix is numerically unstable if flat
directions are presentexamples
→ if f(x) = a+ bx+ cx2 then E[f(x)] = E[a] + E[b]x+ E[c]x2
→ but f(x) = Nxa(1− x)b then E[f(x)] 6= E[N ]xE[a](1− x)E[b]
Monte Carlo Methods
7 / 17
Recall that we are interested in computing
E[f ] =∫dna P(a|data) f(a)
V[f ] =∫dna P(a|data) (f(a)− E[f ])2
Any MC method attempts to do this using MC sampling
E[f ] '∑
k
wkf(ak)
V[f ] '∑
k
wk(f(ak)− E[f ])2
i.e to construct the sample distribution {wk,ak} of theparent distribution P(a|data)
Monte Carlo Methods
8 / 17
Resampling + cross validationNested Sampling (NS)Hybrid Markov chain (HMC); Gabin Gbedo, Mangin-Brinet (2017)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4 x∆u+
JAM17
JAM15
0 0.2 0.4 0.6 0.8 1
−0.15
−0.10
−0.05
0 x∆d+
0.4 0.810−3 10−2 10−1
−0.04
−0.02
0
0.02
0.04 x(∆u + ∆d)
DSSV09
0.4 0.810−3 10−2 10−1
−0.04
−0.02
0
0.02
0.04
x(∆u−∆d)
0.4 0.8x
10−3 10−2 10−1
−0.04
−0.02
0
0.02
0.04 x∆s+
JAM17 + SU(3)0.4 0.8
x10−3 10−2 10−1
−0.1
−0.05
0
0.05
0.1 x∆s−
resampling + CV, Ethier et al (2017)
0 0.2 0.4 0.6 x–3
–2
–1
0
1 hu1
hd1
0.2 0.4 0.6 z–0.4
–0.2
0
0.2
0.4 zH⊥(1)1(fav)
zH⊥(1)1(unf)
0 0.2 0.4δu
–1.2
–0.8
–0.4
0
δd SIDIS
SIDIS+lattice
(a)
0 0.5 1 gT0
2
4
6
norm
alize
dyie
ld (b)SIDIS+lattice
SIDIS
Nested Sampling, Lin et al (2018)
Resampling+cross validation (R+CV)
9 / 17
Resample the data points withinquoted uncertainties using Gaussianstatistics
d(pseudo)k,i = d
(exp)i + σ
(exp)i Rk,i
Fit each pseudo data samplek = 1, .., N to obtain parametervectors ak:
P(a|data)→ {wk = 1/N,ak}
For large number of parameters, splitthe data into tranining and validationsets and find ak that best describesthe validation sample
sampler priors
fit
fit
fit
posteriors
original data
pseudodata
trainingdata
fit
parameters fromminimization steps
validationdata
validation
posterior
as initialguess
prior
Nested Sampling (NS)
10 / 17
The basic idea: compute
Z =∫L(data|a)π(a)dna =
∫ 1
0L(X)dX
+ The procedure collects samples fromisolikelihoods and they are weighted by theirlikelihood values
+ Insensitive to local minima → faithfulconversion of
P(a|data)→ {wk,ak}
+ Multiple runs can be combined into onesingle run → the procedure can beparallelized
L(data|a) in a space
L(X) in X space
- arXiv:astro-ph/0508461v2- arXiv:astro-ph/0701867v2- arxiv.org/abs/1703.09701
Comparison between the methods
11 / 17
Given a likelihood, does the evaluation of E[f ] and V[f ]depend on the method? → use stress testing numericalexample
Setup:+ Simulate a synthetic data via rejection sampling+ Estimate E[f ] and V[f ] using different methods
10−2 10−1 100
x
0
1
2
3
f(x
)
10−2 10−1 100
x
0
1
2
3
f(x
)
Comparison between the methods
12 / 17
10−3 10−2 10−1 100
x
0
1
2
3
f(x
)
NS
0.00 0.25 0.50 0.75 1.00
x
0.00
0.01
0.02
0.03
0.04
0.05
δf(x
)
HESS
NS
R
RCV(50/50)
40 50 60 70 80
tf
0.6
0.8
1.0
1.2
1.4
(δf/f
)/(δf/f
) NS
x = 0.1
x = 0.3
x = 0.5
x = 0.7
HESS, NS and R provide the sameuncertainty
R+CV over estimates theuncertainty by roughly a factor of 2
Uncertainties also depends ontraining fraction (tf)
The results confirmed also within aneural net parametrization
Beyond gaussian likelihood
13 / 17
The Gaussian likelihoods are not adequate to describeuncertainties in the presence of incompatible data setsExample:
+ Two measurements of a quantity m:(m1, δm1), (m2, δm2)
+ The expectation value and variance can be computedexactly
E[m] = m1δm2 +m2δm1
δm22 + δm2
1
V[m] = δm22δm
21
δm22 + δm2
1
+ note: V[m] is independent of |m1 −m2|To obtain more realistic uncertainties, the likelihoodfunction needs to be modified. (e.g. Tolerance criterion)
Likelihood profile in CJ15
14 / 17
−100 −50 0 50 1000.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Lik
elih
ood
1
23
4
5
67
8
9
10
11
12
13
141516
1718
1920 212223
2425 26
27
28
29
3031
3233
34
−100 −50 0 50 1000
2000
4000
6000
8000
10000
∆χ
2
1
23
4
5
67
8
91011
12
13141516
1718
1920212223242526
27
28
29
3031323334
−10 −5 0 5 100.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Lik
elih
ood
1
23
5 812
17 18 27
28
29
3031
34
−10 −5 0 5 100
20
40
60
80
100
∆χ
2
(0) TOTAL
(1) HerF2pCut
(2) slac p
(3) d0Lasy13
(4) e866pd06xf
(5) BNS F2nd
(6) NmcRatCor
(7) slac d
(8) D0 Z
(9) H2 NC ep 3
(10) H2 NC ep 2
(11) H2 NC ep 1
(12) H2 NC ep 4
(13) CDF Wasy
(14) H2 CC ep
(15) cdfLasy05
(16) NmcF2pCor
(17) e866pp06xf
(18) H2 CC em
(19) d0run2cone
(20) d0 gamjet1
(21) CDFrun2jet
(22) d0 gamjet3
(23) d0 gamjet2
(24) d0 gamjet4
(25) jl00106F2d
(26) HerF2dCut
(27) BcdF2dCor
(28) CDF Z
(29) D0 Wasy
(30) H2 NC em
(31) jl00106F2p
(32) d0Lasy e15
(33) BcdF2pCor−1.0
−0.8
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.6
Pro
ject
ion
0
1
23
4
5 6
7
8
9
10 11 12 13 14
15
16 17
18
19
20
21 22 23
(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du
(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3
24 parameters,33 data sets
Eigen directionwithoutincompatibilities
Likelihood profile in CJ15
15 / 17
−100 −50 0 50 1000.0
0.1
0.2
0.3
0.4
0.5
0.6
Lik
elih
ood
1
2
3
4
5
6
7
8
910
11
12
13
14
1516
1718
19
20
21
22
2324
25
26
2728
29
3031
32
33
34
−100 −50 0 50 1000
2000
4000
6000
8000
10000
∆χ
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1920
2122
232425
26
27
28
29
30
31
32
33
34
−10 −5 0 5 100.0
0.1
0.2
0.3
0.4
0.5
0.6
Lik
elih
ood
1
2
3
4
5
6
7
8
910
11
12
13
14
16
1718
19
2022
23
26
2728
29
3031
32
33
34
−10 −5 0 5 100
20
40
60
80
100
∆χ
2
(0) TOTAL
(1) HerF2pCut
(2) slac p
(3) d0Lasy13
(4) e866pd06xf
(5) BNS F2nd
(6) NmcRatCor
(7) slac d
(8) D0 Z
(9) H2 NC ep 3
(10) H2 NC ep 2
(11) H2 NC ep 1
(12) H2 NC ep 4
(13) CDF Wasy
(14) H2 CC ep
(15) cdfLasy05
(16) NmcF2pCor
(17) e866pp06xf
(18) H2 CC em
(19) d0run2cone
(20) d0 gamjet1
(21) CDFrun2jet
(22) d0 gamjet3
(23) d0 gamjet2
(24) d0 gamjet4
(25) jl00106F2d
(26) HerF2dCut
(27) BcdF2dCor
(28) CDF Z
(29) D0 Wasy
(30) H2 NC em
(31) jl00106F2p
(32) d0Lasy e15
(33) BcdF2pCor−0.8
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
Pro
ject
ion
0
1
2
3
4
5
6
7 89
10
1112
13 14
15
16
17
18
19
20
21
22
23
(0) a1uv(1) a2uv(2) a4uv(3) a1dv(4) a2dv(5) a3dv(6) a4dv(7) a0ud(8) a1ud(9) a2ud(10) a4ud(11) a1du
(12) a2du(13) a4du(14) a1g(15) a2g(16) a3g(17) a4g(18) a6dv(19) off1(20) off2(21) ht1(22) ht2(23) ht3
24 parameters,33 data sets
Eigen direction withincompatibilities
Modified likelihoodfunction is needed
Beyond gaussian likelihood
16 / 17
Tolerance criterion (standard choice)Disjoint likelihood function. e.g.joint:
L(m1,m2|m; δm1δm2) = L(m1|m; δm1)L(m2|m; δm2)
E[m] = m1δm2 +m2δm1δm2
2 + δm21
V[m] = δm22δm
21
δm22 + δm2
1
disjoint:
L(m1,m2|m; δm1δm2) = 12(L(m1|m; δm1) + L(m2|m; δm2))
E[m] = 12(m1 +m2) V[m] = 1
2(δm21 + δm2
2) +(m1 −m2
2
)2
Empirical Bayes, hierarchical Bayes ...Many alternatives still to be explored
Summary and outlook
17 / 17
+ Bayesian formulation for global analysis provides a more generalperspective for global fits than the traditional chi-squaredminimization
+ MC approaches are useful to explore new likelihood functions andpriors
+ Uncertainties on PDFs depend on parametrization as well asassumptions about the likelihood function and the priors
+ Given the likelihood function and priors, uncertainties on PDFsshould be independent of the parametrization in the region wherePDFs can be constrained
+ Also the results should be independent of the MC sampling method