Optimisation of the B K ˇ selection at LHCb€¦ · photon in the kaon resonance rest frame...

57
Optimisation of the B + K + π - π + γ selection at LHCb Cyrille Praz Master’s thesis Directed by Prof. Dr. Olivier Schneider Supervised by Dr. Preema Rennee Pais 22.06.2018 Abstract The rare b radiative electroweak transition is a powerful probe of physics beyond the Standard Model. This document presents a selection of B ± K ± π π ± γ candidates in LHCb Run 2 data samples collected at a centre-of-mass energy of 13 TeV. The selection uses a cut-based strategy followed by a multivariate analysis and a characterisation of the signal and background sources. Approximately 3’000, 18’000 and 18’000 B ± K ± π π ± γ decays are selected in 2015, 2016 and 2017 data samples corresponding to integrated luminosities of 0.29, 1.64 and 1.71 fb -1 , respectively.

Transcript of Optimisation of the B K ˇ selection at LHCb€¦ · photon in the kaon resonance rest frame...

Optimisation of the B+ → K+π−π+γ selection at LHCb

Cyrille Praz

Master’s thesis

Directed by Prof. Dr. Olivier Schneider

Supervised by Dr. Preema Rennee Pais

22.06.2018

Abstract

The rare b → sγ radiative electroweak transition is a powerful probe of physics

beyond the Standard Model. This document presents a selection of B± → K±π∓π±γ

candidates in LHCb Run 2 data samples collected at a centre-of-mass energy of 13 TeV.

The selection uses a cut-based strategy followed by a multivariate analysis and a

characterisation of the signal and background sources. Approximately 3’000, 18’000

and 18’000 B± → K±π∓π±γ decays are selected in 2015, 2016 and 2017 data samples

corresponding to integrated luminosities of 0.29, 1.64 and 1.71 fb−1, respectively.

Contents

1 Introduction 3

2 Theoretical background 4

2.1 The Standard Model of particle physics . . . . . . . . . . . . . . . . . . . . 4

2.2 Radiative B decays in the Standard Model . . . . . . . . . . . . . . . . . . 6

2.3 Measurement of the photon polarisation parameter . . . . . . . . . . . . . . 7

3 The LHCb experiment 9

4 Optimisation of the B+ → K+π−π+γ selection 14

4.1 Data samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 Stripping selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3 Trigger lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 Cut-based strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.5 Multivariate analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.5.1 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.5.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5.3 Results and choice of the final cut . . . . . . . . . . . . . . . . . . . 28

5 Study of the B+ → K+π−π+γ signal 32

5.1 Signal study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2 Background study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.2.1 Combinatorial background . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2.2 Partially reconstructed b-hadron background . . . . . . . . . . . . . 33

5.2.3 Peaking backgrounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3 Mass fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Conclusion and outlook 42

A Appendix 43

A.1 Uncertainty on efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A.2 Background coming from B+ → D0ρ+ decays . . . . . . . . . . . . . . . . . 44

A.3 2015 and 2017 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2

1 Introduction

In the last decades, particle physics experiments involving hundreds or even thousands

of scientists have drastically improved our understanding of how Nature works. So far,

the vast majority of results obtained are compatible with the predictions of the Standard

Model of particle physics (SM), which describes with high precision the elementary par-

ticles and their fundamental interactions. However, one knows that the SM is not yet the

theory of everything. In particular, large scale phenomena such as gravity and dark matter

are not included in the SM. In the latter case, whereas many observations have already

provided evidence for the existence of dark matter [1–3], its nature remains unknown.

Many extensions of the SM have been developed, and one of the roles of experimental

particle physics is to test and constrain these new models. The photon polarisation in

the rare b → sγ transition is very sensitive to potential New Physics (NP) effects: in the

SM, this transition is not allowed at tree level and it implies that the photon is predicted

to be mostly left-handed, because the W boson appearing in the electroweak penguin

loop couples only to a left-handed s quark. New particles could appear at loop level and

enhance the right-handed component of the photon polarisation.

By studying the rare B+ → K+res(→ K+π−π+)γ decay, which has three pseudoscalar

mesons in its final state, one can access the polarisation of the photon by using its direction

with respect to the plane defined by the momenta of the three mesons in the rest frame

of the kaon resonance [4–6]. Several experimental studies of this decay have already been

conducted with data collected during the first run (Run 1) of the LHC [7–9] resulting in

the first observation of a non-zero photon polarisation [10].

This thesis documents the selection of B+ → K+π−π+γ candidates1 in data samples

collected with the LHCb detector at a centre-of-mass energy of 13 TeV in 2015, 2016 and

2017, corresponding to integrated luminosities of 0.29, 1.64 and 1.71 fb−1 respectively.

Section 2 gives a theoretical background, Sec. 3 describes briefly the main components

of the LHCb detector, Sec. 4 explains the cut-based and multivariate analysis strategies

used to select the signal decays and Sec. 5 presents a study of the signal and background

sources and the obtained results. Several strategies presented in this study are based on

what was done for Run 1 data [7–11].

1Unless explicitly stated otherwise, the charge-conjugate process is implied throughout this document.

3

2 Theoretical background

This section explains the theoretical motivation to measure the photon polarisation in the

rare b → sγ transition. Sections 2.1 and 2.2 present a quick overview of the Standard

Model of particle physics and its prediction about the photon polarisation, and Sec. 2.3

introduces two methods to investigate the photon polarisation.

2.1 The Standard Model of particle physics

The Standard Model of particle physics (SM) unifies in a single framework most of the

current knowledge that we have about the fundamental particles and their interactions.

As a relativistic quantum field theory, the SM is based on the symmetry group

SU(3)C ⊗ SU(2)L ⊗ U(1)YW , (1)

where C stands for the color charge, L the left chirality and YW the weak hypercharge.

SU(3)C is the symmetry group of the strong interaction, described by Quantum Chromo-

dynamics (QCD), whereas SU(2)L ⊗ U(1)YW is the symmetry group of the Electroweak

theory (EW), which describes the electromagnetic and the weak interactions [12, 13]. Fig-

ure 1 summarises the 17 elementary particles2 of the SM. The matter particles (fermions)

are 6 quarks and 6 leptons divided into 3 generations. In addition, 5 bosons mediate the

fundamental interactions.

As stated in the introduction, the weak interaction is of particular interest in this

study, because it allows for the b→ sγ transition. The charged-current weak interaction,

mediated by the W boson, acts only on the left-handed weak doublets, listed in Table 1.

No transition is possible between the generations of leptons. However, in the quark sector,

the mass eigenstates do not coincide with the weak eigenstates, allowing for transitions

between the generations. The weak eigenstates d′, s′, b′ can be obtained from the mass

eigenstates d, s, b through the Cabibbo-Kobayashi-Maskawa (CKM) matrix [15]:d′

s′

b′

=

Vud Vus Vub

Vcd Vcs Vcb

Vtd Vts Vtb

d

s

b

. (2)

In the SM, the neutral mediator of the weak interaction, Z0, does not couple different

generations of quarks. In particular, this implies that the b→ sγ transition is not possible

at the tree level and can occur only through more complex diagrams, as the one depicted

in Fig. 2.

2By taking into account the anti-particles and the color charge, one obtains a total of 61 elementaryparticles (6 × 2 × 3 quarks, 6 × 2 leptons, 8 gluons and 5 other bosons).

4

2.3 MeV

up

u4.8 MeV

downd511 keV

electron

e< 2 eV

e neutrino

νe

1.28 GeV

charm

c95 MeV

strange

s105.7 MeV

muon

µ< 190 keV

µ neutrino

νµ

173.2 GeV

topt4.7 GeV

bottomb1.777 GeV

tau

τ< 18.2 MeV

τ neutrino

ντ80.4 GeV

W±91.2 GeV

Z

photon

γ

gluon

g

125.1 GeV

HiggsH

strongnuclear

force(co

lor)

electromagnetic

force(ch

arge)

weak

nuclear

force(w

eakisosp

in)

6quarks

(+6an

ti-quarks)

6lep

tons

(+6anti-lep

tons)

12 fermions(+12 anti-fermions)increasing mass →

5 bosons(+1 opposite charge W )

standard matter unstable matter force carriersGoldstonebosons

1st 2nd 3rd generation

Figure 1: Components of the Standard Model of particle physics. The masses are given in unitswhere c = 1. Figure adapted from Ref. [14].

b s

W−

Figure 2: Feynman diagram for the rare b→ sγ transition [16].

5

Table 1: Weak isospin doublets.

(νee−

)L

(νµµ−

)L

(νττ−

)L(

ud′

)L

(cs′

)L

(tb′

)L

2.2 Radiative B decays in the Standard Model

Following M. Gronau and D. Pirjol [5], the effective weak radiative Hamiltonian describing

the b→ sγ transition is given by

Heffrad = −4GF√2VtbV

∗ts [C7R(mb)O7R(mb) + C7L(mb)O7L(mb)] , (3)

where GF is the Fermi constant, V denotes the CKM matrix, C7L,R are effective Wilson

coefficients [6], mb the mass of the b quark [6] and O7L,R the electromagnetic penguin

operators corresponding to a left- or right-handed photon respectively. These operators

follow O7R ∝ sRσµνbLFµν ,O7L ∝ sLσµνbRFµν ,

(4)

with σµν = i2 [γµ, γν ], and Fµν the electromagnetic tensor. In the SM, the Wilson coeffi-

cients are such that|C7R||C7L|

≈ ms

mb≈ 0.02, (5)

which implies that the photon is predominantly left-handed in a b→ sγ transition.

When considering the decay B(bq) → K(i)res(sq)γ, the photon polarisation parameter

λγ is defined as

λ(i)γ :=

|c(i)R |2−|c

(i)L |2

|c(i)R |2+|c(i)

L |2, (6)

where

c(i)L,R :=M(B → K(i)

resγL,R) (7)

denotes the weak radiative amplitude for the resonance (i). It can be shown [5] that the

amplitude ratio does not depend on the resonance and is linked to the Wilson coefficients:

|c(i)R ||c(i)L |

=|C7R||C7L|

. (8)

This implies that the photon polarisation parameter is also the same for all the resonances.

One finds

λ(i)γ =

|C7R|2−|C7L|2|C7R|2+|C7L|2

≡ λγ . (9)

6

Thus, for a radiative B(B) decay, the SM predicts (up to QCD corrections [6])

λγ = −1 (+1) +O(m2s/m

2b). (10)

In the simple case where only one resonance is available, M. Gronau and D. Pirjol [5]

show that λγ coincides with the photon polarisation Pγ defined by

Pγ :=Γ(B → KππγR)− Γ(B → KππγL)

Γ(B → KππγR) + Γ(B → KππγL), (11)

where Γ denotes the decay rate.

2.3 Measurement of the photon polarisation parameter

Several methods have been suggested to measure the photon polarisation parameter [5, 6].

A general method currently under investigation goes through a full amplitude analysis

of the B+ → K+π−π+γ decay, parametrising the phase space with the variable set

{m2K+π−π+ ,m

2K+π− ,m

2π−π+ , cos θ, χ

},

where θ is the angle between the normal to the K+π−π+ plane and the momentum of the

photon in the kaon resonance rest frame (Fig. 3) and χ the angle between the K+−γ plane

and the π+− γ plane in the kaon resonance rest frame (Fig. 4) [5]. The main challenge of

this method is the complexity of the K+π−π+ mass spectrum; in particular, the higher

mass resonance contributions are not well known.

A simplified method, which does not require a full characterisation of the K+π−π+

mass spectrum, uses the up-down asymmetry Aud, defined as

Aud =

∫ 10 d cos θ dΓ

d cos θ −∫ 0−1 d cos θ dΓ

d cos θ∫ 1−1 d cos θ dΓ

d cos θ

, (12)

It can be shown that Aud is proportional to λγ , even in the case where multiple reso-

nances are considered [5]. However, the proportionality constant is not well known from

theory [5, 6], and so a precise value of the photon polarisation cannot be computed from

the up-down asymmetry. Nevertheless, a non-zero Aud implies a non-zero photon polari-

sation. Using data collected during the first run of the LHC, a non-zero Aud was observed

at a significance level of 5.2σ [10].

7

Figure 3: Definition of the angle θ in the hadronic rest frame. Credits to V. Bellee.

Figure 4: Definition of the angle χ in the hadronic rest frame. Credits to V. Bellee.

8

3 The LHCb experiment

The Large Hadron Collider beauty (LHCb) experiment is one of the main experiments

located along the Large Hadron Collider (LHC), a proton-proton3 collider built and run

by the European organisation for nuclear research (CERN) across the Franco-Swiss bor-

der near Geneva. The LHCb detector is a single-arm forward spectrometer covering the

pseudorapidity4 range 2 < η < 5 and designed to study the decays of b and c-hadrons

[17]. Figure 5 shows a schematic view of the detector and its main subdetectors, which

are listed and briefly described in the following sections.

Vertex locator

The vertex locator (VELO) is the closest subdetector to the interaction region and is

designed to identify the production and decay vertices, namely the primary vertex (PV)

and the secondary vertex (SV), of b and c hadrons (Fig. 6). The VELO consists of two

halves, each containing a series of 21 semi-circular silicon modules perpendicular to the

beam direction. The two halves can be retracted from the beam during injection. When

the detector is in its closed position, the innermost part of the sensors are less than 1 cm

away from the LHC beams [18]. Each silicon module provides the radial coordinate r and

the azimuthal coordinate φ of the charged tracks with a hit resolution of ≈ 10µm.

Silicon tracker

Together with the VELO, the silicon tracker (ST) allows for the reconstruction of the

trajectories of charged particles. Thanks to a dipole magnet providing an integrated

magnetic field of 4 Tm for tracks of 10 m length, the momentum p of charged particles can

be computed using the relation

p = qBρ, (13)

where q is the electric charge, B is the magnetic field and ρ the track curvature.

The ST consists of 4 planar tracking stations: the tracker turicensis or trigger tracker

(TT) upstream of the magnet (1 station) and the inner tracker (IT), which forms the

innermost part5 of the 3 downstream tracking stations. The TT and the IT are made

of 4 layers of silicon microstrips arranged in a y-u-v-y geometry, where y is the vertical

direction and u, v two directions corresponding to ±5o from the vertical. The ST has a

hit resolution of ≈ 50µm.

3The LHC can also be used to collide heavy ions, mainly for another experiment called ALICE.4The pseudorapidity η is defined as η = − ln tan θ

2, where θ is the angle with respect to the beam axis.

5i.e. the part closest to the beam pipe.

9

Figure 5: Side view of the LHCb detector [17].

Figure 6: Definition of the primary vertex (PV), secondary vertex (SV) and impact parameter(IP). Adapted from Ref. [19].

10

Outer tracker

The outer tracker (OT), which is a drift-time detector, forms the outside part of the

downstream tracking stations. Similarly to the ST, each of the 3 stations of the OT has

4 layers arranged in a y-u-v-y geometry. Each layer is made of 2 staggered sublayers of

drift-tubes filled with a mixture of Ar/CO2/O2 (70/28.5/1.5) [20]. Together with the IT,

the OT constitutes the 3 tracking stations T1, T2 and T3.

Ring-imaging Cherenkov detectors

Two ring-imaging Cherenkov counters (RICH1 and RICH2), upstream and downstream

of the magnet respectively, are used to identify charged particles. If a charged particle

travels through a medium of refraction index n faster than light in this medium, it emits

a light cone of angle θc related to its velocity β through

β =1

n cos θc. (14)

RICH1 uses a C4F10 radiator and covers the momentum range ≈ 1 − 60 GeV/c, while

RICH2 uses a CF4 radiator and covers the range ≈ 15 − 100 GeV/c. Both detectors

contain mirrors to reflect the Cherenkov light out of the LHCb acceptance. By combining

the information about momentum (Eq. 13) and velocity (Eq. 14), the mass of the particle

(its identity) follows from m = p/(γβ), where γ = 1/√

1− β2.

Calorimeters

The calorimeters give information about the identity, energy and position of the final

state electrons, photons and hadrons. They consist of an electromagnetic calorimeter

(ECAL) followed by a hadronic calorimeter (HCAL) of 25 and 5.6 radiation lengths re-

spectively. The ECAL (HCAL) has a scintillator/lead (scintillator/iron) sampling struc-

ture. The scintillating light produced by both calorimeters is transmitted to phototubes

by wavelength-shifting fibres.

The identification of electrons is challenging due to a high background of pions. Two

subdetectors in front of the ECAL are designed to reject this background: a scintillator

pad detector (SPD) and a preshower detector (PS). The SPD is used to separate electrons

from photons and neutral pions, and the PS segments the electromagnetic shower detection

for charged pions identification.

Muon system

The muon system is composed of 5 stations (M1−M5). M1 is located upstream of the

calorimeter and is based on a triple gas electron multiplier (GEM). M2−M5 are located

downstream of the calorimeter and use iron absorbers and multi-wire proportional cham-

11

bers (MWPC). M1−M3 have a high spatial resolution and provide a momentum resolution

of ≈ 20%, whereas M4−M5 have a lower spatial resolution and are mainly used to identify

penetrating particles.

Trigger6

The LHCb trigger system is represented in Fig. 7. The LHC provides a bunch crossing

rate of 40 MHz, which is currently too high for the readout. The level-0 trigger (L0),

implemented in hardware, performs a first online selection based on the transverse mo-

mentum and energy of single particles (tracks or calorimeter clusters) and rejects also

high multiplicity events. After L0, two high level triggers (HLT1 and HLT2), consisting of

software algorithms, are used to achieve a final storage rate of 12.5 kHz in Run 2. HLT1

reconstructs tracks in events that pass the L0 stage and selects high quality tracks. The

event rate after HLT1 is low enough to buffer the events in a local disk. Thanks to this

buffer, an online calibration and alignment is executed before running the HLT2 on the

selected events. HLT2 reconstructs the full event and includes information on particle

identification.

6Because the trigger system was changed between Run 1 and Run 2, this section is based on Refs. [21, 22].

12

40 MHz bunch crossing rate

450 kHzh±

400 kHzµ/µµ

150 kHze/γ

L0 Hardware Trigger : 1 MHz readout, high ET/PT signatures

Software High Level Trigger

12.5 kHz (0.6 GB/s) to storage

Partial event reconstruction, select displaced tracks/vertices and dimuons

Buffer events to disk, perform online detector calibration and alignment

Full offline-like event selection, mixture of inclusive and exclusive triggers

LHCb 2015 Trigger Diagram

Figure 7: LHCb trigger in 2015 [22].

13

4 Optimisation of the B+ → K+π−π+γ selection

This section describes the selection of B+ → K+π−π+γ candidates in several data samples

collected with the LHCb detector. First, the data and Monte Carlo simulated samples con-

sidered are summarised (Sec. 4.1), the event preselection (stripping) is explained (Sec. 4.2)

and the trigger lines required are listed (Sec. 4.3). Then, a more precise background re-

jection strategy is developed: it consists of a set of cuts (Sec. 4.4) followed by the training

and application of a multivariate classifier (Sec. 4.5).

4.1 Data samples

The three data samples used in this study were collected with the LHCb detector at a

centre-of-mass energy of 13 TeV during the years 2015, 2016 and 2017. They correspond

to integrated recorded luminosities of 0.29, 1.64 and 1.71 fb−1, respectively. Monte Carlo

(MC) samples of signal and several sources of background are generated with Pythia8

[23] and fully simulated with Geant4 [24].

Table 2 lists the MC samples used, which are simulated using 2016 data-taking condi-

tions. The signal is simulated by the exclusive B+ → K1(1270)+γ decay, which is expected

to provide the highest contribution [10]. The other samples are used in Sec. 5 to model

several sources of background.

Based on the results of studies of the simulated signal sample (see Sec. 5.1), one intro-

duces three regions in the B mass distribution defined in Table 3: a signal region, and two

sidebands expected to contain mainly background.

4.2 Stripping selection

The first stage of the selection, called stripping, is a set of loose cuts aimed at preferentially

selecting events of interest to physics analyses. This is done during the central offline

processing of the data samples in order to save storage space and computational resources

[25]. The stripping configurations corresponding to 2015, 2016 and 2017 data considered

are S24, S28r1p1 and S29r2, respectively. Table 4 lists the main selection criteria applied

at this stage:

• The momentum p and transverse momentum pT of each track and the sum Σ pT

of all the transverse momenta coming from the resonance SV are required to be

large enough in order to reject low momentum background. For the same reason, a

minimum value is also imposed on the photon transverse energy ET and for the sum

of all the track transverse momenta and the photon transverse momentum.

• The χ2 of each track and each vertex is a measurement of the reconstruction quality.

It is near unity for a well-reconstructed track.

14

Table 2: Monte Carlo simulated samples used to model signal and background. The numbersof events are given after the generator-level cuts but before the stripping; in particular, theycorrespond to events generated within the detector acceptance.

Simulated decay Number of events

B+ → K1(1270)+γ 3.007× 106

B0 → K∗0γ 2.019× 106

B+ → K∗0π+γ 2.011× 106

B+ → K1(1270)+η 5.138× 105

B0 → K1(1270)0γ 5.183× 105

Table 3: Regions defined in the B candidate mass distribution.

Region Definition Unit

Low-mass sideband M(K+π−π+γ) < 5080 MeV/c2

Signal region M(K+π−π+γ) ∈ [5080, 5480] MeV/c2

High-mass sideband M(K+π−π+γ) > 5480 MeV/c2

• The impact parameter (IP) χ2 indicates if a track is compatible with coming from

the PV. It has to be close to unity in the case of the B candidate track and � 1 for

the other tracks.

• The ghost probability of a track is the probability that this track was reconstructed

by a set of random hits in the detector. A upper bound is set for this probability.

• Wide mass windows are defined for the B candidate and the K resonance.

4.3 Trigger lines

Trigger signals are associated with reconstructed particles and it is therefore possible to

select events where the trigger decision was made on signal (TOS) or on other particles

present in the event (TIS) [21, 22]. In this study, a combination of 5 trigger lines is

required:

• At the hardware stage, one selects events firing the TOS lines corresponding to a pho-

ton or an electron decision: B L0PhotonDecision TOS or B L0ElectronDecision TOS.

• At the HLT1 level, one requires TOS lines corresponding to a decision based on the

output of a multivariate algorithm (MVA) considering 1 or 2 tracks:

B Hlt1TrackMVADecision TOS or B Hlt1TwoTrackMVADecision TOS.

• At the HLT2 level, one considers events firing the inclusive 3 hadrons and 1 photon

line: B Hlt2RadiativeIncHHHGammaDecision TOS.

15

Table 4: Stripping selection requirements.

Variable 2015 2016 2017 Unit

Track pT > 300 > 300 > 300 MeV/cTrack p > 1000 > 1000 > 1000 MeV/cTrack χ2 < 3 < 3 < 3Track IP χ2 > 16 > 20 > 20Track ghost probability < 0.4 < 0.4 < 0.4

Resonance tracks Σ pT > 1500 > 1000 > 1000 MeV/cResonance vertex χ2 < 10 < 9 < 9Resonance IP χ2 > 0 > 0 > 0Resonance mass ∈ [0, 7900] ∈ [0, 7900] ∈ [0, 7900] MeV/c2

Photon ET > 2000 > 2000 > 2000 MeVPhoton CL > 0 > 0 > 0

Photon and tracks Σ pT > 5000 > 3000 > 3000 MeV/cB+ DIRA > 0 > 0 > 0B+ vertex χ2 < 9 < 9 < 9B+ IP χ2 < 9 < 9 < 9B+ mass ∈ [2900, 6500] ∈ [2900, 6500] ∈ [2900, 6500] MeV/c2

Table 5 summarises the trigger efficiencies on signal MC and both data sidebands. The

computation of the uncertainties is presented in Appendix A.1.

4.4 Cut-based strategy

Following the coarse selection applied by the stripping and the trigger lines, this section

and Sec. 4.5 describe the next steps of the selection: a set of more stringent cuts followed

by the training and application of a multivariate classifier. In order to optimise some of

the cuts made at this stage, a figure of merit (significance) is utilised; it is defined as

Significance =Nsig√

Nsig +Nbkg

∣∣∣∣∣5080 MeV/c2<mB<5480 MeV/c2

, (15)

where Nsig and Nbkg are the expected numbers of signal and background events in the

signal region, respectively. For each cut on a particular variable, Nbkg is estimated by

fitting the high-mass sideband with a linear function and integrating the resulting fit

function over the signal region (Fig. 8). To estimate Nsig, one uses

Nsig = L · σ(pp→ B±X) · B(B+ → K+π−π+γ) · ε, (16)

where L is the integrated recorded luminosity, σ(pp → B±X) = 86.6 ± 6.4µb is the B±

production cross section at 13 TeV summed over both charges [26], B(B+ → K+π−π+γ) =

(2.76± 0.22)× 10−5 is the branching fraction of interest [27] and ε is the efficiency deter-

16

Tab

le5:

Tri

gger

effici

enci

esin

per

cent.

SB

stan

ds

for

sid

eban

d.

At

each

leve

l,th

eeffi

cien

cies

are

com

pu

ted

by

imp

osi

ng

the

last

requ

irem

ent

of

each

of

the

pre

ced

ent

leve

ls.

For

exam

ple

,H

LT

2(hhhγ

)is

actu

ally

[L0(γ

)or

L0(e

)]an

d[H

LT

1(1t)

or

HLT

1(2t)

]an

dH

LT

2(hhhγ

).

Tri

gger

effici

enci

es[%

]

2015

2016

2017

Tri

gger

lin

eS

ign

alM

CL

ow-m

ass

SB

Hig

h-m

ass

SB

Low

-mas

sS

BH

igh-m

ass

SB

Low

-mass

SB

Hig

h-m

ass

SB

L0(γ

)46.

0.3

16.0

0.0

113.1

0.0

116.1

84±

0.0

0214.

02±

0.01

15.6

04±

0.003

14.

03±

0.0

1L

0(e)

37.7±

0.3

40.2

0.0

129.5

0.0

151.9

77±

0.0

0351.0

0.01

47.8

29±

0.00

446.4

0.0

1L

0(γ

)or

L0(e

)82.

0.2

54.3

0.0

141.5

0.0

165.8

90±

0.0

0363.

80±

0.01

61.3

41±

0.003

59.

17±

0.0

1

HLT

1(1t)

69.1±

0.2

45.6

0.0

122.2

0.0

154.0

59±

0.0

0335.

07±

0.01

54.2

67±

0.004

38.

76±

0.0

1H

LT

1(2t

)76.2±

0.2

47.8

0.0

128.8

0.0

159.2

26±

0.0

0351.4

0.01

54.5

55±

0.00

445.4

0.0

1H

LT

1(1t)

orH

LT

1(2t)

82.

0.2

52.1

0.0

134.5

0.0

165.8

60±

0.0

0363.

58±

0.01

61.3

20±

0.003

58.

99±

0.0

1

HLT

2(hhhγ

)71.3±

0.2

18.1

0.0

11.6

08±

0.0

0348.2

39±

0.0

0318.

71±

0.01

44.9

81±

0.0

04

18.3

77±

0.0

08

17

0

1000

2000

3000

4000

5000

6000 )2E

vent

s / (

60

MeV

/c LHCb

preliminary

5000 5200 5400 5600 5800 6000 6200 6400

]2) [MeV/cγ+π−π+M(K

5−05

Pull

Figure 8: Illustration of the background estimation. The B+ candidate mass distribution is fittedwith a linear function in the range [5700, 6500] MeV/c2 and the resulting fit function is integratedin the signal region [5080, 5480] MeV/c.

mined from signal MC samples7. One does not optimise significance for the PID variables

because they are not well simulated [28].

Figures 9 and 10 show the distribution of the variables on which the cuts are made;

note that each histogram is normalised to unit area. The cuts on the PID variables are

not very stringent at this stage, knowing that the final optimisation is done by cutting on

the output of a classifier (Sec. 4.5). Table 6 summarises all the requirements described

below:

• Low momentum background is rejected by minimum conditions on the transverse

momentum pT of the B meson and the transverse energy of the photon. The asso-

ciated significance plots are shown in Fig. 11.

• Soft PID cuts are set to discriminate between kaons and pions. Given a charged track

reconstructed as a particle X, XP(Y ) is defined as the probability for this track to

actually originate from a Y particle. This probability is based on the results of a

neural network (NN) that combines information from several subdetectors.

• A high momentum π0 can be mis-reconstructed as a photon if its decay products

(2γ with a probability of ≈ 99.8% [27]) are reconstructed as a single cluster in the

ECAL. A multivariate algorithm inspects the geometry of the showers in the PS and

7Note that the MC simulated samples used for this computation are not reweighted to match data (forexample, the PID calibration is not applied [28]). Therefore, Eq. 16 provides only a rough estimationof Nsig. The signal yield observed in data after the selection is ∼ 25% lower than the result of thiscomputation. However, it was checked by rescaling Nsig that this difference does not change significantlythe value of the cuts maximising the significance.

18

0 2000 4000 6000 8000 10000 12000 14000 16000

Max track pT [MeV/c]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 5000 10000 15000 20000 25000 30000 35000

B pT [MeV/c]

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 1000 2000 3000 4000 5000 6000

M(K + π − π + ) [MeV/c2]

0

1

2

3

4

5

(1/N

) d

N/d

x

1e 3

Low-mass sideband

High-mass sideband

Signal MC

2000 4000 6000 8000 10000 12000 14000 16000 18000

Photon ET [MeV]

0

1

2

3

4

5

6

(1/N

) d

N/d

x1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 5 10 15 20 25 30

K + π − π + vertex isolation ∆χ2

0.0

0.1

0.2

0.3

0.4

0.5

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 9: Offline selection variables after applying the requirements on the trigger lines (2016data). Each histogram is normalised to unit area.

19

0.0 0.2 0.4 0.6 0.8 1.0

KP(K)(1−KP(π))

0

5

10

15

20

25

30

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

π +P(π + )(1− π +P(K))

0

5

10

15

20

25

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

π −P(π − )(1− π −P(K))

0

5

10

15

20

25

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

Photon CL

0

2

4

6

8

10

12

14

16

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Photon/π0 separation

0

2

4

6

8

10

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 10: PID variables after applying the requirements on the trigger lines (2016 data). Eachhistogram is normalised to unit area.

20

Table 6: Offline selection requirements.

Variable 2015 2016 2017 Unit

Max track pT > 1100 > 1100 > 1100 MeV/c

KP(K)(1−KP(π)) > 0.2 > 0.2 > 0.2π+P(π+)(1− π+P(K)) > 0.2 > 0.2 > 0.2π−P(π−)(1− π−P(K)) > 0.2 > 0.2 > 0.2

K∗ vertex isolation ∆χ2 > 4 > 8 > 8M(K+π−π+) mass window ∈ [1100, 1900] ∈ [1100, 1900] ∈ [1100, 1900] MeV/c2

Photon ET > 2800 > 3100 > 3100 MeVPhoton/π0 separation > 0.5 > 0.5 > 0.5Photon CL > 0.2 & 6= 0.5 > 0.2 & 6= 0.5 > 0.2 & 6= 0.5

B pT > 3500 > 5500 > 5500 MeV/c

M(K+π−π0) > 2200 > 2200 > 2200 MeV/c2

M(π+π0) > 1100 > 1100 > 1100 MeV/c2

2000 2500 3000 3500 4000 4500 5000Cut on Photon ET [MeV]

0.2

0.4

0.6

0.8

1.0

Eff

icie

ncy

110

115

120

125

130

135

140

Sign

ifica

nce

significancesignal efficiencybackground efficiency

0 20 40 60 80 100Cut on K + + vertex isolation 2

0.0

0.2

0.4

0.6

0.8

1.0

Eff

icie

ncy

110

115

120

125

130

135

140

145

Sign

ifica

nce

significancesignal efficiencybackground efficiency

2000 4000 6000 8000 10000 12000 14000Cut on B pT [MeV/c]

0.0

0.2

0.4

0.6

0.8

1.0

Eff

icie

ncy

70

80

90

100

110

120

130

Sign

ifica

nce

significancesignal efficiencybackground efficiency

Figure 11: Significance, defined in Eq. 15, MC signal efficiency and combinatorial backgroundefficiency in the signal region, as a function of cuts on several variables.

21

the ECAL to allow for a γ − π0 separation.

• A photon-electron distinction is made based on the photon confidence level (CL)

defined as

CL =tanh(γDLLγ−e) + 1

2, (17)

where γDLLγ−e is the difference, for a particle identified as a photon, of the log-

likelihoods (DLL) of the photon and the electron hypotheses. These log-likelihoods

are obtained from information from the calorimeters. The value γDLLγ−e = 0

corresponds to an error in the PID variable; for this reason, the condition CL 6= 0.5

is imposed.

• Partially reconstructed background can be suppressed by checking that combining

any new track with the reconstructed resonance vertex causes a drop in the vertex

quality. For this purpose, one defines the vertex isolation ∆χ2 as

∆χ2 = mintrack

χ2(reconstructed vertex + track)− χ2(reconstructed vertex), (18)

where the minimum is taken over all the tracks in the event not belonging to the

original reconstructed vertex. A significance plot for this variable is shown in Fig. 11.

• A mass window is defined for the K+π−π+ system. Figure 12 shows that most

of the signal is contained in the window [1100, 1900] MeV/c2. The peak around

1970 MeV/c2 seen in the K+π−π+ mass distribution is interpreted as coming from a

D+s → K+π−π+ decay, whose branching fraction is (6.6±0.4)×10−3 and which can

occur through intermediate states such as K∗0π+ or K+ρ0 [27]. This observation

justifies the upper bound of the window.

• The two last cuts presented in Table 6 correspond to excluded regions needed to

suppress the background coming from the decay

B+ → D0(→ K+ρ−(→ π−π0))ρ+(→ π+π0).

This decay needs to be considered very carefully because it has a large branching

fraction (see Table 7). Two cases are considered:

– The π0 coming from ρ− is reconstructed as a photon and the π0 coming from

ρ+ is not reconstructed. This background is suppressed by requiring

M(K+π−π0) > 2200 MeV/c2 > M(D0),

where M(K+π−π0) is computed by assigning the π0 mass to the photon can-

didate. Figure 13 shows that this cut does not affect the signal. The fact that

most of this background is outside of the signal region is caused by the nar-

rowness of the resonance mass window imposed above (see Appendix A.2 for a

more detailed discussion).

22

4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

1200

1400

1600

1800

2000

2200

2400M

(K+π−π

+)

[MeV

/c2]

0

2000

dN

/(20

MeV

/c2)

0 2500dN/(20 MeV/c2)

20

40

60

80

100

dN

/(20

MeV

/c2)/

(20

MeV

/c2)

Figure 12: M(K+π−π+) and M(K+π−π+γ) for 2016 data. All the requirements listed in Table6 are applied to the distributions except for the resonance mass window.

– The π0 coming from ρ+ is reconstructed as a photon and the π0 coming from

ρ− is not reconstructed. This background is suppressed by requiring

M(π+π0) > 1100 MeV/c2 > M(ρ+),

where M(π+π0) is computed by assigning the π0 mass to the photon candidate.

Figure 14 shows that this cut does not strongly affect the signal.

4.5 Multivariate analysis

The signal and background separation is a binary classification problem and many modern

algorithms based on machine learning techniques can be used [29]. A binary classifier is

an algorithm which takes as input a set of variables (features) in a recorded event and

gives as output a single variable representing a predicted probability for this event to be

signal rather than background.

4.5.1 XGBoost

The algorithm chosen in this study is called XGBoost [30] and belongs to the family of

gradient boosted trees, which are widely used in experimental particle physics. Boosted

trees, similarly to random forests [31], combine the outputs of many trees (weak learners)

23

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

2000

3000

4000

5000

6000M

(K+π−π

0)

[MeV

/c2]

0

5000

dN

/(30

MeV

/c2)

0 5000dN/(30 MeV/c2)

100

200

300

400

500

dN

/(30

MeV

/c2)/

(30

MeV

/c2)

2016 Data

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

2000

3000

4000

5000

6000

M(K

+π−π

0)

[MeV

/c2]

0

1000

2000

dN

/(30

MeV

/c2)

0 500dN/(30 MeV/c2)

10

20

30

40

50

60

70

80d

N/(

30M

eV/c

2)/

(30

MeV

/c2)

Signal MC

Figure 13: M(K+π−π0) and M(K+π−π+γ) for 2016 data (top) and signal MC (bottom), whereM(K+π−π0) is computed by assigning the π0 mass to the photon candidate. In data, the smallpeak around 1900 MeV/c2 corresponds to D0 (see text for details). All the requirements listed inTable 6 are applied to the distributions except for the two last ones.

24

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

0

1000

2000

3000

4000

5000M

(π+π

0)

[MeV

/c2]

0

5000

dN

/(30

MeV

/c2)

0 5000dN/(30 MeV/c2)

50

100

150

200

250

dN

/(30

MeV

/c2)/

(30

MeV

/c2)

2016 Data

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

0

1000

2000

3000

4000

5000

M(π

0)

[MeV

/c2]

0

1000

2000

dN

/(30

MeV

/c2)

0 250dN/(30 MeV/c2)

10

20

30

40

50d

N/(

30M

eV/c

2)/

(30

MeV

/c2)

Signal MC

Figure 14: M(π+π0) and M(K+π−π+γ) for 2016 data (top) and signal MC (bottom), whereM(π+π0) is computed by assigning the π0 mass to the photon candidate. In data, the peakaround 800 MeV/c2 corresponds to ρ+ (see text for details). All the requirements listed in Table 6are applied to the distributions except for the two last ones.

25

Table 7: Branching fractions involved in the background coming from B+ → D0ρ+ [27].

Decay mode Fraction

B+ → D0ρ+ (1.34± 0.18)%

D0 → K+ρ− (11.1± 0.7)%ρ± → π±π0 ≈ 100%

to give a prediction. Figure 15 shows an example of one of the decision trees8 obtained

by training the algorithm. The specificity of boosting is to add weak learners sequentially

during the training and to give more importance at each iteration to the events misclassified

at the previous stage [29].

A general feature of machine learning is the bias-variance tradeoff: if no limit is imposed

on the model complexity, it is easy to train a model which separates perfectly signal from

background when applied on its own training set (low bias), but which makes very poor

predictions when applied to a new dataset (high variance); this regime is called overfitting.

On the other hand, if the model is too simple, then its performance does not depend on

the dataset (low variance), but its predictions are never accurate (high bias); this regime

is called underfitting [29].

The XGBoost algorithm, together with the scikit-learn library [32], provides many

parameters which limit the overfitting. In this study, four such parameters are used:

• The maximal depth of each tree (weak learner) is set to a low value (2 in the case

of the example showed in Fig. 15).

• A shrinkage parameter 0 < η < 1 scales down the weights of newly added trees in

order to reduce the influence of each individual weak learner [30].

• A L1-regularisation parameter α penalises the complexity of each weak learner by

adding a term α∑

i|wi| to the loss function, where the wi are the weights of the tree

and the loss function is the function minimised by the algorithm at each iteration

[30, 33].

• A subsample parameter 0 < s < 1 ensures that each tree is trained only on a random

subset of the total training sample. For example, a value s = 0.5 corresponds to a

subsample size of 50%.

4.5.2 Training

To avoid the creation of an artificial peaking background in the signal region, it is im-

portant to use variables that are not correlated with the B candidate mass. Moreover,

some variables are not well simulated and could introduce a bias in the model [8]. For this

8Here, the name decision tree is an abuse of language, because each leaf of the tree is not associatedwith a decision but with a weight.

26

K+ IPχ2 < 75

π− IPχ2 < 89

w : −0.51 w : −0.30

π+ IPχ2 < 98

w : −0.26 w : 0.24

yes

yes no

no

yes no

Figure 15: Example of decision tree. Each node (circle) is associated with a cut and each leaf(rectangle) is associated with a weight. A more positive (or less negative) weight corresponds to ahigher predicted probability to have a signal event.

reason, the PID variables are not used as input features. The variables used to train the

classifier are listed below:

• The B+,K+, π− and π+ impact parameter χ2, introduced in Sec. 4.2.

• The resonance vertex quality χ2.

• The cosine of the angle between the B candidate momentum and the direction

defined by the PV and SV of the B candidate (DIRA). For a signal event, one

expects DIRA to be near unity.

• One also introduces cone isolation variables [34, 35]. For a given candidate X and

a given cone radius R, one defines the cone set CX,R as the set of all the tracks not

belonging to the X decay and satisfying√

(∆η)2 + (∆φ)2 < R, where ∆η and ∆φ

are the differences in pseudorapidity and azimuthal angle between the track and the

X candidate, respectively. Based on this definition, two cone isolation variables are

defined:

– The cone multiplicity, which is simply the number of tracks in CX,R. The cone

multiplicity is near zero for a well-isolated candidate.

– The cone asymmetry, which is given by

Cone asymmetry(X,R) =

PT (X)− ∑track∈CX,R

PT (track)

PT (X) +∑

track∈CX,RPT (track)

, (19)

27

where PT (X) is the transverse momentum of X. The cone asymmetry is near

unity for a well-isolated candidate (Fig. 16).

These variables have already shown a good discrimination power in a previous analysis of

Run 1 data [8]. Their correlation matrix with the B candidate mass is drawn in Fig. 17.

In order to ensure that the classifier is trained, optimised and applied on different

datasets, one follows the strategy depicted in Fig. 18 and inspired from Ref. [8]:

1. The MC signal and both data sidebands are randomly divided in two subsamples A0

and A1 of equal size and in such a manner that both subsamples contain the same

proportion of events coming from the MC signal and from the data sidebands.

2. A0 itself is divided in two subsamples, A0B0 and A0B1, of relative size 2/3 and

1/3 respectively. A0B0 is used to train a first classifier and the A0B1 subsample is

used to test it and optimise the cut on its output. Similarly, A1 is divided in two

subsamples A1B0 and A1B1 and a new classifier is trained, tested and optimised

following the same steps.

3. The classifier trained on A0B0 (A1B0) is applied to A1 (A0) in such a way that no

classifier is trained and applied on same data.

4.5.3 Results and choice of the final cut

Figure 19 shows the distribution of the output of the two classifiers and compares the

results obtained for the respective training set and test set together with the results of a

Kolmogorov-Smirnov test for an overfitting check [36]. The associated significance plots

are drawn in Fig. 20: the optimal cuts on the classifiers output are found to be 0.16

(0.14) for the classifer trained on A0B0 (A1B0). Whereas the classifiers show a good

separation power, the significance does not increase significantly because the signal is

already dominant in the [5080, 5480] MeV/c2 mass region thanks to the pre-selection criteria

applied. In order to choose the final cut, another figure of merit, the purity, is investigated;

it is defined as

Purity =Nsig

Nsig +Nbkg

∣∣∣∣5080 MeV/c2<mB<5480 MeV/c2

, (20)

where Nsig and Nbkg are the expected numbers of signal and background events in the

signal region. The purity is also depicted in Fig. 20; note that Nbkg is still estimated by

fitting the high-mass sideband, which means that the signal significance and purity are

only computed with respect to the combinatorial background. It can be seen that a better

purity can be achieved without affecting strongly the significance. The final cut is choosen

to be 0.2 for the two classifiers. Depending on the needs of future studies, a more stringent

cut may be imposed.

28

1.0 0.5 0.0 0.5 1.0

B + Cone PT assymetry (R= 1)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 16: Cone isolation variable used to train the classifier (2016 data).

29

M(K

++

)

K+

 IP 

2

+ IP

 2

 IP 

2

B+

 IP 

2

B+

 DIR

A

B+

 FD

 2

K*  

vert

ex is

olat

ion 

2  (1

 trac

k)

K*  

vert

ex is

olat

ion 

2  (2

 trac

ks)

K*  

vert

ex  

2

B+

 Con

e P T

 ass

ymet

ry (R

=1)

B +  Cone PT assymetry (R = 1)

K *  vertex   2

K *  vertex isolation  2 (2 tracks)

K *  vertex isolation  2 (1 track)

B +  FD  2

B +  DIRA

B +  IP  2

 IP  2

+  IP  2

K +  IP  2

M(K + + )

0.12 0.03 0.04 0.04 ­0.06 0.05 0.02 0.04 0.05 ­0.11 1.00

­0.05 ­0.00 ­0.00 0.01 0.05 ­0.04 0.02 ­0.01 ­0.01 1.00 ­0.11

0.02 0.08 0.07 0.07 ­0.00 0.01 0.09 0.06 1.00 ­0.01 0.05

0.02 0.04 0.04 0.04 ­0.00 0.01 0.04 1.00 0.06 ­0.01 0.04

0.05 0.59 0.55 0.54 0.02 0.07 1.00 0.04 0.09 0.02 0.02

­0.09 0.09 0.09 0.09 ­0.19 1.00 0.07 0.01 0.01 ­0.04 0.05

­0.15 0.01 0.01 0.01 1.00 ­0.19 0.02 ­0.00 ­0.00 0.05 ­0.06

0.11 0.41 0.37 1.00 0.01 0.09 0.54 0.04 0.07 0.01 0.04

0.12 0.49 1.00 0.37 0.01 0.09 0.55 0.04 0.07 ­0.00 0.04

0.12 1.00 0.49 0.41 0.01 0.09 0.59 0.04 0.08 ­0.00 0.03

1.00 0.12 0.12 0.11 ­0.15 ­0.09 0.05 0.02 0.02 ­0.05 0.12

0.8

0.4

0.0

0.4

0.8

Figure 17: Correlation between the B candidate mass and the 2016 training variables.

MC signal and data sidebands

A0

A0B0 A0B1

A1

A1B0 A1B1

1/2

2/3 1/3

1/2

2/3 1/3

Figure 18: Strategy followed to train, test and apply the classifiers. The MC signal and datasidebands are randomly divided in two equally-sized subsamples A0 and A1. 2/3 of A0 and A1(A0B0 and A1B0) are used to train 2 classifiers; the remaining thirds (A0B1 and A1B1 respectively)are used to test each classifier and optimise the cut on its output. The classifier trained, testedand optimised on A0 is then applied to A1 and and vice-versa [8].

30

0.0 0.2 0.4 0.6 0.8 1.0Classifier output

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) dN

/dx

Kolmogorov-Smirnov test: signal (background) p-value = 0.464 (0.352)Signal (test)Background (test)

Signal (training)Background (training)

0.0 0.2 0.4 0.6 0.8 1.0Classifier output

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) dN

/dx

Kolmogorov-Smirnov test: signal (background) p-value = 0.134 (0.267)Signal (test)Background (test)

Signal (training)Background (training)

Figure 19: Distribution of two classifiers outputs and comparison between the results obtainedwith the training and test sets. The first classifier (left) is trained on A0B0 and tested on A0B1;the second classifier (right) is trained on A1B0 and tested on A1B1.

0.0 0.2 0.4 0.6 0.8 1.0Cut on classifier output

0.0

0.2

0.4

0.6

0.8

1.0

Eff

icie

ncy

(Pur

ity)

0

20

40

60

80

100

120

Sign

ifica

nce

significancesignal efficiencybackground efficiencypurity

0.0 0.2 0.4 0.6 0.8 1.0Cut on classifier output

0.0

0.2

0.4

0.6

0.8

1.0

Eff

icie

ncy

(Pur

ity)

0

20

40

60

80

100

120

Sign

ifica

nce

significancesignal efficiencybackground efficiencypurity

Figure 20: Significance, purity, signal efficiency and background efficiency as a function of the cuton the two classifier outputs. The first classifier (left) is trained on A0B0 and tested on A0B1; thesecond classifier (right) is trained on A1B0 and tested on A1B1.

31

5 Study of the B+ → K+π−π+γ signal

In this section, the different components present in the B candidate mass distribution

after the selection are discussed and modelled (Secs. 5.1 and 5.2); this aims to build a full

mass fit (Sec. 5.3). All the fits presented are made with the RooFit package [37, 38].

5.1 Signal study

The signal is modelled with a double-tail Crystal Ball function (CB) [38, 39] defined as

CB(m; µ, σ, αL, nL, αR, nR) =

N

(nLαL

)nLexp

(−α2

L2

)(nLαL− αL − m−µ

σ

)−nLif m−µ

σ ≤ −αL,

exp(− (m−µ)2

2σ2

)if − αL < m−µ

σ < αR,(nRαR

)nRexp

(−α2

R2

)(nRαR− αR + m−µ

σ

)−nRif m−µ

σ ≥ αR,

(21)

where N is a normalisation constant and {µ, σ, αL, nL, αR, nR} is a set of 6 positive pa-

rameters described in Table 8. Out of these 6 parameters, only the mean µ and the width

σ are left free in the final fit; all the other parameters are fixed by fitting the B mass

distribution in the signal MC after applying the selection cuts presented in Sec. 4. Figure

21 presents the result of such a fit.

Based on the results of this simulation, one introduces three regions in the B mass

distribution defined in Table 3 and described below:

• The signal region is chosen to correspond approximately to the mean of the distri-

bution ±2σ.

• The high-mass sideband is expected to contain mainly combinatorial background,

which occurs when the reconstructed candidate contains random tracks coming from

an interaction point or from another decay chain.

• The low-mass sideband contains combinatorial and partially reconstructed back-

ground, the latter corresponding to b-hadron decays with more final state particles

than B+ → K+π−π+γ, but where one or several particles are not reconstructed.

5.2 Background study

As stated in the previous section, two main background components are still present after

the selection [8]: the combinatorial and the partially reconstructed backgrounds. In the

following paragraphs, these two sources of background and the possible combination of

them are discussed and modelled.

32

Table 8: Parameters present in the definition of a double-tail Crystal Ball (Eq. 21).

Parameter Description

µ mean of the coreσ width of the coreαR,L transition points of the tailsnL,R exponents of the tails

0

500

1000

1500

2000

2500

3000

)2E

vent

s / (

50

MeV

/c 0.063± = 2.207 Lα 0.071± = 1.415 Rα

2 1.0 MeV/c± = 5279.1 µ2 0.99 MeV/c± = 89.53 σ

0.14± = 1.16 Ln

2.2± = 8.9 Rn

LHCb

Simulation

/ndf = 41.9/24 = 1.72χ

4600 4800 5000 5200 5400 5600 5800 6000

]2) [MeV/cγ+π−π+M(K

5−05

Pull

Figure 21: Mass distribution of MC generated B+ → K1(1270)+γ → K+π−π+γ decays. Theresults of a unbinned maximum likelihood fit with a double-tail Crystal Ball PDF are shown.

5.2.1 Combinatorial background

The mass distribution of the combinatorial background is modelled with a simple expo-

nential exp(τm), where τ is given in c2/MeV. Figure 22 depicts the result of such a fit. A

linear model, showing more stability when the number of background events is high com-

pared to the number of signal events, was used during the selection (Sec. 4) to estimate

the background in the signal region. The exponential model can also be replaced by a

linear model for systematic uncertainties studies.

5.2.2 Partially reconstructed b-hadron background

An immediate property implied by the definition given above of partially reconstructed

b-hadron background is that the mass distribution of this background has an upper bound

mB −mmiss,

where mB is the mass of the parent b-hadron and mmiss is the sum of the masses of the

missing particles. One distinguishes two cases: if exactly one pion is missing or if at least

33

0

1000

2000

3000

4000

5000

)2E

vent

s / (

60

MeV

/c LHCb

preliminary

5200 5400 5600 5800 6000 6200 6400

]2) [MeV/cγ+π−π+M(K

5−05

Pull

Figure 22: B candidate mass distribution after the selection. The result of a unbinned maximumlikelihood fit in the range [5700, 6500] MeV/c2 with a exponential function is shown.

two pions are missing. In what follows, one refers to these two cases as missing pion

background and partially reconstructed background, respectively. This separation is made

because the mass distribution of the missing pion background is strongly present in the

signal region and needs to be modelled carefully.

Because of this upper bound property, both missing pion and partially reconstructed

backgrounds are described by a generalised Argus function [38, 40]:

A(m; m0, c, p) =

N ·mm0

(1− m2

m20

)pexp

[−1

2c2(

1− m2

m20

)]if 0 ≤ m < m0,

0 otherwise,(22)

where N is a normalisation constant, m0 the endpoint of the function given in MeV/c2,

and c and p two free parameters. If µ is the mean of the signal distribution, m0 is chosen to

be µ−mπ0 and µ−2mπ0 for the missing pion background and the partially reconstructed

background respectively, where mπ0 is the π0 mass.

Furthermore, one takes into account the photon energy resolution by convolving the

generalised Argus function with a Gaussian function

G(m; µ, σ) =1√

2πσ2exp

[−1

2

(x− µσ

)2], (23)

where the mean µ is fixed to zero and the width σ is chosen to be the same as the signal

one. In summary, the missing pion background is described with the function

A(m; µ−mπ0 , cmiss−π, pmiss−π)⊗G(m; 0, σ), (24)

where ⊗ denotes a convolution and the partially reconstructed background is assumed to

34

follow the law

A(m; µ− 2mπ0 , cpart, ppart)⊗G(m; 0, σ). (25)

Single missing pion background

The missing pion background parameters cmiss−π and pmiss−π are fixed by simulation.

Due to the lack of B+ → K+π−π+π0γ MC samples, one uses simulated B0 → K∗0γ

and B+ → K∗0π+γ decays by analogy. A 3-step method is used to parametrise this

background contribution [8]:

1. One fits the mass distribution of offline-selected (genuine) B0 candidates in a sample

of simulated B0 → K∗0γ decays with a double-tail Crystal Ball (Fig. 23).

2. One fits the mass distribution of offline-selected B0 → K∗0γ candidates in a sample

of B+ → K∗0π+γ decays with an Argus convoluted with a Gaussian. The endpoint

of the Argus is chosen to be µ−mπ0 , where µ is the mean obtained at the first step;

the resolution of the Gaussian is fixed to the same value as the width of the fit of

the first step (Fig. 24).

3. The parameters cmiss−π and pmiss−π of the missing pion model are fixed according

to the results of the fit obtained at the second step.

5.2.3 Peaking backgrounds

Peaking backgrounds are decays of B+ and B0 mesons whose final state can be mis-

reconstructed as K+π−π+γ. Assuming the same production fractions for B+ and B0 [27],

the contamination coming from a given peaking background is given by

Contamination ≡ Nbkg

Nsig=Bbkg · εbkg

Bsig · εsig, (26)

where Nsig (Nbkg) is the number of signal (background) events after the selection, B is the

branching fraction and ε is the total selection efficiency.

Table 9 lists several peaking backgrounds and their respective branching fractions. The

description of each of them is given below:

• The decay B+ → K+π−π+η (→ γγ), where one photon is not reconstructed is very

similar to the signal. Its branching fraction is estimated by comparing the decays

B+ → K∗+η and B+ → K∗+γ and taking into account that η decays in 2γ with a

probability of ∼ 40%. The selection efficiency for this decay is estimated by using a

MC sample of ∼ 5×105 simulated events. Figure 25 shows the results of a double-tail

Crystal Ball fit on the mass distribution of offline selected events. With a selection

efficiency of ∼ 2.4× 10−3, one obtains an estimated contamination of ∼ 8% over the

full mass range and ∼ 3% in the signal region.

35

0

1000

2000

3000

4000

5000

6000

)2E

vent

s / (

50

MeV

/c 0.040± = 2.302 Lα 0.045± = 1.404 Rα

2 0.74 MeV/c± = 5278.07 µ2 0.70 MeV/c± = 92.06 σ

0.068± = 0.701 Ln

1.0± = 7.7 Rn

LHCb

Simulation

/ndf = 83.2/28 = 3.02χ

4600 4800 5000 5200 5400 5600 5800 6000 6200

]2) [MeV/cγ*0M(K

5−05

Pull

Figure 23: Mass distribution of MC generated B0 → K∗0γ decays. The results of a unbinnedmaximum likelihood fit with a double-tail Crystal Ball PDF are shown.

0

50

100

150

200

250

300

350

400 )2E

vent

s / (

100

MeV

/c 0.41±c = -6.425

0.049±p = 0.058 LHCb

Simulation

/ndf = 40.1/18 = 2.22χ

3600 3800 4000 4200 4400 4600 4800 5000 5200 5400

]2) [MeV/cγ*0) reconstructed as M(Kγ+π*0M(K

5−05

Pull

Figure 24: Mass distribution of B0 → K∗0γ candidates selected in MC generated B+ → K∗0π+γdecays. The results of a unbinned maximum likelihood fit with a generalised Argus PDF convolvedwith a Gaussian PDF are shown. The mean of the Gaussian is fixed to zero and its width to thewidth of the double-tail Crystal Ball in Fig. 23. The endpoint of the generalised Argus is fixed toµ− mπ0 , where µ is the mean of the double-tail Crystal Ball in Fig. 23.

36

Table 9: Peaking backgrounds and corresponding branching fractions as listed in Ref. [8] withupdated values for branching fractions computed from Ref. [27]. The symbol ∼ denotes estimatedvalues.

Decay Branching fraction

B+ → K+π−π+η (→ γγ) ∼ 5.4× 10−6

B0 → K01 (→ K+π−π0)γ ∼ 2.8× 10−5

B+ → D0(→ K+π−π0)π+ (6.9± 0.3)× 10−4

B+ → D∗0(→ D0(→ K+π−)π0)π+ (1.23± 0.05)× 10−4

B+ → D∗0(→ D0(→ K+π−)γ)π+ (6.7± 0.3)× 10−5

B+ → K∗+(→ K+π0)π+π− (2.5± 0.3)× 10−5

B+ → π+π−π+γ ∼ 1.3× 10−6

0

20

40

60

80

100

120

140

160

180

200

220

240 )2E

vent

s / (

115

MeV

/c 0.073± = 0.240 Lα 0.58± = 1.42 Rα

2 26 MeV/c± = 5123 µ2 30 MeV/c± = 121 σ

0.90± = 93.21 Ln

1.1± = 144.6 Rn

LHCb

Simulation

/ndf = 5.0/4 = 1.22χ

4600 4800 5000 5200 5400 5600

]2) [MeV/cγ+π−π+M(K

5−05

Pull

Figure 25: Mass distribution of B+ → K+π−π+γ candidates selected in B+ → K+π−π+η events.

• The decay B0 → K01 (→ K+π−π0)γ, where π0 is not reconstructed and a charged

track is wrongly associated to the resonance vertex, has a branching fraction esti-

mated to be the same as the signal. Only 12 events out of ∼ 5×105 simulated decays

survive the full selection. Figure 26 shows the results of a Gaussian fit on the events

surviving the stripping selection. This gives an estimated contamination of ∼ 0.3%

over the full mass range and ∼ 0.04% in the signal region. Therefore, this source of

background is found to be negligible.

• The decay B+ → π+π−π+γ, where a pion is mis-reconstructed as a kaon, has a

branching fraction expected to be lower than the signal by a factor of order ∼(Vtd/Vts)

2=0.05 [27]. No 2016 MC sample was available at the time of writing this

document to estimate the contamination from this background, but a similar study

on 2011 and 2012 data samples found a contamination of ∼ 6 × 10−4 [8], which is

negligible.

37

0

20

40

60

80

100

120

140

160

180

)2E

vent

s / (

500

MeV

/c

2 38 MeV/c± = 4656 µ2 32 MeV/c± = 1031 σ

LHCb

Simulation

/ndf = 19.5/8 = 2.42χ

2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500

]2) [MeV/cγ+π−π+M(K

5−05

Pull

Figure 26: Mass distribution of B+ → K+π−π+γ candidates selected in B0 → K01γ → K+π−π0γ

events. Only the stripping requirements are applied.

• For the other background sources presented in Table 9, no 2016 MC sample was

available at the time of writing this document, but they are expected to be strongly

suppressed by the hard photon cut and the cut made in Sec. 4.4 to remove the B+ →D0ρ+ background. The same study cited above found negligible contaminations [8].

• Finally, the three peaking backgrounds

B+ → K+ω(→ π+π−π0)

B+ → K+η′(→ π+π−η(→ γγ))

B+ → K+η′(→ ρ0(→ π+π−)γ)

are not considered, because ω and η′ are outside of the available phase space after the

selection, as it was checked by looking at the π−π+π0 and π−π+γ mass distributions

[8].

5.3 Mass fit

By putting together all the components defined above and introducing Nsig, Ncomb, Nmiss−πand Npart as the numbers of signal, combinatorial, missing pion and partially reconstructed

events respectively, one obtains a final fit function M defined as

M(m; µ, σ, τ, cpart, ppart,Nsig, Ncomb, Nmiss−π, Npart) =

NsigCB(m; µ, σ, α∗L, n∗L, α

∗R, n

∗R)

+Ncomb exp(τm)

+Nmiss−πA(m; µ−m∗π0 , c∗miss−π, p

∗miss−π)⊗G(m; 0, σ)

+NpartA(m; µ− 2m∗π0 , cpart, ppart)⊗G(m; 0, σ),

(27)

38

where the stars (*) denote fixed parameters listed in Table 11, whose values are determined

in Secs. 5.1 and 5.2.2.

Figure 27 presents the result of the final mass fits on the 2015, 2016 and 2017 data

samples and Table 10 compares these results with results obtained by a similar study on

2011 and 2012 data collected at a centre-of-mass energy of 7 and 8 TeV and corresponding

to integrated luminosities of 0.98 and 1.97 fb−1, respectively [8, 10]. The mass resolution

in 2016 and 2017 samples is in good agreement with what was obtained from 2012 data.

In 2015, the resolution is ∼ 15% larger; this is probably caused by differences in the ECAL

calibration.

Assuming that the fragmentation fraction does not depend on the collision energy√s

and that the pp→ bb cross-section is proportional to√s, it is relevant to compare among

the years the signal yield Nsig per unit of integrated luminosity L and collision energy√s,

as done in Table 10 (this quantity is approximately proportional to the total efficiency).

It can be seen that the yield increases in Run 2 by a factor larger than the simple energy

and integrated luminosity ratio, indicating an improved efficiency. Note that this factor

would be lower if a higher purity is required for the Run 2 signal.

Table 10: Collision energy, integrated luminosity, number of signal events for each year of data-taking, together with the associated mean and width of the mass distribution and yield per unitof integrated luminosity and collision energy. The results presented in the upper part (years2011−2012) are taken from Refs. [8, 10], the lower part (years 2015−2017) summarises the resultsof this study. Because the uncertainty on the 2017 integrated luminosity was not available, it wasestimated by assuming a same relative uncertainty in 2016 and 2017.

Year√s [ TeV] L [ fb−1] Nsig µ [ MeV/c2 ] σ [ MeV/c2 ]

Nsig

L√s [ fb TeV−1]

2011 7 0.98± 0.01 4084± 83 5279.4± 2.2 93.8± 2.0 596± 132012 8 1.97± 0.01 9787± 129 5279.3± 1.3 85.9± 1.2 622± 8

2015 13 0.29± 0.01 3163± 99 5271.1± 2.9 96.6± 3.2 829± 292016 13 1.64± 0.06 18382± 206 5275.5± 1.0 84.7± 1.0 862± 322017 13 1.71± 0.06 18110± 206 5258.3± 0.9 83.4± 1.0 815± 30

39

0

100

200

300

400

500

600

700

800

)2E

vent

s / (

60

MeV

/c

2 2.9 MeV/c± = 5271.1 µ2 3.2 MeV/c± = 96.6 σ

/MeV2 0.00026 c± = -0.000888 τ 386± = 1533 combN

190± = 2711 πmiss N

268± = 2775 partN

99± = 3163 sigN

4.7± = -13.17 partc

1.0± = 4.0 part

p

LHCb

preliminary

2015 Data

/ndf = 30.3/27 = 1.12χ

4500 5000 5500 6000 6500

]2) [MeV/cγ+π−π+M(K

5−05

Pull

0

1000

2000

3000

4000

5000

)2E

vent

s / (

60

MeV

/c

2 0.96 MeV/c± = 5275.49 µ2 0.99 MeV/c± = 84.68 σ

/MeV2 0.00014 c± = -0.001080 τ 833± = 5961 combN

625± = 14466 πmiss N

809± = 18038 partN

206± = 18382 sigN

1.9± = -1.75 partc

0.42± = 1.84 part

p

LHCb

preliminary

2016 Data

/ndf = 27.0/27 = 1.02χ

4500 5000 5500 6000 6500

]2) [MeV/cγ+π−π+M(K

5−05

Pull

0

1000

2000

3000

4000

5000

)2E

vent

s / (

60

MeV

/c

2 0.92 MeV/c± = 5258.29 µ2 0.97 MeV/c± = 83.45 σ

/MeV2 0.00013 c± = -0.001352 τ 1062± = 7640 combN

457± = 14920 πmiss N

792± = 14104 partN

206± = 18110 sigN

2.7± = -8.23 partc

0.57± = 3.23 part

p

LHCb

preliminary

2017 Data

/ndf = 47.6/27 = 1.82χ

4500 5000 5500 6000 6500

]2) [MeV/cγ+π−π+M(K

5−05

Pull

Figure 27: Mass distribution of the B+ → K+π−π+γ candidates selected in 2015 (top), 2016(middle) and 2017 (bottom) data. The results of a unbinned maximum likelihood fit with thefunction defined by Eq. 27 are shown (blue line). The components of the fit are also depicted: thesignal (dashed blue line), the combinatorial background (dashed red line), the single missing pionbackground (dashed cyan line) and the partially reconstructed background (dashed magenta line).

40

Table 11: Fixed parameters present in the definition of the final fit function (Eq. 27).

Fixed parameter Value

α∗L 2.207n∗L 1.16α∗R 1.415n∗R 8.9c∗miss−π −6.425p∗miss−π 0.058m∗π0 135.0 MeV/c2 [27]

41

6 Conclusion and outlook

This thesis has presented a selection of B± → K±π∓π±γ candidates collected by the LHCb

experiment at a centre-of-mass energy of 13 TeV. A cut-based strategy followed by the

training and application of a multivariate classifier has been described; several cuts and the

output of the classifier were optimised to maximise the significance. Approximately 3’000,

18’000 and 18’000 B± → K±π∓π±γ decays were selected in 2015, 2016 and 2017 data

samples corresponding to integrated luminosities of 0.29, 1.64 and 1.71 fb−1, respectively.

Depending on the needs of future studies, it may be useful to make a stricter cut on

the classifier output in order to increase the signal purity. Moreover, the characterisation

of background sources should be completed when more MC samples using 2016 and 2017

data-taking conditions will be available.

By adding the number of signal candidates found in the first run of the LHC [8, 9]

and the results of this study, approximately 50’000 signal decays are now available for the

measurement of the photon polarisation and the detection of a possible signal of physics

beyond the Standard Model.

As a final note, the civil-engineering work for the High-Luminosity LHC (HL-LHC)

started exactly one week before the submission of this report [41]; the upgraded machine

is designed to deliver an increased luminosity by a factor of five to seven with respect to

its current value [42]. Together with the results of other experiments such as Belle II [43],

the next decade will give us many opportunities to pursue a better understanding of how

Nature works at its most fundamental level.

Acknowledgements

I would like to express my gratitude to CERN and the LHCb collaboration without which

this project would not be possible; to my director Prof. Dr. Olivier Schneider, for his

guidance throughout my work and for having given me the opportunity to be a student-

assistant in his introduction to particle physics course; to my supervisor Dr. Preema

Pais, for her unconditional support, advices and enthusiasm; to Violaine Bellee, for all

her help and the time needed to generate the data samples; to my colleagues, for their

encouragement and comments.

42

A Appendix

A.1 Uncertainty on efficiency

Following Ref. [44], one presents here two methods to estimate the efficiency ε of a selection

where k out of n events survive a set of cuts. In this study, due to the high number of

candidates, both methods give very similar results. Unless otherwise stated, the results of

the bayesian approach is used throughout this document.

Binomial error

In the classical approach, one considers that the selection is a binomial process described

by the probability function

P (k; ε, n) =

(n

k

)εk(1− ε)n−k. (28)

The estimators of the efficiency and its uncertainty are then given by [44]ε =

k

n,

σε =

√ε(1− ε)

n=

√k(n− k)

n3.

(29)

This method cannot be correct in the general case, because it gives unphysical results

for the uncertainty in the limits k → 0 and k → n.

Bayesian approach

In a bayesian approach, one starts from the Bayes theorem and writes

P (ε; k, n) =P (k; ε, n)P (ε; n)

C, (30)

where P (ε; n) is a prior probability and C a normalisation constant. By developing this

last equation, T. Ullrich and Z. Xu obtain the estimators [44]ε =

k

n,

σε =

√(k + 1)(k + 2)

(n+ 2)(n+ 3)−(k + 1

n+ 2

)2

.

(31)

43

A.2 Background coming from B+ → D0ρ+ decays

As explained in Sec. 4.4, the background coming from the decay

B+ → D0(→ K+ρ−(→ π−π0))ρ+(→ π+π0)

is suppressed by introducing the following two cuts :

1. M(K+π−π0) > 2200 MeV/c2 > M(D0).

2. M(π+π0) > 1100 MeV/c2 > M(ρ+).

By looking at Fig. 13 in Sec. 4.4, one may conclude that the first cut is unnecessary,

because the background is outside of the signal region. The reason for this is that the

resonance mass requirement M(K+π−π+) ∈ [1100, 1900] MeV/c2 is already applied to

produce this correlation plot. Figure 28 shows the same plot without the resonance mass

window requirement and it can be seen in this case that the D0 peak is much higher and

that this background is very present in the signal region. A similar result was obtained

for the ρ+ peak in Fig. 14. This remark may be taken into account if the resonance mass

window is chosen to be wider in future studies.

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

2000

3000

4000

5000

6000

M(K

+π−π

0)

[MeV

/c2]

0

20000

dN

/(30

MeV

/c2)

0 25000dN/(30 MeV/c2)

200

400

600

800

1000

1200

dN

/(30

MeV

/c2)/

(30

MeV

/c2)

Figure 28: M(K+π−π0) and M(K+π−π+γ) for 2016 data, where M(K+π−π0) is computedby assigning the π0 mass to the photon candidate. In data, the large peak around 1900 MeV/c2

corresponds to D0. All the requirement listed in Table 6 are applied to the distributions exceptfor the two last ones and for the resonance mass window.

44

A.3 2015 and 2017 data

In this appendix, several figures obtained with the 2015 and 2017 data samples are shown.

All of them correspond to figures that were presented in the main text for the 2016 data

sample. They are listed below:

• Figures 29−32 are overlay plots and correspond to Figs. 9 and 10 in the main text.

• Figures 33, 34 and 35 are correlation plots and correspond to Figs. 12, 13 and 14 in

the main text, respectively.

• Figure 36 shows correlation matrices and corresponds to Fig. 17 in the main text.

• Figure 37 shows classifiers outputs distributions and corresponds to Fig. 19 in the

main text.

45

0 2000 4000 6000 8000 10000 12000 14000 16000

Max track pT [MeV/c]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 10000 20000 30000 40000 50000 60000

B pT [MeV/c]

0.0

0.2

0.4

0.6

0.8

1.0

1.2

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 1000 2000 3000 4000 5000 6000 7000

M(K + π − π + ) [MeV/c2]

0

1

2

3

4

5

(1/N

) d

N/d

x

1e 3

Low-mass sideband

High-mass sideband

Signal MC

2000 4000 6000 8000 10000 12000 14000 16000 18000

Photon ET [MeV]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 5 10 15 20 25 30

K + π − π + vertex isolation ∆χ2

0.0

0.1

0.2

0.3

0.4

0.5

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 29: Offline selection variables after applying the requirements on the trigger lines (2015data). Each histogram is normalised to unit area.

46

0.0 0.2 0.4 0.6 0.8 1.0

KP(K)(1−KP(π))

0

5

10

15

20

25

30

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

π +P(π + )(1− π +P(K))

0

5

10

15

20

25

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

π −P(π − )(1− π −P(K))

0

5

10

15

20

25

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

Photon CL

0

2

4

6

8

10

12

14

16

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Photon/π0 separation

0

2

4

6

8

10

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 30: PID variables after applying the requirements on the trigger lines (2015 data). Eachhistogram is normalised to unit area.

47

0 2000 4000 6000 8000 10000 12000 14000 16000

Max track pT [MeV/c]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 5000 10000 15000 20000 25000 30000 35000 40000

B pT [MeV/c]

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 1000 2000 3000 4000 5000 6000

M(K + π − π + ) [MeV/c2]

0

1

2

3

4

5

(1/N

) d

N/d

x

1e 3

Low-mass sideband

High-mass sideband

Signal MC

2000 4000 6000 8000 10000 12000 14000 16000 18000

Photon ET [MeV]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

(1/N

) d

N/d

x

1e 4

Low-mass sideband

High-mass sideband

Signal MC

0 5 10 15 20 25 30

K + π − π + vertex isolation ∆χ2

0.0

0.1

0.2

0.3

0.4

0.5

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 31: Offline selection variables after applying the requirements on the trigger lines (2017data). Each histogram is normalised to unit area.

48

0.0 0.2 0.4 0.6 0.8 1.0

KP(K)(1−KP(π))

0

5

10

15

20

25

30

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

π +P(π + )(1− π +P(K))

0

5

10

15

20

25

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

π −P(π − )(1− π −P(K))

0

5

10

15

20

25

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.0 0.2 0.4 0.6 0.8 1.0

Photon CL

0

2

4

6

8

10

12

14

16

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Photon/π0 separation

0

2

4

6

8

10

(1/N

) d

N/d

x

Low-mass sideband

High-mass sideband

Signal MC

Figure 32: PID variables after applying the requirements on the trigger lines (2017 data). Eachhistogram is normalised to unit area.

49

4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

1200

1400

1600

1800

2000

2200

2400

M(K

+π−π

+)

[MeV

/c2]

0

500

dN

/(30

MeV

/c2)

0 500dN/(30 MeV/c2)

5

10

15

20

25

30

35

40

dN

/(30

MeV

/c2)/

(30

MeV

/c2)

2015 Data

4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

1200

1400

1600

1800

2000

2200

2400

M(K

+π−π

+)

[MeV

/c2]

0

2000

dN

/(20

MeV

/c2)

0 2500dN/(20 MeV/c2)

20

40

60

80

100

dN

/(20

MeV

/c2)/

(20

MeV

/c2)

2017 Data

Figure 33: M(K+π−π+) and M(K+π−π+γ) for 2015 data (top) and 2017 data (bottom). Allthe requirements listed in Table 6 are applied to the distributions except for the resonance masswindow.

50

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

2000

3000

4000

5000

6000M

(K+π−π

0)

[MeV

/c2]

0

1000

dN

/(50

MeV

/c2)

0 1000dN/(50 MeV/c2)

20

40

60

80

100

120

140

dN

/(50

MeV

/c2)/

(50

MeV

/c2)

2015 Data

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

1000

2000

3000

4000

5000

6000

M(K

+π−π

0)

[MeV

/c2]

0

5000

dN

/(30

MeV

/c2)

0 5000dN/(30 MeV/c2)

100

200

300

400

500d

N/(

30M

eV/c

2)/

(30

MeV

/c2)

2017 Data

Figure 34: M(K+π−π0) and M(K+π−π+γ) for 2015 data (top) and 2017 data (bottom), whereM(K+π−π0) is computed by assigning the π0 mass to the photon candidate. In data, the smallpeak around 1900 MeV/c2 corresponds to D0 (see text for details). All the requirements listed inTable 6 are applied to the distributions except for the two last ones.

51

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

0

1000

2000

3000

4000

5000M

(π+π

0)

[MeV

/c2]

0

1000

dN

/(50

MeV

/c2)

0 1000dN/(50 MeV/c2)

10

20

30

40

50

60

70

80

dN

/(50

MeV

/c2)/

(50

MeV

/c2)

2015 Data

3500 4000 4500 5000 5500 6000 6500M(K+π−π+γ) [MeV/c2]

0

1000

2000

3000

4000

5000

M(π

0)

[MeV

/c2]

0

5000

dN

/(30

MeV

/c2)

0 5000dN/(30 MeV/c2)

50

100

150

200

dN

/(30

MeV

/c2)/

(30

MeV

/c2)

2017 Data

Figure 35: M(π+π0) and M(K+π−π+γ) for 2015 data (top) and 2017 data (bottom), whereM(π+π0) is computed by assigning the π0 mass to the photon candidate. In data, the peakaround 800 MeV/c2 corresponds to ρ+ (see text for details). All the requirements listed in Table 6are applied to the distributions except for the two last ones.

52

M(K

++

)

K+

 IP 

2

+ IP

 2

 IP 

2

B+

 IP 

2

B+

 DIR

A

B+

 FD

 2

K*  

vert

ex is

olat

ion 

2  (1

 trac

k)

K*  

vert

ex  

2

K *  vertex   2

K *  vertex isolation  2 (1 track)

B +  FD  2

B +  DIRA

B +  IP  2

 IP  2

+  IP  2

K +  IP  2

M(K + + )

­0.04 0.04 0.04 0.05 ­0.02 0.03 0.05 ­0.01 1.00

0.04 0.12 0.11 0.11 ­0.01 0.02 0.13 1.00 ­0.01

0.06 0.57 0.67 0.51 0.02 0.06 1.00 0.13 0.05

­0.10 0.08 0.08 0.08 ­0.14 1.00 0.06 0.02 0.03

­0.10 0.03 0.02 0.02 1.00 ­0.14 0.02 ­0.01 ­0.02

0.09 0.40 0.37 1.00 0.02 0.08 0.51 0.11 0.05

0.09 0.43 1.00 0.37 0.02 0.08 0.67 0.11 0.04

0.10 1.00 0.43 0.40 0.03 0.08 0.57 0.12 0.04

1.00 0.10 0.09 0.09 ­0.10 ­0.10 0.06 0.04 ­0.04

0.8

0.4

0.0

0.4

0.8

M(K

++

)

K+

 IP 

2

+ IP

 2

 IP 

2

B+

 IP 

2

B+

 DIR

A

B+

 FD

 2

K*  

vert

ex is

olat

ion 

2  (1

 trac

k)

K*  

vert

ex is

olat

ion 

2  (2

 trac

ks)

K*  

vert

ex  

2

B+

 Con

e P T

 ass

ymet

ry (R

=1)

B +  Cone PT assymetry (R = 1)

K *  vertex   2

K *  vertex isolation  2 (2 tracks)

K *  vertex isolation  2 (1 track)

B +  FD  2

B +  DIRA

B +  IP  2

 IP  2

+  IP  2

K +  IP  2

M(K + + )

0.12 0.04 0.04 0.04 ­0.06 0.05 0.02 0.04 0.04 ­0.11 1.00

­0.06 ­0.02 ­0.01 ­0.00 0.06 ­0.04 0.01 ­0.00 ­0.01 1.00 ­0.11

0.02 0.08 0.07 0.07 ­0.00 0.01 0.09 0.17 1.00 ­0.01 0.04

0.02 0.06 0.05 0.05 ­0.00 0.01 0.06 1.00 0.17 ­0.00 0.04

0.06 0.59 0.59 0.53 0.02 0.07 1.00 0.06 0.09 0.01 0.02

­0.08 0.09 0.09 0.09 ­0.21 1.00 0.07 0.01 0.01 ­0.04 0.05

­0.17 0.00 0.00 0.00 1.00 ­0.21 0.02 ­0.00 ­0.00 0.06 ­0.06

0.12 0.39 0.40 1.00 0.00 0.09 0.53 0.05 0.07 ­0.00 0.04

0.12 0.49 1.00 0.40 0.00 0.09 0.59 0.05 0.07 ­0.01 0.04

0.13 1.00 0.49 0.39 0.00 0.09 0.59 0.06 0.08 ­0.02 0.04

1.00 0.13 0.12 0.12 ­0.17 ­0.08 0.06 0.02 0.02 ­0.06 0.12

0.8

0.4

0.0

0.4

0.8

Figure 36: Correlation between the B candidate mass and the 2015 (top) and 2017 (bottom)training variables.

53

0.0 0.2 0.4 0.6 0.8 1.0Classifier output

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) dN

/dx

Kolmogorov-Smirnov test: signal (background) p-value = 0.916 (0.287)Signal (test)Background (test)

Signal (training)Background (training)

0.0 0.2 0.4 0.6 0.8 1.0Classifier output

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

(1/N

) dN

/dx

Kolmogorov-Smirnov test: signal (background) p-value = 0.288 (0.890)Signal (test)Background (test)

Signal (training)Background (training)

2015 Data

0.0 0.2 0.4 0.6 0.8 1.0Classifier output

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) dN

/dx

Kolmogorov-Smirnov test: signal (background) p-value = 0.316 (0.303)Signal (test)Background (test)

Signal (training)Background (training)

0.0 0.2 0.4 0.6 0.8 1.0Classifier output

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(1/N

) dN

/dx

Kolmogorov-Smirnov test: signal (background) p-value = 0.279 (0.383)Signal (test)Background (test)

Signal (training)Background (training)

2017 Data

Figure 37: Distribution of two classifiers outputs and comparison between the results obtainedwith the training and test sets. The first classifier (left) is trained on A0B0 and tested on A0B1;the second classifier (right) is trained on A1B0 and tested on A1B1. Comparison between 2015data (top) and 2017 data (bottom).

54

References

[1] F. Zwicky, On the masses of nebulae and of clusters of nebulae, Astrophys. J. 86

(1937) 217.

[2] D. Clowe, M. Bradac, A. H. Gonzalez, M. Markevitch, S. W. Randall, C. Jones

et al., A direct empirical proof of the existence of dark matter, Astrophys. J. 648

(2006) L109 [astro-ph/0608407].

[3] Planck collaboration, P. A. R. Ade et al., Planck 2015 results. XIII. Cosmological

parameters, Astron. Astrophys. 594 (2016) A13 [1502.01589].

[4] M. Gronau, Y. Grossman, D. Pirjol and A. Ryd, Measuring the photon polarization

in B → Kππγ, Phys. Rev. Lett. 88 (2002) 051802 [hep-ph/0107254].

[5] M. Gronau and D. Pirjol, Photon polarization in radiative B decays, Phys. Rev.

D66 (2002) 054008 [hep-ph/0205065].

[6] D. Becirevic, E. Kou, A. Le Yaouanc and A. Tayduganov, Future prospects for the

determination of the Wilson coefficient C ′7γ , JHEP 08 (2012) 090 [1206.1502].

[7] I. Leboucq, Observation of the decay B+ → K+π−π+γ at LHCb, Master thesis,

EPFL, 2012.

[8] G. Veneziano, Towards the measurement of photon polarisation in the decay

B+ → K+π−π+γ, Ph.D. thesis, EPFL, 2016.

[9] A. Puig Navarro, First measurements of radiative B decays in LHCb, Ph.D. thesis,

Barcelona U., 2012.

[10] LHCb collaboration, R. Aaij et al., Observation of photon polarization in the b→ sγ

transition, Phys. Rev. Lett. 112 (2014) 161801 [1402.6852].

[11] C. Mordasini, Study of the B+ → K+π−π+γ selection at LHCb, Master thesis,

EPFL, 2017.

[12] S. L. Glashow, Partial symmetries of weak interactions, Nucl. Phys. 22 (1961) 579.

[13] S. Weinberg, A model of leptons, Phys. Rev. Lett. 19 (1967) 1264.

[14] D. Galbraith and C. Burgard, Standard model, standard infographic, 2012.

[15] M. Kobayashi and T. Maskawa, CP violation in the renormalizable theory of weak

interaction, Prog. Theor. Phys. 49 (1973) 652.

[16] J. Ellis, TikZ-Feynman: Feynman diagrams with TikZ, Comput. Phys. Commun.

210 (2017) 103 [1601.05437].

[17] LHCb collaboration, A. A. Alves, Jr. et al., The LHCb detector at the LHC, JINST

3 (2008) S08005.

55

[18] R. Aaij et al., Performance of the LHCb Vertex Locator, JINST 9 (2014) P09007

[1405.7808].

[19] A. Coccaro, Track reconstruction and b-jet identification for the ATLAS trigger

system, J. Phys. Conf. Ser. 368 (2012) 012034 [1112.0180].

[20] LHCb Outer Tracker group, P. d’Argent et al., Improved performance of the LHCb

Outer Tracker in LHC Run 2, JINST 12 (2017) P11016 [1708.00819].

[21] LHCb collaboration, E. Michielin, The LHCb trigger in Run II, PoS ICHEP2016

(2016) 996.

[22] LHCb collaboration, B. Sciascia, LHCb Run 2 trigger performance, PoS

BEAUTY2016 (2016) 029.

[23] T. Sjostrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, P. Ilten et al., An

introduction to PYTHIA 8.2, Comput. Phys. Commun. 191 (2015) 159 [1410.3012].

[24] GEANT4 collaboration, S. Agostinelli et al., GEANT4: A simulation toolkit, Nucl.

Instrum. Meth. A506 (2003) 250.

[25] LHCb Starterkit team, A. Puig, The LHCb Starterkit, J. Phys. Conf. Ser. 898

(2017) 082054.

[26] LHCb collaboration, R. Aaij et al., Measurement of the B± production cross-section

in pp collisions at√s = 7 and 13 TeV, JHEP 12 (2017) 026 [1710.04921].

[27] Particle Data Group, C. Patrignani et al., Review of Particle Physics, Chin. Phys.

C40 (2016) 100001.

[28] R. Aaij et al., Selection and processing of calibration samples to measure the particle

identification performance of the LHCb experiment in Run 2, 1803.00824.

[29] T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning,

Springer Series in Statistics. Springer New York Inc., New York, NY, USA, 2001.

[30] T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proceedings

of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and

Data Mining, KDD ’16, (New York, NY, USA), pp. 785–794, ACM, 2016,

1603.02754.

[31] L. Breiman, Random forests, Machine Learning 45 (2001) 5.

[32] F. Pedregosa et al., Scikit-learn: Machine learning in Python, J. Machine Learning

Res. 12 (2011) 2825 [1201.0490].

[33] A. Muller and S. Guido, Introduction to machine learning with Python: A guide for

data scientists. O’Reilly, 2016.

56

[34] CDF collaboration, A. Abulencia et al., Search for Bs → µ+µ− and Bd → µ+µ−

decays in pp collisions with CDF II, Phys. Rev. Lett. 95 (2005) 221805

[hep-ex/0508036].

[35] M. Chrzaszcz, Search for charged lepton flavour violation at LHCb experiment,

Ph.D. thesis, Cracow, INP, 2014-11-07.

[36] A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. von Toerne and H. Voss,

TMVA: Toolkit for Multivariate Data Analysis, PoS ACAT (2007) 040

[physics/0703039].

[37] R. Brun and F. Rademakers, ROOT: An object oriented data analysis framework,

Nucl. Instrum. Meth. A389 (1997) 81.

[38] W. Verkerke and D. P. Kirkby, The RooFit toolkit for data modeling, eConf

C0303241 (2003) MOLT007 [physics/0306116].

[39] T. Skwarnicki, A study of the radiative CASCADE transitions between the

Upsilon-Prime and Upsilon resonances, Ph.D. thesis, Cracow, INP, 1986.

[40] ARGUS collaboration, H. Albrecht et al., Search for hadronic b→ u decays, Phys.

Lett. B241 (1990) 278.

[41] C. Pralavorio, Major work starts to boost the luminosity of the LHC, www. cern. ch

(2018) Accessed: 15.06.2018.

[42] G. Apollinari, O. Brning, T. Nakamoto and L. Rossi, High Luminosity Large

Hadron Collider HL-LHC, CERN Yellow Report (2015) 1 [1705.08830].

[43] J. Bennett, The Belle II experiment: status and physics prospects, Int. J. Mod.

Phys. Conf. Ser. 46 (2018) 1860082.

[44] T. Ullrich and Z. Xu, Treatment of errors in efficiency calculations, 2007,

physics/0701199v1.

57