Evidence for the Higgs-boson Yukawa Coupling to Tau Leptons With the ATLAS Detector

84
EUROPEAN ORGANISATION FOR NUCLEAR RESEARCH (CERN) CERN-PH-EP-2014-262 Submitted to: Journal of High Energy Physics Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector The ATLAS Collaboration Abstract Results of a search for H ! ⌧⌧ decays are presented, based on the full set of proton–proton collision data recorded by the ATLAS experiment at the LHC during 2011 and 2012. The data corre- spond to integrated luminosities of 4.5 fb -1 and 20.3 fb -1 at centre-of-mass energies of p s =7 TeV and p s =8 TeV respectively. All combinations of leptonic (! `⌫ ¯ with ` = e, μ) and hadronic (! hadrons ) tau decays are considered. An excess of events over the expected background from other Standard Model processes is found with an observed (expected) significance of 4.5 (3.4) stan- dard deviations. This excess provides evidence for the direct coupling of the recently discovered Higgs boson to fermions. The measured signal strength, normalised to the Standard Model expectation, of μ =1.43 +0.43 -0.37 is consistent with the predicted Yukawa coupling strength in the Standard Model. c 2015 CERN for the benefit of the ATLAS Collaboration. Reproduction of this article or parts of it is allowed as specified in the CC-BY-3.0 license. arXiv:1501.04943v2 [hep-ex] 26 Jan 2015

description

higgs discovery

Transcript of Evidence for the Higgs-boson Yukawa Coupling to Tau Leptons With the ATLAS Detector

EUROPEAN ORGANISATION FOR NUCLEAR RESEARCH (CERN)CERN-PH-EP-2014-262Submitted to: Journal of High Energy PhysicsEvidence for the Higgs-boson Yukawa coupling to tau leptonswith the ATLAS detectorThe ATLAS CollaborationAbstractResultsofasearchfor H decaysarepresented, basedonthefull setofprotonprotoncollision data recorded by the ATLAS experiment at the LHC during 2011 and 2012. The data corre-spond to integrated luminosities of 4.5fb1and 20.3fb1at centre-of-mass energies of s = 7 TeVand s=8TeVrespectively. All combinationsof leptonic( ` with` =e, )andhadronic( hadrons ) tau decays are considered. An excess of events over the expected background fromother Standard Model processes is found with an observed (expected) signicance of 4.5 (3.4) stan-dard deviations. This excess provides evidence for the direct coupling of the recently discovered Higgsboson to fermions. The measured signal strength, normalised to the Standard Model expectation, of = 1.43+0.430.37 is consistent with the predicted Yukawa coupling strength in the Standard Model.c2015 CERN for the benet of the ATLAS Collaboration.Reproduction of this article or parts of it is allowed as specied in the CC-BY-3.0 license.arXiv:1501.04943v2 [hep-ex] 26 Jan 2015PreparedforsubmissiontoJHEPEvidencefortheHiggs-bosonYukawacouplingtotauleptonswiththeATLASdetectorThe ATLAS CollaborationAbstract: Results of a search forH decays are presented, based on the full set ofprotonproton collision data recorded by the ATLAS experiment at the LHC during 2011and2012. Thedatacorrespondtointegratedluminositiesof 4.5fb1and20.3fb1atcentre-of-mass energies of s = 7 TeV and s = 8 TeV respectively. All combinations ofleptonic ( ` with` = e, ) and hadronic ( hadrons) tau decays are considered.AnexcessofeventsovertheexpectedbackgroundfromotherStandardModelprocessesisfoundwithanobserved(expected)signicanceof4.5(3.4)standarddeviations. Thisexcess provides evidence for the direct coupling of the recently discovered Higgs boson tofermions. The measured signal strength, normalised to the Standard Model expectation, of=1.43+0.430.37is consistent with the predicted Yukawa coupling strength in the StandardModel.Contents1 Introduction 22 The ATLAS detector and object reconstruction 33 Data and simulated samples 54 Event selection and categorisation 74.1 Event selection 74.2 Analysis categories 114.3 Higgs boson candidate mass reconstruction 135 Boosted decision trees 156 Background estimation 196.1 Background fromZ production 196.2 Background from misidentied leptons or hadronically decaying taus 216.3 Z ee andZ background 226.4 W+jets background 236.5 Background from top-quark production 246.6 Diboson background 246.7 Contributions from other Higgs boson decays 246.8 Validation of background estimates 247 Systematic uncertainties 287.1 Experimental uncertainties 287.2 Background modelling uncertainties 297.3 Theoretical uncertainties 308 Signal extraction procedure 349 Results 4110 Cut-based analysis 4711 Conclusions 5412 Acknowledgements 54 1 1 IntroductionThe investigation of the origin of electroweak symmetry breaking and, related to this, theexperimental conrmation of the BroutEnglertHiggs mechanism [16] is one of the primegoals of the physics programme at the Large Hadron Collider (LHC) [7]. With the discoveryof a Higgs boson with a mass of approximately 125 GeVby the ATLAS [8] and CMS [9]collaborations, animportantmilestonehasbeenreached. Moreprecisemeasurementsofthe properties of the discovered particle [10, 11] as well as tests of the spinparity quantumnumbers[1214] continuetobeconsistentwiththepredictionsfortheStandardModel(SM) Higgs boson.These measurements rely predominantly on studies of the bosonic decay modes, H ,H ZZandH WW. To establish the mass generation mechanism for fermions asimplementedintheSM,itisofprimeimportancetodemonstratethedirectcouplingofthe Higgs boson to fermions and the proportionality of its strength to mass [15]. The mostpromisingcandidatedecaymodesarethedecaysintotauleptons, H , andbottomquarks(b-quarks), H bb. Duetothehighbackground, thesearchfordecaystobb isrestricted to Higgs bosons produced in modes which have a more distinct signature but alowercross-section, suchasHproductionwithanassociatedvectorboson. Thesmallerrate of these processes in the presence of still large background makes their detection chal-lenging. More favourable signal-to-background conditions are expected for H decays.Recently, the CMS Collaboration published evidence for H decays at a signicance interms of standard deviations of3.2 [16], and an excess corresponding to a signicance of2.1 in the search forH bb decays [17]. The combination of channels provides evidencefor fermionic couplings with a signicance of 3.8 [18]. The yield of events in the search forH bb decays observed by the ATLAS Collaboration has a signal signicance of 1.4 [19].The Tevatron experiments have observed an excess corresponding to2.8in theH bbsearch [20].In this paper,the results of a search forH decays are presented,based on thefull protonproton dataset collected by the ATLAS experiment during the 2011 and 2012data-takingperiods, correspondingtointegratedluminositiesof 4.5fb1atacentre-of-massenergyof s=7TeVand20.3fb1at s=8TeV. Theseresultssupersedetheearlierupperlimitsonthecrosssectiontimesthebranchingratioobtainedwiththe7TeV data [21]. All combinations of leptonic ( ` with`=e, ) and hadronic ( hadrons ) taudecaysareconsidered.1Thecorrespondingthreeanalysischannelsaredenotedbyleplep, lephad, andhadhadinthefollowing. Thesearchisdesignedtobesensitive to the major production processes of a SM Higgs boson, i.e. production via gluonfusion(ggF)[22], vector-bosonfusion(VBF)[23], andassociatedproduction(V H)withV= Wor Z. These production processes lead to dierent nal-state signatures, which areexploited by dening an event categorisation. Two dedicated categories are considered toachieve both a good signal-to-background ratio and good resolution for the reconstructed invariant mass. The VBF category, enriched in events produced via vector-boson fusion,1Throughout this paper the inclusion of charge-conjugate decay modes is implied. 2 isdenedbythepresenceof twojetswithalargeseparationinpseudorapidity.2Theboosted category contains events where the reconstructed Higgs boson candidate has a largetransverse momentum. It is dominated by events produced via gluon fusion with additionaljetsfromgluonradiation. Inviewof thesignal-to-backgroundconditions, andinordertoexploitcorrelationsbetweennal-stateobservables, amultivariateanalysistechnique,basedonboosteddecisiontrees(BDTs)[2426], isusedtoextractthenal results. Asa cross-check, a separate analysis where cuts on kinematic variables are applied is carriedout.2 The ATLAS detector and object reconstructionThe ATLAS detector [27] is a multi-purpose detector with a cylindrical geometry. It com-prises an inner detector (ID) surrounded by a thin superconducting solenoid, a calorimetersystemandanextensivemuonspectrometerinatoroidalmagneticeld. TheIDtrack-ing system consists of a silicon pixel detector,a silicon microstrip detector (SCT), and atransition radiation tracker (TRT). It provides precise position and momentum measure-ments for charged particles and allows ecient identication of jets containingb-hadrons(b-jets) in the pseudorapidity range || < 2.5. The ID is immersed in a2T axial magneticeld and is surrounded by high-granularity lead/liquid-argon (LAr) sampling electromag-netic calorimeters which cover the pseudorapidity range || 30GeV. To reduce the contamination of jets by additionalinteractions in the same or neighbouring bunch crossings (pile-up), tracks originating fromthe primary vertex must contribute a large fraction of thepT when summing the scalarpTof all tracks in the jet. This jet vertex fraction (JVF) is required to be at least 75% (50%)for jets with || < 2.4 in the7TeV (8TeV) dataset. Moreover, for the 8 TeV dataset, theJVF selection is applied only to jets withpT< 50GeV. Jets with no associated tracks areretained.In the pseudorapidity range || < 2.5, b-jets are selected using a tagging algorithm [38].Theb-jet tagging algorithm has an eciency of 6070% forb-jets in simulatedtt events.The corresponding light-quark jet misidentication probability is 0.11%, depending on thejetspT and.Hadronically decaying tau leptons are reconstructed starting from clusters of energy inthe electromagnetic and hadronic calorimeters. Thehad3reconstruction is seeded by theanti-ktjet ndingalgorithmwitha radiusparameterR=0.4. Tracks withpT>1GeVwithin a cone of radius 0.2 around the cluster barycentre are matched to the had candidate,and the had charge is determined from the sum of the charges of its associated tracks. Therejection of jets is provided in a separate identication step using discriminating variablesbasedontracks withpT>1GeVandthe energydepositedincalorimetercells foundinthe core region (R < 0.2) and in the region0.2 < R < 0.4 around thehad candidatesdirection. Such discriminating variables are combined in a boosted decision tree and threeworking points, labelled tight, medium and loose [39], are dened, corresponding to dierent3In the following, thehadsymbol always refers to the visible decay products of thehadronic decay. 4 had identication eciency values.In this analysis,hadcandidates withpT> 20GeV and || < 2.47 are used. Thehadcandidates are required to have charge 1, and must be1- or3-track (prong) candidates.In addition, a sample without the charge and track multiplicity requirements is retained forbackground modelling in thehadhadchannel, as described in section 6.2. The identica-tion eciency forhad candidates satisfying the medium criteria is of the order of 5560%.Dedicated criteria [39] to separatehad candidates from misidentied electrons are also ap-plied, with a selection eciency for true had decays of 95%. The probability to misidentifya jet withpT> 20GeV as ahad candidate is typically 12%.Following their reconstruction, candidate leptons, hadronically decaying taus and jetsmaypoint tothesameenergydeposits inthecalorimeters (withinR 40 GeV,Z eventscanbeselectedfromthedatawithhigheciencyandpurity. Toreplacethemuonsinthe selected events, all tracks associated with the muons are removed and calorimeter cellenergies associated with the muons are corrected by subtracting the corresponding energydepositions in a single simulatedZ event with the same kinematics. Finally,boththe track information and the calorimeter cell energies from a simulatedZ decay areaddedtothedataevent. ThedecaysofthetauleptonsaresimulatedbyTauola[71].4These processes are hereafter for simplicity denoted byZ andZ respectively, even thoughthe whole continuum above and below theZpeak is considered. 6 The tau lepton kinematics are matched to the kinematics of the muons they are replacing,including polarisation and spin correlations [72], and the mass dierence between the muonsand the tau leptons is accounted for. This hybrid sample is referred to as embedded datain the following.Otherbackgroundprocessesaresimulatedusingdierentgenerators, eachinterfacedto Pythia [46, 73] or Herwig [74, 75] to provide the parton shower, hadronisation and themodelling of the underlying event, as indicated in table 1. For theHerwig samples, thedecays of tau leptons are also simulated using Tauola [71]. Photon radiation from chargedleptons for all samples is provided by Photos [76]. The samples for W/Z+jets productionare generated with Alpgen [77], employing the MLM matching scheme [78] between thehard process (calculated with LO matrix elements for up to ve partons) and the partonshower. ForWWproduction, the loop-inducedgg WWprocess is also generated, usingthe gg2WW [79] program. In the AcerMC [80], Alpgen, and Herwig event generators,theCteq6L1 parameterisation of the PDFs is used, while theCT10 parameterisation isusedforthegenerationofeventswithgg2WW.Thenormalisationofthesebackgroundcontributions is either estimated from control regions using data, as described in section 6,or the cross sections quoted in table 1 are used.For all samples, a full simulation of the ATLAS detector response [81] using the Geant4program[82] wasperformed. Inaddition, eventsfromminimum-biasinteractionsweresimulatedusingtheAU2[83] parametertuningof Pythia8. TheAU2tuneincludesthe set of optimized parameters for the parton shower, hadronisation, and multiple partoninteractions. They are overlaid on the simulated signal and background events according tothe luminosity prole of the recorded data.The contributions from these pile-up interactionsare simulated both within the same bunch crossing as the hard-scattering process and inneighbouring bunch crossings. Finally, the resulting simulated events are processed throughthe same reconstruction programs as the data.4 Event selection and categorisation4.1 Event selectionSingle lepton, dilepton and di-had triggers were used to select the events for the analysis. Asummary of the triggers used by each channel at the two centre-of-mass energies is reportedin table 2. Due to the increasing luminosity and the dierent pile-up conditions, the onlinepT thresholds increased during data-taking in 2011 and again for 2012, and more stringentidentication requirements were applied for the data-taking in 2012. ThepT requirementson the objects in the analysis are usually 2 GeV higher than the trigger requirements, toensure that the trigger is fully ecient.In addition to applying criteria to ensure that the detector was functioning properly,requirements to increase the purity and quality of the data sample are applied by rejectingnon-collision events such as cosmic rays and beam-halo events. At least one reconstructedvertex is required with at least four associated tracks withpT>400MeV and a positionconsistent with the beam spot. 7 Signal (mH= 125 GeV) MC generator B [pb]s = 8 TeVggF,H Powheg [4245] 1.22 NNLO+NNLL [4853, 84]+ Pythia8 [46]VBF,H Powheg+Pythia8 0.100 (N)NLO [5759, 84]WH,H Pythia8 0.0445 NNLO [62, 84]ZH,H Pythia8 0.0262 NNLO [62, 84]Background MC generator B [pb]s = 8 TeVW(`), (` = e, , ) Alpgen [77]+Pythia8 36800 NNLO [85, 86]Z/(``),Alpgen+Pythia8 3910 NNLO [85, 86]60 GeV< m``< 2 TeVZ/(``),Alpgen+Herwig [87] 13000 NNLO [85, 86]10 GeV< m``< 60 GeVVBFZ/(``) Sherpa [88] 1.1 LO [88]tt Powheg+Pythia8 253NNLO+NNLL [8994]Single top : Wt Powheg+Pythia8 22NNLO [95]Single top : s-channel Powheg+Pythia8 5.6NNLO [96]Single top : t-channel AcerMC [80]+Pythia6 [73] 87.8NNLO [97]q q WW Alpgen+Herwig 54NLO [98]gg WW gg2WW [79]+Herwig 1.4NLO [79]WZ, ZZ Herwig 30NLO [98]H WW same as forH signal 4.7Table1. MonteCarlogeneratorsusedtomodel thesignal andthebackgroundprocessesats=8TeV. Thecrosssectionstimesbranchingfractions( B)usedforthenormalisationofsome processes (many of these are subsequently normalised to data) are included in the last columntogether with the perturbative order of the QCD calculation. For the signal processes theH SM branching ratio is included, and for the Wand Z/ background processes the branching ratiosforleptonicdecays(`=e, , )ofthebosonsareincluded. Forall otherbackgroundprocesses,inclusive cross sections are quoted (marked with a ).Withrespecttotheobjectidenticationrequirementsdescribedinsection2, tightercriteriaareappliedtoaddressthedierentbackgroundcontributionsandcompositionsinthedierentanalysischannels. HigherpTthresholdsareappliedtoelectrons, muons,andhadcandidates according to the trigger conditions satised by the event, as listed intable2. Forthechannelsinvolvingleptonictaudecays, leplepandlephad, additionalisolation criteria for electrons and muons, based on tracking and calorimeter information,areusedtosuppressthebackgroundfrommisidentiedjetsorfromsemileptonicdecaysof charmandbottomhadrons. ThecalorimeterisolationvariableI(ET, R)isdenedas the sum of the total transverse energy in the calorimeter in a cone of sizeR aroundtheelectronclusterorthemuontrack, dividedbytheEToftheelectronclusterorthepTofthemuonrespectively. Thetrack-basedisolationI(pT, R)isdenedasthesumof thetransversemomentaof trackswithinaconeof Raroundtheelectronormuontrack, divided by theET of the electron cluster or the muonpT respectively. The isolation 8 s = 7 TeVTriggerTriggerAnalysis level thresholds [GeV]levelthresholds,lepleplephadhadhadpT[GeV]Single electron 2022 e:peT> 22 24e:peT> 25pT> 10 pT> 20Single muon 18:p1T> 20: p2T> 10 pT> 22e:pT> 20 pT> 20peT> 15Di-electron 12/12 ee:pe1T> 15 pe2T> 15Di-had29/20 :p1T> 35p2T> 25s = 8 TeVTriggerTriggerAnalysis level thresholds [GeV]levelthresholds,lepleplephadhadhadpT[GeV]Single electron 24e:peT> 26e: pT> 10 peT> 26ee:pe1T> 26 pT> 20pe2T> 15Single muon 24 :pT> 26pT> 20Di-electron 12/12 ee:pe1T> 15 pe2T> 15Di-muon 18/8 :p1T> 20 p2T> 10Electron+muon 12/8 e:peT> 15 pT> 10Di-had29/20 :p1T> 35p2T> 25Table2. Summary of the triggers used to select events for the dierent analysis channels at thetwo centre-of-mass energies. The transverse momentum thresholds applied at trigger level and inthe analysis are listed. When more than one trigger is used, a logical OR is taken and the triggereciencies are calculated accordingly.requirements applied are slightly dierent for the two centre-of-mass energies and are listedin table 3.In the hadhad channel, isolated taus are selected by requiring that there are no trackswithpT>0.5GeVinanisolationregionof 0.2 25 GeV.Within the collinear approximation [99], i.e. assuming that the tau directions are givenbythedirectionsofthevisibletaudecayproductsandthatthemomentaoftheneutri-nos constitute the missing transverse momentum, the tau momenta can be reconstructed.For tau decays,the fractions of the tau momenta carried by the visible decay products,6x,i= pvis,i/(pvis,i + pmis,i), withi=1, 2, are expected to lie in the interval 0 70 GeV are rejected. Contributions from tt events are reducedby rejecting events with ab-jet withpT> 30 GeV.Thehadhadchannel: one isolated mediumhad candidate and one isolated tighthadcandidatewithOSchargesarerequired. Eventswithelectronormuoncandidatesarerejected. Forall data, themissingtransversemomentummustsatisfyEmissT>20GeVanditsdirectionmusteitherbebetweenthetwovisiblehadcandidatesinorwithin40GeV,to eliminate low-massZ/events. Although this category isdominated by VBF events, it also includes smaller contributions from ggF andV Hproduction.The boosted category targets events with a boosted Higgs boson produced via ggF.Higgsbosoncandidatesarethereforerequiredtohavelargetransversemomentum,pHT> 100GeV.The pHTis reconstructed using the vector sum of the missing transversemomentum and the transverse momentum of the visible tau decay products. In theleplepchannel, at least one jet withpT>40GeV is required. The jet requirement7mT=p2p`TEmissT (1 cos ), where is the azimuthal separation between the directions of thelepton and the missing transverse momentum. 11 Channel Preselection cutsleplepExactly two isolated opposite-sign leptonsEvents withhad candidates are rejected30 GeV < mvis< 100(75)GeV for DF (SF) events``< 2.5EmissT> 20(40)GeV for DF (SF) eventsEmiss,HPTOT> 40 GeV for SF eventsp`1T+ p`2T> 35GeVEvents with ab-tagged jet withpT> 25 GeV are rejected0.1 < x1, x2< 1mcoll> mZ 25GeVlephadExactly one isolated lepton and one mediumhad candidate with opposite chargesmT< 70GeVEvents with ab-tagged jet withpT> 30 GeV are rejectedhadhadOne isolated medium and one isolated tight opposite-signhad-candidateEvents with leptons are vetoedEmissT> 20GeVEmissTpoints between the two visible taus in, or min[(, EmissT)] < /40.8 < R(had1, had2) < 2.4(had1, had2) < 1.5Channel VBF category selection cutsleplepAt least two jets withpj1T> 40 GeV andpj2T> 30 GeV(j1, j2) > 2.2lephadAt least two jets withpj1T> 50 GeV andpj2T> 30 GeV(j1, j2) > 3.0mvis> 40GeVhadhadAt least two jets withpj1T> 50 GeV andpj2T> 30 GeVpj2T> 35 GeV for jets with || > 2.4(j1, j2) > 2.0Channel Boosted category selection cutsleplepAt least one jet withpT> 40 GeVAllFailing the VBF selectionpHT> 100GeVTable 4. Summary of the event selection for the three analysis channels. The requirements usedin both the preselection and for the denition of the analysis categories are given. The labels (1)and (2) refer to the leading (highest pT) and subleading nal-state objects (leptons, had, jets). Thevariables are dened in the text.selectsaregionof thephasespacewheretheEmissTof same-avoureventsiswellmodelledbysimulation. Inordertodeneanorthogonal category, eventspassingtheVBFcategoryselectionarenotconsidered. Thiscategoryalsoincludessmallcontributions from VBF and VH production.While these categories are conceptually identical across the three channels, dierencesinthedominantbackgroundcontributionsrequiredierentselectioncriteria. Forboth 12 categories, the requirement on jets is inclusive and additional jets, apart from those passingthe category requirements, are allowed.For the hadhad channel, the so-called rest category is used as a control region. In thiscategory, events passing the preselection requirements but not passing the VBF or boostedcategoryselectionsareconsidered. ThiscategoryisusedtoconstraintheZ andmultijet background contributions. The signal contamination in this category is negligible.4.3 Higgs boson candidate mass reconstructionThedi-tauinvariant mass (mMMC) is reconstructedusingthemissingmass calculator(MMC)[100]. Thisrequiressolvinganunderconstrainedsystemof equationsforsixtoeightunknowns, dependingonthenumberofneutrinosinthe nalstate. Theseun-knownsincludethex-, y-, andz-componentsofthemomentumcarriedbytheneutrinosfor each of the two tau leptons in the event, and the invariant mass of the two neutrinosfrom any leptonic tau decays. The calculation uses the constraints from the measuredx-andy-componentsof themissingtransversemomentum, andthevisiblemassesof bothtaucandidates. Ascanisperformedoverthetwocomponentsofthemissingtransversemomentum vector and the yet undetermined variables. Each scan point is weighted by itsprobability according to theEmissTresolution and the tau decay topologies. The estimatorfor themass is dened as the most probable value of the scan points.TheMMCalgorithmprovides asolutionfor 99%of the H andZevents. This is a distinct advantage compared to the mass calculation using the collinearapproximation where the failure rate is higher due to the implicit collinearity assumptions.The small loss rate of about 1% for signal events is due to large uctuations of theEmissTmeasurement or other scan variables.Figure1showsreconstructedmMMCmassdistributionsfor H andZeventsinthelephadVBFandboostedcategories. Themassresolution, denedastheratiobetweenthefullwidthathalfmaximum(FWHM)andthepeakvalueofthemassdistribution (mpeak), is found to be 30% for all categories and channels. 13 [GeV] MMCm0 50 100 150 200Fraction of Events / 5 GeV00.020.040.060.080.10.120.140.160.180.20.22 Z (125) HATLAS VBFhad e + had ) = 92.4 GeV Z (peakm) = 123.2 GeV H (peakm 30 % peakm FWHM/(a) [GeV] MMCm0 50 100 150 200Fraction of Events / 5 GeV00.020.040.060.080.10.120.140.160.180.20.22 Z (125) HATLAS Boostedhad e + had ) = 90.4 GeV Z (peakm) = 122.3 GeV H (peakm 30 % peakm FWHM/(b)Figure1. Thereconstructedinvariant mass, mMMCfor H (mH=125GeV)andZ eventsinMCsimulationandembeddingrespectively, foreventspassing(a)theVBFcategory selection and (b) the boosted category selection in thelephad channel. 14 5 Boosted decision treesBoosted decision trees are used in each category to extract the Higgs boson signal from thelarge number of background events. Decision trees [24] recursively partition the parameterspace into multiple regions where signal or background purities are enhanced. Boosting isa method which improves the performance and stability of decision trees and involves thecombination of many trees into a single nal discriminant [25, 26]. After boosting, the nalscore undergoes a transformation to map the scores on the interval 1 to+1. The mostsignal-like events have scores near 1 while the most background-like events have scores near1.SeparateBDTsaretrainedforeachanalysiscategoryandchannel withsignal andbackgroundsamples, describedinsection6, at s=8TeV. Theyarethenappliedtothe analysis of the data at both centre-of-mass energies. The separate training naturallyexploits dierences in event kinematics between dierent Higgs boson production modes. Italso allows dierent discriminating variables to be used to address the dierent backgroundcompositionsineachchannel. ForthetrainingintheVBFcategory, onlyaVBFHiggsproductionsignalsampleisused, whiletrainingintheboostedcategoryusesggF,VBF,andV Hsignal samples. TheHiggsbosonmassischosentobemH=125GeVforallsignalsamples. TheBDTinputvariablesusedatbothcentre-of-massenergiesarelistedin table 5. Most of these variables have straightforward denitions, and the more complexones are dened in the following. R(1, 2): the distanceR between the two leptons, between the lepton andhad,or between the twohad candidates, depending on the decay mode. pTotalT: magnitudeof thevectorsumof thetransversemomentaof thevisibletaudecay products, the two leading jets, andEmissT.SumpT: scalar sum of thepTof the visible components of the tau decay productsand of the jets. EmissT centrality: a variable that quanties the relative angular position of the missingtransverse momentum with respect to the visible tau decay products in the transverseplane. The transverse plane is transformed such that the direction of the tau decayproducts are orthogonal, and that the smaller angle between the tau decay prod-uctsdenesthepositivequadrantofthetransformedplane. TheEmissTcentralityisdenedasthesumof thex-andy-componentsof theEmissTunitvectorinthistransformed plane.Sphericity: a variable that describes the isotropy of the energy ow in the event [101].It is based on the quadratic momentum tensorS=Pi pi piPi |~ pi2|. (5.1)In this equation, and are the indices of the tensor. The summation is performedover the momenta of the selected leptons and jets in the event. The sphericity of the 15 event (S) is then dened in terms of the two smallest eigenvalues of this tensor, 2and3,S=32(2 + 3). (5.2)min(`1`2,jets): the minimum between the dilepton system and either of the twojets.Object centrality: a variable that quanties the position of an object (an isolatedlepton, ahad candidate or a jet) with respect to the two leading jets in the event. Itis dened asC1,2() = exp"4(12)2 1 + 222#, (5.3)where, 1and2arethepseudorapiditiesoftheobjectandthetwoleadingjetsrespectively. This variable has a value of1 when the object is halfway in betweenthe two jets,1/e when the object is aligned with one of the jets, and< 1/e when theobject is not between the jets in. In theleplep channel the centrality of a thirdjet in the event, C1,2(j3), and the product of thecentralities of the two leptonsareusedasBDTinputvariables, whileinthelephadchannel thecentralityofthe lepton,C1,2(`), is used, and in thehadhad channel the centrality of each,C1,2(1) andC1,2(2), is used. Events with only two jets are assigned a dummyvalue of 0.5 forC1,2(j3).AmongthesevariablesthemostdiscriminatingonesincludemMMC, R(1, 2)and(j1, j2). Figure 2 shows the distributions of selected BDT input variables. For the VBFcategory, the distributions of(j1, j2) are shown for all three channels. For the boostedcategory,the distributions of R(1, 2) are shown for thelephadandhadhadchannelsand the distribution of thepTof the leading jet is shown for theleplepchannel. For alldistributions, the data are compared to the predicted SM backgrounds ats = 8 TeV. Thecorresponding uncertainties are indicated by the shaded bands. All input distributions arewell described, giving condence that the background models (from simulation and data)describe well the relevant input variables of the BDT. Similarly, good agreement is foundfor the distributions at s = 7 TeV. 16 VariableVBF BoostedlepleplephadhadhadlepleplephadhadhadmMMC R(1, 2) (j1, j2) mj1,j2 j1 j2 pTotalT SumpT p1T /p2T EmissT centrality m`,`,j1m`1,`2(`1, `2) Sphericity p`1Tpj1TEmissT/p`2TmT min(`1`2,jets) C1,2(`1) C1,2(`2) C1,2(`) C1,2(j3) C1,2(1) C1,2(2) Table 5. Discriminating variables used in the training of the BDT for each channel and categoryat s=8TeV. The more complex variables are described in the text. The lled circles indicatewhich variables are used in each case. 17 )2, j1(j 2 3 4 5 6 7Events / 0.35 50100150200250300Data(125) H 50 x Z+single-top t tOthersFake leptonUncert. VBF + e +ee ATLASPre-fit-1, 20.3 fb = 8 TeV s(a) [GeV]1jTp0 100 200 300Events / 20 GeV 2004006008001000Data(125) H 50 x Z+single-top t tOthersFake leptonUncert. Boosted + e +ee ATLASPre-fit-1, 20.3 fb = 8 TeV s(b))2, j1(j 3 4 5 6 7Events / 0.2 0100200300400500600Data(125) H 50 x Z+single-top t tOthers Fake Uncert. VBFhad e + had ATLASPre-fit-1, 20.3 fb = 8 TeV s(c))2 , 1 ( R 1 2 3 4Events / 0.2 05001000150020002500Data(125) H 50 x Z+single-top t tOthers Fake Uncert. Boostedhad e + had ATLASPre-fit-1, 20.3 fb = 8 TeV s(d))2, j1(j 2 3 4 5 6 7Events / 0.5 050100150200250300350400Data(125) H 50 x ZOthers Fake Uncert. VBFhadhad ATLASPre-fit-1, 20.3 fb = 8 TeV s(e))2 , 1 ( R 1 1.5 2Events / 0.2 0100200300400500600700800900Data(125) H 50 x ZOthers Fake Uncert. Boostedhadhad ATLASPre-fit-1, 20.3 fb = 8 TeV s(f)Figure2. DistributionsofimportantBDTinputvariablesforthethreechannelsandthetwocategories(VBF, left)and(boosted, right)fordatacollectedat s=8TeV. Thedistributionsareshownfor(a)theseparationinpseudorapidityofthejets, (j1, j2), and(b)thetransversemomentumof theleadingjet pj1Tinthe leplepchannel, for(c) (j1, j2)and(d) R(1, 2),the distanceR between the lepton andhad, in the lephadchannel and for (e)(j1, j2) and(f)R(1, 2), thedistanceRbetweenthetwohadcandidates, inthe hadhadchannel. Thecontributions from a Standard Model Higgs boson with mH = 125 GeV are superimposed, multipliedby a factor of 50. These gures use background predictions made without the global t dened insection 8. The error band includes statistical and pre-t systematic uncertainties. 18 6 Background estimationThe dierent nal-state topologies of the three analysis channels have dierent backgroundcompositions which necessitate dierent strategies for the background estimation. In gen-eral, the number of expected background events and the associated kinematic distributionsarederivedfromamixtureof data-drivenmethodsandsimulation. Thenormalisationofseveral importantbackgroundcontributionsisperformedbycomparingthesimulatedsamplesofindividual backgroundsourcestodatainregionswhichonlyhaveasmall ornegligible contamination from signal or other background events. The control regions usedin the analysis are summarised in table 6.Common to all channels is the dominant Z background, for which the kinematicdistributionsaretakenfromdatabyemployingtheembeddingtechnique, asdescribedinsection3. Backgroundcontributionsfromjetsthataremisidentiedashadronicallydecayingtaus(fakebackgrounds)areestimatedbyusingeitherafake-factormethodorsamples of non-isolated had candidates. Likewise, samples of non-isolated leptons are usedto estimate fake-lepton contributions from jets or hadronically decaying taus and leptonsfrom other sources, such as heavy-quark decays.8Contributions from various other physics processes with leptons and/or had candidatesinthenal stateareestimatedusingthesimulation, normalisedtothetheoretical crosssections,as given in table 1. A more detailed discussion of the estimation of the variousbackground components in the dierent channels is given in the following.6.1 Background fromZ productionA reliable modelling of the irreducibleZ background is an important ingredient ofthe analysis. It has been shown in other ATLAS analyses that existing Z+jets Monte Carlosimulation needs to be reweighted to model data correctly [102104]. Additionally, it is notpossible to select a suciently pure and signal-freeZ control sample from data tomodel the background in the signal region. Therefore this background is estimated usingembedded data, as described in section 3. This procedure was extensively validated usingboth data and simulation. To validate the subtraction procedure of the muon cell energiesand tracks from data and the subsequent embedding of the corresponding information fromsimulation, the muons in Z events are replaced by simulated muons. The calorimeterisolationenergyinaconeof R=0.3aroundthemuonsfromdatabeforeandafterembedding is compared in gure 3(a). Good agreement is found, which indicates that nodeterioration (e.g. possible energy biases) in the muon environment is introduced. Anotherimportant test validates the embedding of more complex Z events, which can only beperformed in the simulation. To achieve a meaningful validation, the same MC generatorwith identical settings was used to simulate both theZ andZ events. Thesample of embedded events is corrected for the bias due to the trigger, reconstruction andacceptance of the original muons. These corrections are determined from data as a functionofpT and(), and allow the acceptance of the original selection to be corrected. The taudecay products are treated like any other objects obtained from the simulation, with one8For simplicity, leptons from heavy-quark decays are considered as fake leptons in the following. 19 important dierence due to the absence of trigger simulation in this sample. Trigger eectsareparameterisedfromthesimulationasafunctionofthetaudecayproductpT. Afterreplacing the muons with simulated taus, kinematic distributions of the embedded samplecanbedirectlycomparedtothefullysimulatedones. Asanexample, thereconstructedinvariant mass, mMMC, is shown in gure 3(b), for the lephad nal state. Good agreementis found and the observed dierences are covered by the systematic uncertainties. Similarly,good agreement is found for other variables, such as the missing transverse momentum, thekinematic variables of the hadronically decaying tau lepton or of the associated jets in theevent. A direct comparison of the Z background in data and the modelling using theembeddingtechniquealsoshowsgoodagreement. Thiscanbeseeninseveralkinematicquantity distributions, which are dominated byZ events, shown in gure 2.The normalisation of this background process is taken from the nal t described insection 8. The normalisation is independent for theleplep,lephad, andhadhad analysischannels. [GeV]Tp , 0.3) TE ( IArbitrary Units00.050.10.150.20.250.30.35DataEmbedded DataATLAS Z [GeV]Tp , 0.3) TE ( I0 2 4 6 8 10Emb. / Data0.80.911.11.2(a) MMC [GeV]60 80 100 120 140 160 180Arbitrary Units00.020.040.060.080.10.120.140.16MCMC Stat. ErrorEmbedded MCEmb. UncertaintyATLAS Simulationhad e + had [GeV]MMC m60 80 100 120 140 160 180Emb. / MC0.80.911.11.2(b)Figure 3. (a) The distribution of the calorimeter isolation energyI(ET, 0.3) pT within a cone ofradiusR = 0.3 around the muons inZ events from data, before and after the embeddingofsimulatedmuons. (b)Thedistributionofthereconstructedinvariant mass, mMMC, inthelephadnalstate, forsimulatedZ events, comparedtotheoneobtainedfromsimulatedZ eventsaftertauembedding. Theratiosof thevaluesbeforeandaftertheembeddingandbetweentheembeddedZ andZ eventsaregivenin(a)and(b)respectively.Theerrorsin(a)and(b)ontheratios(points)representthestatistical uncertainties, whilethesystematicuncertaintiesareindicatedbythehatchedbandsin(b). Theshadedbandsrepresentthe statistical uncertainties from theZ data events in (a) and from theZ simulationin (b). 20 6.2 Background from misidentied leptons or hadronically decaying tausFortheleplepchannel, all backgroundsourcesresultingfrommisidentiedleptonsaretreatedtogether. Inthisapproach, contributionsfrommultijetandW+jetsproduction,aswell asthepartof the tt backgroundresultingfromdecaystoleptonsandhadrons(tt `bqqb) are included. A control sample is dened in data by inverting the isolationrequirements for one of the two leptons, while applying all other signal region requirements.The contributions from other background channels (dileptonictt decays (tt `b`b),Zee, Z, anddibosonproduction)areobtainedfromthesimulationandaresubtracted. Fromthiscontrolsampleatemplateiscreated. ThenormalisationfactorisobtainedbyttingthepTdistributionofthesubleadingleptonatanearlystageofthepreselection.Forthelephadchannel, thefake-factormethodisusedtoderiveestimatesforthemultijet, W+jets, Z+jets, andsemileptonicttbackgroundeventsthatpassthelephadselection due to a misidentied had candidate. The fake factor is dened as the ratio of thenumber of jets identied as medium had candidates to the number satisfying the loose, butnot the medium, criteria. Since the fake factor depends on the type of parton initiating thejet and on thepT of the jet, it is determined as a function ofpT separately for samples en-riched in quark- and gluon-initiated jets. In addition, the fake factor is found to be dierentfor 1-track and 3-track candidates. Three dierent, quark-jet dominated samples are usedseparately for the W+jets, tt and Z+jets background components. They are dened by se-lecting the high-mT region (mT> 70GeV), by inverting the b-jet veto and by requiring twoleptons with an invariant mass consistent with mZ (80GeV < m``< 100GeV) respectively.In addition, a multijet sample dominated by gluon-initiated jets is selected by relaxing thelepton identication and requiring it to satisfy the loose identication criteria. The derivedfakefactorsarefoundtovaryfrom0.124(0.082)for pT=20GeVto0.088(0.038)forpT=150GeVfor1-track(3-track)candidatesintheVBFcategory. Thecorrespondingvaluesfortheboostedcategoryare0.146(0.084)forpT=20GeVand0.057(0.033)forpT=150GeV.Toobtainthefake-backgroundestimatefortheVBFandboostedsignalregions, these factors are then applied, weighted by the expected relativeW+jets,Z+jets,multijet, andtt fractions, to the events in regions dened by applying the selections of thecorresponding signal region, except that the had candidate is required to pass the loose andto fail the mediumhadidentication. As an example, the good agreement between dataand background estimates is shown in gure 4(a) for the reconstructedmass for eventsin the high-mT region, which is dominated byW+jets production.Forthehadhadchannel, themultijetbackgroundismodelledusingatemplateex-tractedfromdatathatpasstheVBForboostedcategoryselection, where, however, thetaus fail the isolation and opposite-sign charge requirements (the number-of-tracks require-ment is not enforced). The normalisation of the multijet background is rst determined byperforming a simultaneous t of the multijet (modelled by the data sample just mentioned)and Z (modelled by embedding) templates after the preselection cuts. The t is per-formed for the distribution of the dierence in pseudorapidity between the two hadronic taucandidates, (had, had). The signal contribution is expected to be small in this category. 21 Theagreementbetweendataandthebackgroundestimateforthisdistributionisshownin gure 4(b) for the rest category dened in section 4. The preselection normalisation isused as a reference point and starting value for the global t (see below) and is used forvalidationplots. Thenalnormalisations ofthetwoimportant backgroundcomponents,from multijet andZ events,are extracted from the nal global t,as described insection 8, in which the (had, had) distribution for the rest category is included. [GeV]MMC m0 100 200 300Events / 20 GeV 20406080100120140160180200220240Data(125) H 50 x Z+single-top t tOthers Fake Uncert. VBFhad e + had ATLASControl Region+jets W-1, 20.3 fb = 8 TeV s(a)) , ( 0 0.5 1 1.5Events / 0.15 020040060080010001200Data(125) H 50 x ZOthers Fake Uncert. Resthadhad ATLASPre-fit-1, 20.3 fb = 8 TeV s(b)Figure4. (a)Thedistributionof thereconstructedinvariant mass, mMMC, foreventsintheW+jetscontrolregion,forthelephadchannel. (b)Theseparationinpseudorapidityofthehadcandidates, (had, had), for thehadhadchannel in the rest control region. The expectedSM Higgs boson signal contribution is superimposed, multiplied by a factor 50. These gures usebackground predictions made without the global t dened in section 8. The error band includesstatistical and pre-t systematic uncertainties.6.3 Z ee andZ backgroundThe DrellYanZ/ ee andZ/ background channels are important contribu-tions to the nal states with two same-avour leptons. They also contribute to the otherchannels. Asdescribedbelow, asimulationbasedonAlpgenisusedtoestimatethesebackground sources. Correction factors are applied to account for dierences between dataand simulation.In the leplep channel, the Alpgen simulation is normalised to the data in the Z-masscontrol region, 80GeV < m``< 100GeV, for each category, and separately for Z ee andZ events. The normalisation factors are determined from the nal t described insection 8. The distribution of the reconstructedmass for events in this control region isshown in gure 5 (a). 22 In thelephad channel, theZ ee andZ background estimates are also basedon simulation. The corrections applied for a had candidate depend on whether it originatesfrom a lepton from the Z boson decay or from a jet. In the rst case, corrections from data,derived from dedicated tag-and-probe studies, are applied to account for the dierence inthe rate of misidentied had candidates resulting from leptons [21, 105]. This is particularlyimportantforZ eeeventswithamisidentiedhadcandidateoriginatingfromatrueelectron. In the second case, the fake-factor method described in section 6.2 is applied.In thehadhad channel, the contribution of this background is very small and is takenfrom simulation. [GeV]MMC m0 100 200 300 400Events / 20 GeV 100200300400500Data(125) H 50 x , ee ZTop+diboson ZFake leptonUncert. VBF + e +ee ATLASControl Region ll ZPre-fit-1, 20.3 fb = 8 TeV s(a))2, j1(j 3 4 5 6Events / 0.2 020406080100120Data(125) H 50 x Z+single-top t tOthers Fake Uncert. VBFhad e + had ATLASControl RegionTopPre-fit-1, 20.3 fb = 8 TeV s(b)Figure5. (a)Thedistributionof thereconstructedinvariant mass, mMMC, foreventsinthe Z``control region, forthe leplepchannel. (b)Thedistributionof theseparationinpseudorapidityof thetwoleadingjets, (j1, j2), foreventsinthetopcontrol region, forthelephadchannel. Thisgureusesbackgroundpredictionsmadewithouttheglobaltdenedinsection 8. The error band includes statistical and pre-t systematic uncertainties.6.4 W+jets backgroundEvents withWbosons and jets constitute a background to all channels since leptonicWdecayscanfeedintoall signatureswhenthetrueleptonisaccompaniedbyajetwhichisfalselyidentiedasahadoraleptoncandidate. Thisprocesscanalsocontributeviasemileptonic heavy quark decays that provide identied leptons.Asstatedinsection6.2, fortheleplepandlephadchannels, theW+jetscontribu-tions are determined with data-driven methods. For thehadhad channel, theW hadbackground is estimated from simulation. A correction is applied to account for dierencesin thehad misidentication probability between data and simulation. 23 6.5 Background from top-quark productionBackground contributions from tt and single top-quark production, where leptons or hadron-ically decaying taus appear in decays of top quarks, are estimated from simulation in theleplep andlephad channels. The normalisation is obtained from data control regions de-ned by requiring ab-jet instead of ab-veto. In thelephadchannel, a large value of thetransverse mass mT is also required, to enhance the background from top-quark productionand to suppress the signal contribution. This background is also found to be small for thehadhadchannel anditisestimatedusingsimulation. Thedistributionof (j1, j2)forevents in the top control region, for thelephad channel, is shown in gure 5 (b).6.6 Diboson backgroundThe production of pairs of vector bosons (W+W, ZZ and WZ), with subsequent decaystoleptonsorjets, contributesespeciallytothebackgroundintheleplepchannel. Forall analysis channels, these contributions are estimated from simulation, normalised to theNLO cross sections indicated in table 1.6.7 Contributions from other Higgs boson decaysIntheleplepchannel, anon-negligiblecontributionfromH WW ``existsandthis process is considered as background. Its contribution is estimated formH= 125GeVusingsimulation. ThecorrespondingsignalcrosssectionisassumedtobetheSMvalueand is indicated in table 1.6.8 Validation of background estimatesAs described above, the normalisation for important background sources that are modelledwith simulation are determined by tting to data in control regions. These normalisationsare compared in table 7 to predictions based on the theoretical cross sections for the 8 TeVanalysis. In most cases, the values obtained are compatible with unity within the statisticaluncertainties shown. For the top control region in the VBF category of the lephad channel,thevalueisalsoinagreementwithunityiftheexperimental andtheoretical systematicuncertainties are included. The control-region normalisations are used for validation plots,and they are used as starting values in the nal global t described in section 8. The globalt does not change any of these normalisations by more than 2%.It is important to verify that the BDT output distributions in data control regions arewell described after the various background determinations. Figure 6 shows distributionsfromimportantcontrol regionsforthe s=8TeVdataset, i.e. theZ-enrichedcontrolregions for theleplep andlephad channels, and the reconstructedinvariant mass side-band control region (dened asmMMC150GeV) for thehadhadchannel. ThedistributionsareshownforboththeVBFandtheboostedcategories. Alldistributions are found to be well described, within the systematic uncertainties. 24 ProcesslepleplephadhadhadZ``-enriched80