Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An...

50
Detector reconstruction of γ -rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis in Engineering Physics: TIFX04-20-03 PETER HALLDESTAM CODY HESSE DAVID RINMAN Department of Physics CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2020

Transcript of Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An...

Page 1: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Detector reconstruction of γ-raysAn application of artificial neural networksin experimental subatomic physics

Bachelor Thesis in Engineering Physics: TIFX04-20-03

PETER HALLDESTAMCODY HESSEDAVID RINMAN

Department of PhysicsCHALMERS UNIVERSITY OF TECHNOLOGYUNIVERSITY OF GOTHENBURGGothenburg, Sweden 2020

Page 2: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis
Page 3: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Bachelor’s thesis: TIFX04-20-03

Detector reconstruction of γ-rays

An application of artifical neural networksin experimental subatomic physics

PETER HALLDESTAMCODY HESSE

DAVID RINMAN

Department of PhysicsDivision of Subatomic, High Energy

and Plasma PhysicsExperimental Subatomic Physics

Chalmers University of TechnologyUniversity of Gothenburg

Gothenburg, Sweden 2020

Page 4: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Detector reconstruction of γ-raysAn application of artificial neural networks in experimental subatomic physicsPeter HalldestamCody HesseDavid Rinman

© Peter Halldestam, Cody Hesse, David Rinman, 2020.

Supervisors: Andreas Heinz and Håkan T. Johansson, Department of PhysicsExaminer: Lena Falk, Department of Physics

Bachelor’s Thesis TIFX04-20-03Department of PhysicsDivision of Subatomic, High Energy and Plasma physicsExperimental subatomic physicsChalmers University of Technology and University of GothenburgSE-412 96 GothenburgTelephone +46 31 772 1000

Cover: Reconstruction of the energy, polar angle and azimuthal angle for γ-rays with beamframe energy up to 10 MeV. For more details, see section 4.4.

Typeset in LATEXPrinted by Chalmers ReproserviceGothenburg, Sweden 2020

ii

Page 5: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

AbstractThe study of nuclear reactions through measuring emitted γ-rays becomes convo-luted due to complex interactions with the detector crystals, leading to cross-talkbetween neighbouring elements. To distinguish γ-rays of higher multiplicity, thedetector data needs to be reconstructed. As a continuation on the works of earlierstudents, this thesis investigates the application of neural networks as a recon-struction method and compares it to the conventional addback algorithm. Threetypes of neural networks are proposed; one Fully Connected Network, a Convolu-tional Neural Network (CNN) and a Graph Neural Network (GNN). Each modelis optimized in terms of structure and hyperparameters, and trained on simu-lated data containing isotropic γ-rays, before finally being evaluated on realisticdetector data.

Compared to previous projects, all presented networks showed a more consistentreconstruction across the studied energy range, which is credited to the newlyintroduced momentum-based loss function. Among the three networks, the fullyconnected performed the best in terms of smallest average absolute differencebetween the correct and reconstructed energies, while having the fewest numberof trainable parameters. By the same metric, none of the presented networksperformed better than addback. They did, however, show a higher signal-to-background ratio in the energy range of 3–6 MeV. Suggestions for further studiesare also given.

Keywords: artificial neural networks, convolutional neural networks, graph neuralnetworks, gamma ray reconstruction, addback, Crystal Ball, TensorFlow, Keras

iii

Page 6: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

SammanfattningDå joner accelereras och kollideras med ett strålmål, kan de studeras bl.a. genomatt mäta den emitterade γ-strålningen. Sådana experiment försvåras dock av γ-strålningens växelverkan i detektorerna, något som leder till överhörning mellannärliggande detektorenheter. För att särskilja strålning av högre multiplicitetkrävs därför en rekonstruktionsmetod, vilken konventionellt varit addback-algoritmen. Denna rapport är en uppföljning på föregående års kandidatarbeten,och undersöker tillämpningen av artificiella neurala nätverk som ett alternativ tilladdback. Tre typer av neurala nätverk presenteras; ett fullt anslutet nätverk, ettfaltande nätverk, samt ett grafnätverk. Vardera modell optimeras både i termerav hyperparametrar och nätverksstruktur, och tränas på simulerad data i formav isotropisk γ-strålning. Slutligen utvärderas nätverken på mer realistisk data.

Samtliga nätverk uppvisar en mer konsekvent rekonstruktion över det studeradeenergiomfånget jämfört med föregående års nätverk, något som tillskrivs den nyarörelsemängdsbaserade kostnadsfunktionen. Bland de prövade nätverken prester-ade det fullt anslutna bäst i minsta genomsnittliga absolut skillnad mellan rekon-struerad och korrekt energi, samtidigt som det innehöll minst antal träningsbaraparametrar. Addback kvarstod dock som bäst i dessa mått, medan nätverkenuppvisade ett bättre signal-bakgrundsförhållande i intervallet 3–6 MeV. Vidarepresenteras förslag för fortsatta studier.

iv

Page 7: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

AcknowledgementsFirst of all we would like to state our gratitude towards our supervisors AndreasHeinz and Håkan T. Johansson, both for presenting us with this project, andfor offering their guidance as to see it through. The task has been equal partschallenging, as intriguing and educational, and could not have been done withoutyour advice.

We would also like to thank our predecessors J. Jönsson, R. Karlsson, M. Lidén,R. Martin as well as J. Olander, M. Skarin, P. Svensson, and J. Wadman forlaying a foundation for us to work upon. Building upon what you have accom-plished has been a luxury, and we would not have come as far without you.

A special thanks also goes out to the bright minds at Google for developingthe machine learning frameworks TensorFlow and Keras, allowing us to focus onbuilding our models rather than for the code to compile.

Lastly, we would like to thank the Swedish National Infrastructure for Com-puting (SNIC) and High Performance Computing Center North (HPC2N) forgranting us the resources to train our networks without ever having to worryabout running out of either memory or time.

The authors, Gothenburg, May 2020

v

Page 8: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis
Page 9: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose and aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Detection of γ-rays 32.1 The Crystal Ball . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Interactions of γ-rays with scintillators . . . . . . . . . . . . . . . . . . . . . 32.3 Relativistic effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Data analysis using Addback routines . . . . . . . . . . . . . . . . . . . . . . 5

3 Artificial neural networks 73.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 Forward propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1.2 The loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1.3 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Convolutional networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3 Graph networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4 Batch normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Method development 134.1 Building the networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Data generation through simulation . . . . . . . . . . . . . . . . . . . . . . . 134.3 Loss functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3.1 Investigating different loss functions . . . . . . . . . . . . . . . . . . . 144.3.2 Photon momentum loss . . . . . . . . . . . . . . . . . . . . . . . . . . 154.3.3 Permutation loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4 Metrics of performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.5 Fully connected networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5.1 Maximum multiplicity of detected γ-rays . . . . . . . . . . . . . . . . 194.5.2 Using batch normalization . . . . . . . . . . . . . . . . . . . . . . . . 194.5.3 Optimization of network dimensions . . . . . . . . . . . . . . . . . . . 204.5.4 Comparison of architectures . . . . . . . . . . . . . . . . . . . . . . . 20

4.6 Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6.1 Rearranging the input . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6.2 CNN models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

vii

Page 10: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Contents

4.6.3 Optimizing hyperparameters of the CNN . . . . . . . . . . . . . . . . 244.6.4 Dealing with existence . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.7 Graph networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.8 Binary classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Results 315.1 Methods of comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Comparison with addback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Discussion 356.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 Convolutional networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.3 Graph networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.4 Classification networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.5 Methods of comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

viii

Page 11: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Chapter 1

Introduction

The fascination of better understanding one’s place in the universe is driving usto seek answers to one of the most fundamental questions: What are we made of ?In nuclear physics, one way to approach this question is by the study of nuclearreactions with particle accelerators. This thesis aims to investigate the possibilityof using neural networks in the reconstruction of energies and emission angles ofγ-rays in such experiments.

1.1 BACKGROUNDWhen a beam of ions is accelerated torelativistic velocities and collides with astationary target, there is a possibility ofenergy being released in the form of γ-radiation. This energy can be measuredusing an array of detectors surrounding thetarget, thus revealing some properties of thestudied nuclei.

One detector designed for this pur-pose is the Darmstadt-Heidelberg CrystalBall. Consisting of 162 scintillating NaI-crystals arranged in a sphere as illustratedin Fig. 1.1, it is capable of measuring boththe energies and angles of incident γ-rays.

However, due to cross-talk betweenneighbouring detector elements, mainly dueto Compton scattering, it can in some casesbe difficult to correctly assign higher multi-plicities of photons [1]. Because of this, thedata from the detector needs to be care-fully reconstructed in order to determinewhether the energy of a single photon wasdeposited in several crystals or not.

A method of dealing with this recon-

struction problem is a set of algorithms,known as the addback-routines. They havepreviously been investigated using simula-tions [2], and were found to correctly iden-tify around 70% of 1 MeV γ-rays. Thisaccuracy, however, rapidly decreases forhigher energies.

Figure 1.1: Cross-sectional diagram ofthe Crystal ball with its 4 shapes of crystaldetectors. [1]

1

Page 12: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

1. Introduction

Due to the complexity of the γ-reconstruction, the question whether a neu-ral network could outperform addback atthis task is raised. Neural networks haveseen great development in recent years andare being implemented more and more inalmost every field of science and technol-ogy. Ranging from deep neural networks,praised for their good approximation ofcomplicated relationships, to convolutionalneural networks, which excel at computa-tionally efficient pattern recognition; theyall seem fit to offer a competetive alterna-tive to the addback algorithms.

1.2 PREVIOUS WORKThis thesis is based upon two previousbachelor projects at the Department ofPhysics, Chalmers University of Technol-ogy, Gothenburg. It is a continuation aim-ing to evaluate the application of machinelearning in γ-ray event reconstruction. In2018, Olander et al. [3] investigated thisquestion and obtained results comparableto the addback algorithm. However, theirresults show artefact in the reconstructionof photon energies and angles, as well asin their signal-to-background ratio com-parsion to addback. In 2019, Karlssonet al. [4] further developed the same con-cepts, focusing on a different structure ofmachine learning: convolutional neural net-works and reaching similar results to thoseof Olander et al. .

1.3 PURPOSE AND AIMSThe goal of this project is to further in-vestigate the application of machine learn-ing in the reconstruction of energies andemission angles of γ-rays for the CrystalBall detector. In particular, the is aim tostudy and optimize different types of neu-ral networks including Fully Connected Net-

works (FCN), Convolutional Neural Net-works (CNN) and Graph Neural Networks(GNN) in order to assess their performanceat this task, and whether they can achieve amore accurate reconstruction than the ad-dback algorithm.

1.4 DELIMITATIONSEven though newer and more advanced par-ticle detectors such as CALIFA [5] exist to-day, we will only be performing tests on theCrystal Ball detector. This is due to its sim-pler geometry, in order to provide a proofof concept. The results should, however,be generalisable to other similar detectors,including CALIFA [3].

Since the nuclei in the acceleratedbeams travel at 50–70% of the speed oflight, it is necessary to apply relativisticcorrections to the detector data in order tostudy the nuclear collisions in their properframe of reference. Both Olander et al. [3]and Karlsson et al. [4] have dealt with suchcorrections directly in the loss function (seesection 3.1.2), which was found to improvethe performance of the neural networks.However, this limits the applicability of thenetworks, as they have to be retrained in or-der to accommodate for different beam ve-locities. Furthermore, the relativistic cor-rections consist of rather simple calcula-tions that can easily be applied to the re-constructed data. Therefore, this study willbe limited to reconstructions in the labora-tory frame.

Lastly, this thesis focuses mainly onthe case where there is, at most, two coin-ciding γ-rays in the initial reaction. This isto shorten the time required for each neuralnetwork to be trained. The software is how-ever generalizable, and thus capable of deal-ing with higher orders of maximum multi-plicities, as briefly discussed in sec. 4.5.1.

2

Page 13: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Chapter 2

Detection of γ-rays

The underlying problems investigated in this thesis encompass a multitude of sub-jects, including machine learning, data simulation and subatomic physics. Thischapter focuses on the latter, introducing the necessary underlying concepts ofdetector physics and the most commonly used set of analysis methods.

2.1 THE CRYSTAL BALLThe Darmstadt-Heidelberg Crystall Ball isa detector that was built in collaborationbetween GSI Darmstadt, the Max-Planck-Institute for Nuclear Physics and the Uni-versity of Heidelberg. The detector is madeof 162 scintillating NaI-crystals forming asphere with an inner radius of 25 cm enclos-ing the target, only leaving openings for theradioactive beam to pass through.

Figure 2.1: Arrangement of the fourshapes of detector elements in the Crystalball. This is the projection of one of 20equilateral spherical triangles, that togetherconstitute the 162 detectors [1].

The crystals cover an equal solid angle of0.076 sr each, and in total 98 % of the fullsolid angle [1]. They come in 4 differentshapes; regular pentagons called A crys-tals and three different kinds of irregularhexagons called B, C, and D crystals. Thearrangement of these crystals, resulting ina spatial resolution of ±8 degrees, can beseen in Figs. 1.1 and 2.1.

2.2 INTERACTIONS OF γ-RAYSWITH SCINTILLATORS

A beam of γ-radiation passing throughmatter decreases exponentially in intensitywhile any individual γ-ray loses energy indiscrete steps/interactions, unlike the grad-ual behaviour of charged particles such asprotons and electrons. This mostly occursin three possible ways, at least in the energyrange of interest [6]: the photoelectric effect,Compton scattering and pair production.

The photoelectric effect is the emis-sion of a charged particle, often e−, dueto the total energy absorption of γ-rays asschematically shown in Fig. 2.2a. The ki-netic energy of the emitted particle is thedifference in the energy of the photon and

3

Page 14: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

2. Detection of γ-rays

γ

Z

e−

Z+

(a) Photoelectric effect

γ

e−

(b) Compton scattering

γ e+

e−

Z

(c) Pair production

Figure 2.2: Feynman diagrams showcasing the three main processes by whichphotons interact with matter. The first (a) shows an electron being released af-ter absorbing a photon, and Compton scattering is shown in (b). The rightmostdiagram (c) illustrates the production of an e±-pair near a nucleus, marked as Z.

the binding energy of the electron.The scattering of γ-rays by charged

particles, again usually an e−, is what iscommonly referred to as Compton scatter-ing. Only a part of the photon energy istransferred to a recoiling electron leaving aless energetic γ with a higher wavelength.

Finally, pair production, which is thecreation of a subatomic particle and itsanti-particle from the annihilation of a neu-tral boson. In this case it is

γ −→ e+ + e−,

i.e. an e±-pair created from a γ as illus-trated in Fig. 2.2c. Due to conservation ofmomentum, this can only occur within theclose proximity of a nucleus.

Of these three processes, at the energyrange of interest (100 keV–20 MeV), Comp-ton scattering is usually the dominatingway of interaction with the NaI-scintillators[4].

Scintillators such as those in the Crys-tal Ball absorb the energy from an incom-ing particle and then re-emits it in the formof visible light. The emitted light intensity

from the scintillators is approximately pro-portional to the deposited energy. To de-tect any ionizing γ-radiation1, these signalsare detected and amplified with a photo-multiplier.

2.3 RELATIVISTIC EFFECTSThe beams used to study colliding nucleiat the Crystal Ball usually travel at highspeeds, β = 0.5− 0.7 in terms of the speedof light; however the γ-rays are measuredin the laboratory frame of reference. Thismeans that relativistic effects must be takeninto account. These are the relativisticDoppler effect and headlight shift, wherethe latter refers to the fact that the emittedγ-rays appear to be scattered conically inthe forward direction [3]. The γ-ray energyE and polar emission angle θ in the labo-ratory frame relates to E ′, θ′ in the beam’scenter of mass frame of reference [7, p. 78–

1Works with α and β radiation as well.

4

Page 15: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

2. Detection of γ-rays

82] as

E ′ = 1− β cos θ√1− β2 E,

cos θ = 1 + cos θ′1 + β cos θ′ .

(2.1)

2.4 DATA ANALYSIS USINGADDBACK ROUTINES

When an energetic γ-ray hits a detector inthe crystal ball, part of its energy might bedeposited in the surrounding crystals dueto scattering and other nuclear interactionsmentioned in section 2.2. The scattered γ-rays can, in turn, re-scatter resulting in achain of detector interactions, hence mak-ing the reconstruction of a single γ-ray ademanding task. The classical solution isto implement an addback algorithm, typ-ically carried out by taking the sum overthe energies measured by a group of adja-cent detectors, often referred to as a cluster[2, 8].

In order to identify a cluster, and tofind the energy deposited from each parti-cle interaction, it is usually assumed thatthe first interaction will deposit the largestproportion of the total deposited energy,Emax [8]. Scattered γ-rays are likely to in-teract with neighbouring crystals and mayre-scatter, depositing energies E ′i and E ′′j re-spectively.

The most commonly-used addbackroutines are Neighbour and Second Neigh-bour. As the name implies, the Neighbouralgorithm calculates the sum of the initialhit and its direct neighbours, ignoring en-ergies deposited from re-scattering, as

Etot = Emax +∑i

E ′i.

where i are the neighbours to the centralcrystal. Second Neighbor is expanded in or-der to include second-ring crystals j to ac-count for chain-scattering events. The totalenergy is then calculated as,

Etot = Emax +∑i

E ′i +∑j

E ′′j .

These two addback routines are illustratedin fig. 2.3.

(a) First neighbour

(b) Second neighbour

Figure 2.3: Schematic representationsof the two clusters containing the crystalneighbours (a) and the second-neighbours(b) with respect to the initial hit [8].

5

Page 16: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

2. Detection of γ-rays

6

Page 17: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Chapter 3

Artificial neural networks

Artificial neural networks (ANN) have in recent years become an integral part ofmodern society, making its way into our everyday lives in trivial matters such asindividual targeted recommendations of music and movies. As its computationalprowess is used to model a multitude of essential processes: ranging from thenatural sciences to economics and politics. The following chapter aims to give abrief introduction to the extensive subject of ANNs.

Table 3.1: Notation used when describ-ing the key mathematical aspects of artifi-cial neural networks. These are introducedin this chapter and used throughout this the-sis.

Notation Descriptionx Inputy Network outputy Correct output

h(`) Hidden layer of index `W (`) Weight matrix

b(`) Bias vectorD Depth (# hidden layers)N` Width (# neurons per layer)A Activation functionL Loss function

3.1 BASIC CONCEPTSAn ANN are a computing system capableof modelling various types of different non-linear problems. Mathematically, they canbe thought of as an approximation f ofsome function f : x 7→ y, where x is someinput feature data and y the correct output.This approximation is iteratively refined ina processes called training, to be discussedin sections 3.1.2 and 3.1.3.

Getting its name from the analogy tothe human brain, an ANN is comprisedof fundamental components commonly re-ferred to as neurons, each containing anumber called its activation. These neuronsare interconnected to form a network, whichis primarily defined by its structure, i.e. howthe neurons are connected, but also fromhyperparameters defining certain quantitiesand operation conditions of the networks,allowing for tuning of the model.

7

Page 18: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

3. Artificial neural networks

x1

x2

x3

y1

y2

x h(1) h(2) y

Figure 3.1: Diagram of a simple fullyconnected network with two hidden layers(depth two), showing the relationship be-tween the neurons. The interconnecting ar-rows represents the elements in the threeweight matrices W (1), W (2) and W (3).

3.1.1 Forward propagationA common network structure design is togroup the neurons into subsequent layerswith every neuron connected to each neuronin the layers before and after. This is knownas a Fully Connected Network (FCN) andan example of a small1 such network is il-lustrated in fig. 4.19. These layers can berepresented by vectors containing the acti-vations for each of its neurons. The firstlayer of the network serves as the input x,which is connected to the output y througha network of hidden layers h(`). The super-script ` denotes the layer index such thath(0) = x and h(D+1) = y. The depth D isthe total number of hidden layers, and sim-ilarly the amount of neurons in each layersis called its width and is denoted with N`.Both of these quantities are referred to ashyperparameters. Consider a hidden layerh(`). Its state is determined by the layerbefore it from the forward propagation rule

h(`) = A(W (`)h(`−1) + b(`)

), (3.1)

whereW is the weight matrix and b the biasvector, which both are trainable parame-

ters, discussed in sec. 3.1.3. This is com-monly referred to as a Dense connection.The function A acting elementwise is calledthe activation function and is the key com-ponent in any ANN. If these are elementsof the linear functions, the network f wouldcompletely reduce to some linear functions,capable only of solving linear tasks.

To successfully apply networks tonon-linear tasks, non-linear activations areneeded. Perhaps the most common activa-tion is the Rectified Linear Unit, given by

ReLU : x 7−→ max(0, x). (3.2)

This is default activation used throughoutthis thesis, unless otherwise mentioned.

Another commonly used activationfunction is the Sigmoid function given by

σ : x 7−→ 11 + e−x

,

which has the characteristic of producingvalues in the interval between 0 and 1. Thismakes it suitable for output neurons ex-pected to represent some probability as ex-plored in section 4.8.

3.1.2 The loss functionIn order for a neural network to function, itmust first be trained. The process of super-vised training in the context of FCN:s refersto optimizing the set of weights and biases{W (`), b(`)} which minimizes the so calledloss function2 over a given set of trainingdata {x, y}. The loss function, denoted L,is a metric of how well the output of thenetwork y corresponds to the correct labelsof the training set y.

There are a multitude of different lossfunctions, some more suitable for certainkind of tasks. In regression problems, where

1In relative terms compared to those examined in this thesis (see section 4.5).2Other commonly used names are cost function, error function and objective function.

8

Page 19: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

3. Artificial neural networks

f is a continuous distribution, the standardchoice of L is the Mean squared error de-fined as:

MSE(y, y) ≡ 1n

∥∥∥y − y∥∥∥2

(3.3)

where n = ND+1 is the number of neuronsin the output layer.

Another type of regression problemsare those of a binary classification, i. e. trueor false. The question could for examplebe: is this a picture of a cat? The stan-dard choice in this case would be the BinaryCross-Entropy H, given by

H(β, β) ≡ −β log β − (1− β) log(1− β)

where β ∈ {0, 1} is the correct binary valueand 0 < β < 1 is the output of the network.

3.1.3 BackpropagationAccording to eq. (3.1) the output layer y =h(D+1) depends on every preceding layerand the corresponding weights and biasesof the network. Optimization is thus doneby first calculating the gradient of L withrespect to these parameters and then up-dating them through gradient descent by

θ 7−→ θ − η∇θL,

where θ is any such trainable parameter andη is a hyperparameter called the learningrate. This procedure is not restricted onlyto FCN:s and the general principle is thesame also for other types of propagationrules than those in eq. (3.1).

While these gradients may be calcu-lated analytically, they can be computa-tionally expensive to evaluate due to thenon-linear activation functions. Because ofthis, it is often preferred to use the back-propagation algorithm, which instead relies

on the chain rule to produce good approxi-mations of ∇θL.

The computational complexity can bereduced further by dividing the training setinto smaller batches and estimating the gra-dient using the average value of the lossfunction over each batch. This method ofminimizing the loss function subject to theweights and biases of the network, i.e. train-ing the network, is referred to as StochasticGradient Descent [9] and is the principleupon which the optimizer used for this the-sis, called Adaptive Moment Estimation3

[10], is based.During training, the entire training

data set is presented to the network sev-eral times, where each pass is called anepoch. The training data only consists ofa portion of the total data, with the re-mainder used as a validation set. Thisenables a good metric for calculating theperformance of the network called valida-tion loss, which is the loss function calcu-lated over the validation set. Should thevalidation loss increase over several epochs,whereas the training loss decreases, the net-work is being over-trained, i.e. overfitted tothe training data. To prevent this, one canimplement early stopping, which automati-cally halts the training whenever the valida-tion loss stops decreasing over a set numberof epochs, called patience.

3.2 CONVOLUTIONALNETWORKS

A Convolutional neural network (CNN) isa neural network implementing one or sev-eral convolutional layers, usually followedby a FCN. Convolutional neural networksexcel at local feature recognition, makingthem widespread in machine learning ap-plications such as image recognition.

A convolutional layer applies a math-ematical operation on an input tensor in or-der to produce a number of output tensors.

3Often abbreviated as ADAM

9

Page 20: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

3. Artificial neural networks

Focusing on the one-dimensional case, asused in this thesis, the input of one oper-ation consists of a vector and produces anumber of vectors, or feature maps, spec-ified by a hyperparameter. Associated toeach feature map is a smaller vector con-sisting of trainable parameters called a ker-nel (or filter). The elements in each featuremap is produced by convolving the kernelwith the input, i.e. calculating the sum ofeach element in the kernel multiplied withthe respective value of the input, as seenin Fig. 3.2. Here, a uniform bias can alsobe added to the output. To produce thenext element of the feature map, the ker-nel is shifted by a number of indices overthe input matrix given by a hyperparame-ter called the stride.

Figure 3.2: Visualization of how the out-put of a 1-dimensional convolutional layeris formed. This layer features a stride andfilter size of 3. The application of bias isnot shown [4].

The stride should be smaller than or equalto the dimension of the kernel in order forthe convolutions to include each value ofthe input at least once. However, there areseveral design choices to consider. Whenchoosing a larger stride relative to the ker-nel size, one effectively downsamples thefeature map, thus reducing the number oftrainable parameters. In contrast, choosinga smaller stride can result in better recogni-tion of details, i.e. larger sensitivity to vari-ations over fewer indices of the input. An-other thing to account for is that a stridestrictly smaller than the kernel size alwaysresults in values along the edges of the in-put matrix contributing less to the resulting

feature map than those in the center. Thiscan be avoided by utilizing padding, whichis the concept of introducing an interval ofzeroes at the start and end of the input vec-tor. If the length of this interval is such thatthe resulting feature map has the same di-mension as the input, this is called SamePadding. On the contrary, Valid Paddingmeans that no padding is applied.

As mentioned above, a larger stridedownsamples the data and therefore resultsin less trainable parameters. In a CNN,downsampling can also be achieved by ap-plying pooling. A pooling layer collects theinformation of the previous layer by, similarto the convolutional layer, applying a ker-nel over the input and calculating the cor-responding value of the output according tosome rule. Common rules to use for pool-ing is either max pooling or average pool-ing, meaning the kernel selects the maxi-mum or average value from its overlap withthe input vector, respectively. The strideof the pooling kernel is always equal to itssize, hence the pooling operation producesan output downsampled by a factor equalto the kernel size [9].

3.3 GRAPH NETWORKSIn the last decade a whole new subgenreof ANNs has been developed, called Graphneural networks (GNN). First introducedby Scarselli et al. [11], these networks pro-cess data in graph domains. A graph G isa data structure comprising a set of nodesand their relationships to one another. Cur-rently, most GNNs have a rather universaldesign in common, and are distinguishedonly by how a layer h(`) is aggregated fromthis graph.

An adjacency matrix A is a represen-tation of any finite G. Its binary elementsindicate whether pairs of nodes are adja-cent or not. For the purpose of this thesis,G is an undirected graph and therefore, Ais diagonally symmetrical.

10

Page 21: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

3. Artificial neural networks

Each network layer h(`) is aggregatedas h(`) 7→ a(h(`), A). he next layer in theforward propagation can thus be describedby

h(`+1) = A(W (`+1)a + b(`+1)).

Different GNNs vary in the choice of theparametrization of a. One such choice isthe spectral rule (3.4) as described by T. N.Kipf and M. Welling [12]. The aggregationtakes the form

a(h(`), A) = D−12 AD−

12 h(`), (3.4)

where D is the diagonal degree matrix of Gand A = A+I (I being the identity matrix).The latter is needed in order to include selfloops in the aggregation. This is similar tothe Dense connection eq. (3.1), with theonly difference being in the aggregation ofh(`). This it is referred to as a GraphDenseconnection.

Let us consider the aggregation of thei:th node in the hidden layer h(`). The spec-tral rule eq. (3.4) gives

ai(h(`), A) =(D−

12 AD−

12 h(`)

)i

=∑n

D− 1

2in

∑j

Aij∑m

D− 1

2jm h

(`)j

=∑j

Aij√DiiDjj

h(`)j ,

where the final simplification comes fromD being diagonal. This can roughly bethought of as the sum of h(`)

i and its neigh-bouring nodes, which explains the inclusionof the identity matrix in A4.

In conclusion, a network using theseprinciples has the advantage of having rela-tionships of every node, i.e. neuron, knownfrom the start. This could potentially re-

duce the time of training since a non-GNNwould need to learn these relationships.

3.4 BATCH NORMALIZATIONIt has been shown that the training rateof a single neuron is greatly improved ifevery element in h(`) is normalized givingthem a mean value of zero and unit vari-ance [14]. In ANNs each layer h(`) gets ittrainable parameters updated proportionalto the partial derivative of the loss func-tion of the following layer h(`+1). Hence, ifthe outputs from two layers differ greatly invalue, the gradient from the smaller outputmay become vanishingly small in compari-son, which is often referred to as the vanish-ing gradient problem. This may effectivelyprevent certain neurons from changing theirvalues and, in the worst case, completelystop the neural network from further train-ing.

The normalization thus ensures thatthe activations for all neurons are of thesame order of magnitude, which gives themequal opportunity to influence the learningof their connected neurons. For a layer ofwidth N , each element h(`)

i , can be normal-ized as

h(`)i 7−→

h(`)i − E[h(`)

i ]√Var[h(`)

i ](3.5)

where the expectation, E[h(`)i ] ≈ µb, and

variance, Var[h(`)i ] ≈ s2

b , are estimated withthe mean µb and sample standard deviatonsb over a batch denoted b of training data,hence the name batch normalization [15].

The process of normalization pre-sented thus far can however change, or evenlimit, what the layer can represent. For in-stance, normalizing the inputs of the sig-moid function will limit it to be approx-

4Without it, the aggregation of the i:th node would just be the mean of its neighbours exluding the nodeitself! For the interested reader: a more comprehensive explanation behind the normalization using thedegree matrix can be found in [13]

11

Page 22: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

3. Artificial neural networks

imately linear. The power of representa-tion is easily regained by ensuring that thetransformation inserted in the neural net-work can represent the identity transform.This is achieved by introducing two train-

able parameters for each layer to be nor-malized: one for scaling, and another forshifting, allowing it to take on any range ofnumbers.

12

Page 23: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Chapter 4

Method development

The core of this thesis lies in exploring various methods using neural networks andthe investigation of how well they perform event reconstruction of γ-rays. Thischapter presents the key aspects of the procedural development of such methodsand the resulting set of networks selected to be compared to the addback routinein the following chapter. Although a fair amount of effort was put into the analy-sis of e.g. FCNs and CNNs, perhaps the most noteworthy aspect is the redesignedloss function described in section 4.3.2.

4.1 BUILDING THENEURAL NETWORKS

The objective of these networks is to recon-struct the energies and directions for eachof the γ-rays in the initial reaction from aset of detected energies {Ei}. In the caseof the Crystal Ball detector, the input con-sists of values for each of the N0 = 162 crys-tal detectors. Every network is trained oncomputer-generated data, to be discussedin sec. 4.2.

All neural networks have been imple-mented usingKeras [16], a Python API run-ning on top of Google’s machine learningframework Tensorflow [17]. This allows usto work at a high level, using Keras’ prede-fined enviroments for common features ofANNs (described in section 3), while utiliz-ing the hardware-optimized computationalbackend of Tensorflow. Keras describes thestructure of a neural network through theabstract Model-class, which is implementedto create the different network structuresdescribed in sections 4.5 and 4.6. The re-

sulting product can be found at https://github.com/PajterMclovin/DetNet.

The majority of the network trainingwas carried out using one of two local ma-chines; one equipped with a Nvidia GTX1060 GPU and the other with a 1080 Ti,both machines operating on quad-core 6thgeneration Intel Xeon processors and 32 GBof RAM. For final training and evaluationof convolutional networks, the HPC sys-tem Kebnekaise at HPC2N, Umeå Univer-sity, Sweden, as as part of Swedish NationalInfrastructure for Computing (SNIC) [18],was utilized. These tests ran using nVidiaK80 GPUs on nodes with 14-core 5th gener-ation Xeon processors and 64 GB of RAM.

4.2 DATA GENERATIONTHROUGH SIMULATION

The training of an ANN requires an exten-sive pre-labeled data set. For the purposesof this study, such data sets must containthe original and measured energies in everydetector crystal of the Crystal Ball for each

13

Page 24: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

individual event. This is generated using aC++ based toolkit called GEANT4, a stateof the art tool to model the interactions ofparticles with matter. Modeling these in-teractions, however, is a difficult task due tothe large number of coupled degrees of free-dom and the quantum-mechanical, i.e prob-abilistic, nature of the processes. Hence,GEANT4 utilizes Monte Carlo simulationsto produce numerical solutions [19].

The final modeling was performed us-ing ggland, a wrapper program built aroundthe GEANT4 library, which allowed for thesimulation of high-energy particle eventswithin the Crystal Ball geometry [20]. Theggland simulations are constructed usingan arbitrary number of particle sources, re-ferred to as guns, placed at the center of thegeometry. Each gun simulates the emissionof a single particle, with an energy and an-gle which is randomized within the bound-aries of a predetermined distribution, e.g.to emulate the scattering from target nu-clei used in experiments.

To handle the large amounts of dataproduced within the ggland simulations,

0 2 4 6 8 10energy [MeV]

100

102

104

106

coun

t / 10 ke

V

Figure 4.1: Semi-log plot showingthe energy distribution from the GEANT4simulations which was used for trainingthe networks. It is uniform in the interval0.01− 10 MeV with an additional numberof about 106 empty events.

a C++ framework for data processing calledROOT was used. The ROOT-frameworkwas originally developed by CERN in or-der to easily save, access and handle thepetabytes of data generated in accelerator-based physics experiments [21].

For a given maximum multiplicity m,i.e. the maximum number of γ-rays emittedduring an event, there are m + 1 subsetsof the correct label set y. One that repre-sents the case where there are no γ and yis all zeros, and the rest containing betweenone and m γ-rays. Since the simulated datadoes not include the former, these are sub-sequently added in a given amount, see Fig.4.1.

4.3 LOSS FUNCTIONSThe loss function is undeniably a criticalpart of the training of a neural network,and needs to be treated accordingly. Thissection explains some different approachesin defining L that were investigated duringthe development of this thesis. Ultimatelythe so-called photon momentum loss was se-lected, a rather natural definition using themomentum vectors of the γ-rays in carte-sian coordinates.

4.3.1 Investigating different lossfunctions

During early development of the neural net-works in the Keras enviroment, a numberof loss functions were implemented. Pri-marily the non-relativistic modified MSE-loss functions of previous studies [3, 4] weretested. One such MSE variant is shownin eq. (4.1), where ∆Ei = Ei − Ei andanalogous for the angles. Note variablesmarked with a hat represent the correct la-bel, whereas those without are the outputof the network. The coefficients labeledλ are used to combine the different termsand were optimized by Karlsson et al. [4],and their final choice for the equation above

14

Page 25: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

L{Ei, θi, φi} = 1m

m∑i=1

[λE(∆Ei)2 + λθ(∆θi)2 + λφ

((∆φi + π) mod 2π − π

)2]

(4.1)

L{Ei, θi, φi} = 1m

m∑i=1

[λE(∆Ei)2 + λθ(∆θi)2 + λφ

(1− cos ∆φ

)](4.2)

was to set all λ = 1. Whether this holdsfor other network structures is a differentquestion, which was not investigated in thiswork. They also showed in their plots thatthis cost function in combination with theirdata sets produced artefacts, mainly in thespatial reconstruction. The polar angle θtended to be reconstructed too low, whereasthe azimuthal angle φ was reconstructedtoo low for φ < π and too high for φ > π.

When implementing the same lossfunction in Keras, the artefacts persisted.This is expected, as the artefacts likely area result of many different inconsistencies inthe treatment of the angles and not dueto the computational framework. For in-stance, consider an event with perfect re-construction for the polar angle ∆θ = 0but with ∆φ = π. The spatial part of thefunction would in this case yield a cost ofλφπ

2, regardless of θ. This is bad, becausefor θ = 0 this would be a perfect recon-struction, wheras for θ = π/2 it would bethe worst possible. Because the coordinatesare interdependent in terms of distance, ameans of addressing this issue could be toweigh the azimuthal term with cos θ as sug-gested by Karlsson et al., making the ex-pression look more and more like a trans-formation to cartesian coordinates. This,among other things, inspires the use of amomentum loss function which will be dis-cussed in section 4.3.2.

Another reason for the artefacts inthe azimuthal angles when using the lossfunction in eq. (4.1) was suspected to stemfrom the use of the modulo function. Be-ing a non-continuous function, it is not a

preferred choice in contexts of optimizationwith gradient descent. Because of this, acontinuous replacement as seen in eq. (4.2)was tested.

Unlike the modulo function this ex-pession has a continuous gradient every-where. Furthermore, the gradient of thisfunction also points towards the closestminima, which was thought to be anotheradvantage of this loss function. However,through testing it became clear that thisfunction produced artefacts similar to thoseof eq. (4.1), likely due to the separate treat-ment of the angles.

4.3.2 Photon momentum lossSince the energy and direction of a photoncan be described by its momentum vectorp, it is a natural choice of quantity for thenetwork to reconstruct. Continuing alongthis path, p is here represented using nat-ural units (E = ‖p‖, c = 1). Let m be themaximum multiplicity. The output of thenetwork thus takes the form of

Y = (p1, . . .pm) ∈ R3m.

In practice, each vector can be identifiedwith three cartesian coordinates, hence Y ∈R3m. For the case m = 2, the output wouldbe

Y = (p1x, p1y, p1z, p2x, p2y, p2z).

Any non-existing photon, in either input oroutput, is represented by the zero vector.

15

Page 26: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

With the output being the reconstructedmomentum vectors of the γ-rays, perhapsthe most logical choice of loss function,based on the mean squared error introducedin eq. (3.3), is simply

L{pi} =m∑i=1

∥∥∥pi − pi∥∥∥2, (4.3)

where pi is the corresponding correct vec-tor. Since m is the same for each individ-ual network, the factor 1/m is omitted inthe python scripts to reduce the number ofunnecessary floating point operations.

This error is still, as usual, averagedover the entire batch during training. Notethat the use of cartesian coordinates han-dles the angular relationships automaticallythrough the change of coordinates, also re-moving the need for periodicity in the lossfunction.

From testing with simpler networks,it was verified that this loss function doesnot cause any angular artefacts such asthose discussed previously in section 4.3.1.Furthermore, this loss function features anaffine gradient with respect to p and istherefore more computationally efficient tooptimize.

4.3.3 Permutation lossFor multiplicites m > 1, each pi has tobe paired with its corresponding pi. Sincethere is no inclination for the network toproduce (p, p′) over (p′, p) for instance,these would result in two completely differ-ent values of L. Olander et al. [3] solved thisproblem by calculating the loss function forall the m! possible combinations and thenselecting the one that has the minimal loss,i.e the least mean squared error. This couldbe done by having a script doing this assign-ment in a series of for-loops, but for back-propagation it would be necessary to writean additional gradient calculation.

The alternative is to use the default sym-bolic operations of Tensorflow. Let

Y = (y1, y2, . . .yn)T ∈ Rn×3m

be a output batch of size n and Y its corre-sponding labels. To calculate the mean lossas described above, consider the permuta-tion tensor P of order 3 which compriseseach of the m! permutation matrices. Withm = 2 for instance, the 2! = 2, such matri-ces are the identity matrix P12 = I and

P21 =

0 0 0 1 0 00 0 0 0 1 00 0 0 0 0 11 0 0 0 0 00 1 0 0 0 00 0 1 0 0 0

.

Together with the identity tensor E, a ten-sor similar to P but only with m! identitymatrices, the error E for all possible com-binations in the batch can be described inindex notation as

Eijk = Yi`E`jk − Yi`P`jk,

where the last index in E , E and P goesover the m! permutations and identity ma-trices, respectively. The purpose of E is justto make copies of Y in order to pair themwith every permutation of Y . The squarederror is then deduced from

Lij = EikjEmknδimδjn,

which is a Rn×m!-matrix containing the lossfor every pairing. Here δ is the Kroneckerdelta defined as

δij =

1, if i = j,

0, otherwise.

The returned loss function is thus given

16

Page 27: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

by minimizing these for each reconstructionand then taking the batch average, i.e

L = 1n

n∑j=1

mini≤m!

Lij.

This is the loss function used throughoutthis thesis.

4.4 METRICS OFPERFORMANCE

When analyzing the networks, a good wayto represent the photon-reconstruction isthrough the “lightsaber” plot. Used byboth previous projects [3, 4], they comein the form of three histograms, with axesshowing the correct and reconstructed val-ues of E, θ and φ on abscissa and ordinate,respectively. For a perfect reconstruction,the plot should show the identity repre-sented by the blue diagonal lines. By utiliz-ing the momentum as mentioned in section4.3.2, we also introduce a new metric forevaluating network performance, the MeanMomentum Error (MME), as

MME = 1N

N∑i=1

∥∥∥pi − pi∥∥∥, (4.4)

where N is the number of data points. Thisis also simply referred to as the Mean Er-ror, and serves as a good scalar metric ofhow well a network performs. Since thisvalue treats all data points in R3 equally, itaccounts for miss-identification of the exis-tence of γ-rays by introducing an error ei-ther equal to the momentum of the recon-structed non-existent photon, or the non-reconstructed existing photon.

However, in order to quantify whethera network consistently succeeds at recon-structing existing photons as such, theevents can be ordered into four categoriesQe, Qi, Qm, and Qo

1, based on their cor-

rect and reconstructed energies as shownin table 4.1. As mentioned in section 4.2,the standard way of labeling a non-existentphoton is by setting its energy and all itsangles to zero. Since a network withoutclassification neurons cannot explicitly la-bel a photon as non-existent, a threshold εis defined, with energies reconstructed be-low this threshold signifying non-existentphotons. The value ε = 0.01 MeV is used,which is the lower limit for the energiesgenerated in our data sets. Counting thenumber of events in each category gives ad-ditional information about the behavior ofthe network.

Table 4.1: The different categories of γ-ray reconstructions. The labels Qe and Qo

refers to those correctly registered as exist-ing or non-existing, respectively, while Qi

and Qm are the those invented or missed bythe network.

E = 0 E > 0E > ε Qi Qe

E < ε Qo Qm

Note how the number of events that end upin either one of these categories is highlydependent on the threshold. A Qi photoncould just as well be counted as a Qo pho-ton using a higher threshold. Still, thosenumbers provide an idea of how a certainalteration of a network changes its behav-ior. However, to more clearly visualize theworkings of the network, the mappings ofthese categories are altered slightly com-pared to the previous reports [3, 4] to forma new type of plot shown in Fig. 4.2. Here,the Qi events are displaced horizontally to arandom point within the vertical bar, as toshow the energy at which the γ-rays in thiscategory are invented all while maintainingan appropriate z-axis for the plot..

1Those in the category Qo are mapped close to the origin, hence the ”o”.

17

Page 28: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

Figure 4.2: Reconstruction of energy E, polar angle θ and azimuthal angle φ forγ-rays of maximum multiplicity m = 2, using a FCN of uniform width N = 80 anddepth D = 4. The network is trained on a total of 1.5 × 106 events, including anadditional 5× 105 of fully empty events. Notice the lack of missing reconstructionsin the Qm area.

4.5 FULLY CONNECTEDNETWORKS

Perhaps the most elementary type of neuralnetwork is the FCN, making them a reason-able choice to investigate first. As describedin section 3.1.1, a FCN of depth D consistsof D + 1 Dense connections, each with awidth of N` neurons. The FCN:s examinedin this thesis are predominantly of uniformwidth N` = N, ∀`. However, different net-work architectures with alternating widthsare investigated in section 4.5.4.

Figure 4.3 shows a flowchart describ-ing the general structure of the FCN. Theinput are the detected energies {Ei} fromthe N0 = 162 crystal detectors and the out-put are the m momentum vectors {pj} ofthe reconstructed γ-rays. After each Denseconnection a ReLU-function, given by eq.(3.2), is applied with the exception of theconnection to the output. Here a linear ac-tivation is used for it be able to generatenegative values, since ReLU ≥ 0, by defini-tion.

Input{Ei}

Dense

Dense

Dense

Output{pj}

...

Figure 4.3: Flowchart of a FCN with ar-bitrary depth connecting the input detectorenergies Ei with the reconstructed photonmomenta pj. The coloured boxes representsthe Dense connections between each of thelayers, described in eq. (3.1).

18

Page 29: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

4.5.1 Maximum multiplicityof detected γ-rays

While the number of output neurons isfixed, it is still required that the ANN:s areable to train on any given maximum mul-tiplicity of γ-rays. However this report ismostly considering the case m = 2.

Fig. 4.4 shows not only that theFCN:s are capable of this, but how well theyperform with different m as well. A rangeof FCN:s of varying depths, all with a uni-form width of 80, is trained on 1.5 × 106

simulated events with an additional 5× 105

completely empty events. The case wherem = 2 is also presented in Fig. 4.2. Highermaximum multiplicities give rise to moreuncertain reconstructions as suggested bythe ordering of the mean errors of differ-ent m in the figure. This is not surprisingconsidering the calculation of the permuta-tion loss function scales with m!, makingthe training of the network more difficult.

4.5.2 Utilizingbatch normalization

To determine if batch normalization is ben-eficial to the networks performance, a se-ries of FCN:s with or without utilizingbatch normalization is compared to eachother. The normalization is applied beforethe ReLU activations. Figure 4.5 shows aflowchart of these blocks of dense connec-tions and batch normalizations.

Every network of uniform width 64was trained on a maximum of 300 epochsand early stopping with a patience of 20epochs. Figure 4.6 shows that using batchnormalization in this case did not improvethe performance of the FCN:s for depthsbelow 20. The time for these to train didnot differ significantly either. Those deeperthan 20 seem to benefit from using the nor-malization but with an overall higher meanerror than networks with lower depths.

0 5 10 15 20 25 30Depth

0.0

0.2

0.4

0.6

0.8

MME [M

eV]

12

34

5

Figure 4.4: Mean errors in recon-structed energies for models of depths be-tween 0 and 30. Every hidden layer is ofwidth 64. The models were trained on datasets of different maximum multiplicity, andare coloured accordingly.

Input{Ei}

Dense

BatchNorm

Dense

BatchNorm

Dense

Output{pj}

...

Figure 4.5: FCN of arbitrary depthusing batch normalization. Every hiddenlayer is of uniform width N = 64, save thelast one connected to the output of width3m. Note that the activation function is ap-plied at the end of each block.

19

Page 30: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

0 10 20 30 40 50 60Depth

0.00

0.25

0.50

0.75

1.00

1.25

1.50

MME [M

eV]

without BNwith BN

Figure 4.6: Mean errors in recon-structed energies for models of depths be-tween 0 and 60 with our without batch nor-malization. Every hidden layer is of width64.

4.5.3 Optimization ofnetwork dimensions

Every Dense connection is defined by aweight matrix W (`) and bias vector b(`) asintroduced in section 3.1.1. Between layer` and `+ 1 there are N`+1(N`+ 1) trainableparameters and for a network with a fixedwidth N throughout, the total number ofparameters in a FCN with D > 0 is thusequal to

N(N0 + 1)+N(N + 1)(D − 1)+3m(N + 1),

where N0 is the number of neurons in theinput layer and 3m in the output respec-tively.

To deduce the optimal configurationof D and N in means of best performancefor the least amount of parameters, a se-ries of different FCN:s was trained sys-tematically and then evaluated. With themean error of the reconstructed momentaas the measurement of performance, a con-tour plot was created with Clough-Tocher

interpolation implemented with the Pythonlibrary SciPy [22]. This produced a piece-wise cubic C1-surface shown in figure 4.7.Each network was trained with a maximumof 300 epochs and using early stopping withpatience 5. Furthermore, a learning rate of10−4 is used throughout the testing, as itwas deemed optimal for similar networks inthe same task. [4]

Fig. 4.7 shows that FCNs with depthslarger than about 5 hidden layers do notimprove the performance. In fact a depthD = 4 and width of about N = 100 neuronsper layer seems to be the most favorableconfiguration considering its low number ofparameters being about 5 × 104. This ismarked with a red cross in the figure.

0 10 20 30Depth

0

50

100

150

200

Width

0.5

1.0

1.5

2.0

2.5

3.0

3.5

MME [M

eV]

Figure 4.7: Contour plot showing the in-terpolated mean errors for FCN:s with dif-ferent depths and widths where each suchconfiguration is marked with a dot. The redcross shows the selected FCN configurationused in the comparison with addback in thefollowing chapter.

4.5.4 Comparison of architecturesUntil now, only FCNs of uniform widthhave been investigated. To see how thesecompare to non-uniform networks, threesuch FCNs of different architectures are an-alyzed. These are the Triangle, Bottle andInverted bottle designs.

20

Page 31: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

Input{Ei}

(140)

(118)

(96)

(74)

(52)

(30)

(6)

Output{pj}

Figure 4.8: The Triangle architec-ture with a depth 6 and maximum multi-plicity 2. Each dense connection (coloured)is marked with the width N` of subsequenthidden layer.

Firstly figure 4.8 shows the Triangle designwith depth D = 6. The width N` decreaseslinearly between the input and the outputlayer as

N` = N0 −N0 − 3mD + 1 `.

Every dense connection is activated by theReLU-function, save the last one whichyields the output through a linear activa-tion.

The second design is shown in figure4.9. It has two separate uniform parts, thefirst has a width 64 and the other narrowerhalf has 32, hence the name Bottle. Be-cause this has two parts of equal depths,this design must be of an even depth.

Input{Ei}

(64)

(64)

(64)

(32)

(32)

(32)

(6)

Output{pj}

Figure 4.9: The Bottle architecture witha depth 6 and maximum multiplicity 2. TheInverted bottle is the same, but with the twohalves interchanged

0 50k 100k 150k 200kNumber of parameters

0.3

0.4

0.5

0.6

0.7

MME [MeV

]

uniformtrianglebottleinverted

Figure 4.10: Mean momentum error forthe four different FCN architectures againstthe number of trainable parameters. Sincefewer parameters are often much faster totrain, this shows that there is no real reasonnot to use the uniform design.

21

Page 32: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

4.6 CONVOLUTIONALNETWORKS

4.6.1 Rearranging the inputFor a CNN to extract local informationfrom neighboring crystals, the input datamust first be reshaped in a way that accom-modates for the geometry of the detector.Based on the work of Karlsson et. al. [4],this is done by applying a sparse binarytransformation matrix that selects energiesfrom specific crystals and arranges theminto cluster arrays representing different ar-eas of neighbouring crystals. The first ele-ment of each cluster array contains the en-ergy of the central crystal, which is eitheran A or a D crystal, followed by the energiesof its neighbours and second-neighbours.Note that in this context, a cluster refers tothe collection containing the central crystalalong with all of its neighbours and secondneighbours — not only the subset of thoseregistering a significant energy, as it is usedin section 2.4. This results in clusters of sizeSA = 16 for for each of the NA = 12 A crys-tals and SD = 19 for each of the ND = 30D crystals, which combined cover the entireCrystal Ball with some overlap. An exam-ple of a D cluster can be seen in Fig. 4.11.

Figure 4.11: Schematic representationof a D cluster. Note that the shapes andangular relationships of the crystals differslightly from the real detector.

Karlsson et. al. [4] chose to sort eachcluster array by selecting the closest crystalto the beam exit as the first element amongthe neighbours and continue in counter-

clockwise order (as seen from inside the de-tector). The same sorting was then madeseparately for the second-neighbors. Thistype of sorting, here abbreviated BE forBeam Exit-sorting, has problems due tooccasional shifts among the indices of thesecond-neighbours and thus does not cor-rectly take the local neighbourhood of thegeometry into account. Due to this, we in-troduce a new sorting called CCT-sortingfor Consistent Crystal Type.

For CCT, the cluster array centeredaround the crystal c can be written as(c n m) where n and m are arrays of theneighours and second-neighbours to c, re-spectively. The closest neighbours n aresorted counter-clockwise in the same way asfor BE, with the exception of taking crys-tal shapes into account whenever applica-ble (i.e for D-crystals). In this case, thefirst neighbour element in the cluster arrayis the B-crystal closest to the beam exit. Tosort the second-neighbours, we select thefirst element m1 as the second-neighbourto c which is also neighbour to the firsttwo entries of n. Mathematically, we dothis by defining two sets N1 and N2 as theneighbours to the first and second elementof n, respectively. The first element of thesecond-neighbours can then be chosen as

m1 = {N1 ∩N2} \ {c}.

After this, m is formed by continuingcounter-clockwise, completing the clusterarray (c n m).

For the pattern recognition of a CNNto be effective, one would prefer to notdistinguish between the rotational symme-tries of these clusters. For instance, a spe-cific pattern produced by a photon whenscattered between three C-crystals of a D-cluster should ideally be detected by thesame kernel regardless of their position rel-ative to the central D-crystal. A similarargument can be made for including mirror

22

Page 33: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

images of the clusters, enabling a specifickernel to operate symmetrically on oppos-ing clusters. To account for these symme-tries, the cluster vectors are extended toinclude each rotated and reflected state ofthe cluster.

Figure 4.12: Schematic representationof how the input vector to a branch of theCNN contains the rotations and reflectionsof a specific neighbourhood of crystals.

Each rotated state features a copy of theprevious, but with indices of the neighboursincreased by one and second-neighbours bytwo (with wrapping), resulting in a counter-clockwise rotation. To handle reflections,the initial sorting of the neighbours is in-stead carried out clockwise, resulting in areflection over the axis through the cen-tral crystal to the first neighbour. Incorpo-rating these concepts in the transformationmatrix generates the complete input to theCNN as shown in Fig. 4.12.

4.6.2 CNN modelsDue to the different geometries of the Aand D clusters, different kernel and stridelengths need to be applied for each centralcrystal type in order to summarize the in-formation from the cluster. Therefore, thecluster vectors are fed to a sequence of par-allel 1D-convolution layers. For each clus-ter type, the first convolutional layer has astride and kernel length equal to the num-ber of neighbours in the cluster, i.e SA = 16for the A clusters and SD = 19 for the

D crystals, summarizing the information ofeach cluster for a specific orientation. Tohandle the rotations and reflections, the lastconvolutional layers each have stride andkernel length equal to rA = 10 for the A-clusters and rD = 12 for the D-clusters.Since the kernel- and stride length of theconvolutional layers are chosen with regardto the structure of the cluster vector, nopadding is applied. By doing so, the outputof these convolutional layers corresponds tothe NA = 12 A clusters and the ND = 30D clusters. At this point, the A- and D-branches of the network can be joined to aseries of fully connected hidden layers, andfinally connected to an output layer.

Input{Ei}

Transf. A

Conv A (cluster)

BatchNorm

Conv 1× 1 layers(CFCF only)

Conv A (orient.)

BatchNorm

Transf. D

Conv D (cluster)

BatchNorm

Conv 1× 1 layers(CFCF only)

Conv D (orient.)

BatchNorm

Concat.

FCN

Output{pj}

Figure 4.13: Overview of the CCF andCFCF models with its two branches. Thispicture shows a version where Batch Nor-malization is applied. The Conv 1x1 layersare bypassed for the CCF model.

23

Page 34: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

Due to this structure, the model is referredto as the CCF-model (Convolutional - Con-volutional - Fully connected) and is shownin Fig. 4.13. ReLU activations are appliedto all convolutional and FC-layers, exceptfor the last layer which has a linear activa-tion.

The CCF model can be extended byintroducing additional layers between thefirst and last convolutional layers. Specifi-cally, a number of convolutional layers withkernel and stride of 1 are added, somethingthat introduces more trainable parameterswithout reshaping the output. These layersare set to produce as many filters as theirinput layer, and are shown as “Conv 1x1”layers in Fig. 4.13. The introduction ofthese layers can be interpreted as “emulat-ing” the transformations of fully connectedlayers for the feature maps, and thereforethis is referred to as the CFCF model. Intheory, this design should introduce greaterheadroom for learning patterns in the firstconvolutional layer. Furthermore, batchnormalization can be added to both modelsafter each convolutional layer.

4.6.3 Optimizing hyperparametersof the CNN

While the kernel and stride lengths of theCNN are given by the model as mentionedabove, there are still many design consid-erations regarding the hyperparameters ofthe network. To simplify things, the learn-ing rate is kept fixed at 10−4 and the num-ber of feature maps f is kept constant be-tween the corresponding layers of the dif-ferent branches.

The initial tests consisted of optimiz-ing the hidden layers of the CCF model.The tests were done with and without BN,training on data with a maximum multi-plicity of 2 until halted by early stopping.The final hidden layers used a fixed widthof 80. While this is wide in comparison to

the number of output neurons, it is still notas wide as the concatenated output of thetwo branches given that 32 and 16 filtersare used for the first and last convolutionallayers, respectively. Training such modelsusing a depth of 3, 5, 7 and 10 and eval-uating them on a validation set yields theresult shown in Fig. 4.14. From this figure,it becomes obvious that the CCF modeldoes not benefit from BN, although it maybe practical for higher multiplicities wherethe increased depth may result in a rela-tively lower MME. The figure also showsthat this particular configuration of the net-works does not improve with more hiddenlayers.

3 4 5 6 7 8 9 10Depth

0.575

0.600

0.625

0.650

0.675

0.700

0.725

0.750

MME [M

eV]

Without BNWith BN

Figure 4.14: Mean momentum error forCCF models with and without implementa-tions of batch normalization.

To investigate the effect of number of filtersin the CCF model, the model is trained asbefore but with its FCN depth set to 3 withBN disabled. By compiling models withvarious numbers of filters at the first andlast convolutional layer, and again trainingthem until halted by early stopping, themodels were tested using validation data.MME for the tested filter configurationsare shown in Fig. 4.15, where one can seethat filter configurations of more than 16 foreach layer do not impact the performanceof the network significantly. The smaller

24

Page 35: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

configurations of 8 filters for the last layer,however, prove to be detrimental in terms ofMME. Note that the models with more fil-ters in the first layer tend to perform worsethan those with an equal number in bothlayers.

[8, 8] [16, 8] [16, 16] [32, 16] [32, 32] [64, 32] [64, 64]Filter configuration

0.560.570.580.590.600.610.620.630.64

MME [M

eV]

Figure 4.15: Mean momentum errorfor different filter configurations of the CCFmodel. The numbers [a, b] represent thenumber of filters in the first and last layer,respectively.

From this, one may draw the conclusionthat 16 filters for each layer is the optimalconfiguration. However, with hopes of de-tecting greater multiplicities, the number offilters would likely have to increase. Thesame argument can be made for the depthof the network. To test this, four differentCCF models were again trained using datawith a maximum multiplicity of 5. Thesemodels consisted of all combinations of [16,16], [32, 32] filter configurations and depthsof 3 and 5, with other hyperparameters asbefore. Here, the most expensive networkto train in terms of number of parame-ters, i.e [32, 32] filters, D = 5 turned outto yield an MME 0.041 MeV smaller thanthe cheapest, while having around twice asmany trainable parameters.

To compare the CFCF model to theCCF, a number of models are trained ondata with a maximum multiplicity of 2.

The models use 32 convolutional filters perlayer and 3 hidden layers with a width of80. Each CFCF features 1-10 intermediateconvolutional layers. The validation MMEis shown in Fig. 4.16. Comparing the resultto the 32 filter configuration of the CCFfrom before, one can see a slight improve-ment in MME when using 2–3 intermediatelayers. However, the absence of a trend inthis plot suggests that the training of thesenetworks is limited by the learning rate.

0 2 4 6 8 10Depth

0.54

0.55

0.56

0.57

0.58

0.59

0.60

MME [M

eV]

Figure 4.16: Mean momentum error forthe CFCF model when using different num-bers of intermediate layers.

4.6.4 Dealing with existenceBased on the category system introducedin Table 4.1, the events reconstructed bythe CNN has in the aforementioned testsvery nearly consisted of 75% Qe photonsand 25% Qi photons for data with maxi-mum multiplicity of m = 2. For m = 5,these numbers were very close to 60% Qe

and 40% Qi. These numbers seem fitting,as for instance in the case of m = 2, thedata contained a 1:1-ratio of single and dou-ble γ-ray events. The reason for other cate-gories remaining (almost) empty is that thenetwork rarely reconstruct low enough ener-gies to fall below a threshold of ε = 0.01, letalone without being presented empty events(no registered energy, all labels set to zero).

25

Page 36: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

Figure 4.17: Histogram over the reconstructions of the CFCF model trained ondata with m = 2 including empty events. This particular model uses [16, 16] filters,a depth of D = 3 and 3 intermediate convolutional layers.

By training with empty events, amore accurate reconstruction was hoped tobe achieved; both in terms of MME andthat few γ-rays are missed or invented bythe network (false positives). This goal canonly somewhat be formalized by finding anetwork such that the Qi events are of lowerenergy. To observe this, several CNN mod-els were trained on data with m = 2 andan additional 5×5 empty events. The bestin terms of MME was a CFCF model with16 filters in each layer, 2 intermediate lay-ers and 3 hidden layers. Its reconstruc-tion, shown in Fig. 4.17, yielded an MMEof 0.296 MeV.

Figure 4.18: Histogram comparsion be-tween the Qi events of CFCF models trainedwith and without empty events.

Comparing this to earlier trained models, itis clear that the MME is greatly impactedby training with empty events.

From retraining the same model ondata not containing empty events and com-paring the distribution of Qi events, as seenin Fig. 4.18, it is clear that it also improvesthe network in this regard.

4.7 GRAPH NETWORKSThe motivation to use the graph G ofthe neighbouring detector crystals is un-derstood through the aggregation (3.4) andits relation to the addback routines as de-scribed in section 2.4. Using this in thebeginning of a network integrates the de-tected mean energy in each neighbourhoodof the input neurons (i.e. the crystal de-tectors), just as addback does, into a fullyconnected network. Without this the net-work does not know how the neurons arerelated to each other and has to learn thisduring training. Since G is known, the con-struction of such a GNN is rather straight-forward.

First, the adjacency matrix A, whichencodes most of the information regard-ing G, is constructed. It is a symmetric

26

Page 37: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

162× 162-matrix where

Aij =

1, if crystals i, j are neighbours,0, otherwise.

As mentioned in section 3.3, the matrixshould include self loops as well. This isdone by adding the identity matrix, i.e.A = A + I. The diagonal node degree ma-trix D is simply the row-wise sum of A inits diagonal.

Figure 4.19: The graph G representingthe relationships of every crystal detector inthe Crystal Ball. Each node is mapped tothe surface center of corresponding crystal.

A GraphDense connection is applieddirectly on the input layer and preservesthe width N0 to the following layer. Be-ing very similar to a Dense connection withthe same kind of trainable parameters Wand b, the number of trainable parametersis given by N0(N0 + 1) = 26 406 in the caseof the Crystal Ball detector.

Testing different GNN:s can be donein multiple ways. The selected approach isto compare 3 such networks with the uni-form FCN, which was investigated in sec-tion 4.5. The first network GNN1 is illus-trated in figure 4.20, consisting of a singlegraph convolution and a subsequent FCN.

Input{Ei}

GraphDense

FCN

Output{pj}

Figure 4.20: The GNN1 model witha single GraphDense connected to a con-secutive FCN. GNN2 is similar in designwith one additional GraphDense directly af-ter the first one.

Input{Ei}

GraphDense

GraphDense

GraphDense

Concat.

FCN

Output{pj}

Figure 4.21: The GNN3 design with itsthree branches which are concatenated to aN = 3 × 162 wide hidden layer. This isfollowed by a FCN before the output layer.

The second, called GNN2, is the same asGNN1 with an additional convolution di-rectly after the first one. While GNN1 isfeeding the neighbourhood mean energy for

27

Page 38: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

each node, GNN2 deals with the more ex-tensive area including second neighbours aswell.

The last one is unsurprisingly namedGNN3 and stands out from the other twowith three branches between the input neu-rons and the following FCN. This coversboth the smaller neighbourhood as GNN1does and likewise the larger one as the sec-ond design.

The third branch in GNN3 is sup-posed to further include the individual en-ergy deposits as well. As shown in figure4.21, these three branches are concatenatedfeeding the FCN with a 3× 162 wide inputlayer.

0 2 4 6 8 10FCN depth

0.2

0.3

0.4

0.5

0.6

MME [M

eV]

FCNGCN1GCN2GCN3

Figure 4.22: The mean momentumerror in reconstructions made with a FCNand the three types of GNN:s with a vary-ing depth in the subsequent FCN:s shownin figures 4.20 and 4.21. Each network wastrained on data with m = 2.

Varying the depth of the FCN:s, these fournetwork designs were trained on data ofmaximum multiplicity 2. The mean mo-mentum error in the event reconstructionfor each network is shown in figure 4.22.

One should note that a GNN of zeroFCN depth still has these GraphDense con-nections and therefore have more trainableparameters than the corresponding uniformFCN. With this in mind, there is not a sig-

nificant difference in MME between the fournetworks at higher depths. Nevertheless theGNN:s with a number of subsequent denseconnections seems to perform slightly bet-ter than the FCN at any depth.

4.8 BINARYCLASSIFICATION

The ANNs examined so far are trained tomodel a continuous spectrum, in this casethe γ-ray momenta {pi}. Nevertheless, thismethod has the limitation of not being ableto explicitly imply whether a photon is re-ally there or not in the reconstruction.

To introduce a binary classificationneuron for each event is nothing new.Karlsson et al. [4] tried adding some dif-ferent contributions to L by training withan additional neuron {µi}, one for every γ-ray. Their approach was to modify L in eq.(4.1) in a variety of different ways to includethese classification neurons. The loss func-tion would in other words depend on fourvariables per hit as L{Ei, θi, φi, µi}. Duringtraining the correct µi ∈ {0, 1} was set toequal zero for non-existent γ-rays and viceversa. They concluded that these networkswas in fact able to classify missing photonsto some extent, but performed even worsein reconstructing the existing in comparisonto the pure regression networks.

Having the completely refurbishedpermutation loss function (4.3) this opensup a myriad of new possible ways to tacklethis problem. One idea is to add an ex-tra term to L, proportional to the binarycross-entropy H(βi) as introduced in sec-tion 3.1.2, where βi represents any floating-point number between 0 and 1. It is the net-work’s best guess whether the γ-ray existsor not. This classification neuron is anal-ogous to µi but another name is used toavoid any confusion with that method.

28

Page 39: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

Figure 4.23: Reconstruction of energy E, polar angle θ and azimuthal angle φ forγ-rays, using the binary classification model. The FCN is of uniform width N = 80and depth D = 4 and was trained on data with maximum multiplicity m = 2 on atotal of 1.5× 106 events, including an additional 5× 105 of fully empty events. Theline artefacts in the two angle plots is manifestations of the 162 detector positions.

The loss function L{pi, βi} is given by

L =m∑i=1

[∥∥∥pi − pi∥∥∥2

+ λH(βi)]

(4.5)

The coefficient λ is used to decide how muchthis binary classification may affect L andis here simply set to one. Further inves-tigation will be needed to obtain a moresuitable value.

A problem arises when adding theseextra classification neurons relating to thenetwork design. As before, the reconstruc-tion of {pi} must still be treated as a re-gression problem and the last activationproducing these values needs to be linear.Since {βi} is a probability in the interval(0, 1) its activation function needs to con-strain the output accordingly. The sigmoidfunction does just this. Therefore, the out-put activations are separated as illustratedin Fig. 4.24. Before calculating L, everyclassification neuron is paired with a re-constructed pi. The minimum permutationloss is then acquired as described in section4.3.3 but now with 4 variables instead of 3

Input{Ei}

FCN

{pi} Dense: linear {βi} Dense: sigmoid

Concat.

Output {pi, βi}

Figure 4.24: The binary classifica-tion model with the two types of activationsshown. These are concatenated together be-fore the output.

for each event being (px, py, pz; β).This method of using the loss function

(4.5) was not examined thoroughly enoughto be able conclude whether the approachof implementing an extra term for the bi-nary classification has any potential or not.A simple test was made using an FCN withN = 80 andD = 4, which has shown to per-

29

Page 40: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

4. Method development

form rather sufficiently with the ordinarymomentum loss (4.3). The resulting recon-struction is illustrated in Fig. 4.23, whereλ = 1 is used.

The left plot shows the reconstructedenergies against the correct ones, and sug-gests that while a large amount of emptyevents are correctly registered, i.e. in Qo,the overall reconstruction is substandard.

Compared to the networks examinedin section 4.5.3 with a mean momentum er-ror usually lower than 1 MeV, this classifica-tion network has a MME of about 1.7 MeV.Considering that many of these events are

registered in Qo, meaning they do not con-tribute to this metric, this network is pro-foundly worse than most of those exploredin this thesis. Furthermore, a fair amount ofevents are registered in Qm and thus falselyomitted from the event reconstruction.

Perhaps another configuration of theproportionality coefficient λ would result inmore acceptable reconstructions. Thus fur-ther investigations is required. The use ofterms other than the binary cross entropyin (4.5) could improve this method as well.For further discussion, see section 6.4.

30

Page 41: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Chapter 5

Results

To assess how well the ANNs performed in reconstructing detector data, the net-works were compared to the first-neighbour addback routine described in sec. 2.4.This was done by having them reconstruct a more realistic evaluation set thatinvolves a Doppler shifted signal of some fixed energy and background radiation.This method of analysis and comparison was inherited from [3].

5.1 METHODS OF COMPARISONThe previous chapters outline a comprehen-sive investigation with the goal of determin-ing the optimal design for an ANN for mul-tiplicities equal to or lower than two. Threesuch networks selected to be compared withaddback is described below.

The first model is a FCN with 4 hid-den layers, each with 80 neurons. This con-figuration of depth and width is motivatedby the work in section 4.5, specifically bythe results presented in Fig. 4.7. This net-work yielded a low MME considering it onlyhas 32 966 trainable parameters, the lowestamount among these three networks.

The second model is a CNN, based onthe CFCF model detailed in section 4.6.4.This network contains a total of 125 574trainable parameters.

The last model is the GNN with threebranches and a total of three GraphDenseconnections illustrated in Fig. 4.21. Its sub-sequent FCN includes a single hidden layerof width 80, resulting in a network with atotal of 118 178 parameters.

Each network was trained on a data

set containing a uniform distribution of en-ergies in the range 0.01 − 10 MeV with amaximum multiplicity of m = 2, includingempty events, as shown in Fig. 4.1. Af-ter this, they were evaluated on a secondmore realistic data set. In this set, eachevent has a 50 % chance of containing a γ-ray from background radiation in the range0.01 − 1 MeV, and a 10 % chance of con-taining a Doppler shifted signal γ-ray witha fixed energy E in the beam frame of ref-erence. The speed of the latter is uniformlydistributed between β = 0.6 − 0.7 c. Sincethe probabilities for each type of γ-ray tooccur are independent, the validation setalso has a maximum multiplicity of m = 2.Due to the relativistic effects discussed insection 2.3, the reconstructed energies aretransformed to beam frame of reference us-ing eq. (2.1). The outputs of the networkare then compared with addback, as shownin the following section 5.2.

To each output, whether from ad-dback or a neural network, the superposi-tion of a Gaussian and an exponential curveis fitted. The Gaussian curve, labeled fs,represents the signal, whereas the exponen-

31

Page 42: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

5. Results

tial curve, fb, represents the background ra-diation. Serving as a metric of how well thesignal can be distinguished from the back-ground, the signal-to-background ratio R iscalculated as

R ≡

∫fs(E) dE∫fb(E) dE

, (5.1)

where the integrals are evaluated on theinterval µ ± σ, with µ and σ being themean value and standard deviation of theGaussian fit, respectively. The R-values ob-tained for each network can be seen in Fig.5.4.

Finally, the absolute difference be-tween the reconstructed and the correct en-ergy is calculated for each network. Here,the reconstructed energy E is is the meanvalue µ of the signal fit fs. This is donefor correct signal energies E in the range1.5−8.0 MeV. This is presented in Fig. 5.5.

5.2 COMPARISON WITHADDBACK

To verify the effects of a momentum-basedCartesian loss function, we consider theproblem of reconstructing Doppler shiftedγ-rays with energy 3.5 MeV, as investigatedpreviously by Karlsson et al. [4]. The pre-ceding study utilized a modulus-based lossfunction which, despite yielding accuratereconstructions for high energies, gave riseto artefacts in the reconstruction of lowerenergies. The equivalent task using momen-tum loss can be seen in figure 5.1, wherethe blue line represents our reconstructionusing a FCN, and the red line representsreconstructions made by addback. Thisimage clearly portraits the eradication ofthe previously mentioned artefacts, whichtook the shape of an additional peak in theregion of lower energies.

Figure 5.1: Beam frame energy spec-trum reconstructed by addback shown inred and the FCN-network in blue. TheDoppler shifted γ-ray signal has an energyof 3.5 MeV.

Eliminating these artefacts allows forthe reconstruction of low energy events.The FCN-reconstruction of such an event,at 1.5 MeV, is represented in Fig. 5.2. Here,the fitted signal- and background curvescan be seen as the dashed and dotted lines,respectively. Moreover, the reconstructedenergies for each event are represented intable 5.1, which further demonstrates thenetworks’ ability to reconstruct low-energyγ-rays. Figure 5.3 represents a zoomed inversion of the 3.5 MeV event, shown in fig-ure 5.1, as to give a visual comparison to theequivalent image of the preceding study.

Table 5.1: Reconstructed signal energiesof addback and the three networks. The en-ergies are in [MeV].

Correct Addback FCN CNN GNN1.5 1.48 1.50 1.54 1.532.0 1.94 2.00 2.02 2.012.5 2.49 2.57 2.59 2.583.5 3.46 3.57 3.59 3.565.0 4.90 5.09 5.22 5.136.0 5.88 5.74 5.82 5.797.5 7.33 7.34 7.47 6.928.0 7.82 7.79 7.86 7.38

32

Page 43: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

5. Results

Figure 5.2: The energy spectrum in the beam frame of reference reconstructedby addback shown in red and the FCN in blue. A Gaussian curve fs, as well as aexponential fb, is fitted for the the Doppler shifted γ-ray signal of energy of 1.5 MeVand the background radiation respectively.

Figure 5.3: The energy spectrum in the beam frame of reference reconstructedby addback shown in red and the FCN in blue. The Doppler shifted γ-ray signal hasenergy of 3.5 MeV.

33

Page 44: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

5. Results

The signal to background ratio, asseen in Fig. 5.4, implies similar reconstruc-tion performance between addback and thetrained neural networks, entailing the net-works have little difficulty reconstructingany of the investigated energies. The bestR-value for the ANNs is reached between 3–6 MeV, surpassing that of addback, beforedecreasing at higher energies.

The mean and standard deviation ofthe reconstruction errors for the ANNsand addback are illustrated in Fig. 5.5.Each error is calculated from evaluations of1.5 to 8.0 MeV, using the data representedin table 5.1. The FCN and CNN evalua-tions resulted in a mean and standard devi-ation which show similar, yet slightly worse,performance than addback. The GNN per-formed significantly worse in this metric.

The best performance among theANNs is shown by the CNN network, whichresulted in mean error of 1.01·10−1, as com-pared to the mean error of addback being8.75 · 10−2, both with a standard deviationof ~ 6.2 · 10−2. However, in relative dif-ference of the energies, it is clear that theFCN produced the most accurate predic-tions across the energy range. Though re-sulting in higher mean errors than previousstudies, each network performed exception-ally well in terms of consistency over a largerange of energies.

Figure 5.4: Signal to background ratio,R, for evaluation of energies ranging from1.0–8.0 MeV in beam frame. The markerstyle of each method is illustrated in the leg-end.

Addback FCN CNN GNN0.0

0.1

0.2

0.3

0.4

0.5

MME [M

eV]

Figure 5.5: Illustration of the abso-lute difference between the reconstructed en-ergies E and the correct energies E listed intable 5.1. The hight of each box representsthe mean error, and the lines represent thestandard deviations.

34

Page 45: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Chapter 6

Discussion

With the aim of this thesis being to examine methods using ANNs for the task ofevent reconstruction of γ-rays, progress in further developing such methods havebeen made on the foundation of previous work by [3, 4]. This chapter discussesthese improvements, other less significant results, and suggestions for further in-vestigations.

6.1 CONCLUSIONIn contrast to the results of [4], the recon-structions of the energy spectrum presentedin section 5.2 does not introduce any arte-facts such as peaks in the background radi-ation region. In fact, these are sometimeseven hard to distinguish from those of ad-dback as is the case in figure 5.1. However,the closer the energy of the signal γ-raysare to that of the upper or lower limits ofthe distribution, the less accurate the re-construction is. This is not surprising oreven necessarily bad. If one wishes to per-form measurements at other energy ranges,the simple solution would be to train on en-ergies accordingly.

As shown in Fig. 5.5, the FCN seemsto produce an overall more accurate re-construction than the other two networks.Considering this has about 27 % fewertrainable parameters, the FCN using thepermutation loss function is the most ap-pealing method. The CNN and GNN donot improve upon this simpler design, eventhough they were more planned out andmaybe even overengineered.

From optimizing the hyperparametersof the networks trained in this thesis, a re-curring pattern has been that the deepernetworks performed worse than those withfewer hidden layers. With our approachof testing, it may mean that the final de-signs end up being approximations of ad-dback, simply weighing in the energies ofdifferent crystals together. This could bebad, since the main incentive to using neu-ral networks is the chance of it findingmore complex relationships within the data,not otherwise easily modeled by the meansof conventional algorithms. To do this,while still facilitating the option for learn-ing simpler patters, one could implementa model based on the ResNet or ResNextarchitechtures [23, 24]. Such models fea-ture a data flow which can bypass sev-eral layers, which would in theory enabledeeper network structures while simultane-ously having shorter forward-propagationconnections for learning simpler relation-ships. For the same reasons, alternativemethods utilizing parallel layers with, orcombined through, max pooling could alsosee success.

35

Page 46: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

6. Discussion

Perhaps the most decisive feat of thisthesis is the momentum loss function (4.3)being the squared error in the γ-ray mo-mentum vector. Not only is it based onthe standard choice of loss for this type ofregression problem, as mentioned i section3.1.2, it also eliminates the need of relativeweighting factor such as those in eq. (4.1)and (4.2).

Using batch normalization does notseem to do much for the networks, if notmaking them perform worse. This is notso problematic in this case, consideringthat the networks investigated are not sodeep, making the vanishing gradient prob-lem, as discussed in sec. 3.4, barely notice-able. However, it may still see use in thecontext of deeper networks or when using aclassification node.

6.2 CONVOLUTIONALNETWORKS

As briefly mentioned in sec. 4.6.3, some re-sults indicate that the networks may haveunderperformed due to their training. Eventhough the networks were trained on ex-tensive data sets, using early stopping toensure a balance of validation and trainingdata loss, the use of a fixed learning ratecan lead to some parameters “overshooting”their local optima. One way to address thisis to, once halted by early stopping, reini-tialize the training again with a lower learn-ing rate. This would likely also increase theperformance of the FCN and GNN.

Furthermore, the optimization of thehyperparameters of the CNN (see sec.4.6.3) were mostly done using simulateddata only containing one or two γ-rays, i.e.no empty events. While the process itselfmost certainly led to a more refined model,it is possible some choices of hyperparame-ters would be different if the networks weretrained on data more similar to that of thefinal evaluation.

6.3 GRAPH NETWORKSWhile the GNN did not improve upon thereconstruction of the FCN, there might bepotential with this method when scaling upthe number of inputs. It would be inter-esting to see if encoding the graph into thenetwork a priori could reduce the number oftrainable parameters needed when dealingwith much larger detectors, where also moreprevalent Compton scattering lead to morecrystal detectors registering a single initialγ-ray. Perhaps the Crystal Ball has a toolow solid angle resolution for GNNs to be anappropriate method for this task. Furtherinvestigations of other aggregation rules asdiscussed in 3.3 would then be more inter-esting to make. In [11] a variety of alterna-tives to the spectral rule (3.4) is mentioned,where some even make use of convolutions.

6.4 CLASSIFICATIONNETWORKS

The use of additional classification neuronshas been speculated in [3, 4] to give thenetwork the ability to better determine theexistence of an event. While more eventsare correctly registered as empty events, theaccuracy in the reconstruction of existingγ-rays have so far only been shown to de-crease. This result was documented in [4]and yet again observed in Fig. 4.23.

Further investigation should be fo-cused on the design of the loss function,since this seems to have the largest impactin the performance of a neural network. Afew suggestions is firstly to optimize theproportionality coefficient λ in eq. (4.5).Using another logistic loss function insteadof binary cross entropy could perhaps be agood idea as well. An example is the Hingeloss defined as

L = max(0, 1− ββ

),

where β ∈ {−1, 1} is the correct binary la-

36

Page 47: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

6. Discussion

bel and β the predicted value in an appro-priate domain. For instance when ββ < 1,the classification is either wrong (ββ < 0)or correct but with a small confidence (0 ≤ββ < 1), meaning that Hinge loss also pe-nalizes inconfident predictions.

Furthermore, one could get rid of theclassification term in the loss function andtry to approach the problem in another way.On suggestion is the slightly modified

L =m∑i

[βi∥∥∥pi − pi

∥∥∥2+ (1− βi)

∥∥∥pi∥∥∥2],

with βi being sigmoid activated. In thisway, no ambigous factor λ is needed.

6.5 METHODS OF COMPARISONAn important aspect of all scientific stud-ies is to find accurate methods for evalu-ating the final result. The primary meth-ods used in this study where the signalto background ratio, R, and the absolutedifference between the correct and the re-constructed energies. Both methods whereadapted from previous theses, with littleto no change, in order to give a fair com-parison to previous studies. However, aftersome investigation, we found that the eval-uation of the R-value is greatly affected bythe choice of the parameters used to cal-culate a start approximation to the curves.Simply changing the initial angle of the ex-ponential fit by a factor 0.5 changed thefinal R-value by a factor 104, implying theapproximation is highly unstable.

37

Page 48: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

6. Discussion

38

Page 49: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Bibliography

[1] R. S. Simon, “The darmstadt-heidelberg crystal ball,” Journal de Physique Colloques,vol. 41, 1980.

[2] S. Lindberg, “Optimised use of detector systems for relativistic radioactive beams,”Master’s thesis, Inst. för fysik, CTH, Göteborg, 2013.

[3] J. Olander, M. Skarin, P. Svensson, and J. Wadman, “Rekonstruktion fråndetektordata med hjälp av neurala nätverk: En studie i tvärsnittet mellan kärnfysikoch maskininlärning,” Tech. Rep., 2018.

[4] R. Karlsson, J. Jönsson, M. Lidén, and R. Martin, “Event reconstruction of γ-raysusing neural networks: Going deeper with machine learning into the analysis ofdetector data,” Tech. Rep., 2019.

[5] A. Knyazeva, J. Parka, P. Golubeva, and J. Cederkäll, “Properties of the csi(tl)detector elements of the califa detector,” Nuclear Instruments Methods in PhysicsResearch, vol. 940, pp. 393–404, October 2019.

[6] W. R. Leo, Techniques for nuclear and particle physics experiments: a how toapproach. Springer, 1987.

[7] W. Rindler, Relativity: Special, General and Cosmological, 2nd ed. Oxford Press,2006.

[8] V. Panin, “Fully exclusive measurements of quasi-free single-nucleon knockoutreactions in inverse kinematics,” Ph.D. dissertation, Technische Universität,Darmstadt, 2012. [Online]. Available:https://tuprints.ulb.tu-darmstadt.de/id/eprint/3170

[9] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.[Online]. Available: http://www.deeplearningbook.org

[10] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” ICLRconference paper, 2015.

[11] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graphneural network model,” IEEE TNN, vol. 20, 2009.

[12] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutionalnetworks,” Presented at the ICLR, Tech. Rep., 2017.

[13] T. S. Jepsen. (2019) How to do deep learning on graphs with graph convolutional

39

Page 50: Detector reconstruction of γ-rays...2003/04/20  · Detector reconstruction of γ-rays An application of artificial neural networks in experimental subatomic physics Bachelor Thesis

Bibliography

networks. [Online]. Available: https://towardsdatascience.com/how-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-62acf5b143d0

[14] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, Efficient BackProp. Berlin,Heidelberg: Springer Berlin Heidelberg, 2012, pp. 9–48. [Online]. Available:https://doi.org/10.1007/978-3-642-35289-8_3

[15] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training byreducing internal covariate shift,” Presented at the International Conference onMachine Learning, Tech. Rep., 2015.

[16] ONEIROS. (2020) Keras: The python deep learning library. [Online]. Available:https://keras.io/

[17] G. B. Team. (2020) Tensorflow: Essential documentation. [Online]. Available:https://www.tensorflow.org/guide/

[18] HPC2N. (2016) Kebnekaise. [Online]. Available:https://www.hpc2n.umu.se/resources/hardware/kebnekaise

[19] Geant4 Collaboration CERN. (2020) Geant4. [Online]. Available:http://www.geant4.org/geant4/

[20] H. T. Johansson, “Ggland – command line simulations,” GSI, vol. 2017-1, 2013.

[21] ROOT Data Analysis Framework – Users Guide, May 2018. [Online]. Available:https://root.cern.ch/root/htmldoc/guides/users-guide/ROOTUsersGuideA4.pdf

[22] T. S. Community. (2019) Clough-tocher interpolation. [Online]. Available:https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.CloughTocher2DInterpolator.html#scipy.interpolate.CloughTocher2DInterpolator

[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[24] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformationsfor deep neural networks,” IEEE Conference on Computer Vision and PatternRecognition, 2017.

40