Efficient algorithms of multidimensional γ-ray spectra compression V. Matoušek and M. Morháč...

Post on 12-Jan-2016

216 views 0 download

Transcript of Efficient algorithms of multidimensional γ-ray spectra compression V. Matoušek and M. Morháč...

Efficient algorithms of multidimensional

γ-ray spectra compression

V. Matoušek and M. Morháč

Institute of Physics, Slovak Academy of Sciences, Bratislava, Slovakia

Vladislav.Matousek@savba.sk Miroslav.Morhac@savba.sk

ACAT 2005, Zeuthen May 22 - 27, 2005

The measurements of data in nuclear physics experiments are oriented towards gathering a large amount of multidimensional data.

The data are collected in the form of events.

In a typical experiment with spectrometers (Gammasphere, Euroball), each coincidence event consists of a set of n integers (e1, e2, …, en), which are proportional to the energies of the coincident -rays.

Such a coincidence specifies a point in an n-dimensional hypercube.

Storing of multidimensional data goes very frequently beyond the available storage media volumes.

Multiparameter nuclear data taken from experiments are typically stored:

• directly by events and index the coincidences - list mode storage;

• analyzed and stored as multidimensional histograms (hypercubes) -

nuclear spectra.

List of events storing mode has several disadvantages:

• enormous amount of information that has to be written onto storage

media (primarily tapes),

• long time needed to process the data.

Multidimensional histograms - nuclear spectra:

Advantages:

• Possibility of interactive handling with data.

• It allows easily to create slices of lower dimensionality.

Disadvantages:

• The multidimesional amplitude analysis must be done.

• Storage requirements for multidimensional hypercubes are

enormous • e.g. 3-D γ-γ-γ coincidence nuclear spectrum with resolution

of 14 bits (16 384 channels) per axis and 2 Bytes per channel requires 8 TB of memory.

• Data often need to be stored in RAM for interactive handling.

It is needed to compress the multidimensional nuclear spectra to the size of available memory.

Suitable data compression techniques must satisfy these requirements:

• Less storage space after compression of the multidimensional nuclear spectra

• Preservation as much information as possible - minimum data distortion

• Fast enough to be suitable for on-line compression during the experiment

Constrains:• The size of the original multidimensional spectrum goes beyond the

capacity of available memory.• Data from nuclear experiments are received as a train of events -

they need to be analyzed and compressed separately event by event

Thus, the multidimensional amplitude analysis must be performed together with compression, event by event in on-line acquisition mode.

Suitable methods widely used:

• Binning - neighboring channels are summed together - loss of

information.

• Employing natural properties of data - e.g. symmetry removal from

the multidimensional -ray spectra from Gammasphere - no loss of

information.

• Use of fast orthogonal transformation algorithms.

• Storing the descriptors of events with counts of occurrences.

For instance in multidimensional γ-ray spectra from Gammasphere one can utilize the property of symmetry of the data. It holds:

• for 2-dimensional spectra: E(1, 2) = E(2, 1)

• for 3-dimensional spectra

Symmetry removal:

1 2 3 1 3 2 2 1 3

2 3 1 3 1 2 3 2 1

, , , , , ,

, , , , , ,

E E E

E E E

Principle of storage of 2-dimensional symmetrical data:

The size of reduced space can be simply expressed:

Two-dimensional symmetrical spectra with resolution R = 4.

2

21

.2

R

i

R RN i

By composition of triangles of the sizes R, R-1, ..., 2, 1 we get the geometrical shape called tetrahedron.

An example of storage 3-dimensional symmetrical data in the form of tetrahedron.The size (volume) of the reduced space of the tetrahedron is

2 3 2

31

3 2.

2 6

R

i

i i R R RN

In case of 4-dimensional data by compositions of tetrahedrons we obtain hyperhedron of the 4-th order for R = 4.

The volume of the hyperhedron of 4-th order can be expressed as

3 2 4 3 2

41

3 2 6 11 6.

6 24

R

i

i i i R R R RN

Dimensionality of spectra Compression ratio – CR Storage requirements [MB]

2 2 256

3 6 ≈ 1.25 ∙ 106

n n!

The achievable compression ratios and storage requirements for typical spectra (14 bit ADCs and 2 Bytes per channel):

Radware package - the author combines both utilizing the property of symmetry and binning. Three-fold coincidences are stored in the form of cubes with the sizes 8 x 8 x 8. Inside of each cube the data are binned so that they span entirely the resolution 8192 channels in each dimension.

2 16384

!

n

n

Compression methods using orthogonal transformations:

The multidimensional array, hypercube, is transformed into a new data array in transform domain, where the maximum amount of information is concentrated into smaller number of elements.

The basic premise is that the transformation of a signal has an energy distribution more amenable to retaining the shape of the data than the spatial domain representation.

Because of the inherent element-to-element correlation, the energy of the signal in the transform domain tends to be clustered in a relatively small number of transform coefficients.

The advantages of using fast orthogonal transforms:

• Existing fast algorithms allowing their on-line implementation.

• Linearity of the transforms. The signal that is being compressed

need not to be stored statically in the memory. Each event can be

transformed in time separately. The predetermined transformed

coefficients are summed (analysis with on-line compression).

Fixed kernel orthogonal transforms usually employed in data compression:

• Discrete Cosine, Walsh-Hadamard, Fourier, Hartley and other

transforms.

• Haar transform - the first and simplest scaling function of the

mother wavelet suitable for generating an orthonormal wavelet

basis.

The use of classical orthogonal transforms is very efficient provided that the form of compressed data resembles the form of the transform base functions.

The efficiency of the compression strongly depends on the nature of the experimental data.

Fourier transform and DCT are well suited to compress cosine and sine data shapes, whereas the Walsh-Hadamard transform is suitable to compress rectangular shapes in the input data.

There arose an idea to modify the shape of the base functions of the orthogonal transform so that the maximum possible compression of the multidimensional spectra can be achieved:

• We have proposed the fast orthogonal transform with transform

kernel adaptable to the reference vectors representing the

processed data.

• The structure of the signal flow graph is the Cooley-Tukey's type.

The principle of the method consists in direct modification of the multiplicative coefficients a, b, c, d of the signal flow graph in such a way that the base functions approximate the shape of the reference vector.

Let us illustrate the method for the case of size of the transform N = 4.

Signal flow graph of the fast adaptive orthogonal transform.

The coefficients of basic element of the signal flow graph are calculated as

, , , ,

where x0, x1 are values of the reference vector.

The values y0, y1 at the output are:

, .

Basic element of the signal flow graph.

0

2 20 1

xa

x x

1

2 20 1

xb

x x

c b d a

2 20 0 1y x x 1 0y

The transform coefficients are calculated in such a way that for the reference vector at the input they transform it into the one point at the output.

We have proposed the fast algorithm of on-line multidimensional amplitude analysis with compression using adaptive Walsh transform:

• removes the necessity to store whole spectrum before compression,

compression is performed event by event,

• it is optimized so that only a minimum number of operations are

needed.

The above mentioned principle of adaptability can be applied also for other transform structures.

The compression is achieved by discarding pre-selected elements in the transformed multidimensional array.

Two basic methods for element selection:

• zonal sampling

• threshold sampling.

Block data compression using orthogonal transforms with symmetry removal:

In case of 3-dimensional space, it is divided into cubes. Each cube of the size S S S will be compressed to the cube of the size C C C.

We assume:

• The sizes of cubes are equal in all dimensions.

• The number of cubes in each dimension and their sizes S, C are the

power of 2.

The number of cubes in the tetrahedron is

Where

R is the number of channels (e.g. resolution of ADC), S is the size of cube before the compression.

For each cube we have to define adaptive transform and consequently we need to store its coefficients.

The number of transform coefficients for one dimension is

3 2

3

3 2,

6

M M MN

,RMS

2( )CT S C

The elements are stored in the float format (4 Bytes). The transform coefficients must be stored for each dimension, thus to store 3-dimen-sional compressed data we need

Bytes of storage media.

Then in general for D-dimensional data the size of needed memory is

We have to adhere to the following rules:

• the size of the cube of original data S should be as small as

possible.

• the size of the cube after compression, C , should be the biggest

possible (C ≤ S), i.e., we desire smallest possible compression.

• the data volume for the chosen combination C, S must fit the size of

memory available.

33 [ 3 2 ( )] 4N C S C

3[ 2 ( )] 4 Bytes.DN C D S C

The following sizes of cubes were chosen for block transform compression of multidimensional γ-ray spectra of 16 384 channels per axis and 4 Bytes for each channel.

Dim. of spectraS

[Channels]C

[Channels]Storage

[MB]Compression ratio CR

3 256 8 366 8010

4 1024 8 189.5 63.3 106

5 2048 8 168.4 2.33 1011

We have compressed histograms for 3-, 4-, 5-fold γ-ray coincidences of the event data from Gammasphere.

Examples achieved by employing compression on 3-fold γ-ray spectra with symmetry removal:

Slice from original data (thin line) and decompressed slice (thick line) from data compressed by employing binning operation (Radware)

Slice from original data (thin line) and decompressed slice (thick line) from data compressed by employing adaptive Walsh transform.

Two-dimensional slice from original data.

Two-dimensional slice from data compressed by employing binning operation (Radware).

Two-dimensional decompressed slice from data compressed via adaptive Walsh transform.

Three-dimensional original spectrum (sizes of spheres are proportional to counts the channels contain).

Three-dimensional spectrum decompressed from data compressed via adaptive Walsh transform. Due to the smoothing

effect of the adaptive transform some information is lost.

Similar experiments were done with 4-fold coincidence γ-ray spectra.

One-dimensional slice from original 4-dimensional spectrum (thin line) and the same slice decompressed from data compressed via adaptive Walsh transform (thick line). Due to enormous compression ratio the distortion of data in some regions is considerable. On the other hand in some regions the fidelity of the method is satisfactory.

Two-dimensional slice from original 4-dimensional spectrum.

Two-dimensional slice decompressed from 4-dimensional data compressed via adaptive Walsh transform

Compression of multidimensional γ-ray coincidence spectra using list of descriptors.

The input data describing an external event can be expressed using descriptor. Each descriptor describes fully the event.

This method is based on maintaining the list of descriptors.

The number of different descriptors, which actually occurred during an experiment is much smaller than the number off all possible descriptors.

So, the multidimensional space has empty regions.

Conventional analyzer - The descriptor defines the location in the memory at which the counts (number of occurrences of the descriptor) is stored. The range of descriptors is defined by the size of the memory.

An alternative technique - Store only those descriptors that actually occurred in the experiment.

The correspondence between the location and the description is lost, it is necessary to store the descriptor as well as associated counts.

When a new event comes, it must be sorted into its channel in a list by using its descriptor:

• The problem is to devise a procedure for assigning the descriptor

location number so that the time needed to store or read out a

descriptor is minimized.

There exist several retrieval algorithms:

Sequential method: An obvious routine for searching the list on the memory is to compare the descriptor of a new event with the descriptor in each location starting at the first one. When a match is found, the associated count is increased by one. Such an algorithm is time consuming and cannot be accepted for on-line applications.

Sequential retrieval of events

Tree method: A considerable reduction of access time can be achieved by using a tree search algorithm. The descriptor of a new event is compared repeatedly with descriptors arranged in a tree. The main disadvantage of this technique is its complexity and amount of redundant information given by address pointers.

Tree search algorithm of event retrieval.

Partitioning and indexing method - It is the combination of the two previous methods and is implemented e.g. in the database Blue for high-fold -ray coincidence data*.

The hypercube is partitioned in high and low density regions. Each node of the tree represents a subvolume of the n-dimensional hypercube. The left and right child nodes represent the bisected volume of the parent. Associated with each leaf-node is a sublist of descriptors falling into appropriate geometric volume. They are arranged according sequential retrieval algorithm.

[*] Cromaz M. et al.: Blue: a database for high-fold -ray coincidence data, NIM A 462 (2001) 519.

Pseudo-random transformation of addresses of locations of descriptors. Requirements:

• Uniform (or quasi-uniform) distribution of descriptors over memory

addresses for any shape of multidimensional spectra.

• Clusters of descriptors in physical field, hypercube, must be spread

over the whole range of possible addresses and adjacent

descriptors must go to addresses far away from each other.

• Transformation must be fast, so that it can be applied on-line for

high-rate experiments.

Unlimited number of methods of generation of pseudorandom numbers:

• residues of modulo operations, Hamming’s code technique,

transformation through the division of polynomials, etc.

One of the methods satisfying the above stated criteria and give pseudorandom distribution is based on the assignment of inverse number (in the sense of modulo arithmetic) to each address in original space.

where M is prime.

This operation can be carried out through the use of look-up table of pre-computed inverse numbers.

Through the transformation each descriptor uniquely derives its storage address.

There is possibility of more descriptors being transformed to the same address. To overcome this serious limitation, the transformation is used only to generate an address at which to start searching in the bucket of descriptors.

1 mod ,b a M

A list of successive locations, where d is the depth of searching are checked:

• If in a location the descriptor coincides with read out descriptor, the

counts in this location are incremented.

• If no descriptor coincides with read out descriptor and in a location

within search depth is empty location, the descriptor is written to

this location and its count is set to 1.

• If there is no empty location within search depth and no descriptor

coincides with read out descriptor, additional processing is done.

During the experiment, the events with higher counts (statistics) occur earlier and therefore there is a higher probability that free positions will be occupied by statistically relevant events.

One can utilize additional information and to store the events with the highest weights, i.e., the highest probability of occurrence.

Provided that all locations for the depth d are occupied and the descriptor did not occur in this region, we scan the region once more and find the event with the smallest probability of occurrence

Then we compare the probability of the processed event pk with pj. If pk > pj we replace the descriptor in the position j with descriptor of the processed event and we set the counts of the event to 1. Otherwise, the processed event is ignored.

How to determine the probabilities of the occurrences of events?

Several approaches are possible in practice.

One of them is to utilize marginal (projection) spectra for each dimension. Then for n-dimensional event with the event values

min .j ii dp p

1 2, ,..., na a a

this probability can be defined

where si is marginal spectrum for dimension i.

However many other definitions and approaches are possible.

Example of 3-fold coincidence -ray spectra storing:

The descriptor of each event contains the addresses x, y, z and counts (short integers), i.e., each event takes 8 Bytes.

We utilize again the property of symmetry of the multidimensional γ-ray spectra. Then chosen prime module has to satisfy the condition

1

n

e i ii

P s a

3 23 28 .

6

M M MMemory

For the 384 MB memory we have chosen the prime module M = 601.

Assignment between numbers from ring <1,600> and their modulo inverse numbers.

Spectrum of distances between two adjacent modulo inverse numbers.

One can observe great scattering in these distances. This allows quasi-uniform distribution of descriptors in the transformed area.

We utilize the property of symmetry in γ-ray coincidence spectra.

The algorithm of calculation of the address of an event in the transformed area:

1.arrange the coordinates so that x ≤ y ≤ z

2.calculate , ,

3.calculate , ,

4.calculate address in the transformed area

This defines the beginning position of the searching for a given descriptor.

3 2 23 2

6 2i i i i i

i

x x x y yadr z

(mod601)mx x (mod601)my y (mod601)mz z

1(mod601)i mx x 1(mod601)i my y 1(mod601)i mz z

The whole linear array of descriptors (36 361 808 items) have been mapped to the 16384 channels spectrum. One can observe quasi-constant distribution, which witnesses about quasi-uniform distribution of descriptors over all memory addresses in the transform domain.

Distribution of descriptor counts in the transformed domain.

Prime module M, memory requirements and achieved compression ratio for 3-, 4- and 5-fold -ray spectra (16 384 channels in each dimension):

The searching depth for all cases is 1000 events.

Dim. of spectraPrime module

MStorage

[MB]Compression ratio

CR

3 601 290.9 30 239

4 157 262.9 33 452

5 73 237.0 37 100

Three-fold coincidence spectra.

High counts region of 1-dimesional slice from original data (thick line) and corresponding region from compressed data (thin line).

Low counts region of slice from original data (thick line) and corresponding from compressed data (thin line).

Influence of searching depth on quality of decompressed spectra

Increasing the length of searching in the buffer of compressed events improves the preservation of the peak heights.

In all spectra we subtracted background.

Narrow (one peak wide) 1-dimensional slice from non-compressed original data (thick line) and compressed 3-dimensional array (thin

line), for the searching depth=1000 events.

Two-dimensional slice from original 3-dimensional data.

Reconstructed 2-dimensional slice from compressed3-dimensional data.

Three-dimensional slices from both original and compressed events.

Original 3-dimensional data.

Decompressed 3-dimensional data.

Four-fold coincidence events.

Part of 1-dimensional slices from non-compressed original (thick line) and compressed 4-dimensional (thin line) arrays.

Two-dimensional slice from original 4-dimensional data.

Two-dimensional slice from compressed 4-dimensional data.

Three-dimensional slice from original 4-dimensional data.

Three-dimensional slice from compressed 4-dimensional data.

Examples of 4-dimensional slices from 4-fold coincidence data in pies display mode.

Original 4-dimensional data. The sizes of balls are proportional to the volumes and the colors in the pies correspond to the content of

channels in the 4-th dimension (64 channels in x, y, z dimensions and 16 channels in v dimension).

Four-dimensional slice from compressed 4-dimensional data.

Decompressed slice. Big peaks correspond in both data, however in small peaks some differences can be observed.

Examples of applying compression methods to 5-fold coincidence data.

A part of one-dimensional slices from original (thick line) and compressed (thin line) 5-fold coincidence events.

Two-dimensional slice from original 5-dimensional data.

Two-dimensional slice from compressed 5-dimensional data.

Conclusion:

In the talk were presented new methods of multidimensional coincident γ-ray spectra compression:

• Using fast adaptive orthogonal transforms.

• Using method of retaining the list of descriptors scattered in the

compressed area using pseudorandom address transformation

method.

The processed data have the property of symmetry in their nature. In both cases the symmetry removal methods are implemented directly in compression algorithms.

A new class of adaptive transforms with the transformation kernel modifiable to the reference vectors that reflect the shape of the compressed data were presented.

Methods of compression used:

Orthogonal transforms - the compression is achieved by removal of redundant and irrelevant data components.

List of descriptors - the compression is achieved on account of quasi-uniform distribution of the data in the transform domain space and thus by its more efficient utilization.

The algorithms are designed to be employed for on-line compression during experiment.

After the experiment the operator can decompress any slices of equal or lower dimensionality from the compressed data.

For nuclear spectra, both methods proved to give better results as the classical ones and allow to achieve higher compression ratios with less distortion.

Some relevant publications:

1. Morháč M., Matoušek V.: Multidimensional nuclear spectra compression using fast adaptive Fourier - based transforms, Computer Physics Communications, 165 (2005) 127.

2. Matoušek V., Morháč M., Kliman J., Turzo I., Krupa L., Jandel M., Efficient storing of multidimensional histograms using advanced compression techniques, NIM A 502 (2003) 725.

3. Morháč M., Matoušek V., A new method of on-line multiparameter analysis with compression, NIM A 370 (1996) 499.

4. Morháč M., Matoušek V., Turzo I., Multiparameter data acquisition and analysis system with on-line compression. IEEE Transaction on Nuclear Science 43 (1996) 140.

5. Morháč M., Kliman J., Matoušek V., Turzo I., Integrated multi-parameter nuclear data analysis package, NIM A 389 (1997) 89.