CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

7
2142 BIOTECHNOL. & BIOTECHNOL. EQ. 24/2010/4 ARTICLE DOI: 10.2478/V10133-010-0082-7 Biotechnol. & Biotechnol. Eq. 2010, 24(4), 2142-2148 Keywords: comet assay, distribution patterns, DNA damage data, % DNA in tail Abbreviations: BLM: bleomycin; KS: Kolmogorov-Smirnov; χ 2 : Chi-Square; GEV: Generalized extreme value distribution; JSB: Johnson SB distribution; PF: Power Function distribution; D4: Dagum (4P) distribution; D: Dagum distribution; F: Frechet (3P) distribution; GP: Generalized Pareto distribution; LL: Log-Logistic (3P) distribution; P: Pert distribution; GM: Gumbel Max distribution; R: Rayleigh (2P) distribution; T: Triangular distribution; GG: Generalized Gamma distribution; K: Kumaraswamy distribution. Introduction The comet assay is a useful and sensitive method for quantification of DNA damage and repair in vivo and in vitro in eukaryotic and some prokaryotic cells, and a tool to study factors affecting mutagenicity. Comet assay already has been used as versatile and comparatively new approach in genotoxic research, ecotoxicology, radiation and micro beam biology, as well as in biomonitoring and experimental data modelling. Initially studies with this methodology were performed in animal (21) and human model systems (reviewed by 26), but in the last decade of the 20 th century the standard comet procedure undergoes changes leading to its use for adequate assessment of the damage of plant DNA. Hence, research interests are directed towards prediction of direct or indirect influence by physical agents, different types of radiation, or genotoxic compounds such as alkylating, crosslinking, and oxidizing agents. (12, 20). The correct selection of statistical models for experimental data analysis is a key factor in comet assay studies. The choice of directly observed experimental unit (8, 14) is considered as an essence for a successful application of statistical test for comet assay experimental work. The assumptions of the distribution of the values of this experimental unit may cause differences in the receipt of response and to influence the further conclusions. For assessment of the effect, at first was used an empirical method of visual scoring (10, 17) or conversion to pseudo-percentage score (5), followed by advanced image analysis systems (17). The latter enforces more detailed statistical analysis. An essential part of the statistical analysis is the application of different distributions of the experimental comet assay data providing a broad background for unbiased estimates as well as improved estimates of the mean and variance of exposures (7). Moreover, by definition all models for statistical analysis depend on the sample size. A smaller number of records (smaller than 50 comets per slide) in the analysis might lead both to a masking of the significant levels of the effect after treatment with different concentrations of agents and to a bias in outcomes regardless of the statistical approach. In some recent studies, the preferred number of comet formation exceeds 100 per slide. In most plant studies, experimental unit is selected from the values of the tail moment, the tail length, the tail intensity, or the percentage DNA in tail. These observed parameters are usually presented by their mean values (16, 18, 28) or median ratio (13), comprising the standard deviation for the studied populations. The diversity of the data in comet populations rarely fit normal distribution, limiting in this manner the usefulness of the parametric statistical tools. Among several approaches intended to solve this problem, the transformation of the data followed by modifying through function (3, 15) emerges as a very popular one. COMPARATIVE ANALYSIS OF DATA DISTRIBUTION PATTERNS IN PLANT COMET ASSAY M. Georgieva 1 , L. Stoilov 1 , E. Rancheva 2 , E. Todorovska 3 , D. Vassilev 3 1 Institute of Plant Physiology and Genetics, Bulgarian Academy of Sciences, Sofia, Bulgaria 2 Agricultural University, Plovdiv, Bulgaria 3 AgroBioInstitute, Sofia, Bulgaria Correspondence to: Dimitar Vassilev E-mail: [email protected] ABSTRACT Screening capabilities of the comet assay have a real potential to study the impact of radiation and different mutagenic sources on induction of damage in DNA in plant nuclei. Heterogeneity of DNA damage data obtained by the application of different agents leads to an inconsistency and variation of the experimental outcomes, obtained after assessment of the comet populations. Although the potential of the comet assay technology has been clearly demonstrated, many important and interesting statistical questions remain. In this respect, different types of data need different statistical designs and this usually makes statistical analysis problematic. We advocate here a greater attention to different classical statistical distributions which best fit to plant comet data. We also demonstrate that all distribution patterns of the % of DNA in tail can be fitted by a Johnson SB distribution.

Transcript of CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

Page 1: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2142 Biotechnol. & Biotechnol. eq. 24/2010/4

Article DOi: 10.2478/V10133-010-0082-7

Biotechnol. & Biotechnol. eq. 2010, 24(4), 2142-2148Keywords: comet assay, distribution patterns, DnA damage data, % DnA in tailAbbreviations: BlM: bleomycin; KS: Kolmogorov-Smirnov; χ2: chi-Square; GeV: Generalized extreme value distribution; JSB: Johnson SB distribution; PF: Power Function distribution; D4: Dagum (4P) distribution; D: Dagum distribution; F: Frechet (3P) distribution; GP: Generalized Pareto distribution; ll: log-logistic (3P) distribution; P: Pert distribution; GM: Gumbel Max distribution; R: Rayleigh (2P) distribution; t: triangular distribution; GG: Generalized Gamma distribution; K: Kumaraswamy distribution.

Introductionthe comet assay is a useful and sensitive method for quantification of DNA damage and repair in vivo and in vitro in eukaryotic and some prokaryotic cells, and a tool to study factors affecting mutagenicity. comet assay already has been used as versatile and comparatively new approach in genotoxic research, ecotoxicology, radiation and micro beam biology, as well as in biomonitoring and experimental data modelling. initially studies with this methodology were performed in animal (21) and human model systems (reviewed by 26), but in the last decade of the 20th century the standard comet procedure undergoes changes leading to its use for adequate assessment of the damage of plant DnA. hence, research interests are directed towards prediction of direct or indirect influence by physical agents, different types of radiation, or genotoxic compounds such as alkylating, crosslinking, and oxidizing agents. (12, 20).

the correct selection of statistical models for experimental data analysis is a key factor in comet assay studies. the choice

of directly observed experimental unit (8, 14) is considered as an essence for a successful application of statistical test for comet assay experimental work. the assumptions of the distribution of the values of this experimental unit may cause differences in the receipt of response and to influence the further conclusions. For assessment of the effect, at first was used an empirical method of visual scoring (10, 17) or conversion to pseudo-percentage score (5), followed by advanced image analysis systems (17). the latter enforces more detailed statistical analysis. An essential part of the statistical analysis is the application of different distributions of the experimental comet assay data providing a broad background for unbiased estimates as well as improved estimates of the mean and variance of exposures (7). Moreover, by definition all models for statistical analysis depend on the sample size. A smaller number of records (smaller than 50 comets per slide) in the analysis might lead both to a masking of the significant levels of the effect after treatment with different concentrations of agents and to a bias in outcomes regardless of the statistical approach. in some recent studies, the preferred number of comet formation exceeds 100 per slide.

in most plant studies, experimental unit is selected from the values of the tail moment, the tail length, the tail intensity, or the percentage DnA in tail. these observed parameters are usually presented by their mean values (16, 18, 28) or median

ratio (13), comprising the standard deviation for the studied populations.

The diversity of the data in comet populations rarely fit normal distribution, limiting in this manner the usefulness of the parametric statistical tools. Among several approaches intended to solve this problem, the transformation of the data followed by modifying through function (3, 15) emerges as a very popular one.

CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In plAnt Comet AssAy

M. Georgieva1, l. Stoilov1, e. Rancheva2, e. todorovska3, D. Vassilev3

1Institute of Plant Physiology and Genetics, Bulgarian Academy of Sciences, Sofia, Bulgaria2Agricultural University, Plovdiv, Bulgaria3AgroBioInstitute, Sofia, Bulgariacorrespondence to: Dimitar Vassileve-mail: [email protected]

AbstrACtScreening capabilities of the comet assay have a real potential to study the impact of radiation and different mutagenic sources on induction of damage in DNA in plant nuclei. Heterogeneity of DNA damage data obtained by the application of different agents leads to an inconsistency and variation of the experimental outcomes, obtained after assessment of the comet populations. Although the potential of the comet assay technology has been clearly demonstrated, many important and interesting statistical questions remain. In this respect, different types of data need different statistical designs and this usually makes statistical analysis problematic. We advocate here a greater attention to different classical statistical distributions which best fit to plant comet data. We also demonstrate that all distribution patterns of the % of DNA in tail can be fitted by a Johnson SB distribution.

Page 2: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2143Biotechnol. & Biotechnol. eq. 24/2010/4

here we compare and evaluate the distribution effects of the treatment with bleomycin (BLM), using an already verified experimental data set from (11). We describe the distribution patterns of real data from plant comet assay experiments compared with the most frequently accepted and fitted distributions which can be used as a sensitive indicator for differences of DnA damage response.

materials and methodsexperimental design and data setA compiled sample of real comet data published by Georgieva and Stoilov (2008) was used. control and treatments data, obtained after treatment with 50, 100, 150 and 200μg/ml BLM have been selected, using the first 50 randomly characterized barley nucleoids, dispersed on three separate slides. the analyzed parameter was the % DnA in tail.

Data frequency distribution and goodness of fit testsFrequency distributions of the observed comet parameter values (% DnA in tail) were examined. Results were analyzed by easy-Fit software (version 5.3, MathWave technologies). Distribution fitting was used as adopt estimators of variation and to specify the most likely and appropriate distributions (23). that was done both for the control and for the other four groups of treatments.

Two statistical approaches were used to define goodness of fit criteria (Kolmogorov-Smirnov (KS) and Chi-Square (χ2) goodness of fit tests). Based on them, a rank selection of the distributions was done. the applied KS test is relying on the empirical cumulative distribution function. hence, such

statistics is described as the largest vertical difference between the theoretical and the empirical cumulative distribution function.

,

The main objective of the χ2 test is to find out how the observed value of a given treatment is significantly different from the expected value. Calculations of the value of χ2 goodness of fit test were done according to the formula:

where, χ2 is the Chi-Square goodness of fit test value, and O and E are the observed and expected values, respectively.

Having in mind these two goodness of fit criteria we selected different distributions fitted to the data, taking into account their rank from the preprocessing in order to attain the most suitable distribution which could fully describe the analyzed scattered experimental data set.

results and Discussionthe precise processing and interpretation of a given data set in comet assay is one of the most important problems in plant toxicology studies concerning environmental response. computer-assisted measurements at the nuclei level, expressed as recording values, are associated with a large amount of scattered data (6). the reasons for that can be either the

tAble 1type of distributions of comet assay data on animal and human studies

type of distribution type of model system Agent

number of measured comets as a data point in data analysis

parameter source ofliterature

Chi-square (χ2)human UV A 100 cells per sample tail moment (1)human UV A About 50 to 100 cells tail moment (2)

Gaussianhuman neutron-irradiation 80–100 images tail moment (9)

Animal h2o250 nucleoids per slide; two slides tail moment (26)

unsymmetricalWeibull human Bleomycin 48 comets per slide;

2 replicates for each sampletail moment% DnA in tail (7)

bimodal Animal UVA; UVB 100 and 150 observations olive tail moment (22)mixture of Gamma human h2o2 50± 60 cells per slide tail moment (23)

Weibullhuman high let protons

co-60 gamma rays - olive tail moment (5)

human benzene - tail moment% DnA in the tail (24)

Weibull, exponential; logistic; log normal; log-logistic

Animal h2o250 nucleoids per slide; two slides

tail moment (26)

binomial humanPolycyclic aromatic hydrocarbons

150 cellstail lengthtail intensitytail moment

(19)

Page 3: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2144 Biotechnol. & Biotechnol. eq. 24/2010/4

fig. 1. calculated distribution frequency by KS test compared to the real distributions of the observations of % DnA in tail(A) control; (b) 50μg/ml BLM; (C) 100μg/ml BLM; (D) 150μg/ml BLM; (e) 200μg/ml BLM; Normal distribution is presented only for comparison

Page 4: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2145Biotechnol. & Biotechnol. eq. 24/2010/4

fig. 2. Calculated distribution frequency by χ2 test compared to the real distributions of the observations of % DnA in tail(A) control; (b) 50μg/ml BLM; (C) 100μg/ml BLM; (D) 150μg/ml BLM; (e) 200μg/ml BLM; Normal distribution is presented only for comparison

Page 5: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2146 Biotechnol. & Biotechnol. eq. 24/2010/4

fig. 3. comparative analysis between probabilities and quantiles of normal and JSB distributions of the % of DnA in tail values presented on the P-P plots (A, C, e, G and I) and the q-q plots (b, D, f, H and J)

Page 6: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2147Biotechnol. & Biotechnol. eq. 24/2010/4

asynchronous developmental stage of the comet population or the different individual response to the influence of the studied agent. Moreover, the observed dispersion of the values of the recorded parameters after plant comet assay experiments is tightly related to the status of the nuclei after the isolation. therefore, the current results obtained in other model systems inferred that often it is not possible to standardize the distribution that most accurately has been fitted to describe the comet parameters. in animal and human studies the resulting distribution of the values has been described in different ways: as Gaussian (normal) distribution (9, 27), chi-square (1, 2), Weibull standard (5, 25, 27) or unsymmetrical (7) distribution, Bimodal distribution (22), mixture of Gamma distributions (24), exponential, logistic, log normal, log-logistic (ll) (27), Binomial distribution (19), etc (table 1).

the departure of our comet data from the normal distribution is detected either by comparing of the probability density function to the histogram of the data (fig. 1 and fig. 2) or by the cumulative distribution function expressed as correlation on probability-probability (P-P) plot (fig. 3).

Following the fact, that the distributions of both control and treatment data are different from the normal probability law, several non-Gaussian distributions using KS and χ2 goodness of fit tests are examined. Methodologically KS criteria determine the maximal difference for a single parameter between two populations without requiring assumptions regarding distribution of the data. As a competitive statistical approach of an empirical distribution function, χ2 goodness of fit test as a non-parametric statistical inference (4) works when the range of the recorded data is divided into a number of intervals.

the comparison between different distribution types (bounded; unbounded; non-negative and advanced), providing comparable results for both type of estimation is summarized at fig. 1. Our data showed that most fitted distributions which described the control samples were of continuous type. For example Power Function (PF) and 4-parameter Dagum distribution (4D) (fig. 1A) is considered to be the most accurately fitted according to KS goodness of fit test, whilst Johnson SB (JSB), Gumbel Max (GM) and Generalized Pareto (GP) (fig. 2A) matched better for χ2 goodness of fit test, respectively. Among them, the JSB distribution appears to provide the best fit to the data (fig. 2A).

4-parameter Dagum distribution often describes data that are more variable than the exponential distribution (heavy-tailed). concerning the probability density function, this type of distribution also provides the appropriate fit for the data obtained after treatment with 50μg/ml BLM (fig. 1b).

According to the χ2 criteria the best matching distribution of the data acquired after treatment with concentration 50μg/ml BlM is advanced Generalized Pareto distribution (fig. 2b). this distribution is appropriate to describe extreme values. in case of a weak effect of the bleomycin treatment, part of the comet population is composed of comets with low

values of the parameter, and another part forms a group with quite higher values as compared to the previous one. Ranked as a second suitable distribution, the JSB (fig. 2b) also fits to the data received after 50μg/ml BLM treatment.

the availability of extreme values in our case is used to model the distribution of exceedances above a highly fixed threshold. the existence of an extreme value (probably received from accumulation of hedgehog type comet) can be explained as an excessive event and is captured by the Generalized extreme Value (GeV) distibution according to the KS criteria (fig. 1C, fig. 1D and fig. 1e) in all high bleomycin concentration (100, 150, 200μg/ml BLM).

For treatments with higher bleomycin concentration we refer bounded JSB distribution (fig. 1C, fig. 1D, fig. 1e and fig. 2) which is a flexible, theoretically appealing probability function for characterizing ratios, or random variables bound by extremes, often used in human exposures and environmental studies (8).

In all cases where a fit was obtained, Chi-Square goodness of fit test is suitable for all distributions and indicated that the null hypothesis of the sample drawn from a JSB distribution should not be rejected (fig. 2A, fig. 2b, fig. 2C, fig. 2D and fig. 2e). these results support the hypothesis that the JSB function can adequately describe the distribution of the studied comet parameter (% DnA in the tail).

the results of the cumulative distribution function values presented on the P-P plots (fig. 3A, fig. 3C, fig. 3e, fig. 3G and fig. 3I) and the quantiles presented on the quantile-quantile (q-q) plot (fig. 3b, fig. 3D, fig. 3f, fig. 3H and fig. 3J) were used to compare normal and chi-square JSB distribution, chosen as the most appropriate after our analysis. the P-P plots and the q-q plots (fig. 3) confirm the inadequacy of the normal distribution. the analysis of the q-q plots (fig. 3b, fig. 3D, fig. 3f, fig. 3H and fig. 3J) emphasizes our inference, that the chi squared JSB distribution provides the best fit to the empirical distribution for all groups.

As it was discussed above, multiple distributions can be fitted to most comet data and it is not easy to find a unique and general solution for various data categories (27). our results confirm that as a consequence it is not always possible to have uniquely and generally expressed goodness of fit test for all groups of the experimental data. thus the application of both criteria in our study can be regarded as a way to solve this problem (fig. 1 and fig. 2). Nevertheless, according χ2 criteria, JSB distribution became more emphasized for fitting to the control and treatments data groups (fig. 2). this is confirmed also by the obtained scale parameters of the fitted distribution, which showed to be more consistent for the JSB criteria (fig. 2).

Conclusionsthis study has demonstrated some useful ways for statistical description of scattered input data from plant comet assay experiments. A qualitative examination of the distribution

Page 7: CompArAtIve AnAlysIs of DAtA DIstrIbutIon pAtterns In ...

2148 Biotechnol. & Biotechnol. eq. 24/2010/4

of comet data provides models with better consistence and accuracy. our study suggests that for description of plant comet data, the most appropriate fitting procedure is a chi-square JSB. A further development of this study can reveal ways for applying better statistical techniques for unbiased estimations of the factors influencing mutagenicity in plants comet assay.

referenCes1. bauer e., recknagel r.D., fiedler u., Wollweber l.,

bock C., Greulich K.o. (1998) Mutat. Res., 398(1-2), 101-110.

2. bock C., Dittmar H., Gemeinhardt H., bauer e., Greulich K.o. (1998) Mutat. Res., 408, 111-120.

3. Collins A., Dusinska m., franklin m., somorovska m., petrovska H., Duthie s., fillion l., panayiotidis m., raslova K., vaughan n. (1997) environ. Mol. Mutagen., 30, 139-146.

4. DasGupta A. (2008) Asymptotic theory of Statistics and Probability, Springer, new York.

5. Di Giorgio m., Kreiner A.J., schuff J.A., vallerga m.b, taja m.r., lopez f.o., Alvarez D.e., saint martin G., burlon A., Debray m.e., Kesque J.m., somacal H., stoliar p., valda A., Davidson J., Davidson m., ozafran m.J., vazquez m.e. (2004) 11th international congress on the international Radiation Protection Association, Madrid, Spain, 23-28 May.

6. Duez p., Dehon G., Kumps A., Dubois J. (2003) Mutagenesis, 2, 159-166.

7. ejchart A. and sadlej-sosnowska n. (2003) Mutat. Res., 534(1-2), 85-92.

8. flynn m.r. (2004) Ann. occup. hyg., 48(7), 617-622.9. Gajendiran n., tanaka K., Kamada n. (2001) J. Radiat.

Res., 42, 157-163.10. Gedik C.m., ewen s.W., Collins A.r. (1992) int. J.

Radiat. Biol., 62, 313-320.11. Georgieva m. and stoilov l. (2008) environ. Mol.

Mutagen., 49(5), 381-387.12. Gichner t. and plewa m.J. (1998) Mutat. Res., 401, 143-

152.

13. Gichner t., ptacek o., stavreva D.A., Wagner e.D., plewa m.J. (2000) Mutat. Res., 470, 1-9.

14. lovell D.p., thomas G., Dunbow r. (1999) teratog. carcinog. Mutagen., 19, 109-119.

15. lovell D.p. and omori t. (2008) Mutagenesis, 23(3), 171-182.

16. Koppen G. and Cerda H. (1997) lebenson-Wiss technol., 30, 452-457.

17. Kumaravel t.s., vilhar b., faux s.p., Jha A.n. (2009) cell Biol. toxicol., 25, 53-64.

18. menke m., Chen I., Angelis K.J., schubert I. (2001) Mutat. Res., 493, 87-93.

19. moretti m., Dell’omo m., villarini m., pastorelli r., muzi G., Airoldi l., pasquini r. (2007) BMc Public health, 7, 270.

20. navarrete m.H., Carrera p., de miguel m., de la torre C. (1997) Mutat. Res., 389, 271-277.

21. ostling o. and Johanson K.J. (1984) Biochem. Biophys. Res. commun., 123, 291-298.

22. patton W.p., Chakravarthy u., Davies r.J., Archer D.b. (1999) invest. ophthalmol. Vis. Sci., 40(13), 3268-3275.

23. schittkowski K. (2002) Struct. Multidisc. optim., 23, 153-169.

24. shepherd b.e. and schaalje G.b. (2000) quantitative Microbiology, 2, 69-79.

25. swanepoel A.J., pretorius p.J., laubscher p.J., labuschagne W.p. (2005) iona Pilanesperg: Paper G2-2.

26. tice r.r., Agurell e., Anderson D., burlinson b., Hartmann A., Kobayashi H., miyamae y., rojas e., ryu J.C., sasaki y.f. (2000) environ. Mol. Mutagen., 35, 206-221.

27. verde p.e., Geracitano l.A., Amado l.l., rosa C.e., bianchini A., monserrat J.m. (2006) Mutat. Res., 604, 71-82.

28. Wojewodzka m., buraczewska I., Kruszewski m. (2002) Mutat. Res., 518, 9-20.