ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by...

22
ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development of an integrated DNA microarray data processing and meta-analysis platform plus a microarray experimental data repository, in Grid.

Transcript of ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by...

Page 1: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6

Subtask 6 Development of an integrated DNA microarray data processing and

meta-analysis platform plus a microarray experimental data repository, in Grid.

Page 2: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

Overview

• Introduction• MicroArray Experiments• Problems• The GRISSOM Portal• System Architecture Overview• Case Study• Technical Issues• GRISSOM Platform Benefits

Page 3: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

MicroArray ExperimentsGene expression microarrays constitute promising high throughput measurement methodologies of the simultaneous expression of the whole genome of an organism at a specific instant.

In practice they can be used to compare the level of transcription among different conditions in order to:

i) Understand the mechanisms implicated in various stages of the biological system investigated

ii) Classify diseases, or in general pathologies e.g. tumours with different prognosis status that are indistinguishable by microscopic histological examination iii) Monitor the response to therapy and iv) Identify and categorize diagnostic or prognostic biomarkers

Page 4: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

MicroArray Experiment Workflow

Page 5: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

Problems• Computational Processing steps of Microarray experiments data are

laborious, something which represents by far the most considerable bottleneck in the successful exploitation of the technology.

• Consequently there is imperative requirement for large storage and computing facilities

• This results in compounding costs in a significant yet expensive technology, thus setting back research progress in the field.

• Technical setbacks: array artifacts, scratches, scanner sensitivity and settings.

• The curse of dimensionality: tens of thousands of genes (variables) with a small number of samples form major challenges in statistical inference.

• Noise: non-specific hybridization as well as the difference between the actual amount of mRNA per cell and the relative differential expression measured by microarrays introduce variance and noise in experiments

Page 6: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

The GRISSOM Portal

• http://www.grissom.gr• Access– Restricted Web Access– Registered Users– Special Security Mechanism

Page 7: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

The GRISSOM Portal

• http://www.grissom.gr• Web Portal Access (SSL)– Two Access Modes:

HellasGrid Certificates Validation

HellasGrid Certificates Validation

Custom Certificates Validation

Custom Certificates Validation

Signed by NHRF

Signed by NHRF

Page 8: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

The GRISSOM Portal

• HellasGrid Authentication & Access– MyProxy• MyProxy Certification Authority

UserUser MyProxy-logonMyProxy-logon MyPrxy-ServerMyPrxy-Server Grid.Auth.GRGrid.Auth.GR

Page 9: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

The GRISSOM Portal

http://www.grissom.gr• Features– Experimental data upload– Versatile Data Processing:

• Normalization, Filtering, Statistical Selection, Clustering, Genes Annotation

– Automated experiment submission to HellasGrid Infrastructure and monitoring

– Biological Experiment Repository– Meta-Analysis Methods including gene annotation and GO

Analysis

Page 10: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

The GRISSOM Portalhttp://www.grissom.gr• Input:– Raw Dataset Files (various image formats,for cDNA/Affy)

– Analysis Parameters

• Output– Expressed Gene Lists– Interactive Graphs– Annotated Genes– References to similar

Experiments

Page 11: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

The GRISSOM Portal

http://www.grissom.gr• Distributed Database:– Data records on Portal while actual data on SEs– Interconnection with other open biological

databases (EBI ArrayExpress, NCBI GEO) for finding other related experiments

– Annotation of genes performed using specialized databases (Biomart)

Page 12: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

System Architecture Overview

• Main Components:– Web Portal (User Interface)– Local DB– Grid Middleware

• PHP + Java• gLite 3.1

– Parallel Execution Code (MPI + Octave)/ in the phase of development job submission through gLite DAG for fully distributed code execution

– Grid Storage Elements

Page 13: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

System Architecture Overview

Page 14: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

System Architecture Overview

Analysis steps are executed using the MPI technology over multiple nodes

The number of Nodes are equal to the number of experimental conditions found in every experiment

Page 15: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

Case Study – Test Scenarios

The system was tested using multiple datasets that differ in size and architecture:

Description GEO IDNum. of nodes

requiredNum. of

Conditions

Num. of Replicate

sTranscriptional signature of wounded keratinocytes reveals selective roles for ERK1/2, P38 and PI3K signaling pathways

GSE6820 18 18 36

Gene expression variation in lymphocyte subpopulations in response to low dose of ionizing radiations

GSE6978 6 6 34

Embryonic stem cell differentiation induced by various chemicals: time course

GDS1823 3 or 9 3 or 9 18

Compensated and decompensated right ventricular hypertrophy at onset: time course

GDS1928 3 3 27

Spatial gene expression in flowers (cDNA array) GDS865 5 5 15

NHRF provided Dataset - 5 5 15

Page 16: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

Case Study – Performance Measures

Scenario

HellasGrid nodes used for

GRISSOM runGRISSOM Run

(minutes)Single Node Run

(minutes)Speed-up

Ratio

GEO-GSE6820 - 18/36 18 80 400 5

GEO-GSE6978 - 6/34 6 79 340 4.30

GEO-GDS1823 - 9/18 9 38 186 4.89

GEO-GDS1928 - 3/27 3 35 63 1.83

GEO-GDS865 - 5/15 5 16 47 2.94

NHRF - 5/15 5 17 54 3.18

Intel Core 2 Duo E4300 1.8GHz processor with 2.0 GB RAM system used running GNU Octave 2.1.73 on Linux Ubuntu 7.10 operation system

Page 17: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

Case Study – Performance Measures

Analysis Run using the Same Dataset with different Parallelization Level.

First Run: 3 Nodes - Second Run: 9 Nodes

Page 18: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

Grid-related Performance Limitations

• Different node H/W generations• Heterogeneity of node installed S/W (esp.

regarding biocomputing packages like Bioconductor)

• Maintenance Issues

Page 19: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

GRISSOM Platform Screenshots

Page 20: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

GRISSOM Platform Benefits

• Parallelization• Time optimization• User Transparency• Automated Job Submission + Monitoring• Open Access Biological Experiment Repository• Shell fully concealing the Grid

Page 21: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

• GRISSOM Development TeamAristotle Chatziioannou ([email protected])Ilias Maglogiannis ([email protected])Ioannis Kanaris ([email protected])Charalambos Doukas ([email protected])Eleftherios Pilalis ([email protected])Panagiotis Moulos ([email protected])

• Under the supervision of the Institute of Biological Research & Biotechnology, National Hellenic Research Foundation

Fragiskos Kolisis ([email protected])

• in collaboration with the National Documentation Center, National Hellenic Research Foundation

• Funded by ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6

Page 22: ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6 Subtask 6 Development.

http://www.grissom.grhttp://www.grissom.gr

Thank youQuestions ?Thank you

Questions ?