ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by...
-
Upload
opal-johns -
Category
Documents
-
view
216 -
download
0
Transcript of ACT 119153 (NISR+Τ) 3 rd Phase ”Open Access Repositories and Electronic Journals” supervised by...
ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6
Subtask 6 Development of an integrated DNA microarray data processing and
meta-analysis platform plus a microarray experimental data repository, in Grid.
Overview
• Introduction• MicroArray Experiments• Problems• The GRISSOM Portal• System Architecture Overview• Case Study• Technical Issues• GRISSOM Platform Benefits
MicroArray ExperimentsGene expression microarrays constitute promising high throughput measurement methodologies of the simultaneous expression of the whole genome of an organism at a specific instant.
In practice they can be used to compare the level of transcription among different conditions in order to:
i) Understand the mechanisms implicated in various stages of the biological system investigated
ii) Classify diseases, or in general pathologies e.g. tumours with different prognosis status that are indistinguishable by microscopic histological examination iii) Monitor the response to therapy and iv) Identify and categorize diagnostic or prognostic biomarkers
MicroArray Experiment Workflow
Problems• Computational Processing steps of Microarray experiments data are
laborious, something which represents by far the most considerable bottleneck in the successful exploitation of the technology.
• Consequently there is imperative requirement for large storage and computing facilities
• This results in compounding costs in a significant yet expensive technology, thus setting back research progress in the field.
• Technical setbacks: array artifacts, scratches, scanner sensitivity and settings.
• The curse of dimensionality: tens of thousands of genes (variables) with a small number of samples form major challenges in statistical inference.
• Noise: non-specific hybridization as well as the difference between the actual amount of mRNA per cell and the relative differential expression measured by microarrays introduce variance and noise in experiments
The GRISSOM Portal
• http://www.grissom.gr• Access– Restricted Web Access– Registered Users– Special Security Mechanism
The GRISSOM Portal
• http://www.grissom.gr• Web Portal Access (SSL)– Two Access Modes:
HellasGrid Certificates Validation
HellasGrid Certificates Validation
Custom Certificates Validation
Custom Certificates Validation
Signed by NHRF
Signed by NHRF
The GRISSOM Portal
• HellasGrid Authentication & Access– MyProxy• MyProxy Certification Authority
UserUser MyProxy-logonMyProxy-logon MyPrxy-ServerMyPrxy-Server Grid.Auth.GRGrid.Auth.GR
The GRISSOM Portal
http://www.grissom.gr• Features– Experimental data upload– Versatile Data Processing:
• Normalization, Filtering, Statistical Selection, Clustering, Genes Annotation
– Automated experiment submission to HellasGrid Infrastructure and monitoring
– Biological Experiment Repository– Meta-Analysis Methods including gene annotation and GO
Analysis
The GRISSOM Portalhttp://www.grissom.gr• Input:– Raw Dataset Files (various image formats,for cDNA/Affy)
– Analysis Parameters
• Output– Expressed Gene Lists– Interactive Graphs– Annotated Genes– References to similar
Experiments
The GRISSOM Portal
http://www.grissom.gr• Distributed Database:– Data records on Portal while actual data on SEs– Interconnection with other open biological
databases (EBI ArrayExpress, NCBI GEO) for finding other related experiments
– Annotation of genes performed using specialized databases (Biomart)
System Architecture Overview
• Main Components:– Web Portal (User Interface)– Local DB– Grid Middleware
• PHP + Java• gLite 3.1
– Parallel Execution Code (MPI + Octave)/ in the phase of development job submission through gLite DAG for fully distributed code execution
– Grid Storage Elements
System Architecture Overview
System Architecture Overview
Analysis steps are executed using the MPI technology over multiple nodes
The number of Nodes are equal to the number of experimental conditions found in every experiment
Case Study – Test Scenarios
The system was tested using multiple datasets that differ in size and architecture:
Description GEO IDNum. of nodes
requiredNum. of
Conditions
Num. of Replicate
sTranscriptional signature of wounded keratinocytes reveals selective roles for ERK1/2, P38 and PI3K signaling pathways
GSE6820 18 18 36
Gene expression variation in lymphocyte subpopulations in response to low dose of ionizing radiations
GSE6978 6 6 34
Embryonic stem cell differentiation induced by various chemicals: time course
GDS1823 3 or 9 3 or 9 18
Compensated and decompensated right ventricular hypertrophy at onset: time course
GDS1928 3 3 27
Spatial gene expression in flowers (cDNA array) GDS865 5 5 15
NHRF provided Dataset - 5 5 15
Case Study – Performance Measures
Scenario
HellasGrid nodes used for
GRISSOM runGRISSOM Run
(minutes)Single Node Run
(minutes)Speed-up
Ratio
GEO-GSE6820 - 18/36 18 80 400 5
GEO-GSE6978 - 6/34 6 79 340 4.30
GEO-GDS1823 - 9/18 9 38 186 4.89
GEO-GDS1928 - 3/27 3 35 63 1.83
GEO-GDS865 - 5/15 5 16 47 2.94
NHRF - 5/15 5 17 54 3.18
Intel Core 2 Duo E4300 1.8GHz processor with 2.0 GB RAM system used running GNU Octave 2.1.73 on Linux Ubuntu 7.10 operation system
Case Study – Performance Measures
Analysis Run using the Same Dataset with different Parallelization Level.
First Run: 3 Nodes - Second Run: 9 Nodes
Grid-related Performance Limitations
• Different node H/W generations• Heterogeneity of node installed S/W (esp.
regarding biocomputing packages like Bioconductor)
• Maintenance Issues
GRISSOM Platform Screenshots
GRISSOM Platform Benefits
• Parallelization• Time optimization• User Transparency• Automated Job Submission + Monitoring• Open Access Biological Experiment Repository• Shell fully concealing the Grid
• GRISSOM Development TeamAristotle Chatziioannou ([email protected])Ilias Maglogiannis ([email protected])Ioannis Kanaris ([email protected])Charalambos Doukas ([email protected])Eleftherios Pilalis ([email protected])Panagiotis Moulos ([email protected])
• Under the supervision of the Institute of Biological Research & Biotechnology, National Hellenic Research Foundation
Fragiskos Kolisis ([email protected])
• in collaboration with the National Documentation Center, National Hellenic Research Foundation
• Funded by ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic Journals” supervised by the Greek Information Society /FP6
http://www.grissom.grhttp://www.grissom.gr
Thank youQuestions ?Thank you
Questions ?