Introduction to grid technologies and practical usage of...
Transcript of Introduction to grid technologies and practical usage of...
Introduction to grid technologies and practical usage of gLite middleware
Pavol Štefko, Lukáš Chlad, Lukáš Fajt,Ondřej Ticháček, Alina Pranovich
Grid Computing
● Grid computing is the collection of computer resources from multiple locations to reach a common goal.
● A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities.
● Grid tends to be more loosely coupled, heterogenous and geographically dispersed than cluster computing.
● Virtual organization is a set of individuals and/or institutions.
Large Hadron Collider
Large Hadron Collider
● Enormous data stream - 15 PB per year.● More than 1000 active participants.● Necessity for a joint access to the experimental data in the global
scale and their processing and analysis on the computing resources wordwide.
● development of the new technology of distributed computing = grid ● Worldwide LHC Computing Grid (WLCG) : Tier-0 (CERN), Tier-1
(11), Tier-2 (JINR)● Tier-3 Monitoring toolkit, DDM DQ2 Deletion service, Remote
access to ATLAS and CMS,
Prime Grid
● Distributed computing project for searching for PN of world-record size.
● As of September 2011, there was about 7500 active participants from 114 countries with 1.663 petaflops (quadrillion operations per second) of processing power
● Sub-projects○ 321 Prime Search○ Twin Prime Search○ AP26 Search
● Also you can join to the one of the Prime Grid projects● With a little patience, you may find a large or even record breaking
prime and enter into Chris Caldwell‘s The Largest Known Primes Database as a Titan!
Heart Simulator
● Current computational models track the electro-mechanics of the heart from sub-cellular to the whole-organ level.
● The models allow a better comprehension of important cardiac diseases, such as Ventricular Arrhythmia, Myocarditis, Infarct, Chagas Disease, Diabetes, etc.
● Complicated system of partial and ordinary differential equations -> useful to use the computational power of the grid.
● This application was gridified to the EELA (E-Science Grid Facility for Europe and Latin America) Infrasctructure.
Pierre Auger Experiment● Using GRID for Simulation and Modeling of Ultra High-Energy Cosmic
Rays● Physic Background
○ UHE CR are in energy range 1018-1020 eV○ Detectors are two types (using Cherenkov radiation and ultraviolet
light)● Computation needs
○ Middleware = gLite○ Software packages: Fortran, CORSIKA (COsmic Ray SImulations for
KAscade), Aires (AIRshower Extended Simulations), ROOT, Geant4.. Modeling the particle showers Simulation of particle showers
Genealogical Sorting Index (GSI)The Cummings Laboratory
a statistic to quantify the degree of exclusive ancestry of individuals in labeled groups on a rooted tree.● Open source package● Web interface
1. Upload a tree file.2. View any tree in your input file. Individuals will be in different colors according to their class assignments.3. Assign class labels for the individuals either by uploading a class label assignment file.
● Search for Extra-Terrestrial Inteligence● Analysis of radio signal from space● Distributed coputing among home PCs
○ 150 000 /225 000 active participants / hosts○ BOINC software ○ free CPU/GPU time
Work with gLite
We worked on an educational grid network "edu". This consisted of SEs and CEs which can be retrieved with lcg-info --list-ce(se)
Workload Managment System (WMS)
● The purpose of the WMS - accept user jobs
- assign them to appropriate CE- recover the status of a job- retrieve results of a job
● Each grid job is described using a Job Description Language (JDL) file
● This file specifies the job's characteristics and constraints used to find the most suitable CE
More information about CEs (such as the number of CPUs and the number of jobs being executed) can be obtained with lcg-infosites --vo edu ce
In a similar way we can get more info about SEs - lcg-infosites --vo edu se
Basic job types
● normal job - the most simple type of job
● job collection - submission of a set of jobs whose description files are placed in one directory
● parametric job - submission of a set of jobs having the same descriptions apart from the values of the parametric attributes. Use when your jobs differs only in argument values or input/output files.
● MPI (Message Passing Interface) job
MPI Jobs
● Message Passing Interface - currently the most commonly used interface for the development of parallel applications.
● Parallel applications spawn over several processes which are in the common case mapped over an equivalent number of processors or physical CPUs.
● These processes may communicate with one another by passing information in MPI terminology.
● The goal is to reduce the overall execution time of the application.
Job Submission
● list of CEs that can accept job - glite-wms-job-list-match -a job.jdl
● job.jdl submission - glite-wms-job-submit -a -o job.id job.jdl
● status - glite-wms-job-status -i job.id - Submitted, Waiting, Ready, Scheduled, Running, Done, Cleared, Aborted, Cancelled and Purged
● retrieve results - glite-wms-job-output --dir results -i job.id
Data Management System
● Storing data on different systems induce a need of common interface SRM (Storage Resource Manager)
● On the GRID there are files and replicas of files
● Each file has GUID (Grid Unique IDentifier) and may have one or more LFN aliases (Logical File Name)
● Client always communicates with SE through SRM and keeps track of all his files thanks to LFC (LCG File Catalogue)
Data Management SystemExamples of using:
Get list of files and directories
Then you might want to create new directory
And copy file to SE and register it into catalogue
Or download file from SE
Data Management SystemExamples of using:
You can add new alias for the uploaded file and list all alias
Creating replica to SE for file with GUID
And after work you can detele file
Our experience with GRID
● One really simple task about a data management where we used hostname command
● Managing of a graphic data - resizing of our images (*.jpg, *.png) on grid
● Processing of a video data - converting and resizing of our videos (*.avi, *.flv) with ffmpeg
● Blender's model visualization - rendering of few models
● MPI examples - the value of π, numerical integrationELMER, FDS, MEEP
Thank you for attention!