Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system...

Use SLURM job scheduling system on π supercomputer

SJTU HPC Centerhpc@sjtu.edu.cn

Jan 7th, 2016

SJTU HPC Center hpc@sjtu.edu.cn Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 1 / 32

1 Part I: A brief introduction to SJTU π supercomputer

2 Part II: Job submission via SLURM scheduling system

3 Part III: Smart environment modules

4 Part IV: Tips for monitoring your jobs

Part I: A brief introduction to SJTU π supercomputer

SJTU π: A computer cluster

Multiple nodes connected by ultra high speed networksA virtual computer under programming abstraction (OpenMP, MPI)CPUs with low clock frequency, high parallelism, high aggregatedcomputer power

Different Compute Queues on π

Queues are pools of compute nodes on Pi:cpu: 2xCPUs with 16 cores in total, 64GB Memfat: 2xCPUs with 16 cores in total, 256GB Memgpu: 2xCPUs with 16 cores in total, 2xK20m GPUs, 64GB Memk40: 2xCPUs with 16 cores in total, 2xK40 GPUs, 64GB Mem

Assitant Programs for HPC users

SLURM: the job scheduling systemEnvironment Modules: Load and unload libraries with easeGanglia monitoring page: http://pi.sjtu.edu.cn/ganglia

Part II: Job submission via SLURM scheduling system

Why another job scheduling system (SLURM)?

SLURM just works:Free and opensourceProven scalability and reliabilityFastDebug friendly: SSH-login to compute hosts at running

Known issues of SLURM on π

NO QoS or job quotas yetUnexpected library behaviors on SLURMA hard limit of 24 hours on walltime per jobJob privacy is NOT enabled yet

Resource on SLUMR will be available until Jan 31th, 2016.

SLURM Overview

LSF SLURM Functionsinfo Cluster status

bjobs squeue Job statusbsub sbatch Job submissionbkill [JOB_ID] scancle [JOB_ID] Job deletion

sinfo: check cluster status

Host state: drain(something wrong), alloc(in use), idle, down.

PARTITION AVAIL TIMELIMIT NODES STATE NODELISTcpu* up 1-00:00:00 1 drain node001cpu* up 1-00:00:00 31 alloc node[002-032]gpu up 1-00:00:00 4 alloc gpu[47-50]fat up 1-00:00:00 2 alloc fat[19-20]k40 up 1-00:00:00 2 alloc mic[01-02]k40 up 1-00:00:00 2 idle mic[03-04]fail up 2-00:00:00 1 down* node222

squeue: check job status

Job status: R(Running), PD(Pending).

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)2402 fat add_upc hpctheo PD 0:00 2 (Resources)2313 cpu hbn310 physh R 23:49:00 2 node[003,008]

Prepare to submit a job

Make sure you know:which partition or queue to use.how many CPU cores in totalhow many CPU cores on each hostwhether GPUs are requriedexpected runtime at max

sbatch usage

sbatch jobsript.slurm

v.s. LSF

bsub < jobscript.lsf

sbatch options

LSF SLURM Meaning-n [count] -n [count] Total processes-R "span[ptile=count]" --ntasks-per-node=[count] Processes per host-q [queue] -p [partition] Job queue/partition-J [name] --job-name=[name] Job name-o [file_name] --output=[file_name] Standard output file-e [file_name] --error=[file_name] Standard error file-W [hh:mm:ss] --time=[dd-hh:mm:ss] Max walltime

sbatch options (continued)

LSF SLURM Meaning-x --exclusive Use the hosts exclusively

-mail-type=[type] Notification type-u [mail_address] --mail-user=[mail_address] Email for notification

--nodelists=[nodes] Job host preference--exclude=[nodes] Job host to avoid--depend=[state:job_id] Job dependency

A sbatch example (CPU)

#SBATCH --job-name=LINPACK#SBATCH --partition=cpu#SBATCH -n 64#SBATCH --ntasks-per-node=16#SBATCH --mail-type=end#SBATCH --mail-user=YOU@EMAIL.COM#SBATCH --output=%j.out#SBATCH --error=%j.err#SBATCH --time=00:20:00

A sbatch example (GPU)

#SBATCH --job-name=GPU_HPL#SBATCH --partition=k40#SBATCH -n 4#SBATCH --ntasks-per-node=2#SBATCH --exclusive#SBATCH --mail-type=end#SBATCH --mail-user=YOU@MAIL.COM#SBATCH --output=%j.out#SBATCH --error=%j.err#SBATCH --time=00:30:00

Part III: Smart environment modules

Fresh new modules for SLURM: /lustre/usr/modulefiles/pi

Loaded automatically on mu07 login nodeSmart enough to derive the combanation of compilers and MPI libsThe NO. of modules is still growiingSampels for job submission: /lustre/utility/pi-code-sample

Simplified module loading

$ module load gcc/4.9 openmpi/gcc49/1.8 fftw3/gcc49/openmpi8/3.3

$ module load gcc/4.9 openmpi/1.8 fftw3/3.3

$ module load gcc openmpi fftw3

Optimizedly built libraries and software

gcc icc jdk perl python R pgiimpi openmpi mvapich2 mpichmkl atlas lapack openblas mpc gmp mpfr gsl eigenabysss samtools smufin gatk maq bwa bowtieopenfoam cgal gromacshdf5 netcdf scotch ffmpeg szipcuda(cublas included) cudnn caffe

Opensource alternatives of MATLAB and IDL

GNU Octave https://www.gnu.org/software/octave/GNU Data Language (GDL)

Part IV: Tips for monitoring your jobs

Wait, is my application running well?

You can cofirm your applicaion’s state by:Comparing performance between π and your laptop.Comparing performance betwen π and existing traces or benchmarks.Monitor the applicaiton, compute nodes more exactly, viahttp://pi.sjtu.edu.cn/ganglia.Asking administrators support@lists.hpc.sjtu.edu.cn for help.

Communicate with HPC administrators

Yelling “XXX is slow” doesn’t help. Please report the following information:Job ID;Website of your application;Expected results and what actually happened;

Email support@lists.hpc.sjtu.edu.cn is always preferable over phone.

Monitor what? Load, CPU, Mem, Network

Please check http://pi.sjtu.edu.cn/ganglia:Load: The number of “threads”, should be approximately 16 – thenumber of CPU cores

Below 16: starvingAbove 16: overload

CPU report: Charts in yellow color are goodsys and wait should be less than 5%.

Mem: Do NOT exceed the physical capapcityNetwork: Ethernet traffic should be less than 1MB/s.

Case 1: Overload due to too many processesCaused by incorrect setting of NO. of cores, or inbalanced load betweennodes.

Case 2: Too high sys utilization

Caused by linking or loading incorrect MPI libraries, or hardware issue.

Case 3: Memory Usage Exceeding

The data is just too fat. Try the fat queue please.

Case 4: Inefficient Use of EthernetCaused by linking or loading incorrect MPI libraries, or Infiniband driverissue.

A workaround: export I_MPI_FABRICS=shm:daplSJTU HPC Center hpc@sjtu.edu.cn Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 31 / 32

Reference

SJTU π documents http://pi.sjtu.edu.cn/piACCRE’s SLURM Documentationhttp://www.accre.vanderbilt.edu/?page_id=2154Job samples for Pi supercomputerhttp://pi.sjtu.edu.cn/doc/samples/Remote Desktop via NoMachinehttp://pi.sjtu.edu.cn/doc/rdp/Environment Module on Pi http://pi.sjtu.edu.cn/doc/module/

Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system...

Documents

Transcript of Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system...

ExAnte Property (Monotone Data Reduction)didawiki.di.unipi.it/lib/exe/fetch.php/tdm/bonchi-14-maggio.pdf · KDD Laboratory HPC Laboratory ISTI – C.N.R. Italy ... Discussione Tesi

Background Subtraction - II - Supercomputer …venky/SE263/slides/BGM2.pdfBackground Subtraction - II R. Venkatesh Babu SE 263 R. Venkatesh Babu Running Gaussian Average •Pfinder(Wren,

Accelerating Shallow Water Flow and Mass Transport …on-demand.gputechconf.com/gtc/2013/presentations/S3324...sij A hD c t τ = ∆− Dh = u κ L Dh yy = u κ T κκ LT / 10= HPC

HPC Benchmarking - ICL · HPC Benchmarking Presentations: ... Ax b O Axn ε − = 21 32 32 nn− 2n2 ... • Multi-scale execution of kernels via MG (truncated) V cycle.

Classification of Protein Secondary Structure · Classification of Protein Secondary Structure Jay Yagnik and Karthik Raman Supercomputer Education & Research Centre. ... • Order

Regional climate simulations for over Europe using the HPC ...€¦ · Following the World Meteorological Organization (WMO) 30 years is the classical period for performing the statistics

HPC Καλαμάτα πρόσκληση 2017 08 29 - geotee.gr › lnkFiles › 20170830044226_5.pdf · 2018-03-13 · HPC Καλαμάτα πρόσκληση 2017 08 29.cdr Author:

Real Time Power and Performance Monitoring of Supercomputer Application Shankar Prajapati BS in Computer Science Claflin University shan08kar@gmail.com.

Unlocking the mysteries of Plutonium via HPC

An introduction to SJTU supercomputer - hpc.sjtu.edu.cn · PDF fileAn introduction to SJTU π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Center for High Performance Computing, SJTU

Simulation of detached flows using OpenFOAM® - HPC … · Simulation of detached flows using OpenFOAM® A. Fakhari, V. Armenio University of Trieste A. Petronio ... Smagorinsky with

IebWdkc d ]hkc - u-szeged.hu HPc/51137.pdf · 2008. 1. 31. · gross NPQ for the wild biotypes normally reaches a value of 2.5 to 3, but it remained between 1 and 1.5 (mainly around

SJTU CCOE Annual Report and Renew Request

Vshmem: Shared-Memory OS-Support for Multicore-based HPC ...€¦ · software stack, demonstrating the deployment of a hypervisor and several VMs, each of which is managing a subset

Energy efficient HPC software€¦ · •MEng, Computer Systems Engineering, UoB •PhD student in μ group •Researching software energy modelling & optimisations •Focus on embedded

Microwave Technology - SJTU

Xiangdong Ji University of Maryland/SJTU Physics of gluon polarization Jlab, May 9, 2013.

Network Traffic Analysis With Query Driven Visualization ...sc05.supercomputing.org/schedule/pdf/anal116.pdf · Network Traffic Analysis With Query Driven Visualization SC 2005 HPC

parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

Portable Data Mining on Azure and HPC Platform