Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system...

32
Use SLURM job scheduling system on π supercomputer SJTU HPC Center [email protected] Jan 7th, 2016 SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 1 / 32

Transcript of Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system...

Page 1: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Use SLURM job scheduling system on π supercomputer

SJTU HPC [email protected]

Jan 7th, 2016

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 1 / 32

Page 2: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

1 Part I: A brief introduction to SJTU π supercomputer

2 Part II: Job submission via SLURM scheduling system

3 Part III: Smart environment modules

4 Part IV: Tips for monitoring your jobs

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 2 / 32

Page 3: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part I: A brief introduction to SJTU π supercomputer

Part I: A brief introduction to SJTU π supercomputer

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 3 / 32

Page 4: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part I: A brief introduction to SJTU π supercomputer

SJTU π: A computer cluster

Multiple nodes connected by ultra high speed networksA virtual computer under programming abstraction (OpenMP, MPI)CPUs with low clock frequency, high parallelism, high aggregatedcomputer power

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 4 / 32

Page 5: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part I: A brief introduction to SJTU π supercomputer

Different Compute Queues on π

Queues are pools of compute nodes on Pi:cpu: 2xCPUs with 16 cores in total, 64GB Memfat: 2xCPUs with 16 cores in total, 256GB Memgpu: 2xCPUs with 16 cores in total, 2xK20m GPUs, 64GB Memk40: 2xCPUs with 16 cores in total, 2xK40 GPUs, 64GB Mem

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 5 / 32

Page 6: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part I: A brief introduction to SJTU π supercomputer

Assitant Programs for HPC users

SLURM: the job scheduling systemEnvironment Modules: Load and unload libraries with easeGanglia monitoring page: http://pi.sjtu.edu.cn/ganglia

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 6 / 32

Page 7: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

Part II: Job submission via SLURM scheduling system

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 7 / 32

Page 8: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

Why another job scheduling system (SLURM)?

SLURM just works:Free and opensourceProven scalability and reliabilityFastDebug friendly: SSH-login to compute hosts at running

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 8 / 32

Page 9: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

Known issues of SLURM on π

NO QoS or job quotas yetUnexpected library behaviors on SLURMA hard limit of 24 hours on walltime per jobJob privacy is NOT enabled yet

Resource on SLUMR will be available until Jan 31th, 2016.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 9 / 32

Page 10: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

SLURM Overview

LSF SLURM Functionsinfo Cluster status

bjobs squeue Job statusbsub sbatch Job submissionbkill [JOB_ID] scancle [JOB_ID] Job deletion

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 10 / 32

Page 11: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

sinfo: check cluster status

Host state: drain(something wrong), alloc(in use), idle, down.

PARTITION AVAIL TIMELIMIT NODES STATE NODELISTcpu* up 1-00:00:00 1 drain node001cpu* up 1-00:00:00 31 alloc node[002-032]gpu up 1-00:00:00 4 alloc gpu[47-50]fat up 1-00:00:00 2 alloc fat[19-20]k40 up 1-00:00:00 2 alloc mic[01-02]k40 up 1-00:00:00 2 idle mic[03-04]fail up 2-00:00:00 1 down* node222

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 11 / 32

Page 12: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

squeue: check job status

Job status: R(Running), PD(Pending).

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)2402 fat add_upc hpctheo PD 0:00 2 (Resources)2313 cpu hbn310 physh R 23:49:00 2 node[003,008]

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 12 / 32

Page 13: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

Prepare to submit a job

Make sure you know:which partition or queue to use.how many CPU cores in totalhow many CPU cores on each hostwhether GPUs are requriedexpected runtime at max

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 13 / 32

Page 14: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

sbatch usage

SLURM

sbatch jobsript.slurm

v.s. LSF

bsub < jobscript.lsf

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 14 / 32

Page 15: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

sbatch options

LSF SLURM Meaning-n [count] -n [count] Total processes-R "span[ptile=count]" --ntasks-per-node=[count] Processes per host-q [queue] -p [partition] Job queue/partition-J [name] --job-name=[name] Job name-o [file_name] --output=[file_name] Standard output file-e [file_name] --error=[file_name] Standard error file-W [hh:mm:ss] --time=[dd-hh:mm:ss] Max walltime

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 15 / 32

Page 16: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

sbatch options (continued)

LSF SLURM Meaning-x --exclusive Use the hosts exclusively

-mail-type=[type] Notification type-u [mail_address] --mail-user=[mail_address] Email for notification

--nodelists=[nodes] Job host preference--exclude=[nodes] Job host to avoid--depend=[state:job_id] Job dependency

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 16 / 32

Page 17: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

A sbatch example (CPU)

#SBATCH --job-name=LINPACK#SBATCH --partition=cpu#SBATCH -n 64#SBATCH --ntasks-per-node=16#SBATCH --mail-type=end#SBATCH [email protected]#SBATCH --output=%j.out#SBATCH --error=%j.err#SBATCH --time=00:20:00

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 17 / 32

Page 18: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part II: Job submission via SLURM scheduling system

A sbatch example (GPU)

#SBATCH --job-name=GPU_HPL#SBATCH --partition=k40#SBATCH -n 4#SBATCH --ntasks-per-node=2#SBATCH --exclusive#SBATCH --mail-type=end#SBATCH [email protected]#SBATCH --output=%j.out#SBATCH --error=%j.err#SBATCH --time=00:30:00

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 18 / 32

Page 19: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part III: Smart environment modules

Part III: Smart environment modules

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 19 / 32

Page 20: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part III: Smart environment modules

Fresh new modules for SLURM: /lustre/usr/modulefiles/pi

Loaded automatically on mu07 login nodeSmart enough to derive the combanation of compilers and MPI libsThe NO. of modules is still growiingSampels for job submission: /lustre/utility/pi-code-sample

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 20 / 32

Page 21: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part III: Smart environment modules

Simplified module loading

$ module load gcc/4.9 openmpi/gcc49/1.8 fftw3/gcc49/openmpi8/3.3

v.s.

$ module load gcc/4.9 openmpi/1.8 fftw3/3.3

v.s.

$ module load gcc openmpi fftw3

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 21 / 32

Page 22: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part III: Smart environment modules

Optimizedly built libraries and software

gcc icc jdk perl python R pgiimpi openmpi mvapich2 mpichmkl atlas lapack openblas mpc gmp mpfr gsl eigenabysss samtools smufin gatk maq bwa bowtieopenfoam cgal gromacshdf5 netcdf scotch ffmpeg szipcuda(cublas included) cudnn caffe

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 22 / 32

Page 23: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part III: Smart environment modules

Opensource alternatives of MATLAB and IDL

GNU Octave https://www.gnu.org/software/octave/GNU Data Language (GDL)

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 23 / 32

Page 24: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Part IV: Tips for monitoring your jobs

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 24 / 32

Page 25: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Wait, is my application running well?

You can cofirm your applicaion’s state by:Comparing performance between π and your laptop.Comparing performance betwen π and existing traces or benchmarks.Monitor the applicaiton, compute nodes more exactly, viahttp://pi.sjtu.edu.cn/ganglia.Asking administrators [email protected] for help.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 25 / 32

Page 26: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Communicate with HPC administrators

Yelling “XXX is slow” doesn’t help. Please report the following information:Job ID;Website of your application;Expected results and what actually happened;

Email [email protected] is always preferable over phone.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 26 / 32

Page 27: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Monitor what? Load, CPU, Mem, Network

Please check http://pi.sjtu.edu.cn/ganglia:Load: The number of “threads”, should be approximately 16 – thenumber of CPU cores

Below 16: starvingAbove 16: overload

CPU report: Charts in yellow color are goodsys and wait should be less than 5%.

Mem: Do NOT exceed the physical capapcityNetwork: Ethernet traffic should be less than 1MB/s.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 27 / 32

Page 28: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Case 1: Overload due to too many processesCaused by incorrect setting of NO. of cores, or inbalanced load betweennodes.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 28 / 32

Page 29: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Case 2: Too high sys utilization

Caused by linking or loading incorrect MPI libraries, or hardware issue.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 29 / 32

Page 30: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Case 3: Memory Usage Exceeding

The data is just too fat. Try the fat queue please.

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 30 / 32

Page 31: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Case 4: Inefficient Use of EthernetCaused by linking or loading incorrect MPI libraries, or Infiniband driverissue.

A workaround: export I_MPI_FABRICS=shm:daplSJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 31 / 32

Page 32: Use SLURM job scheduling system on supercomputer - · PDF fileUse SLURM job scheduling system on π supercomputer SJTU HPC Center hpc@sjtu.edu.cn Jan 7th, 2016 SJTU HPC Center hpc@sjtu.edu.cn

Part IV: Tips for monitoring your jobs

Reference

SJTU π documents http://pi.sjtu.edu.cn/piACCRE’s SLURM Documentationhttp://www.accre.vanderbilt.edu/?page_id=2154Job samples for Pi supercomputerhttp://pi.sjtu.edu.cn/doc/samples/Remote Desktop via NoMachinehttp://pi.sjtu.edu.cn/doc/rdp/Environment Module on Pi http://pi.sjtu.edu.cn/doc/module/

SJTU HPC Center [email protected] Use SLURM job scheduling system on π supercomputer Jan 7th, 2016 32 / 32