Real Time Power and Performance Monitoring of Supercomputer Application Shankar Prajapati BS in...
-
Upload
layton-adwell -
Category
Documents
-
view
221 -
download
2
Transcript of Real Time Power and Performance Monitoring of Supercomputer Application Shankar Prajapati BS in...
Real Time Power and Performance Monitoring of Supercomputer Application
Shankar Prajapati BS in Computer Science
Claflin [email protected]
Nate Rini(Mentor)
3 flops 34,000,000,000,000,000flops
VS
History and progress of Supercomputer
Our demand for faster supercomputers is
increasing faster than the combined might of Moore’s law and Dennard scaling.
Most of the world’s computing superpowers have announced their intentions to create exascale (1000 petaflops) supercomputers by 2020.
Supercomputer in progress
The speed limit for modern
supercomputers is now
set by power consumption
Performance α Power
Relationship
How best to optimize applications to fully utilize available system resources efficiently ?
We are monitoring from the system level
How?
Objectives
Understand the relationship between power consumptions and performance
Monitor the power and performance of supercomputer application
We don’t want to interfere with the users
Benchmarking
Measuring Performance
High Performance Linpack(HPL)
Tools to measure Efficiency and Performance
High Performance LINPACK(HPL)
Since 1993, the fastest supercomputers have been ranked on the TOP500 list according to their LINPACK benchmarking results.
Message Passing Interface (MPI)
High Performance
Portability Scalability
Most programs now running on highly parallel computers are built on the Message-Passing Interface, or MPI.
OSU Micro-Benchmarking
Bandwidth test
Latency test
Message Rate test
The test cluster we setup of ganglia mimics the real cluster in smaller scale
Ganglia test cluster
In a Virtual box, we installedlatest version of Centos.
We installed ganglia from source code in one of the main node.
Installed all the dependent packages like APR ,libConfuse , expat, pkg-config, python,PCRE,RRDtool and few other packages on which ganglia depends.
Ganglia Architecture
The Ganglia MONitor Daemon (GMOND)
The Ganglia METAdata Daemon (GMETAD)
Round Robin Database Tools
(RRDTools)PHP-based Web
interface
Web server
Ganglia Web Interface for Jellystone test cluster
Ganglia Load vs time graphLINPACK over 28 nodes on
Jellystone test cluster
Model Specific Register
Various Control registers in the x86 instruction setused for debugging, program execution tracing,Computer performancemonitoring, and toggling certain CPU features.
MSR and Librapl
Librapl simplifies access to the RAPL values
In the MSR registers of modern Intel CPUs like
SandyBridge processors.
Intel® Power Gadget
Software based power
usage monitoring
tool for 2nd generation
Intel Core processors
or later.
• Package power limit
• Energy of the CPU/processor cores
• Energy of the processor graphics
Logfile data
Host metrics from rvitals
rvitals retrieves hardware vital information from the on-board Service Processor for a single or range of nodes and groups.
IBM iDataplex iPDU
PDU is queried via SNMP. SNMP is a set of protocols for managing
complex networks
The data collection includes power usage from Intel processors, PDUs and node power supplies.
The data is collected while running selected jobs.
The analysis of sampled data helps us to understand how jobs affects the real-time
power usage in supercomputers.
Data Collection
Intel® Power Gadget
IPMI via rvitals
IBM iDataplex iPDU via
SNMP
Tools
V S
Observations
Observations
Conclusion
We were able to monitor the power usage by different nodes while running jobs with a one minute granularity
We also compared the data output fromthree different power measurement tools.
We successfully achieved our goal to monitor the real time power and performance of a super computer.
Future
Supercomputer Systems Group Shawn StrandeNathan Rini ( Mentor )Irfan ElahiShawn NeedhamRohan Rodrigues Jonathan RobertsStormy KnightTom Gowan Benjamin Mathews
Acknowledgement
Ananta TiwariLaura Carrington
Questions