Deployment of Data-flow Applications in Many-core ... · Deployment of Data-flow Applications in...

1
Deployment of Data-flow Applications in Many-core Architectures UltrasoundToGo RTD 2013 Stefanos Skalistis and Alena Simalatsar Rigorous System Design Laboratory (RiSD), EPFL Task A Task C ω(e AC ) Task B ω(e AB ) Cluster 1 Cluster 2 b(e AC ) I A T A 0 ω(e AB ) F A 0 b(e AB ) I B T B 0 0 b(e BA ) Application Partitioning Application Placement Mapping, Scheduling, Buffer Allocation Power Management Real-Time Scheduling SMT Solver Offline Online Image credit: Marcio Castro et al. Many-core architectures: + Efficient for high-parallelizable applications + Support for power management - Complex communication mechanisms - Safe code-generation is not straightforward Kalray MPPA-256: 256 cores (400 MHz) in 16 clusters with 2MB shared scratchpad memory, on 2D-torus NoC Scheduling constraints Resource constraints Power constraints Providing real-time guarantees is challenging Existing methods: Based on WCET → largely overestimated Do not focus on optimizing performance Real-time performance and power efficiency: Real-time scheduling respecting time constraints Operating within power constraints Optimal use of resources Motivation Multi-criteria optimization decisions: Right degree of parallelism? Application partitioning and mapping to PE? Size of buffers? Task scheduling with real-time constraints? Deploying on many-core: Bridging safety-critical and best-effort Hybrid approach: Offline: Guarantees based on worst-case execution time Online: Optimizations based on actual execution times [1] Tendulkar, Pranav, et al. "Many-Core Scheduling of Data Parallel Applications using SMT Solvers." Digital System Design (DSD), 2014 17th Euromicro Conference on. IEEE, 2014. [2] De Dinechin, Benoît Dupont, et al. "Time-critical computing on a single-chip massively parallel processor." Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, 2014. [3] Ibrahim, Aya, et al. "Assessment of Image Quality vs. Computation Cost for Different Parameterizations of Ultrasound Imaging Pipelines.“ Medical Cyber-Physical Systems (MCPS) Workshop, 2015 Case study: 3D Ultrasound Imaging Data-flow application High degree of parallelism Medical nature requires guarantees Portability imposes power limitations Finding solutions using SMT solvers: Safe deployment without undermining performance

Transcript of Deployment of Data-flow Applications in Many-core ... · Deployment of Data-flow Applications in...

Page 1: Deployment of Data-flow Applications in Many-core ... · Deployment of Data-flow Applications in Many-core Architectures ... Rigorous System Design Laboratory (RiSD), EPFL Task A

Deployment of Data-flow Applications

in Many-core Architectures

UltrasoundToGo RTD 2013

Stefanos Skalistis and Alena Simalatsar

Rigorous System Design Laboratory (RiSD), EPFL

Task A

Task C

ω(eAC)

Task B

ω(eAB)

Cluster 1 Cluster 2b(eAC)

IA TA0 ω(eAB)

FA

0

b(eAB)

IB

TB

0

0

b(eBA)

Application Partitioning

Application Placement

Mapping, Scheduling, Buffer Allocation

Power Management

Real-Time Scheduling

SMT

Solver

Offline

Online

Image credit: Marcio Castro et al.

Many-core architectures:

+ Efficient for high-parallelizable applications

+ Support for power management

- Complex communication mechanisms

- Safe code-generation is not straightforward

Kalray MPPA-256:

• 256 cores (400 MHz) in 16 clusters with 2MB shared scratchpad memory, on 2D-torus NoC

• Scheduling constraints

• Resource constraints

• Power constraints

Providing real-time guarantees is challenging

Existing methods:

• Based on WCET → largely overestimated

• Do not focus on optimizing performance

Real-time performance and power efficiency:

• Real-time scheduling respecting time constraints

• Operating within power constraints

• Optimal use of resources

Motivation

Multi-criteria optimization decisions:

• Right degree of parallelism?

• Application partitioning and mapping to PE?

• Size of buffers? Task scheduling with real-time constraints?

Deploying on many-core: Bridging safety-critical and best-effort

Hybrid approach:

Offline: Guarantees based on worst-case execution time

Online: Optimizations based on actual execution times

[1] Tendulkar, Pranav, et al. "Many-Core Scheduling of Data Parallel Applications using SMT Solvers." Digital System Design (DSD), 2014 17th Euromicro Conference on. IEEE, 2014.

[2] De Dinechin, Benoît Dupont, et al. "Time-critical computing on a single-chip massively parallel processor." Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, 2014.

[3] Ibrahim, Aya, et al. "Assessment of Image Quality vs. Computation Cost for Different Parameterizations of Ultrasound Imaging Pipelines.“ Medical Cyber-Physical Systems (MCPS) Workshop, 2015

Case study: 3D Ultrasound Imaging

• Data-flow application

• High degree of parallelism

• Medical nature requires guarantees

• Portability imposes power limitations

Finding solutions using SMT solvers:

Safe deployment without

undermining performance