Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)

61

Transcript of Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)

Parallel and Asynchronous Programming

Or how we built a Dropbox clone without a PhD in Astrophysics

Panagiotis KanavosDotNetZone [email protected]

• Processors are getting smaller

• Networks are getting worse

• Operating Systems demand it

• Only a subset of the code can run in parallel

Why

• Once, a single-thread process could use 100% of the CPU

• 16% ΜΑΧ ona Quad core LAPTOP with HyperThreading

• 8% ΜΑΧ on an 8 core server

Processors are getting smaller

• Hand-coded threads and synchronization

• BackgroundWorker Heavy, cumbersome, single threaded, inadequate progress reporting

• EAP: From event to event Complicated, loss of continuity

• APM: BeginXXX/EndXXX Cumbersome, imagine socket programming with Begin/End!

or rather ...

What we used to have

• Collisions Reduced throughput

Deadlocks

• Solution: Limit the number of threads ThreadPools

Extreme: Stackless Python

Copy data instead of shared access

Extreme: Immutable programming

The problem with threads

• How can I speed-up my algorithm?

• Which parts can run in parallel?

• How can I partition my data?

Why should I care aboutthreads?

Example

Revani

• Beat the yolks with 2/3 of sugar until fluffy

• Beat the whites with 1/3 of sugar to stiff meringue

• and add half the mixture to the yolk mixture.

• Mix semolina with flour and ground coconut ,

• add rest of meringue and mix

• Mix and pour in cake pan

• Bake in pre-heated oven at 170οC for 20-25 mins.

• Allow to cool

• Prepare syrup, boil water, sugar, lemon for 3 mins.

• Pour warm syrup over revani

• Sprinkle with ground coconut.

Synchronous Revani

Parallel Revani

• Beat yolks • Beat Whites

• Add half mixture

• Mix semolina

• Add rest of meringue

• Mix

• Pour in cake pan

• Pour syrup

• Sprinkle

• Bake • Prepare syrup

• Support for multiple concurrency scenarios

• Overall improvements in threading

• Highly Concurrent collections

What we have now

Scenaria

• Faster processing of large data• Number crunching

• Execute long operations

• Serve high volume of requests• Social Sites, Web sites, Billing, Log aggregators

• Tasks with frequent blocking• REST clients, IT management apps

• Data Parallelism

• Task Parallelism

• Asynchronous programming

• Agents/Actors

• Dataflows

Scenario Classification

• Partition the data

• Implement the algorithm in a function

• TPL creates the necessary tasks

• The tasks are assigned to threads

• I DON’T’T have to define the number of Tasks/Threads!

Data Parallelism – Recipe

• Parallel.For / Parallel.ForEach

• PLINQ

• Partitioners

Data Parallelism - Tools

• Parallel execution of lambdas

• Blocking calls!

• We specify Cancellation Token

Maximum number of Threads

Task Scheduler

Parallel class Methods

• LINQ Queries

• Potentially multiple threads

• Parallel operators

• Unordered results

• Beware of racesList<int> list = new List<int>();

var q = src.AsParallel()

.Select(x => { list.Add(x); return x; })

.Where(x => true) .Take(100);

PLINQ

• Doesn’t use SSE instructions

• Doesn’t use the GPU

• Isn’t using the CPU at 100%

What it can’t do

• Data Parallelism

• Task Parallelism

• Asynchronous programming

• Agents/Actors

• Dataflows

Scenaria

• Break the problem into steps

• Convert each step to a function

• Combine steps with Continuations

• TPL assigns tasks to threads as needed

• I DON’T have to define number of Tasks/Threads!

• Cancellation of the entire task chain

Task Parellelism – Recipe

• Tasks wherever code blocks

• Cancellation

• Lazy Initialization

• Progress Reporting

• Synchronization Contexts

The Improvements

• Problem: How do you cancel multiple taskswithout leaving trash behind?

• Solution: Everyone monitors a CancellationToken TPL cancels subsequent Tasks or Parallel operations

Created by a CancellationTokenSource

Can execute code when Cancel is called

Cancellation

• Problem: How do you update the UI from inside a task?

• Solution: Using an IProgress<T> object Out-of-the-Box Progress<T> updates the current Synch Context

Any type can be a message

Replace with our own implementation

Progress Reporting

• Calculate a value only when needed

• Lazy<T>(Func<T> …)

• Synchronous or Asynchronous calculation Lazy.Value

Lazy.GetValueAsync<T>()

Lazy Initialization

• Since .NET 2.0!

• Hides Winforms, WPF, ASP.NET SynchronizationContext.Post/Send instead of Dispatcher.Invoke etc

Synchronous and Asynchronous version

• Automatically created by the environment SynchronizationContext.Current

• Can create our own E.g. For a Command Line aplication

Synchronization Context

• Data Parallelism

• Task Parallelism

• Asynchronous programming

• Agents/Actors

• Dataflows

Scenaria

• Support at the language leve

• Debugging support

• Exception Handling

• After await return to original “thread” Beware of servers and libraries

• Dos NOT always execute asynchronously Only when a task is encountered or the thread yields

Task.Yield

Async/Await

private static async Task<T>

Retry<T>(Func<T> func, int retryCount) {

while (true) {

try {

var result = await Task.Run(func);

return result;

}

catch {

If (retryCount == 0)

throw;

retryCount--;

} } }

Asynchronous Retry

• Highly concurrent

• Thread-safe

• Not only for TPL/PLINQ

• Producer/Consumer scenaria

More Goodies - Collections

• ConcurrentQueue

• ConcurrentStack

• ConcurrentDictionary

Concurrent Collections - 2

• Duplicates allowed

• List per Thread

• Reduced collisions for each tread’s Add/Take

• BAD for Producer/Consumer

The Odd one - ConcurrentBag

• NOT faster than plain collections in low concurrency scenarios

• DO NOT consume less memory

• DO NOT provide thread safe enumeration

• DO NOT ensure atomic operations on content

• DO NOT fix unsafe code

Concurrent Collections -Gotchas

• Visual Studio 2012

• Async Targeting package

• System.Net.HttpClient package

Also in .NET 4

• F# async

• C++ Parallel Patterns Library

• C++ Concurrency Runtime

• C++ Agents

• C++ AMP

Other Technologies

• Object storage similar to Amazon S3/Azure Blob storage

• A Service of Synnefo – IaaS by GRNet

• Written in Python

• Clients for Web, Windows, iOS, Android, Linux

• Versioning, Permissions, Sharing

Synnefo

• REST API base on CloudFiles by Rackspace Compatible with CyberDuck etc

• Block storage

• Uploads only using blocks

• Uses Merkle Hashing

Pithos API

• Multiple accounts per machine

• Synchronize local folder to a Pithos account

• Detect local changes and upload

• Detect server changes and download

• Calculate Merkle Hash for each file

Pithos Client for Windows

The Architecture

UI

WPF

MVVM

Caliburn

Micro

Core

File Agent

Poll Agent

Network

Agent

Status Agent

Networking

CloudFiles

HttpClient

Storage

SQLite

SQL Server

Compact

• .ΝΕΤ 4, due to Windows XP compatibility

• Visual Studio 2012 + Async Targeting Pack

• UI - Caliburn.Micro

• Concurrency - TPL, Parallel, Dataflow

• Network – HttpClient

• Hashing - OpenSSL – Faster than native provider for hashing

• Storage - NHibernate, SQLite/SQL Server Compact

• Logging - log4net

Technologies

• Handle potentially hundrends of file events

• Hashing of many/large files

• Multiple slow calls to the server

• Unreliable network

• And yet it shouldn’t hang

• Update the UI with enough information

The challenges

• Use producer/consumer pattern

• Store events in ConcurrentQueue

• Process ONLY after idle timeout

Events Handling

• Why I hate Game of Thrones

• Asynchronous reading of blocks

• Parallel Hashing of each block

• Use of OpenSSL for its SSE support

• Concurrency Throttling

• Beware of memory consumption!

Merkle Hashing

• Each call a task

• Concurrent REST calls per account and share

• Task.WhenAll to process results

Multiple slow calls

• Use System.Net.Http.HttpClient

• Store blocks in a cache folder

• Check and reuse orphans

• Asynchronous Retry of calls

Unreliable network

• Use Transactional NTFS if available Thanks MS for killing it!

• Update a copy and File.Replace otherwise

Resilience to crashes

• Use of independent agents

• Asynchronous operations wherever possible

Should not hang

• Use WPF, MVVM

• Use Progress to update the UI

Provide Sufficient user feedback

• Create Windows 8 Dekstop and WinRT client

• Use Reactive Framework

Next Steps

ΖΗΤΟΥΝΤΑΘ ΕΘΕΛΟΝΤΕΣ

• Avoid Side Effects

• Use Functional Style

• Clean Coding

• THE BIG SECRET: Use existing, tested algorithms

• IEEE, ACM Journals and libraries

Clever Tricks

• Simplify asynchronous or parallel code

• Use out-of-the-box libraries

• Scenarios that SUIT Task or Data Parallelism

YES TPL

• To accelerate “bad” algorithms

• To “accelerate” database access Use proper SQL and Indexes!

Avoid Cursors

• Reporting DBs, Data Warehouse, OLAP Cubes

NO TPL

• Functional languages like F#, Scala

• Distributed Frameworks like Hadoop, {m}brace

When TPL is not enough

• C# 5 in a Nutshell, O’Riley

• Parallel Programming with .NET, Microsoft

• Pro Parallel Programming with C#, Wiley

• Concurrent Programming on Windows, Pearson

• The Art of Concurrency, O’Reilly

Books

• Parallel FX Team: http://blogs.msdn.com/b/pfxteam/

• ΙΕΕΕ Computer Society http://www.computer.org

• ACM http://www.acm.org

Useful Links