Atmosphere 2014: Lockless programming - Tomasz Barański

Post on 09-May-2015

224 views 0 download

description

In the world of multi-core programming, traditional parallel programming techniques with locks (mutexes and similar mechanisms) create performance bottlenecks. Lockless programming is a set of techniques employing atomic operations to synchronize data exchange between threads. The talk introduces the audience to the lockless programming, presents its benefits and pitfalls. The presenter will talk about support for atomic operations in different CPU families as well as support for them in lower- and higher-level languages. He will also cover reordering and memory barriers. He will end the talk with tips on designing lockless algorithms and practical examples of lockless data structures. Tomasz Barański - Tomasz Barański is a software developer working in Kraków for IBM T.J. Watson Research on projects related to High-Performance computing. He has got over 12 years experience in enterprise world, taking roles of a developer, tester, interaction designer and a go-to guy.

Transcript of Atmosphere 2014: Lockless programming - Tomasz Barański

Lockless Programming

Tomasz BarańskiIBM Research

Me

Making software for 15 years

IBM Research @ KRK

Lockless?

Programming with multiple threads that access

shared memory and threads cannot block

each other.

Why?

And also

(Dead|Live)locks

Priority inversion

Lock convoy

How?

Atomic operations Memory barriers

Atomic operations Memory barriers

( τομοςἄ indivisible)

Atomic operations Memory barriers

CAS FAA|AAF

Atomic operations Memory barriers

CAS FAA|AAF

LoadLoad LoadStore

StoreLoad StoreStore

Compare-And-Swap

cas(val, old, new) =if val == old

val = newreturn SUCCESS

elsereturn FAIL

Fetch-And-Add

faa(val, i) =tmp = valval += ireturn tmp

Sequential consistency

acqiure lockread Xread Y

(…)store Ystore X

release lock

Pseudo-assembly

acqiure lockread Xread Y

(…)store Ystore X

release lock

acqiure lockread Y

(…)store X

(...)read X

(...)store Y

release lock

reordering

compiler(JVM)CPU

read Y(…)

store X(...)

read X(...)

store Y

read Y(…)

store X(...)

read X(...)

store Y

Thread 2Thread 1

What are X and Y?

Sequential consistency

All threads (on all CPUs) agree on order of all memory operations, and the order is consistent with the operations order in the source code.

Memory barriers

read XLoadLoad Barrier

read Y(…)

store Ystore X

read X(…)

store X(...)

read Y(...)

store Y

reordering

compiler(JVM)CPU

read Xread Y

(…)store Y

StoreStore Barrierstore X

read Y(…)

store Y(...)

read X(...)

store X

reordering

compiler(JVM)CPU

read Xread Y

(…)LoadStore Barrier

store Ystore X

read Y(…)

read X(…)

store X(...)

store Y

reordering

compiler(JVM)CPU

store Xstore Y

(…)StoreLoad Barrier

read Xread Y

store Y(…)

store X(…)

read X(...)

read Y

reordering

compiler(JVM)CPU

Full barrier

Let's get practical!

Lock-free (FIFO) queue

(by John D. Valois)

enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)

enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)

enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)

enqueue(x) =q = new Nodeq.value = xq.next = NULLdo

p = tailsucc = CAS(p.next, NULL, q)if !succ

CAS(tail, p, p.next)while !succCAS(tail, p, q)

enqueue(x) =q = new Nodeq.value = xq.next = NULLdo

p = tailsucc = CAS(p.next, NULL, q)if !succ

CAS(tail, p, p.next)while !succCAS(tail, p, q)

dequeue() =do

p = headif p.next == NULL

error QUEUE_EMPTYwhile !CAS(head, p, p.next)return p.next.value

Never waitsNever blocks

Silver bullet?

More difficultABA problem

Solution?

Tagged referenceIntermediate nodes

LL/SC

Load-Link / Store-Conditional

Separates storage has valuefrom storage has been changed.

PowerPC, ARMbut NOT: x86, SPARC

LoadLink(x) =read(x)mark(x)

StoreConditional(x) = if x marked

store(x)unmark(x)return SUCCESS

elsereturn FAILURE

Language support

C (gcc)

__sync_fetch_and_add (_sub, _or...)__sync_add_and_fetch (_sub, _or...)

__sync_bool_compare_and_swap__sync_val_compare_and_swap

__sync_synchronize

C++11

#include <atomic>

template <class T> struct atomic;

atomic_thread_fence(...)

::store(...)::load(...)::compare_exchange(...)::fetch_add(...)

Java

java.util.concurrent.atomic

AtomicInteger.addAndGet.getAndAdd.compareAndSet

AtomicIntegerArray

AtomicReferenceAtomicStampedReference

?