Atmosphere 2014: Lockless programming - Tomasz Barański
-
Upload
proidea -
Category
Presentations & Public Speaking
-
view
224 -
download
0
description
Transcript of Atmosphere 2014: Lockless programming - Tomasz Barański
Lockless Programming
Tomasz BarańskiIBM Research
Me
Making software for 15 years
IBM Research @ KRK
Lockless?
Programming with multiple threads that access
shared memory and threads cannot block
each other.
Why?
And also
(Dead|Live)locks
Priority inversion
Lock convoy
How?
Atomic operations Memory barriers
Atomic operations Memory barriers
( τομοςἄ indivisible)
Atomic operations Memory barriers
CAS FAA|AAF
Atomic operations Memory barriers
CAS FAA|AAF
LoadLoad LoadStore
StoreLoad StoreStore
Compare-And-Swap
cas(val, old, new) =if val == old
val = newreturn SUCCESS
elsereturn FAIL
Fetch-And-Add
faa(val, i) =tmp = valval += ireturn tmp
Sequential consistency
acqiure lockread Xread Y
(…)store Ystore X
release lock
Pseudo-assembly
acqiure lockread Xread Y
(…)store Ystore X
release lock
acqiure lockread Y
(…)store X
(...)read X
(...)store Y
release lock
reordering
compiler(JVM)CPU
read Y(…)
store X(...)
read X(...)
store Y
read Y(…)
store X(...)
read X(...)
store Y
Thread 2Thread 1
What are X and Y?
Sequential consistency
All threads (on all CPUs) agree on order of all memory operations, and the order is consistent with the operations order in the source code.
Memory barriers
read XLoadLoad Barrier
read Y(…)
store Ystore X
read X(…)
store X(...)
read Y(...)
store Y
reordering
compiler(JVM)CPU
read Xread Y
(…)store Y
StoreStore Barrierstore X
read Y(…)
store Y(...)
read X(...)
store X
reordering
compiler(JVM)CPU
read Xread Y
(…)LoadStore Barrier
store Ystore X
read Y(…)
read X(…)
store X(...)
store Y
reordering
compiler(JVM)CPU
store Xstore Y
(…)StoreLoad Barrier
read Xread Y
store Y(…)
store X(…)
read X(...)
read Y
reordering
compiler(JVM)CPU
Full barrier
Let's get practical!
Lock-free (FIFO) queue
(by John D. Valois)
enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)
enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)
enqueue(x) =acquire(lock)q = new Nodeq.value = xq.next = NULLtail.next = qtail = qrelease(lock)
enqueue(x) =q = new Nodeq.value = xq.next = NULLdo
p = tailsucc = CAS(p.next, NULL, q)if !succ
CAS(tail, p, p.next)while !succCAS(tail, p, q)
enqueue(x) =q = new Nodeq.value = xq.next = NULLdo
p = tailsucc = CAS(p.next, NULL, q)if !succ
CAS(tail, p, p.next)while !succCAS(tail, p, q)
dequeue() =do
p = headif p.next == NULL
error QUEUE_EMPTYwhile !CAS(head, p, p.next)return p.next.value
Never waitsNever blocks
Silver bullet?
More difficultABA problem
Solution?
Tagged referenceIntermediate nodes
LL/SC
Load-Link / Store-Conditional
Separates storage has valuefrom storage has been changed.
PowerPC, ARMbut NOT: x86, SPARC
LoadLink(x) =read(x)mark(x)
StoreConditional(x) = if x marked
store(x)unmark(x)return SUCCESS
elsereturn FAILURE
Language support
C (gcc)
__sync_fetch_and_add (_sub, _or...)__sync_add_and_fetch (_sub, _or...)
__sync_bool_compare_and_swap__sync_val_compare_and_swap
__sync_synchronize
C++11
#include <atomic>
template <class T> struct atomic;
atomic_thread_fence(...)
::store(...)::load(...)::compare_exchange(...)::fetch_add(...)
Java
java.util.concurrent.atomic
AtomicInteger.addAndGet.getAndAdd.compareAndSet
AtomicIntegerArray
AtomicReferenceAtomicStampedReference
?