Wavelets @ CPU
-
Upload
david-barina -
Category
Science
-
view
41 -
download
0
Transcript of Wavelets @ CPU
Wavelets @ CPU
David Barina
April 15, 2014
David Barina Wavelets @ CPU April 15, 2014 1 / 16
Wavelet
David Barina Wavelets @ CPU April 15, 2014 2 / 16
Discrete Wavelet Transform
David Barina Wavelets @ CPU April 15, 2014 3 / 16
Lifting
α
β
γ
δ
David Barina Wavelets @ CPU April 15, 2014 4 / 16
2-D DWT
David Barina Wavelets @ CPU April 15, 2014 5 / 16
2-D Separability
David Barina Wavelets @ CPU April 15, 2014 6 / 16
What have I done?
loop fusion
removed prologs/epilogs
influence of CPU cache
SIMD-vectorization
parallelization
David Barina Wavelets @ CPU April 15, 2014 7 / 16
Loop Fusion
read
write
F
F
David Barina Wavelets @ CPU April 15, 2014 8 / 16
Removed Prologs and Epilogs
David Barina Wavelets @ CPU April 15, 2014 9 / 16
Influence of CPU Cache
David Barina Wavelets @ CPU April 15, 2014 10 / 16
SIMD Vectorization
4 × 4 6 × 2
David Barina Wavelets @ CPU April 15, 2014 11 / 16
Image Processing and Buffers
David Barina Wavelets @ CPU April 15, 2014 12 / 16
Parallelization
prolog
overlay
overlay
segment
David Barina Wavelets @ CPU April 15, 2014 13 / 16
Results
Intel Core2 Quad @ 2.00 GHz
10 Mpx
CDF 9/7, 1 level, in-place
approach best algorithm time/px speed-up
separable diag. 17.23 ns 1.0×single-loop diag. 2 × 2 9.55 ns 1.8×core diag. 2 × 2 8.79 ns 2.0×super-core vert. 4 × 4 5.33 ns 3.2×parallel (4) vert. 4 × 4 1.55 ns 11.1×
David Barina Wavelets @ CPU April 15, 2014 14 / 16
Future Work
merge several levels
merge forward and inverse cores
another wavelets
combine with EAW
another platforms (ARM, GPU, FPGA)
another transforms
David Barina Wavelets @ CPU April 15, 2014 15 / 16
Example (AMD Opteron)
1.0ns
10.0ns
100.0ns
1.0k 10.0k 100.0k 1.0M 10.0M 100.0M
tim
e /
pix
el
pixels
naive verticalnaive diagonal
single-loop verticalsingle-loop diagonal
David Barina Wavelets @ CPU April 15, 2014 16 / 16