Wavelets @ CPU

16
Wavelets @ CPU David Barina April 15, 2014 David Barina Wavelets @ CPU April 15, 2014 1 / 16

Transcript of Wavelets @ CPU

Page 1: Wavelets @ CPU

Wavelets @ CPU

David Barina

April 15, 2014

David Barina Wavelets @ CPU April 15, 2014 1 / 16

Page 2: Wavelets @ CPU

Wavelet

David Barina Wavelets @ CPU April 15, 2014 2 / 16

Page 3: Wavelets @ CPU

Discrete Wavelet Transform

David Barina Wavelets @ CPU April 15, 2014 3 / 16

Page 4: Wavelets @ CPU

Lifting

α

β

γ

δ

David Barina Wavelets @ CPU April 15, 2014 4 / 16

Page 5: Wavelets @ CPU

2-D DWT

David Barina Wavelets @ CPU April 15, 2014 5 / 16

Page 6: Wavelets @ CPU

2-D Separability

David Barina Wavelets @ CPU April 15, 2014 6 / 16

Page 7: Wavelets @ CPU

What have I done?

loop fusion

removed prologs/epilogs

influence of CPU cache

SIMD-vectorization

parallelization

David Barina Wavelets @ CPU April 15, 2014 7 / 16

Page 8: Wavelets @ CPU

Loop Fusion

read

write

F

F

David Barina Wavelets @ CPU April 15, 2014 8 / 16

Page 9: Wavelets @ CPU

Removed Prologs and Epilogs

David Barina Wavelets @ CPU April 15, 2014 9 / 16

Page 10: Wavelets @ CPU

Influence of CPU Cache

David Barina Wavelets @ CPU April 15, 2014 10 / 16

Page 11: Wavelets @ CPU

SIMD Vectorization

4 × 4 6 × 2

David Barina Wavelets @ CPU April 15, 2014 11 / 16

Page 12: Wavelets @ CPU

Image Processing and Buffers

David Barina Wavelets @ CPU April 15, 2014 12 / 16

Page 13: Wavelets @ CPU

Parallelization

prolog

overlay

overlay

segment

David Barina Wavelets @ CPU April 15, 2014 13 / 16

Page 14: Wavelets @ CPU

Results

Intel Core2 Quad @ 2.00 GHz

10 Mpx

CDF 9/7, 1 level, in-place

approach best algorithm time/px speed-up

separable diag. 17.23 ns 1.0×single-loop diag. 2 × 2 9.55 ns 1.8×core diag. 2 × 2 8.79 ns 2.0×super-core vert. 4 × 4 5.33 ns 3.2×parallel (4) vert. 4 × 4 1.55 ns 11.1×

David Barina Wavelets @ CPU April 15, 2014 14 / 16

Page 15: Wavelets @ CPU

Future Work

merge several levels

merge forward and inverse cores

another wavelets

combine with EAW

another platforms (ARM, GPU, FPGA)

another transforms

David Barina Wavelets @ CPU April 15, 2014 15 / 16

Page 16: Wavelets @ CPU

Example (AMD Opteron)

1.0ns

10.0ns

100.0ns

1.0k 10.0k 100.0k 1.0M 10.0M 100.0M

tim

e /

pix

el

pixels

naive verticalnaive diagonal

single-loop verticalsingle-loop diagonal

David Barina Wavelets @ CPU April 15, 2014 16 / 16