Figure 1: A multidate image registration...

Post on 22-Sep-2020

3 views 0 download

Transcript of Figure 1: A multidate image registration...

Divergence matching criteria for registration, indexingand retrieval

Alfred O. Hero

Dept. EECS, Dept Biomed. Eng., Dept. Statistics

University of Michigan - Ann Arbor

hero@eecs.umich.edu

http://www.eecs.umich.edu/˜hero

Collaborators: Olivier Michel, Bing Ma, John Gorman

Outline

1. Statistical framework: entropy measures, error exponents

2. Registration, indexing and retrieval

3. α-entropy andα-MI estimation

4. Graph theoretic entropy estimation methods

1

(a) ImageI1 (b) ImageI0 (c) Registration result

Figure 1: A multidate image registration example

2

Statistical Framework� X: an image

� Z = Z(X): an image feature vector

� Θ: a parameter space

� f (zjθ): feature density (likelihood)

� XR a reference image

� fX(i)g a database ofK images

ZR = Z(XR) � f (zjθR)

Zi = Z(Xi) � f (zjθi); i = 1: : : : ;K

) Similarity btwnXi , XR lies in similarity btwn models

3

Divergence Measures

Refs: [Csiszar:67,Basseville:SP89]

Define densities

fi = f (zjθi); fR = f (zjθR)

The Renyi α-divergence of fractional orderα 2 [0;1] [Renyi:61,70 ]

Dα( fi k fR) = D(θikθR) =1

α�1ln

Z

fR

fifR

�αdx

=

1α�1

lnZ

f αi f 1�α

R dx

4

Renyi α-Divergence: Special cases� α-Divergence vs. Batthacharyya-Hellinger distance

D 12

( fi k fR) = ln

�Z p

fi fRdx

�2

D2BH( fik fR) =

Z �pfi �

pfR

�2dx= 2

1�Z p

fi fRdx

� α-Divergence vs. Kullback-Liebler divergence

limα!1

Dα( fi ; fR) =Z

fR lnfRfi

dx:

5

Renyi α-divergence and Error Exponents

Observe i.i.d. sampleW = [W1; : : : ;Wn]

H0 : Wj � f (wjθ0)

H1 : Wj � f (wjθ1)

Bayes probability of error

Pe(n) = β(n)P(H1)+α(n)P(H0)

LDP gives Chernoff bound [Dembo&Zeitouni:98]

liminfn!∞

1n

logPe(n) =� supα2[0;1]

f(1�α)Dα(θ1kθ0)g :6

Indexing via α-divergence

Refs: Vasconcelos&Lippman:DCC98, Stoica&etal:ICASSP98,Do&Vetterli:ICIP00

H0 : ZRi � f (zjθ0)

H1 : ZRi � f (zjθ1)

Clairvoyant indexing rule:

X(i) � X( j) , Dα( fik fR) < Dα( f jk fR)

Indexing problem: findθi attaining minθi 6=ΘR Dα(θikθR)

1. Image classification:fi index model classes [Stoica&etal:INRIA98]

2. Target detection:fR is noise reference andfi are target references.

Declare detection if minθi 6=ΘR Dα(θikθR)> threshold

7

Registration via α-Mutual-Information

Ref: Viola&Wells:ICCV95

1. ReferenceXR and targetXT .

2. Set of rigid transformationsfTig3. Derived feature vectors

ZR = Z(XR); Zi = Z(Ti(XT))

8

H0 : fZRj ;Z

ijg independent

H1 : fZRj ;Z

ijg dependent

Error exponent isα-MI (Pluim&etal:SPIE01,

Neemuchwala&etal:ICIP01)

MIα(ZR;Zi) =

1α�1

ln

Z

f α(ZR;Zi)( f (ZR) f (Zi))1�αdZRdZi :

9

Ultrasound Registration Example

Figure 2: Three ultrasound breast scans. From top to bottom are: case 151,

case 142 and case 162.

10

MI Scatterplots

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250

50

100

150

200

250

Figure 3:MI Scatterplots. 1st Col: target=reference slice. 2nd Col: target = reference+1 slice.

11

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

ANGLE OF ROTATION (deg) −−−−−−>

α−D

IVE

RG

EN

CE

(αM

I) −

−−

−−

−>

α−DIVERGENCE v/s α CURVES FOR α ∈ [0,1] FOR SINGLE PIXEL INTENSITY

α =0 α =0.1α =0.2α =0.3α =0.4α =0.5α =0.6α =0.7α =0.8α =0.9

α=0.1

α=0.9

α=0

Figure 4: α-Divergence as function of angle for ultra sound image regis-

tration

12

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.005

0.01

0.015

0.02

0.025

α −−−−−−>

curv

atur

e of

α−

MI −

−−

−−

−>

CURVATURE OF: α−MI v/s α CURVE FOR SINGLE PIXEL INTENSITY

Figure 5: Resolution ofα-Divergence as function of alpha

13

Feature Trees

Root Node

Depth 1

Depth 2

Not examined further

Figure 6:Part of feature tree data structure.

Terminal nodes (Depth 16)

Figure 7:Leaves of feature tree data structure.

14

ICA Features

Figure 8:Estimated ICA basis set for ultrasound breast image database

15

Simple Example

Figure 9: Bar images with contrast 1.02, 1.07 and 1.78. Background is low

variance white Gaussian while bar is uniform intensity.

16

Single Pixel vs Feature Tag

−5 −4 −3 −2 −1 0 1 2 3 4 50.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Degree of rotation

Mut

ual I

nfor

mat

ion

( α

~=

1 )

Single pixel based MI peak with submergence of structure in background

intensity ratio = 1.02intensity ratio = 1.07intensity ratio = 1.78

−5 −4 −3 −2 −1 0 1 2 3 4 50.6

0.7

0.8

0.9

1

1.1

1.2

Degree of rotation

Mut

ual I

nfor

mat

ion

( α

~=

1 )

Tag based MI peak with submergence of structure in background

intensity ratio = 1.02intensity ratio = 1.07intensity ratio = 1.78

Figure 10: Upper curves are single pixel based MI trajectories while lower

curves are 4�4 tag based MI trajectories for bar images.

17

US Registration Comparisons

151 142 162 151/8 151/16 151/32

pixel 0:6=0:9 0:6=0:3 0:6=0:3

tag 0:5=3:6 0:5=3:8 0:4=1:4

spatial-tag 0:99=14:6 0:99=8:4 0:6=8:3

ICA 0:7=4:1 0:7=3:9 0:99=7:7

Table 1: Numerator =optimal values ofα and Denominator = maximum

resolution of mutualα-information for registering various images (Cases

151, 142, 162) using various features (pixel, tag, spatial-tag, ICA). 151/8,

151/16, 151/32 correspond to ICA algorithm with 8, 16 and 32 basis ele-

ments run on case 151.

18

Methods of Divergence Estimation� Z = Z(X): a statistic (MI, reduced rank feature, etc)

� fZig: n i.i.d. realizations fromf (Z;θ)

Objective: EstimateDα( fik fR) from Zi ’s

1. Parametric density estimation methods

2. Non-parametric density estimation methods

3. Non-parametric minimal-graph estimation methods

19

Non-parametric estimation methods

Given i.i.d. sampleX = fX1; : : : ;Xng

Density “plug-in” estimator

Hα( fn) =

11�α

ln

Z

IRdf α(x)dx

Previous work limited to Shannon entropyH( f ) =�R

f (x) ln f (x)dx

� Histogram plug-in [Gyorfi&VanDerMeulen:CSDA87]

� Kernel density plug-in [Ahmad&Lin:IT76]

� Sample-spacing plug-in [Hall:JMS86](d = 1)

� Performance degrades as densityf becomes non smooth

� Unclear how to robustifyf against outliers

� d-dimensional integration might be difficult

� ) functionf f (x) : x2 IRdg over-parameterizes entropy functional

20

Direct α-entropy estimation� MST estimator ofα-entropy [Hero&Michel:IT99]:

Hα =

11�α

lnLγ(Xn)=n�α

� Direct entropy estimator: faster convergence for nonsmooth densities

� Parameterα is varied by varying interpoint distance measure

� Optimally prunedk-MST graphs robustifyf against outliers

� Greedy multi-scale MST approximations reduce combinatorial

complexity

21

Minimal Graphs: Minimal Spanning Tree (MST)

Let Mn = M (Xn) denote the possible sets of edges in the class of acyclic

graphs spanningXn (spanning trees).

The Euclidean Power Weighted MST achieves

LMST(Xn) = minMn

∑e2Mn

kekγ:

22

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

256 random samples

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

MST

Figure 11:A sample data set and the MST

23

Minimal Graphs: Pruned MST

Fix k, 1� k� n.

Let Mn;k = M (xi1; : : : ;xik) be a minimal graph connectingk distinct

verticesxi1; : : : ;xik.

Thek-MST T�n;k = T�(xi�1

; : : : ;xi�k

) is minimum of allk-point MST’s

L�n;k = L�(Xn;k) = mini1;:::;ik

minMn;k

∑e2Mn;k

kekγ

24

0 0.5 1

0

0.5

1

x2

k−MST (k=99): 1 outlier rejection

0 0.5 1

0

0.5

1

x2

(k=98): 2 outlier rejection

0 0.5 1

0

0.5

1

x1

x2

k−MST (k=62): 38 outlier rejection

0 0.5 1

0

0.5

1

(k=25): 75 outlier rejection

x1x2

Figure 12: k-MST for 2D torus density with and without the addition of

uniform “outliers”.

25

Convergence of MST

0 0.5 1

0

0.5

1

uniform 2−d distribution (n=100)

x

y

0 0.5 1

0

0.5

1

MST

x

y

0 500 1000 15000

5

10

15

20

25Mean MST length as function of n

n, uniform 2−d distribution on [0,1]

Mea

n M

ST

leng

th0 0.5 1

0

0.5

1

triangular 2−d distribution (n=100)

x

y

0 0.5 1

0

0.5

1

MST

x

y0 500 1000 1500

0

5

10

15

20

25Mean MST length as function of n

n, triangular 2−d distribution on [0,1]M

ST

leng

th

Figure 13:2D Triangular vs. Uniform sample study for MST.

26

0 500 1000 15000

5

10

15

20

25

N, o=unif, *=triang

MS

T le

ngth

0 500 1000 1500−1.5

−1.4

−1.3

−1.2

−1.1

−1

−0.9

−0.8

−0.7

number of points, o=unif, *=triang

2*Lo

g(Ln

/sqr

t(n)

)

Figure 14:MST and log MST weights as function of number of samples for

2D uniform vs. triangular study.

27

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 15:Continuous quasi-additive euclidean functional satisfies “self-similarity” property on any scale.

28

Asymptotics: the BHH Theorem and entropy estimation

Theorem 1Beardwood&etal:Camb59,Steele:95,Redmond&Yukich:SPA96Let L

be a continuous quasi-additive Euclidean functional with power-exponent

γ, and letXn = fX1; : : : ;Xng be an i.i.d. sample drawn from a distribution

on [0;1]d with an absolutely continuous component having (Lebesgue)

density f(x). Then

(1)

limn!∞

Lγ(Xn)=n(d�γ)=d = βLγ;d

Z

f (x)(d�γ)=ddx; (a:s:)

Or, lettingα = (d� γ)=d

limn!∞

Lγ(Xn)=nα = βLγ;d exp((1�α)Hα( f )) ; (a:s:)

29

Extension to Pruned Graphs

Fix α 2 [0;1] and letk= bαnc. Then asn! ∞ (Hero&Michel:IT99)

L(X �

n;k)=(bαnc)ν ! βLγ;d minA:P(A)�α

Z

f ν(xjx2 A)dx (a:s:)

or, alternatively, with

Hν( f jx2 A) =1

1�νln

Z

f ν(xjx2 A)dx

L(X �

n;k)=(bαnc)ν ! βL;γ exp

�(1�ν) min

A:P(A)�αHν( f jx2 A)

�(a:s:)

30

Asymptotics: Plug-in estimation ofHα( f )

Class of Holder continuous functions over[0;1]d

Σd(κ;c) =n

f (x) : j f (x)� pbκcx (z)j � c kx�zkκ

o

Class of functions of Bounded Variation (BV) over[0;1]d

BVd(c) =(

f (x) : supfxig

∑i

j f (xi)� f (xi�1)j � c

):

Proposition 1 (Hero&Ma:IT01) Assume that fα 2 Σd(κ;c). Then, if f α

is aminimax estimator

supf α2Σd(κ;c)

E1=p

�����Z bf α(x)dx�Z

f α(x)dx����p� = O

�n�κ=(2κ+d)

�31

Asymptotics: Minimal-graph estimation of Hα( f )

Proposition 2 (Hero&Ma:IT01) Let d� 2 and

α = (d� γ)=d 2 [1=2;(d�1)=d]. Assume that fα 2 Σd(κ;c) whereκ� 1

and c< ∞. Then for any continuous quasi-additive Euclidean functional

supf α2Σd(κ;c)

E1=p

�����Lγ(X1; : : : ;Xn)nα �βLγ;d

Z

f α(x)dx

����p��O

n�1=(d+1)�

Conclude: minimal-graph estimator converges faster for

κ <

dd�1

32

As Σd(1;c)� BVd(c), we have

Corollary 1 (Hero&Ma:IT01) Let d� 2 and

α = (d� γ)=d 2 [1=2;(d�1)=d]. Assume that fα is of bounded

variation over[0;1]d. Then

supf α2BVd(c)

E1=p

�����Z bf α(x)dx�βLγ;d

Z

f α(x)dx

����p� � O

n�1=(d+2)�

supf α2BVd(c)

E1=p

�����Lγ(X1; : : : ;Xn)

nα �βLγ;d

Zf α(x)dx

����p� � O

n�1=(d+1)�

33

Observations� Minimal graph rates valid for MST,k-NN graph, TSP, Steiner Tree,

etc

� Analogous rate bound holds for progressive-resolution algorithm

LGγ (Xn) =

md

∑i=1

Lγ(Xn\Qi)

fQig is uniform partition of[0;1]d into cell volumes 1=md

� Optimal sequence of cell volumes is:

m�d = n�1=(d+1)

� These results also apply to greedy multi-resolutionk-MST

34

Application: Image Registration

Two independent data samples from unknown distributions

� X = [X1; : : : ;Xm] � f (x)

�Y = [Y1; : : : ;Yn] � g(x)Suppose:g(x) = f (Ax+b), ATA= I

Objective: find rigid transformationA;b

� Two methods:

1. α-MI of f(Xi ;Yi)g

ni=1

2. α-Entropy offXig

mi=1+fYig

ni=1

35

Reference image

50 100 150 200 250 300

50

100

150

200

250

300

350

(a)

Image at 300,0,110 rotation

50 100 150 200 250 300

50

100

150

200

250

300

350

(b)

Image at 290,−20,130 rotation

50 100 150 200 250 300

50

100

150

200

250

300

350

(c)

Figure 16: Reference and target SAR/DEM images

36

O(n�1=(2d+1)) algorithm for α-MI estimation

MIα(X;Y) =

1α�1

ln

Z

f αX;Y(x;y)( fX(x) fY(y))

1�αdxdy:

Algorithm:

1. Kernel estimates fX; fY ( O(n�1=(d+2)))

2. Uniformizing probability transformations :

X = FX(X); Y = FY(Y)

3. Graph entropy estimate of MIα(X;Y) ( O(n�1=(2d+1)))

Lγ(f(X1;Y1); : : : ;(Xn;Yn)g)

nα ! βLγ;dZ

f αX;Y(x;y)dxdy

= βLγ;d

Zf αX;Y(x;y)( fX(x) fY(y))

1�αdxdy (w:p

37

O(n�1=(d+1)) criterion: α-Jensen difference� Jensen’s difference btwnf0; f1:

∆Jα = Hα(ε f1+(1� ε) f0)� εHα( f1)� (1� ε)Hα( f0)� 0

� f0; f1 are two densities,ε satisfies 0� ε� 1

� Let X;Y be i.i.d. features extracted from two images

X = fX1; : : : ;Xmg; Y = fY1; : : : ;Yng

� Each realization inunorderedsampleZ = fX; Yg has marginal

fZ(z) = ε fX(z)+(1� ε) fY(z); ε =

mn+m

� α-Jensen difference for rigid transformation T

∆Jα(T) = Hα(ε fX +(1� ε) fY)� εHα( fX)� (1� ε)Hα( fY)| {z }

constant

38

Reference image

50 100 150 200 250 300

50

100

150

200

250

300

350

(a)

Image at 300,0,110 rotation

50 100 150 200 250 300

50

100

150

200

250

300

350

(b)

0 100 200 3000

50

100

150

200

250

300

350Voronoi regions for Image (290,−20,130)

(c)

0 100 200 3000

50

100

150

200

250

300Voronoi region for reference image

(d)

Figure 17: Reference and target SAR/DEM images

0

200

400

0

100

200

3000

10

20

30

40

X

misaligned points

Y

Inte

nsity

(a)

0

200

400

0

100

200

3000

10

20

30

40

X

MST demonstration

Y

Inte

nsity

(b)

Figure 18: MST demonstration for misaligned images

39

0

200

400

0

100

200

3000

20

40

60

X

Aligned points

Y

Inte

nsity

(a)

0

200

400

0

100

200

3000

20

40

60

X

MST demonstration

Y

inte

nsity

(b)

Figure 19: MST demonstration for aligned images

40

Conclusions

1. α-divergence for indexing can be justified via decision theory

2. Non-parametric estimation of Jensen’s difference is low complexity

alternative toα-divergence estimation

3. Non-parametric estimation of Jensen’s difference is possible without

density estimation

4. Minimal-graph estimation outperforms plug-in estimation for

non-smooth densities

41

Divergence vs. Jensen: Asymptotic Comparison

For ε 2 [0;1] andg a p.d.f. define

fε = ε f1+(1� ε) f0; Eg[Z] =Z

Z(x)g(x)dx; f α12

=

f α12R

f α12dx

Then

∆Jα =

αε(1� ε)

2

24Ef α12

0@" f1� f0f 1

2

#2

1A+

α1�α

Ef α12

"

f1� f0f 1

2

#!2

35+O(∆)

Dα( f1k f0) =

α4

Z

f 12

"

f1� f0f 1

2

#2

dx+O(∆)

42