Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

53
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    250
  • download

    1

Transcript of Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Page 1: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Scale-Invariant Feature Transform (SIFT)

Jinxiang Chai

Page 2: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Review

Image Processing - Median filtering

- Bilateral filtering

- Edge detection

- Corner detection

Page 3: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Review: Corner Detection

1. Compute image gradients

2. Construct the matrix from it and its neighborhood values

3. Determine the 2 eigenvalues λ(i.j)= [λ1, λ2].

4. If both λ1 and λ2 are big, we have a corner

2

2

),(yyx

yxxji III

IIIC

Page 4: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

The Orientation Field

Corners are detected where both λ1 and λ2 are big

Page 5: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Good Image Features

• What are we looking for?– Strong features– Invariant to changes (affine and

perspective/occlusion)– Solve the problem of correspondence

• Locate an object in multiple images (i.e. in video)• Track the path of the object, infer 3D structures,

object and camera movement,

Page 6: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Scale Invariant Feature Transform (SIFT)

• Choosing features that are invariant to image scaling and rotation

• Also, partially invariant to changes in illumination and 3D camera viewpoint

Page 7: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Invariance

• Illumination

• Scale

• Rotation

• Affine

Page 8: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Required Readings

• Object recognition from local scale-invariant features [pdf link], ICCV 09

• David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

Page 9: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Motivation for SIFT

• Earlier Methods– Harris corner detector

• Sensitive to changes in image scale• Finds locations in image with large gradients in two

directions

– No method was fully affine invariant• Although the SIFT approach is not fully invariant it

allows for considerable affine change• SIFT also allows for changes in 3D viewpoint

Page 10: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

SIFT Algorithm Overview

1. Scale-space extrema detection

2. Keypoint localization

3. Orientation Assignment

4. Generation of keypoint descriptors.

Page 11: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Scale Space• Different scales are appropriate for

describing different objects in the image, and we may not know the correct scale/size ahead of time.

Page 12: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Scale space (Cont.)

• Looking for features (locations) that are stable (invariant) across all possible scale changes– use a continuous function of scale (scale space)

• Which scale-space kernel will we use?– The Gaussian Function

Page 13: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

- variable-scale Gaussian

- input image

Scale-Space of Image

y)(x,I *)ky,G(x, )ky,x,( L

),,( kyxG),( yxI

Page 14: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

- variable-scale Gaussian

- input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

Scale-Space of Image

y)(x,I *)ky,G(x, )ky,x,( L

),,( kyxG),( yxI

),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD

Page 15: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

- variable-scale Gaussian

- input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

Scale-Space of Image

y)(x,I *)ky,G(x, )ky,x,( L

),,( kyxG),( yxI

),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD

Page 16: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

- variable-scale Gaussian

- input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

Scale-Space of Image

y)(x,I *)ky,G(x, )ky,x,( L

),,( kyxG),( yxI

),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD

Look familiar?

Page 17: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

- variable-scale Gaussian

- input image

• To detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

Scale-Space of Image

y)(x,I *)ky,G(x, )ky,x,( L

),,( kyxG),( yxI

),(*)),,(),,((),,( yxIyxGkyxGyxD ),,(),,(),,( yxLkyxLyxD

Look familiar?

-bandpass filter!

Page 18: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Difference of Gaussian

1. A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)

2. B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)

3. DOG (Difference of Gaussian) = A – B

4. So how to deal with different scales?

Page 19: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Difference of Gaussian

1. A = Convolve image with vertical and horizontal 1D Gaussians, σ=sqrt(2)

2. B = Convolve A with vertical and horizontal 1D Gaussians, σ=sqrt(2)

3. DOG (Difference of Gaussian) = A – B

4. Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)

Page 20: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

A1

B1

Difference of Gaussian Pyramid

Input Image

Blur

Blur

Blur

Downsample

Downsample

B2

B3

A2

A3

A3-B3

A2-B2

A1-B1

DOG2

DOG1

DOG3

Blur

Page 21: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Other issues

• Initial smoothing ignores highest spatial frequencies of images

Page 22: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Other issues

• Initial smoothing ignores highest spatial frequencies of images

- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

Page 23: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Other issues

• Initial smoothing ignores highest spatial frequencies of images

- expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

• How to do downsampling with bilinear interpolations?

Page 24: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Bilinear Filter

Weighted sum of four neighboring pixels

x

y

u

v

Page 25: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Bilinear Filter

Sampling at S(x,y):

(i+1,j)

(i,j) (i,j+1)

(i+1,j+1)

S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)

+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

u

v

y

x

Page 26: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Bilinear Filter

Sampling at S(x,y):

(i+1,j)

(i,j) (i,j+1)

(i+1,j+1)

S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j)

+ (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

Si = S(i,j) + a*(S(i,j+1)-S(i))

Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j))

S(x,y) = Si+b*(Sj-Si)

To optimize the above, do the following

u

v

y

x

Page 27: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Bilinear Filter

(i+1,j)

(i,j) (i,j+1)

(i+1,j+1)

y

x

Page 28: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Pyramid Example

A1 B1 DOG1

DOG3

DOG3A2

A3 B3

B2

Page 29: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Feature Detection

• Find maxima and minima of scale space• For each point on a DOG level:

– Compare to 8 neighbors at same level– If max/min, identify corresponding point at pyramid

level below– Determine if the corresponding point is max/min of its 8

neighbors– If so, repeat at pyramid level above

• Repeat for each DOG level• Those that remain are key points

Page 30: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Identifying Max/Min

DOG L-1

DOG L

DOG L+1

Page 31: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Refining Key List: Illumination

• For all levels, use the “A” smoothed image to compute– Gradient Magnitude

• Threshold gradient magnitudes: – Remove all key points with MIJ less than 0.1

times the max gradient value

• Motivation: Low contrast is generally less reliable than high for feature points

Page 32: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Results: Eliminating Features

• Removing features in low-contrast regions

Page 33: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Results: Eliminating Features

• Removing features in low-contrast regions

Page 34: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Assigning Canonical Orientation

• For each remaining key point:– Choose surrounding N x N window at DOG

level it was detected

DOG image

Page 35: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Assigning Canonical Orientation

• For all levels, use the “A” smoothed image to compute– Gradient Orientation

+

Gaussian Smoothed Image Gradient Orientation Gradient Magnitude

Page 36: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Assigning Canonical Orientation

• Gradient magnitude weighted by 2D Gaussian with σ of 3 times that of the current smoothing scale

Gradient Magnitude 2D Gaussian Weighted Magnitude

* =

Page 37: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Assigning Canonical Orientation• Accumulate in histogram

based on orientation• Histogram has 36 bins with

10° increments

Weighted Magnitude

Gradient OrientationGradient OrientationS

um o

f W

eigh

ted

Mag

nitu

des

Page 38: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Assigning Canonical Orientation• Identify peak and assign

orientation and sum of magnitude to key point

Weighted Magnitude

Gradient OrientationGradient OrientationS

um o

f W

eigh

ted

Mag

nitu

des

Peak*

Page 39: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Eliminating edges

• Difference-of-Gaussian function will be strong along edges– So how can we get rid of these edges?

Page 40: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Eliminating edges

• Difference-of-Gaussian function will be strong along edges– Similar to Harris corner detector

– We are not concerned about actual values of eigenvalue, just the ratio of the two

yyxy

xyxx

DD

DDH

2

2

),(yyx

yxx

T

y

x

y

xji III

III

I

I

I

IH

r

r

r

r

Det

Tr 2

2

222 )1()()(

)(

)(

H

H

Page 41: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Eliminating edges

• Difference-of-Gaussian function will be strong along edges– So how can we get rid of these edges?

Page 42: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Local Image Description

• SIFT keys each assigned:– Location– Scale (analogous to level it was detected)– Orientation (assigned in previous canonical

orientation steps)

• Now: Describe local image region invariant to the above transformations

Page 43: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

SIFT: Local Image Description

• Needs to be invariant to changes in location, scale and rotation

Page 44: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

SIFT Key Example

Page 45: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Local Image Description

For each key point:

• Identify 8x8 neighborhood (from DOG level it was detected)

• Align orientation to x-axis

Page 46: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Local Image Description

3. Calculate gradient magnitude and orientation map

4. Weight by Gaussian

Page 47: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Local Image Description

5. Calculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.

Page 48: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Local Image Description

6. This histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)

Page 49: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Applications: Image Matching

• Find all key points identified in source and target image– Each key point will have 2d location, scale and

orientation, as well as invariant descriptor vector

• For each key point in source image, search corresponding SIFT features in target image.

• Find the transformation between two images using epipolar geometry constraints or affine transformation.

Page 50: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Image matching via SIFT featrues

Feature detection

Page 51: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Image matching via SIFT featrues

• Image matching via nearest neighbor search

- if the ratio of closest distance to 2nd closest distance greater than 0.8 then reject as a false match.

• Remove outliers using epipolar line constraints.

Page 52: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Image matching via SIFT featrues

Page 53: Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Summary

• SIFT features are reasonably invariant to rotation, scaling, and illumination changes.

• We can use them for image matching and object recognition among other things.

• Efficient on-line matching and recognition can be performed in real time