• date post

22-Dec-2015
• Category

## Documents

• view

237

0

Embed Size (px)

### Transcript of Scale-Invariant Feature Transform (SIFT) Jinxiang Chai

• Slide 1
• Scale-Invariant Feature Transform (SIFT) Jinxiang Chai
• Slide 2
• Review Image Processing - Median filtering - Bilateral filtering - Edge detection - Corner detection
• Slide 3
• Review: Corner Detection 1. Compute image gradients 2. Construct the matrix from it and its neighborhood values 3. Determine the 2 eigenvalues (i.j)= [ 1, 2 ]. 4. If both 1 and 2 are big, we have a corner
• Slide 4
• The Orientation Field Corners are detected where both 1 and 2 are big
• Slide 5
• Good Image Features What are we looking for? Strong features Invariant to changes (affine and perspective/occlusion) Solve the problem of correspondence Locate an object in multiple images (i.e. in video) Track the path of the object, infer 3D structures, object and camera movement,
• Slide 6
• Scale Invariant Feature Transform (SIFT) Choosing features that are invariant to image scaling and rotation Also, partially invariant to changes in illumination and 3D camera viewpoint
• Slide 7
• Invariance Illumination Scale Rotation Affine
• Slide 8
• Required Readings Object recognition from local scale- invariant features [pdf link], ICCV 09pdf link David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
• Slide 9
• Motivation for SIFT Earlier Methods Harris corner detector Sensitive to changes in image scale Finds locations in image with large gradients in two directions No method was fully affine invariant Although the SIFT approach is not fully invariant it allows for considerable affine change SIFT also allows for changes in 3D viewpoint
• Slide 10
• SIFT Algorithm Overview 1.Scale-space extrema detection 2.Keypoint localization 3.Orientation Assignment 4.Generation of keypoint descriptors.
• Slide 11
• Scale Space Different scales are appropriate for describing different objects in the image, and we may not know the correct scale/size ahead of time.
• Slide 12
• Scale space (Cont.) Looking for features (locations) that are stable (invariant) across all possible scale changes use a continuous function of scale (scale space) Which scale-space kernel will we use? The Gaussian Function
• Slide 13
• -variable-scale Gaussian -input image Scale-Space of Image
• Slide 14
• -variable-scale Gaussian -input image To detect stable keypoint locations, find the scale-space extrema in difference-of- Gaussian function Scale-Space of Image
• Slide 15
• -variable-scale Gaussian -input image To detect stable keypoint locations, find the scale-space extrema in difference-of- Gaussian function Scale-Space of Image
• Slide 16
• -variable-scale Gaussian -input image To detect stable keypoint locations, find the scale-space extrema in difference-of- Gaussian function Scale-Space of Image Look familiar?
• Slide 17
• -variable-scale Gaussian -input image To detect stable keypoint locations, find the scale-space extrema in difference-of- Gaussian function Scale-Space of Image Look familiar? -bandpass filter!
• Slide 18
• Difference of Gaussian 1.A = Convolve image with vertical and horizontal 1D Gaussians, =sqrt(2) 2.B = Convolve A with vertical and horizontal 1D Gaussians, =sqrt(2) 3.DOG (Difference of Gaussian) = A B 4.So how to deal with different scales?
• Slide 19
• Difference of Gaussian 1.A = Convolve image with vertical and horizontal 1D Gaussians, =sqrt(2) 2.B = Convolve A with vertical and horizontal 1D Gaussians, =sqrt(2) 3.DOG (Difference of Gaussian) = A B 4.Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)
• Slide 20
• A1 B1 Difference of Gaussian Pyramid Input Image Blur Downsample B2 B3 A2 A3 A3-B3 A2-B2 A1-B1 DOG2 DOG1 DOG3 Blur
• Slide 21
• Other issues Initial smoothing ignores highest spatial frequencies of images
• Slide 22
• Other issues Initial smoothing ignores highest spatial frequencies of images - expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid
• Slide 23
• Other issues Initial smoothing ignores highest spatial frequencies of images - expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid How to do downsampling with bilinear interpolations?
• Slide 24
• Bilinear Filter Weighted sum of four neighboring pixels x y u v
• Slide 25
• Bilinear Filter Sampling at S(x,y): (i+1,j) (i,j) (i,j+1) (i+1,j+1) S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j) + (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1) u v y x
• Slide 26
• Bilinear Filter Sampling at S(x,y): (i+1,j) (i,j) (i,j+1) (i+1,j+1) S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j) + (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1) S i = S(i,j) + a*(S(i,j+1)-S(i)) S j = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j)) S(x,y) = S i +b*(S j -S i) To optimize the above, do the following u v y x
• Slide 27
• Bilinear Filter (i+1,j) (i,j) (i,j+1) (i+1,j+1) y x
• Slide 28
• Pyramid Example A1 B1DOG1 DOG3 A2 A3 B3 B2
• Slide 29
• Feature Detection Find maxima and minima of scale space For each point on a DOG level: Compare to 8 neighbors at same level If max/min, identify corresponding point at pyramid level below Determine if the corresponding point is max/min of its 8 neighbors If so, repeat at pyramid level above Repeat for each DOG level Those that remain are key points
• Slide 30
• Identifying Max/Min DOG L-1 DOG L DOG L+1
• Slide 31
• Refining Key List: Illumination For all levels, use the A smoothed image to compute Gradient Magnitude Threshold gradient magnitudes: Remove all key points with M IJ less than 0.1 times the max gradient value Motivation: Low contrast is generally less reliable than high for feature points
• Slide 32
• Results: Eliminating Features Removing features in low-contrast regions
• Slide 33
• Results: Eliminating Features Removing features in low-contrast regions
• Slide 34
• Assigning Canonical Orientation For each remaining key point: Choose surrounding N x N window at DOG level it was detected DOG image
• Slide 35
• Assigning Canonical Orientation For all levels, use the A smoothed image to compute Gradient Orientation + Gaussian Smoothed Image Gradient OrientationGradient Magnitude
• Slide 36
• Assigning Canonical Orientation Gradient magnitude weighted by 2D Gaussian with of 3 times that of the current smoothing scale Gradient Magnitude2D GaussianWeighted Magnitude * =
• Slide 37
• Assigning Canonical Orientation Accumulate in histogram based on orientation Histogram has 36 bins with 10 increments Weighted Magnitude Gradient Orientation Sum of Weighted Magnitudes
• Slide 38
• Assigning Canonical Orientation Identify peak and assign orientation and sum of magnitude to key point Weighted Magnitude Gradient Orientation Sum of Weighted Magnitudes Peak *
• Slide 39
• Eliminating edges Difference-of-Gaussian function will be strong along edges So how can we get rid of these edges?
• Slide 40
• Eliminating edges Difference-of-Gaussian function will be strong along edges Similar to Harris corner detector We are not concerned about actual values of eigenvalue, just the ratio of the two
• Slide 41
• Eliminating edges Difference-of-Gaussian function will be strong along edges So how can we get rid of these edges?
• Slide 42
• Local Image Description SIFT keys each assigned: Location Scale (analogous to level it was detected) Orientation (assigned in previous canonical orientation steps) Now: Describe local image region invariant to the above transformations
• Slide 43
• SIFT: Local Image Description Needs to be invariant to changes in location, scale and rotation
• Slide 44
• SIFT Key Example
• Slide 45
• Local Image Description For each key point: Identify 8x8 neighborhood (from DOG level it was detected) Align orientation to x- axis
• Slide 46
• Local Image Description 3.Calculate gradient magnitude and orientation map 4.Weight by Gaussian
• Slide 47
• Local Image Description 5.Calculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.
• Slide 48
• Local Image Description 6.This histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)
• Slide 49
• Applications: Image Matching Find all key points identified in source and target image Each key point will have 2d location, scale and orientation, as well as invariant descriptor vector For each key point in source image, search corresponding SIFT features in target image. Find the transformation between two im