Scale-Invariant Feature Transform (SIFT)

download Scale-Invariant Feature Transform (SIFT)

If you can't read please download the document

description

Scale-Invariant Feature Transform (SIFT). Jinxiang Chai. Review. Image Processing - Median filtering - Bilateral filtering - Edge detection - Corner detection. Review: Corner Detection. 1. Compute image gradients - PowerPoint PPT Presentation

Transcript of Scale-Invariant Feature Transform (SIFT)

  • Scale-Invariant Feature Transform (SIFT)Jinxiang Chai

  • ReviewImage Processing - Median filtering

    - Bilateral filtering

    - Edge detection

    - Corner detection

  • Review: Corner Detection1. Compute image gradients

    2. Construct the matrix from it and its neighborhood values

    3. Determine the 2 eigenvalues (i.j)= [1, 2].

    4. If both 1 and 2 are big, we have a corner

  • The Orientation FieldCorners are detected where both 1 and 2 are big

  • Good Image FeaturesWhat are we looking for?Strong featuresInvariant to changes (affine and perspective/occlusion)Solve the problem of correspondenceLocate an object in multiple images (i.e. in video)Track the path of the object, infer 3D structures, object and camera movement,

  • Scale Invariant Feature Transform (SIFT)Choosing features that are invariant to image scaling and rotationAlso, partially invariant to changes in illumination and 3D camera viewpoint

  • InvarianceIlluminationScaleRotationAffine

  • Required ReadingsObject recognition from local scale-invariant features [pdf link], ICCV 09

    David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

  • Motivation for SIFTEarlier MethodsHarris corner detectorSensitive to changes in image scaleFinds locations in image with large gradients in two directionsNo method was fully affine invariantAlthough the SIFT approach is not fully invariant it allows for considerable affine changeSIFT also allows for changes in 3D viewpoint

  • SIFT Algorithm Overview

    Scale-space extrema detectionKeypoint localizationOrientation AssignmentGeneration of keypoint descriptors.

  • Scale SpaceDifferent scales are appropriate for describing different objects in the image, and we may not know the correct scale/size ahead of time.

  • Scale space (Cont.)Looking for features (locations) that are stable (invariant) across all possible scale changesuse a continuous function of scale (scale space)

    Which scale-space kernel will we use?The Gaussian Function

  • Scale-Space of Image

    variable-scale Gaussianinput image

  • Scale-Space of Image

    variable-scale Gaussianinput imageTo detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

  • Scale-Space of Image

    variable-scale Gaussianinput imageTo detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

  • Scale-Space of Image

    variable-scale Gaussianinput imageTo detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

    Look familiar?

  • Scale-Space of Image

    variable-scale Gaussianinput imageTo detect stable keypoint locations, find the scale-space extrema in difference-of-Gaussian function

    Look familiar? -bandpass filter!

  • Difference of GaussianA = Convolve image with vertical and horizontal 1D Gaussians, =sqrt(2)B = Convolve A with vertical and horizontal 1D Gaussians, =sqrt(2)DOG (Difference of Gaussian) = A BSo how to deal with different scales?

  • Difference of GaussianA = Convolve image with vertical and horizontal 1D Gaussians, =sqrt(2)B = Convolve A with vertical and horizontal 1D Gaussians, =sqrt(2)DOG (Difference of Gaussian) = A BDownsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels)

  • Difference of Gaussian PyramidInput ImageBlurBlurBlurDownsampleDownsampleB2B3A2A3A3-B3A2-B2A1-B1DOG2DOG1DOG3Blur

  • Other issuesInitial smoothing ignores highest spatial frequencies of images

  • Other issuesInitial smoothing ignores highest spatial frequencies of images - expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

  • Other issuesInitial smoothing ignores highest spatial frequencies of images - expand the input image by a factor of 2, using bilinear interpolation, prior to building the pyramid

    How to do downsampling with bilinear interpolations?

  • Bilinear FilterWeighted sum of four neighboring pixels xyuv

  • Bilinear FilterSampling at S(x,y):(i+1,j)(i,j)(i,j+1)(i+1,j+1)S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j) + (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

    uvyx

  • Bilinear FilterSampling at S(x,y):(i+1,j)(i,j)(i,j+1)(i+1,j+1)S(x,y) = a*b*S(i,j) + a*(1-b)*S(i+1,j) + (1-a)*b*S(i,j+1) + (1-a)*(1-b)*S(i+1,j+1)

    Si = S(i,j) + a*(S(i,j+1)-S(i))Sj = S(i+1,j) + a*(S(i+1,j+1)-S(i+1,j))S(x,y) = Si+b*(Sj-Si)To optimize the above, do the followinguvyx

  • Bilinear Filter(i+1,j)(i,j)(i,j+1)(i+1,j+1)yx

  • Pyramid ExampleA1B1DOG1DOG3DOG3A2A3B3B2

  • Feature DetectionFind maxima and minima of scale spaceFor each point on a DOG level: Compare to 8 neighbors at same levelIf max/min, identify corresponding point at pyramid level belowDetermine if the corresponding point is max/min of its 8 neighborsIf so, repeat at pyramid level aboveRepeat for each DOG levelThose that remain are key points

  • Identifying Max/MinDOG L-1DOG LDOG L+1

  • Refining Key List: IlluminationFor all levels, use the A smoothed image to computeGradient Magnitude

    Threshold gradient magnitudes: Remove all key points with MIJ less than 0.1 times the max gradient valueMotivation: Low contrast is generally less reliable than high for feature points

  • Results: Eliminating FeaturesRemoving features in low-contrast regions

  • Results: Eliminating FeaturesRemoving features in low-contrast regions

  • Assigning Canonical OrientationFor each remaining key point:Choose surrounding N x N window at DOG level it was detectedDOG image

  • Assigning Canonical OrientationFor all levels, use the A smoothed image to computeGradient Orientation

    +Gaussian Smoothed ImageGradient OrientationGradient Magnitude

  • Assigning Canonical OrientationGradient magnitude weighted by 2D Gaussian with of 3 times that of the current smoothing scaleGradient Magnitude2D GaussianWeighted Magnitude*=

  • Assigning Canonical OrientationAccumulate in histogram based on orientationHistogram has 36 bins with 10 incrementsWeighted MagnitudeGradient OrientationGradient OrientationSum of Weighted Magnitudes

  • Assigning Canonical OrientationIdentify peak and assign orientation and sum of magnitude to key pointWeighted MagnitudeGradient OrientationGradient OrientationSum of Weighted MagnitudesPeak*

  • Eliminating edgesDifference-of-Gaussian function will be strong along edgesSo how can we get rid of these edges?

  • Eliminating edgesDifference-of-Gaussian function will be strong along edgesSimilar to Harris corner detector

    We are not concerned about actual values of eigenvalue, just the ratio of the two

  • Eliminating edgesDifference-of-Gaussian function will be strong along edgesSo how can we get rid of these edges?

  • Local Image DescriptionSIFT keys each assigned:LocationScale (analogous to level it was detected)Orientation (assigned in previous canonical orientation steps)Now: Describe local image region invariant to the above transformations

  • SIFT: Local Image DescriptionNeeds to be invariant to changes in location, scale and rotation

  • SIFT Key Example

  • Local Image DescriptionFor each key point: Identify 8x8 neighborhood (from DOG level it was detected)Align orientation to x-axis

  • Local Image DescriptionCalculate gradient magnitude and orientation mapWeight by Gaussian

  • Local Image DescriptionCalculate histogram of each 4x4 region. 8 bins for gradient orientation. Tally weighted gradient magnitude.

  • Local Image DescriptionThis histogram array is the image descriptor. (Example here is vector, length 8*4=32. Best suggestion: 128 vector for 16x16 neighborhood)

  • Applications: Image MatchingFind all key points identified in source and target imageEach key point will have 2d location, scale and orientation, as well as invariant descriptor vectorFor each key point in source image, search corresponding SIFT features in target image. Find the transformation between two images using epipolar geometry constraints or affine transformation.

  • Image matching via SIFT featruesFeature detection

  • Image matching via SIFT featrues Image matching via nearest neighbor search - if the ratio of closest distance to 2ndclosest distance greater than 0.8 then reject as a false match. Remove outliers using epipolar line constraints.

  • Image matching via SIFT featrues

  • SummarySIFT features are reasonably invariant to rotation, scaling, and illumination changes.

    We can use them for image matching and object recognition among other things.

    Efficient on-line matching and recognition can be performed in real time