CS231A TA SESSION - cvgl.stanford.edu

16
CS231A TA SESSION PROBLEM SET 4 Christopher B. Choy

Transcript of CS231A TA SESSION - cvgl.stanford.edu

Page 1: CS231A TA SESSION - cvgl.stanford.edu

CS231A TA SESSIONPROBLEM SET 4

Christopher B. Choy

Page 2: CS231A TA SESSION - cvgl.stanford.edu

QUESTION 1

Page 3: CS231A TA SESSION - cvgl.stanford.edu

HOUGH TRANSFORMTransform a point in a cartesian coordinate toall lines (line parameters) that pass the point in the polar coordinate

H : (x, y) → {(r, θ)|r(θ) = x cos θ + y sin θ}

y = + x + → r(θ) = x cos θ + y sin θcos θsin θ

rsin θ

Page 4: CS231A TA SESSION - cvgl.stanford.edu

GENERALIZED HOUGH TRANSFORMfeature to points in the 2D Hough space

is the reference origin of the shapeFor points on the boundary, store : R-Table

As a function of the gradient Rotation and scale : 4D Hough Space

Φ(x) S = { }a⃗ a⃗

x ⃗ = +r ⃗ a⃗ x ⃗ Φ( )x ⃗

D.H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes", Pattern Recognition, Vol.13, No.2, p.111-122, 1981

Page 5: CS231A TA SESSION - cvgl.stanford.edu

IMPLICIT SHAPE MODELR-Table as a function of an image patchtransform an image patch to object centers

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV 2004 

Page 6: CS231A TA SESSION - cvgl.stanford.edu

QUESTION 2

Page 7: CS231A TA SESSION - cvgl.stanford.edu

BAG OF VISUAL WORDS

Feature Extraction

Clustering

Encoding

Spatial Binning

Kernel SVM k(x, y) = ⟨Ψ(x),Ψ(y)⟩

Page 8: CS231A TA SESSION - cvgl.stanford.edu

BAG OF VISUAL WORDSFeature Extraction

Local image featureRobust to typical image transformations

Dense SIFT SIFT at every location vs a key point

Interest points might not correspond toforeground

Clustering (Dictionary)Visual words should be distinctive anddiverseCommon visual words (wheel) will form acluster

K-means clustering

Page 9: CS231A TA SESSION - cvgl.stanford.edu

BAG OF VISUAL WORDSEncoding

Maps local features to visual wordsHard quantization

assign to the NN

Spatial PyramidEncode spatial informationDivide the image into subsectionsFor each subsection, create aVW histogram

Page 10: CS231A TA SESSION - cvgl.stanford.edu

SIMILARITY METRIC FOR DISTRIBUTIONSTrain a linear SVM on features (distribution): Distance metric??

How similar is a distribution from a distribution : -divergence!P Q f

, convexf : (0,7) →

(P||Q) = ∫ f ( ) dQDfdPdQ

Kullback-Leibler Divergence

Divergencef (x) = x log x

χ2

f (x) = (x + 1)2

Page 11: CS231A TA SESSION - cvgl.stanford.edu

FOR A DISCRETE CASEIntersection kernel :

kernel :

k(x, y) = *min( , )xi yi

χ2 k(x, y) = *12

( +xi yi)2

+xi yi

In the problem 2, we will use kernel for the similarity metricχ2

Page 12: CS231A TA SESSION - cvgl.stanford.edu

HOMOGENEOUS KERNEL AND FINITE APPROXIMATION expensive, non linear!

But in the feature space it becomes linear ( dimension)K(x, y)

Ψ(Þ) 7

Feature map using the Homogeneous KernelGeneric histogram kernel is

Homogeneous if intersection, , Hellinger's, Jensen-Shannon's

Find a finite feature map that

Map distributions using then it becomes a linear classificationproblem in the feature space!

Ψ(x)K(x, y) = k( , )*i xi yi

k(cx, cy) = ck(x, y)χ2

(Þ)Ψ̂ k(x, y) = ⟨Ψ(x),Ψ(y)⟩ a ⟨ (x), (y)⟩Ψ̂ Ψ̂

(Þ)Ψ̂

Page 13: CS231A TA SESSION - cvgl.stanford.edu

HOMOGENEOUS KERNELUse the homogeneity,

Ex, intersection kernel

Define as the Fourier transformation of

Continuous, infinite dimensional featureSample at points

In the problem 2, we use vl_homkermap()

k(x, y) = k ( , ) = (log(y/x))xy‾‾√ x/y‾ ‾‾3 y/x‾ ‾‾3 xy‾‾√

k(x, y) = min(x, y) = (log )xy‾‾√yx

(w) = e+|w|/2

κ(λ) (w)Ψ(x) = e+iλ log x xκ(λ)‾ ‾‾‾‾3

λ = +nL, (+n + 1)L, . . . , nL(x) !Ψ̂ 2n+1

k(x, y) a ⟨ (x), (y)⟩Ψ̂ Ψ̂ n = 1

Page 14: CS231A TA SESSION - cvgl.stanford.edu

BAG OF VISUAL WORDS

Feature Extraction

Clustering

Encoding

Spatial Binning

Kernel SVM k(x, y) = ⟨Ψ(x),Ψ(y)⟩

Page 15: CS231A TA SESSION - cvgl.stanford.edu

BAG OF VISUAL WORDSInstall VL Feat, an extensive CV library

Set up the path correctly in the starter code p2.m

Vary options and see how it affects the performance

http://www.vlfeat.org/

Page 16: CS231A TA SESSION - cvgl.stanford.edu

Code extract_dense_sift.m

Image feature extraction

Dense SIFT features

use vl_dsift()

randomly select 10,000 features

Code create_dictionary.m

Create dictionary of visual words

Use k-means to find the centroid of clusters.

use vl_kmeans() 

Code create_histograms.m

Represent images as histograms

Spatial pyramid