EE269 Signal Processing for Machine Learning - Lecture...

EE269Signal Processing for Machine Learning

Lecture 8

Instructor : Mert Pilanci

Stanford University

February 4, 2019

Recap: Linear and Quadratic Discriminant Analysis

I Suppose x[n] = [x1, ...xN ] ∼ N(µk,Σ) when y = k

gk(x) = Px|y=k = 1

(2π)N2 |Σ|

12e−

(x−µk)T Σ−1(x−µk)

I K classes

f(x) = arg maxk=1,...,K

πkgk(x)

Scaled identity covariances Σk = σ2I

I Suppose x[n] = [x1, ...xN ] ∼ N(µk,Σk) when y = k

gk(x) = Px|y=k = 1

(2π)N2 |Σk|

12e−

(x−µk)T Σ−1k (x−µk)

I K classes

I Decision boundary: hyperplane

wT (x− x0) = 0

w = µi − µj

2(µi + µj)−

||µi − µj ||2log

πiπj

(µi − µj)

I Hyperplane passes through the point x0 and is orthogonal tow

Scaled identity covariances Σk = σ2I

gk(x) = Px|y=k = 1

(2π)N2 |Σk|

12e−

I K classes

wT (x− x0) = 0

w = µi − µj

2(µi + µj)−

||µi − µj ||2log

πiπj

(µi − µj)

I Hyperplane passes through the point x0 and is orthogonal tow

Identical covariances Σk = Σ

gk(x) = Px|y=k = 1

(2π)N2 |Σ|

12e−

wT (x− x0) = 0

w = Σ−1(µi − µj)

2(µi + µj)−

log πiπj

(µi − µj)TΣ−1(µi − µj)(µi − µj)

I Hyperplane passes through x0 but not necessarily orthogonalto the lines between the means

Identical covariances Σk = Σ

gk(x) = Px|y=k = 1

(2π)N2 |Σ|

12e−

wT (x− x0) = 0

w = Σ−1(µi − µj)

2(µi + µj)−

log πiπj

(µi − µj)TΣ−1(µi − µj)(µi − µj)

I Hyperplane passes through x0 but not necessarily orthogonalto the lines between the means

Quadratic Discriminant Analysis: Σk arbitrary

gk(x) = Px|y=k = 1

(2π)N2 |Σk|

12e−

I hk(x) = xTWkx+ wTk x+ wk0

I Classify as class k if hk(x) > hk′(x) ∀k′ 6= k

Wk = −12Σ−1

wk = Σ−1k µk

wk0 = −12µ

Tk Σ−1

k µk − 12 log |Σk|+ log πk

Quadratic decision regions: hyperquadrics

Estimating parameters: univariate Gaussian

I Suppose x1, x2, ...xn i.i.d. ∼ N(µ, σ2)

I Estimating means

µML = 1n

∑ni=1 xn

I Estimating variances

σ2ML = 1

∑ni=1(xn − µML)2

Estimating parameters: multivariate Gaussian

I Suppose x1, x2, ...xn i.i.d. ∼ N(µ,Σ)

I Estimating means

µML = 1n

∑ni=1 xn

I Estimating covariances

ΣML = 1n

∑ni=1(xn − µML)(xn − µML)T

Estimating parameters: multivariate Gaussian

I Suppose x1, x2, ...xn i.i.d. ∼ N(µ,Σ)

I Estimating means

µML = 1n

∑ni=1 xn

I Estimating covariances

ΣML = 1n

Linear vs Quadratic Discriminant Analysis

Estimate µk, for k = 1...,K and Σ

Kn+(n2

)+ n parameters

Estimate µk, Σk for k = 1...,K

Kn+K((n2

)parameters

Linear vs Quadratic Discriminant Analysis

Estimate µk, for k = 1...,K and Σ

Kn+(n2

)+ n parameters

Estimate µk, Σk for k = 1...,K

Kn+K((n2

)parameters

Regularized Linear Discriminant Analysis

I Maximum Likelihood Covariance estimate

ΣML = 1n

I Regularized estimate

Σ̂ = (1− α) diag(ΣML) + αΣML

I Diagonal Linear Discriminant Analysis (α = 0)

Σ̂ = diag(ΣML)

(slide credit: T. Hastie et al.)

Optimal basis change and dimension reduction

I Decision boundary wT (x− x0) = 0

e.g., in LDA with equal covariances, w = Σ−1(µi − µj)I Classifies based on wTx ∈ R

Mean of the projected data

I y = aTx

µ1 = 1N1

∑i∈ class 1 xi

µ2 = 1N2

∑i∈ class 2 xi

Fisher’s LDA

I µk = E[x | x comes from class k]

I Σk = E(x− µk)(x− µk)T | x comes from class k]

I classify using a scalar feature y = aTx

βk = E[y | x comes from class k]

σ2k = E[(y − βk)2 | x comes from class k]

(β1 − β2)2

σ21 + σ2

Fisher’s LDA

(β1 − β2)2

σ21 + σ2

Fisher’s LDA

(β1 − β2)2

σ21 + σ2

Fisher’s LDA

βk = E[y | x comes from class k] = aTµk

σ2k = E[(y − βk)2 | x comes from class k] =

E[(aT (x− µk))2] = E[(aT (x− µk)(x− µk)Ta] = aTΣka

(β1 − β2)2

σ21 + σ2

= maxa

(aT (µ1 − µ2))2

aT (Σ1 + Σ2)a

= maxa

where Q = (µ1 − µ2)(µ1 − µ2)T and P = Σ1 + Σ2.

Fisher’s LDA

Maximizing quadratic forms

I Eigenvalue Decomposition Q = UΛUT

I Change of basis b = UTa, i.e., Ub = a

aTUΛUTa

aTa= max

bTUTUb

= maxb

I Optimum is given by b = δ[n− k∗] where

k∗ = arg maxk

Λkk = 1

Solution: a = u1 maximal eigenvector, i.e., Qu1 = λ1u1

Optimal value : λ1

aTUΛUTa

aTa= max

bTUTUb

= maxb

k∗ = arg maxk

Λkk = 1

Optimal value : λ1

aTUΛUTa

aTa= max

bTUTUb

= maxb

k∗ = arg maxk

Λkk = 1

Optimal value : λ1

Maximizing quadratic forms: two quadratics

I Theorem (Simultaneous Diagonalization)

Let P,Q ∈ Rn×n real symmetric matrices, and P is positivedefinite, then there exists a matrix V such that

V TPV = I

V TQV = Λ = diag(λ1, ..., λn)

where V,Λ satisfies the generalized eigenvalue equation:

Qvi = λiPvi

V TPV = I

where V,Λ satisfies the generalized eigenvalue equation:

Qvi = λiPvi

V TPV = I

where V,Λ satisfies thegeneralized eigenvalue equation:

Qvi = λiPvi

Proof: Let P = UPΛPUTP be its Eigenvalue Decomposition

V ′ = UPΛ− 1

2P will only diagonalize P

Let V ′TQV ′ = U ′Λ′U ′T be its EVD

Set V = V ′U ′

I Let V and Λ satisfy the generalized eigenvalue equation

Qvi = λiPvi

Basis change a = V b, i.e., b = V Ta

bTV TQV b

bTV TPV b= max

I Solution: a = v1, maximal generalized eigenvector

Optimal value: λ1 maximum generalized eigenvalue

I Let V and Λ satisfy the generalized eigenvalue equation

Qvi = λiPvi

Basis change a = V b, i.e., b = V Ta

bTV TQV b

bTV TPV b= max

I Solution: a = v1, maximal generalized eigenvector

Optimal value: λ1 maximum generalized eigenvalue

Fisher’s LDA

I Solution: Qa = λPa, therefore P−1Qa = λa

P−1(µ1 − µ2)(µ1 − µ2)Ta = λa

a = constant× P−1(µ1 − µ2)

can be normalized as a := P−1(µ1−µ2)||P−1(µ1−µ2)||2

Fisher’s LDA

P−1(µ1 − µ2)(µ1 − µ2)Ta = λa

Fisher’s LDA

P−1(µ1 − µ2)(µ1 − µ2)Ta = λa

EE269 Signal Processing for Machine Learning - Lecture...

Documents

Transcript of EE269 Signal Processing for Machine Learning - Lecture...

Lecture Notes. Cost Minimization Before looked at maximizing Profits (π) = TR – TC or π =pf(L,K) – wL – rK But now also look at cost minimization That.

Weyl Sums for Quadratic Roots - University of California ...wdduke/preprints/weylsums.pdf · Weyl Sums for Quadratic Roots W. Duke, J. B. Friedlander and H. Iwaniec 1 Introduction

Breaking the quadratic barrier for 3-LCC’s over the Realszdvir/papers/DSW13.pdf · 2019-07-31 · Breaking the quadratic barrier for 3-LCC’s over the Reals Zeev Dvir Shubhangi

QUADRATIC DIFFERENTIALS, MEASURED FOLIATIONS AND METRIC ...subhojoy/public_html/quadPP-final.pdf · QUADRATIC DIFFERENTIALS, MEASURED FOLIATIONS AND METRIC GRAPHS ON THE PUNCTURED

Constructing an ±-maximizing option trading strategy in a multi

Maximizing Spatial -Fairness in Multi-Tier Multi-Rate Spatial Aloha ...liang/publications/TCOM_MMTS.pdf · of MMTS to the objective value of a Karush-Kuhn-Tucker (KKT) point of the

Quadratic Number Fields { Lecture 6 · Quadratic Number Fields { Lecture 6 TU Kaiserslautern { Summer term 2017 Tommy Hofmann June 12, 2017. Lattices and J-sets I If ; 2C are complex

Nonlinear Rescaling as Interior Quadratic Prox Method in ...mason.gmu.edu/.../InteriorQuadraticProxMethod.pdf · and Interior Quadratic Prox methods for the dual problem in the rescaled

Time-stepping techniqueskuzmin/cfdintro/lecture8.… · Time-stepping techniques Unsteady ﬂows are parabolic in time ⇒ use ‘time-stepping’ methods to advance transient solutions

Diploma Thesis Development of a System for Maximizing the ...

Proﬁt-maximizing operation and valuation of hydroelectric ... · PDF fileProﬁt-maximizing operation and valuation of hydroelectric ... Both problems are approached by using time-dependent

Probabilistic Graphical Modelspeople.csail.mit.edu/dsontag/courses/pgm13/slides/lecture8.pdf · Fig. 5.4 Structured mean Þeld approximation for a factorial HMM. (a) Original model

Medical Microbiology I - Lecture8

APPROXIMATION BOUNDS FOR QUADRATIC …people.ece.umn.edu/~nikos/SJE000001.pdfAPPROXIMATION BOUNDS FOR QUADRATIC OPTIMIZATION 3 performance ratio of υ qp/υ sdp = O(m) in the complex

Phys 774: ε Ellipsometry α 0 Isirenko/Phys-774/Lecture8-2007.pdfEllipsometry: Linear polarized light is incident on the sample under an angle φ0 and polarization of the reflected

Quadratic gravity in first order formalismuam...Quadratic gravity in rst order formalism Corfu, 27th September, 2017 Sergio Gonz alez Mart n Enrique Alvarez & Jesus Anero Quadratic

Lecture 10 Pattern Recognition & Learning II€¦ · we get (see Burges tutorial online): where the α i are obtained by maximizing: =∑ i w αy x i i i ∑ ∑ ∑ ≥ = − ⋅

Lecture 16: Switched Linear Quadratic Regulationjianghai/Teaching/ECE695/Lec_16.pdf · Switched Linear Quadratic Regulation Problem: For the switched linear system x k+1 = A ˙ k

Trigonometric Equations In quadratic form, using identities or linear in sine and cosine.

Lecture8 FiniteElementMethod(FEM)for1D Poisson’sEquationjinnliu/teaching/nde07/Lecture8.pdf · Lecture8 FiniteElementMethod(FEM)for1D Poisson’sEquation Jinn-LiangLiu 2017/4/18