06a math essentials 2 - courses.washington.edu

22
Machine Learning Math Essentials Part 2 Jeff Howbert Introduction to Machine Learning Winter 2012 1

Transcript of 06a math essentials 2 - courses.washington.edu

Page 1: 06a math essentials 2 - courses.washington.edu

Machine Learning

Math EssentialsPart 2

Jeff Howbert Introduction to Machine Learning Winter 2012 1

Page 2: 06a math essentials 2 - courses.washington.edu

Gaussian distribution

Most commonly used continuous probability distributiondistribution

Also known as the normal distributionAlso known as the normal distribution

Two parameters define a Gaussian:Two parameters define a Gaussian:– Mean μ location of center

Variance 2 idth of c r e– Variance σ2 width of curve

Jeff Howbert Introduction to Machine Learning Winter 2012 2

Page 3: 06a math essentials 2 - courses.washington.edu

Gaussian distribution

2

2)(1 μ−− x

In one dimension

222/12

2

)2(1),|( σ

πσσμ = exΝ

Jeff Howbert Introduction to Machine Learning Winter 2012 3

Page 4: 06a math essentials 2 - courses.washington.edu

Gaussian distribution

In one dimension Causes pdf to decrease as distance from center

increases

2)(1 μ−− x

increases

22)(

2/122

)2(1),|( σ

μ

πσσμ = exΝ

Controls width of curve

Normalizing constant: insures that distribution

i 1

Jeff Howbert Introduction to Machine Learning Winter 2012 4

integrates to 1

Page 5: 06a math essentials 2 - courses.washington.edu

Gaussian distribution

2

21

21 xex−

μ = 0 σ2 = 1 μ = 2 σ2 = 1

Jeff Howbert Introduction to Machine Learning Winter 2012 5

μ = 0 σ2 = 5 μ = -2 σ2 = 0.3

Page 6: 06a math essentials 2 - courses.washington.edu

Multivariate Gaussian distribution

)()(1 1T11 Σ−

In d dimensions

)()(2

2/12/

1T

||1

)2(1),|(

μxΣμx

ΣΣμx

−−−= eΝ dπ

x and μ now d-dimensional vectors– μ gives center of distribution in d-dimensional spaceμ g p

σ2 replaced by Σ, the d x d covariance matrix– Σ contains pairwise covariances of every pair of featuresp y p– Diagonal elements of Σ are variances σ2 of individual

features

Jeff Howbert Introduction to Machine Learning Winter 2012 6

– Σ describes distribution’s shape and spread

Page 7: 06a math essentials 2 - courses.washington.edu

Multivariate Gaussian distribution

CovarianceM t d f t i bl t d i t f– Measures tendency for two variables to deviate from their means in same (or opposite) directions at same time

ce hno

cov

aria

nchigh (positiv

covariancen

ve)e

Jeff Howbert Introduction to Machine Learning Winter 2012 7

Page 8: 06a math essentials 2 - courses.washington.edu

Multivariate Gaussian distribution

In two dimensions

⎥⎦

⎤⎢⎣

⎡=

00

μ

⎥⎦

⎤⎢⎣

⎡=

13.03.025.0

Σ

Jeff Howbert Introduction to Machine Learning Winter 2012 8

Page 9: 06a math essentials 2 - courses.washington.edu

Multivariate Gaussian distribution

In two dimensions

⎥⎦

⎤⎢⎣

⎡=

26.06.02

Σ ⎥⎦

⎤⎢⎣

⎡=

1002

Σ ⎥⎦

⎤⎢⎣

⎡=

1001

Σ

Jeff Howbert Introduction to Machine Learning Winter 2012 9

Page 10: 06a math essentials 2 - courses.washington.edu

Multivariate Gaussian distribution

In three dimensions

rng( 1 );mu = [ 2; 1; 1 ];[ ]sigma = [ 0.25 0.30 0.10;

0.30 1.00 0.70;0.10 0.70 2.00] ;

x = randn( 1000, 3 );x = x * sigma;

Jeff Howbert Introduction to Machine Learning Winter 2012 10

x = x + repmat( mu', 1000, 1 );scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );

Page 11: 06a math essentials 2 - courses.washington.edu

Vector projection

Orthogonal projection of y onto x– Can take place in any space of dimensionality > 2Can take place in any space of dimensionality > 2– Unit vector in direction of x is

x / || x || y

– Length of projection of y indirection of x is

|| y || cos(θ ) xθ|| y || ⋅ cos(θ )

– Orthogonal projection ofy onto x is the vector

xprojx( y )

yprojx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || =[ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form)

Jeff Howbert Introduction to Machine Learning Winter 2012 11

Page 12: 06a math essentials 2 - courses.washington.edu

Linear models

There are many types of linear models in machine learning.– Common in both classification and regression.Common in both classification and regression.– A linear model consists of a vector β in d-dimensional

feature space.– The vector β attempts to capture the strongest gradient

(rate of change) in the output variable, as seen across all training samplestraining samples.

– Different linear models optimize β in different ways.– A point x in feature space is mapped from d dimensions p p pp

to a scalar (1-dimensional) output z by projection onto β:

dd xxz ββαα +++=⋅+= L11xβ

Jeff Howbert Introduction to Machine Learning Winter 2012 12

dd xxz ββαα ++++ 11xβ

Page 13: 06a math essentials 2 - courses.washington.edu

Linear models

There are many types of linear models in machine learning.– The projection output z is typically transformed to a final

predicted output y by some function ƒ:

)()()( 11 dd xxffzfy ββαα +++=⋅+== Lxβexample: for logistic regression, ƒ is logistic functionexample: for linear regression, ƒ( z ) = z

M d l ll d li b th li– Models are called linear because they are a linear function of the model vector components β1, …, βd.

– Key feature of all linear models: no matter what ƒ is, aKey feature of all linear models: no matter what ƒ is, a constant value of z is transformed to a constant value of y, so decision boundaries remain linear even after transform

Jeff Howbert Introduction to Machine Learning Winter 2012 13

transform.

Page 14: 06a math essentials 2 - courses.washington.edu

Geometry of projections

w0 ≡ αw ≡ βw ≡ β

Jeff Howbert Introduction to Machine Learning Winter 2012 14

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 15: 06a math essentials 2 - courses.washington.edu

Geometry of projections

Jeff Howbert Introduction to Machine Learning Winter 2012 15

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 16: 06a math essentials 2 - courses.washington.edu

Geometry of projections

Jeff Howbert Introduction to Machine Learning Winter 2012 16

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 17: 06a math essentials 2 - courses.washington.edu

Geometry of projections

Jeff Howbert Introduction to Machine Learning Winter 2012 17

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 18: 06a math essentials 2 - courses.washington.edu

Geometry of projections

margin

Jeff Howbert Introduction to Machine Learning Winter 2012 18

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 19: 06a math essentials 2 - courses.washington.edu

From projection to prediction

iti i l 1positive margin → class 1

negative margin → class 0

Jeff Howbert Introduction to Machine Learning Winter 2012 19

Page 20: 06a math essentials 2 - courses.washington.edu

Logistic regression in two dimensions

Interpreting the model vector of coefficients

From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ]α = B( 1 ), β = [ β1 β2 ] = B( 2 : 3 )α β define location and orientationα, β define location and orientationof decision boundary

– - α is distance of decisionboundary from originboundary from origin

– decision boundary isperpendicular to β

β

magnitude of β defines gradientof probabilities between 0 and 1

Jeff Howbert Introduction to Machine Learning Winter 2012 20

Page 21: 06a math essentials 2 - courses.washington.edu

Logistic function in d dimensions

Jeff Howbert Introduction to Machine Learning Winter 2012 21

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 22: 06a math essentials 2 - courses.washington.edu

Decision boundary for logistic regression

Jeff Howbert Introduction to Machine Learning Winter 2012 22

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)