Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. ·...

17

Transcript of Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. ·...

Page 1: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...
Page 2: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Parametric Estimation X = { xt }t where xt ~ p (x)

Parametric estimation:

Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X

e.g., N ( μ, σ2) where q = { μ, σ2}

Sufficient statistics (Wikipedia):

a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter

e.g., Bernoulli dist.: X1,...,Xn T(x) = X1+...+Xn is a sufficient statistic for p: px(1-p)1-x.

2

Page 3: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Maximum Likelihood Estimation Likelihood of q given the sample X l (θ|X) = p (X |θ) = ∏

t p (xt|θ)

Log likelihood

L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

Maximum likelihood estimator (MLE)

θ* = argmaxθ L(θ|X)

3

Page 4: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Examples: Bernoulli Bernoulli: Two states, failure/success, x in {0,1}

P (x) = pox (1 – po )

(1 – x)

L (po|X) = log ∏t po

xt (1 – po ) (1 – xt)

MLE: po = ∑t xt / N

4

MLE: dL/dp=0

Page 5: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Examples: Multinomial Multinomial: K>2 states, xi in {0,1}

P (x1,x2,...,xK) = ∏i pi

xi

L(p1,p2,...,pK|X) = log ∏t ∏

i pi

xit

MLE: pi = ∑t xi

t / N

Finding the local maxima and minima of a function subject to constraints (조건부 최적화 문제):

∑t pi = 1

Lagrange multiplier, Karush–Kuhn–Tucker (KKT) conditions

5

Page 6: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Lagrange multiplier (Wikipedia)

Lagrange multiplier

Solve:

Page 7: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Karush-Kuhn-Tucker condition (Wikipedia)

KKT multiplier:

Lp = f (x)+ migi(x)+ l jhj (x)j=1

l

åi=1

m

å

Page 8: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Gaussian (Normal) Distribution

2

2

2exp

2

1 x-xp

N

mx

s

N

x

m

t

t

t

t

2

2

p(x) = N ( μ, σ2)

MLE for μ and σ2:

8

μ σ

2

2

22

1

xxp exp

Page 9: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Bias and Variance

9

Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2] Mean square error: r (d,q) = E [(d–q)2] = (E [d] – q)2 + E [(d–E [d])2] = Bias2 + Variance

q

Page 10: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Bayes’ Estimator Treat θ as a random var with prior p (θ)

Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ

For easier calculation,

Reduce it to a single point

Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)

Maximum Likelihood (ML): prior density is flat. θML = argmaxθ p(X|θ)

Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

The best estimate of a random var. is its mean! 10

Page 11: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Bayes’ Estimator: Example xt ~ N (θ, σ2) and θ ~ N ( μ0, σo

2)

θML = m

θMAP = θBayes’ =

020

2

20

20

2

2

/1/

/1

/1/

/|

q

Nm

N

NE X

11

N

x

m t

t

2

2

2exp

2

1|

q

q

tt x

xp

Page 12: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

SVM vs. MED • Support Vector Machine (SVM)

• Maximum Entropy Discrimination (MED)

Max-margin 학습과 Bayesian-style 학습을 결합

min𝑝(𝐰),ξ𝐾𝐿 𝑝 𝐰 ||𝑝0 𝐰 + 𝐶 ξ𝑖𝑁𝑖=1

𝑠. 𝑡. 𝑦𝑖 𝑝 𝐰 𝐰𝑇𝑓(𝐱𝑖) 𝑑𝐰 ≥ 1 − ξ𝑖 , ξ∀𝑖

SVM is a special case of MED: (p0(w) ~ N(0,I))

+1

-1 p(w)

0 , 1)(, ..

2

1min

2

,

iiii

n

iiw

byits

C

xw

w

Page 13: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Margin-based Learning 파라다임 (ICML) <Structural SVM>

Page 14: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

Parametric Classification

iii

iii

CPCxpxg

CPCxpxg

log| log

or

|

i

i

iii

i

i

i

i

CPx

xg

xCxp

log log log

exp|

2

2

2

2

22

2

1

22

1

14

Page 15: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

15

Given the sample

ML estimates are

Discriminant becomes

Nt

tt ,rx 1}{ X

x

, if

if

ijx

xr

jt

it

ti C

C

0

1

t

ti

t

tii

t

i

t

ti

t

ti

t

it

ti

ir

rmx

sr

rx

mN

r

CP

2

2 ˆ

i

i

iii CP

s

mxsxg ˆ log log log

2

2

22

2

1

Page 16: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

16

Equal variances

Single boundary at halfway between means

Page 17: Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

17

Variances are different

Two boundaries