Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. ·...

Parametric Estimation X = { xt }t where xt ~ p (x)

Parametric estimation:

Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X

e.g., N ( μ, σ2) where q = { μ, σ2}

Sufficient statistics (Wikipedia):

a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter

e.g., Bernoulli dist.: X1,...,Xn T(x) = X1+...+Xn is a sufficient statistic for p: px(1-p)1-x.

Maximum Likelihood Estimation Likelihood of q given the sample X l (θ|X) = p (X |θ) = ∏

t p (xt|θ)

Log likelihood

L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

Maximum likelihood estimator (MLE)

θ* = argmaxθ L(θ|X)

Examples: Bernoulli Bernoulli: Two states, failure/success, x in {0,1}

P (x) = pox (1 – po )

(1 – x)

L (po|X) = log ∏t po

xt (1 – po ) (1 – xt)

MLE: po = ∑t xt / N

MLE: dL/dp=0

Examples: Multinomial Multinomial: K>2 states, xi in {0,1}

P (x1,x2,...,xK) = ∏i pi

L(p1,p2,...,pK|X) = log ∏t ∏

MLE: pi = ∑t xi

Finding the local maxima and minima of a function subject to constraints (조건부 최적화 문제):

∑t pi = 1

Lagrange multiplier, Karush–Kuhn–Tucker (KKT) conditions

Lagrange multiplier (Wikipedia)

Lagrange multiplier

Solve:

Karush-Kuhn-Tucker condition (Wikipedia)

KKT multiplier:

Lp = f (x)+ migi(x)+ l jhj (x)j=1

Gaussian (Normal) Distribution

1 x-xp

p(x) = N ( μ, σ2)

MLE for μ and σ2:

xxp exp

Bias and Variance

Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2] Mean square error: r (d,q) = E [(d–q)2] = (E [d] – q)2 + E [(d–E [d])2] = Bias2 + Variance

Bayes’ Estimator Treat θ as a random var with prior p (θ)

Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ

For easier calculation,

Reduce it to a single point

Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)

Maximum Likelihood (ML): prior density is flat. θML = argmaxθ p(X|θ)

Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

The best estimate of a random var. is its mean! 10

Bayes’ Estimator: Example xt ~ N (θ, σ2) and θ ~ N ( μ0, σo

θML = m

θMAP = θBayes’ =

SVM vs. MED • Support Vector Machine (SVM)

• Maximum Entropy Discrimination (MED)

Max-margin 학습과 Bayesian-style 학습을 결합

min𝑝(𝐰),ξ𝐾𝐿 𝑝 𝐰 ||𝑝0 𝐰 + 𝐶 ξ𝑖𝑁𝑖=1

𝑠. 𝑡. 𝑦𝑖 𝑝 𝐰 𝐰𝑇𝑓(𝐱𝑖) 𝑑𝐰 ≥ 1 − ξ𝑖 , ξ∀𝑖

SVM is a special case of MED: (p0(w) ~ N(0,I))

-1 p(w)

0 , 1)(, ..

Margin-based Learning 파라다임 (ICML) <Structural SVM>

Parametric Classification

CPCxpxg

log| log

log log log

Given the sample

ML estimates are

Discriminant becomes

tt ,rx 1}{ X

iii CP

mxsxg ˆ log log log

Equal variances

Single boundary at halfway between means

Variances are different

Two boundaries

Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. ·...

Documents

Transcript of Introduction to Machine Learning - Kangwoncs.kangwon.ac.kr/~leeck/ML/04-1.pdf · 2016. 6. 6. ·...

17 Machine Learning Radial Basis Functions

Machine Learning Seminar: Support Vector Regression

Machine Learning 10-701tom/10701_sp11/slides/MDPs_RL_04_28_2011.pdfApr 28, 2011 · 1 Tom Mitchell, April 2011 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department

Machine Learning Learning with Graphical Models · Machine Learning Learning with Graphical Models Marc Toussaint University of Stuttgart Summer 2015

CSC 411: Introduction to Machine Learningmren/teach/csc411_19s/lec/lec22.pdf · CSC 411: Introduction to Machine Learning CSC 411 Lecture 22: Reinforcement Learning II Mengye Ren

Machine Learning and Data Mining Linear classification

Introduction to Machine Learning - Kangwonleeck/AI2/04-1.pdf · 2018. 9. 4. · Introduction to Machine Learning Author: ethem Created Date: 4/2/2013 7:03:30 AM ...

KungFu: Making Training in Distributed Machine Learning ...

CSC 311: Introduction to Machine Learning · 2020. 11. 4. · CSC 311: Introduction to Machine Learning Lecture 8 - Probabilistic Models Pt. II, PCA Roger Grosse Chris Maddison Juhan

Stochastic Knowledge Representations and Machine Learning ...

Machine Learning - cs.cmu.edu

Machine Learning Classical Conditioning Synaptic Plasticity · Machine Learning Classical Conditioning Synaptic Plasticity ... What types of criterions can we use to answer ... Proof

Lecture 15: Learning Basics Neural Network / Deep Machine Learning …€¦ · Machine Learning Lecture 15: Neural Network / Deep Learning Basics. 3. ewx+b 1 + e wx+b Logistic Regression

Proceso estadístico de modelos con Machine Learning › bitstream › 20.500... · Postproceso estadístico con Machine Learning: aplicación… • Trabajo con 5 aeropuertos de

Sparse methods for machine learning Theory and algorithms

Machine Learning 10-701tom/10701_sp11/slides/Kernels_SVM2_04_1… · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 12, 2011

Introduction to Machine Learning - UMDusers.umiacs.umd.edu/~jbg/teaching/CMSC_723/02a_sg.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Introduction to Machine Learning 12

CPSC 540: Machine Learning - Metropolis-Hastings

Introduction to Machine Learning - UMIACSjbg/teaching/CMSC_726/03a.pdf · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine Learning 12 / 18. Getting to Union Station

Reliable Machine Learning Acceleration for Future Space ...