ETHEM ALPAYDIN © The MIT Press, 2010 · 2016. 8. 8. · ETHEM ALPAYDIN © The MIT Press, 2010...
Transcript of ETHEM ALPAYDIN © The MIT Press, 2010 · 2016. 8. 8. · ETHEM ALPAYDIN © The MIT Press, 2010...
ETHEM ALPAYDIN© The MIT Press, 2010
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Lecture Slides for
Parametric Estimation X = { xt }t where xt ~ p (x)
Parametric estimation:
Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X
e.g., N ( μ, σ2) where q = { μ, σ2}
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
Maximum Likelihood Estimation Likelihood of q given the sample X
l (θ|X) = p (X |θ) = ∏t
p (xt|θ)
Log likelihood
L(θ|X) = log l (θ|X) = ∑tlog p (xt|θ)
Maximum likelihood estimator (MLE)
θ* = argmaxθ L(θ|X)
4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1}
P (x) = pox (1 – po )
(1 – x)
L (po|X) = log ∏t
poxt (1 – po )
(1 – xt)
MLE: po = ∑t
xt / N
Multinomial: K>2 states, xi in {0,1}
P (x1,x2,...,xK) = ∏ipi
xi
L(p1,p2,...,pK|X) = log ∏t ∏
ipi
xit
MLE: pi = ∑t
xit / N
5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Gaussian (Normal) Distribution
2
2
2exp
2
1 x-xp
N
mx
s
N
x
m
t
t
t
t
2
2
p(x) = N ( μ, σ2)
MLE for μ and σ2:
6
μ σ
2
2
22
1
xxp exp
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Bias and Variance
7
Unknown parameter qEstimator di = d (Xi) on sample Xi
Bias: bq(d) = E [d] – qVariance: E [(d–E [d])2]
Mean square error: r (d,q) = E [(d–q)2]
= (E [d] – q)2 + E [(d–E [d])2]= Bias2 + Variance
q
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Bayes’ Estimator Treat θ as a random var with prior p (θ)
Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)
Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ
Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)
Maximum Likelihood (ML): θML = argmaxθ p(X|θ)
Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ
8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Bayes’ Estimator: Example xt ~ N (θ, σo
2) and θ ~ N ( μ, σ2)
θML = m
θMAP = θBayes’ =
q
22
0
2
22
0
2
0
1
1
1 //
/
//
/|
Nm
N
NE X
9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Parametric Classification
iii
iii
CPCxpxg
CPCxpxg
log| log
or
|
ii
iii
i
i
i
i
CPx
xg
xCxp
log log log
exp|
2
2
2
2
22
2
1
22
1
10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11
Given the sample
ML estimates are
Discriminant becomes
Nt
tt ,rx 1}{ X
x
, i f
i f
ijx
xr
jt
it
ti C
C
0
1
t
ti
t
tii
t
i
t
ti
t
ti
t
it
ti
ir
rmx
sr
rx
mN
r
CP
2
2 ˆ
ii
iii CP
s
mxsxg ˆ log log log
2
2
22
2
1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
12
Equal variances
Single boundary athalfway between means
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
13
Variances are different
Two boundaries
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Regression
2
20
q
q
,N
,N
|~|
~
|:estimator
xgxrp
xg
xfr
N
t
tN
t
tt
N
t
tt
xpxrp
rxp
11
1
log| log
, log|XL q
14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Regression: From LogL to Error
2
1
2
12
2
2
1
2
1
2
12
22
1
N
t
tt
N
t
tt
ttN
t
xgrE
xgrN
xgr
q
q
q
||
|log
|exp log|
X
XL
15Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Linear Regression
0101 wxwwwxg tt ,|
t
t
t
tt
t
t
t
t
t
t
xwxwxr
xwNwr
2
10
10
t
t
tt
t
t
t
t
t
t
t
xr
r
w
w
xx
xN
yw 1
0
2A
yw1A
16Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Polynomial Regression
01
2
2012 wxwxwxwwwwwxg ttktkk
t ,,,,|
NNNN
k
k
r
r
r
xxx
xxx
xxx
2
1
22
2222
1211
1
1
1
r D
rwTT DDD
1
17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Other Error Measures
2
12
1
N
t
tt xgrE qq ||X
18
Square Error:
Relative Square Error:
Absolute Error: E (θ |X) = ∑t|rt – g(xt| θ)|
ε-sensitive Error:
E (θ |X) = ∑ t1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)
2
1
2
1
N
t
t
N
t
tt
rr
xgr
E
q
q
|
|X
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Bias and Variance
19
222||| xgExgExgExrExxgxrEE XXXX
bias variance
222 xgxrExxrErExxgrE ||||
noise squared error
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Estimating Bias and Variance
ti
t i
tti
t
tt
xgM
xg
xgxgNM
g
xfxgN
g
1
1
1
2
22
Variance
Bias
20
M samples Xi={xti , r
ti}, i=1,...,M
are used to fit gi (x), i =1,...,M
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Bias/Variance Dilemma Example: gi(x)=2 has no variance and high bias
gi(x)= ∑trt
i/N has lower bias with variance
As we increase complexity,
bias decreases (a better fit to data) and
variance increases (fit varies more with data)
Bias/Variance dilemma: (Geman et al., 1992)
21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
22
bias
variance
f
gi g
f
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Polynomial Regression
23
Best fit “min error”
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
24
Best fit, “elbow”
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Model Selection Cross-validation: Measure generalization accuracy by
testing on data unused during training
Regularization: Penalize complex models
E’=error on data + λ model complexity
Akaike’s information criterion (AIC), Bayesian information criterion (BIC)
Minimum description length (MDL): Kolmogorov complexity, shortest description of data
Structural risk minimization (SRM)
25Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Bayesian Model Selection
data
model model|datadata|model
p
ppp
26
Prior on models, p(model)
Regularization, when prior favors simpler models
Bayes, MAP of the posterior, p(model|data)
Average over a number of models with high posterior (voting, ensembles: Chapter 17)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Regression example
27Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Coefficients increase in magnitude as order increases:1: [-0.0769, 0.0016]2: [0.1682, -0.6657, 0.0080]3: [0.4238, -2.5778, 3.4675, -0.00024: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]
i i
N
t
tt wxgrEtionregulariza 2
2
12
1ww ||: X