Post on 28-Aug-2020
Parametric Estimation X = { xt }t where xt ~ p (x)
Parametric estimation:
Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X
e.g., N ( μ, σ2) where q = { μ, σ2}
Sufficient statistics (Wikipedia):
a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter
e.g., Bernoulli dist.: X1,...,Xn T(x) = X1+...+Xn is a sufficient statistic for p: px(1-p)1-x.
2
Maximum Likelihood Estimation Likelihood of q given the sample X l (θ|X) = p (X |θ) = ∏
t p (xt|θ)
Log likelihood
L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)
Maximum likelihood estimator (MLE)
θ* = argmaxθ L(θ|X)
3
Examples: Bernoulli Bernoulli: Two states, failure/success, x in {0,1}
P (x) = pox (1 – po )
(1 – x)
L (po|X) = log ∏t po
xt (1 – po ) (1 – xt)
MLE: po = ∑t xt / N
4
MLE: dL/dp=0
Examples: Multinomial Multinomial: K>2 states, xi in {0,1}
P (x1,x2,...,xK) = ∏i pi
xi
L(p1,p2,...,pK|X) = log ∏t ∏
i pi
xit
MLE: pi = ∑t xi
t / N
Finding the local maxima and minima of a function subject to constraints (조건부 최적화 문제):
∑t pi = 1
Lagrange multiplier, Karush–Kuhn–Tucker (KKT) conditions
5
Lagrange multiplier (Wikipedia)
Lagrange multiplier
Solve:
Karush-Kuhn-Tucker condition (Wikipedia)
KKT multiplier:
Lp = f (x)+ migi(x)+ l jhj (x)j=1
l
åi=1
m
å
Gaussian (Normal) Distribution
2
2
2exp
2
1 x-xp
N
mx
s
N
x
m
t
t
t
t
2
2
p(x) = N ( μ, σ2)
MLE for μ and σ2:
8
μ σ
2
2
22
1
xxp exp
Bias and Variance
9
Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2] Mean square error: r (d,q) = E [(d–q)2] = (E [d] – q)2 + E [(d–E [d])2] = Bias2 + Variance
q
Bayes’ Estimator Treat θ as a random var with prior p (θ)
Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)
Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ
For easier calculation,
Reduce it to a single point
Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)
Maximum Likelihood (ML): prior density is flat. θML = argmaxθ p(X|θ)
Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ
The best estimate of a random var. is its mean! 10
Bayes’ Estimator: Example xt ~ N (θ, σ2) and θ ~ N ( μ0, σo
2)
θML = m
θMAP = θBayes’ =
020
2
20
20
2
2
/1/
/1
/1/
/|
q
Nm
N
NE X
11
N
x
m t
t
2
2
2exp
2
1|
q
q
tt x
xp
SVM vs. MED • Support Vector Machine (SVM)
• Maximum Entropy Discrimination (MED)
Max-margin 학습과 Bayesian-style 학습을 결합
min𝑝(𝐰),ξ𝐾𝐿 𝑝 𝐰 ||𝑝0 𝐰 + 𝐶 ξ𝑖𝑁𝑖=1
𝑠. 𝑡. 𝑦𝑖 𝑝 𝐰 𝐰𝑇𝑓(𝐱𝑖) 𝑑𝐰 ≥ 1 − ξ𝑖 , ξ∀𝑖
SVM is a special case of MED: (p0(w) ~ N(0,I))
+1
-1 p(w)
0 , 1)(, ..
2
1min
2
,
iiii
n
iiw
byits
C
xw
w
Margin-based Learning 파라다임 (ICML) <Structural SVM>
Parametric Classification
iii
iii
CPCxpxg
CPCxpxg
log| log
or
|
i
i
iii
i
i
i
i
CPx
xg
xCxp
log log log
exp|
2
2
2
2
22
2
1
22
1
14
15
Given the sample
ML estimates are
Discriminant becomes
Nt
tt ,rx 1}{ X
x
, if
if
ijx
xr
jt
it
ti C
C
0
1
t
ti
t
tii
t
i
t
ti
t
ti
t
it
ti
ir
rmx
sr
rx
mN
r
CP
2
2 ˆ
i
i
iii CP
s
mxsxg ˆ log log log
2
2
22
2
1
16
Equal variances
Single boundary at halfway between means
17
Variances are different
Two boundaries