Introduction to Machine Learning - Kangwon leeck/AI2/04-1.pdf · PDF file...

Click here to load reader

  • date post

    27-Feb-2021
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Introduction to Machine Learning - Kangwon leeck/AI2/04-1.pdf · PDF file...

  • Parametric Estimation  X = { xt }t where xt ~ p (x)

     Parametric estimation:

    Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X

    e.g., N ( μ, σ2) where q = { μ, σ2}

     Sufficient statistics (Wikipedia):

    a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter

    e.g., Bernoulli dist.: X1,...,Xn  T(x) = X1+...+Xn is a sufficient statistic for p: px(1-p)1-x.

    2

  • Maximum Likelihood Estimation  Likelihood of q given the sample X l (θ|X) = p (X |θ) = ∏

    t p (xt|θ)

     Log likelihood

    L(θ|X) = log l (θ|X) = ∑ t log p (xt|θ)

     Maximum likelihood estimator (MLE)

    θ* = argmaxθ L(θ|X)

    3

  • Examples: Bernoulli  Bernoulli: Two states, failure/success, x in {0,1}

    P (x) = po x (1 – po )

    (1 – x)

    L (po|X) = log ∏t po xt (1 – po )

    (1 – xt)

    MLE: po = ∑t x t / N

    4

    MLE: dL/dp=0

  • Examples: Multinomial  Multinomial: K>2 states, xi in {0,1}

    P (x1,x2,...,xK) = ∏i pi xi

    L(p1,p2,...,pK|X) = log ∏t ∏i pi xi

    t

    MLE: pi = ∑t xi t / N

     Finding the local maxima and minima of a function subject to constraints (조건부 최적화 문제):

     ∑ t pi = 1

     Lagrange multiplier, Karush–Kuhn–Tucker (KKT) conditions

    5

  • Lagrange multiplier (Wikipedia)

    Lagrange multiplier

    Solve:

  • Karush-Kuhn-Tucker condition (Wikipedia)

    KKT multiplier:

    Lp = f (x)+ migi(x)+ l jhj (x) j=1

    l

    å i=1

    m

    å

  • Gaussian (Normal) Distribution

       

     

      

    

     

    2

    2

    2 exp

    2

    1 x -xp

     

    N

    mx

    s

    N

    x

    m

    t

    t

    t

    t

    2

    2

     p(x) = N ( μ, σ2)

     MLE for μ and σ2:

    8

    μ σ

       

     

      

      

    2

    2

    22

    1

    

    x xp exp

  • Bias and Variance

    9

    Unknown parameter q Estimator di = d (Xi) on sample Xi Bias: bq(d) = E [d] – q Variance: E [(d–E [d])2] Mean square error: r (d,q) = E [(d–q)2] = (E [d] – q)2 + E [(d–E [d])2] = Bias2 + Variance

    q

  • Bayes’ Estimator  Treat θ as a random var with prior p (θ)

     Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

     Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ  For easier calculation,

     Reduce it to a single point

     Maximum a Posteriori (MAP):  θMAP = argmaxθ p(θ|X)

     Maximum Likelihood (ML): prior density is flat.  θML = argmaxθ p(X|θ)

     Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ  The best estimate of a random var. is its mean!

    10

  • Bayes’ Estimator: Example  xt ~ N (θ, σ2) and θ ~ N ( μ0, σo2)  θML = m

     θMAP = θBayes’ =

      02 0

    2

    2 0

    2 0

    2

    2

    /1/

    /1

    /1/

    / | 

    

    

     q

     

     

    N m

    N

    N E X

    11

    N

    x

    m t

    t 

         

     

      

    2

    2

    2 exp

    2

    1 |

    q

     q

    t t xxp

  • SVM vs. MED • Support Vector Machine (SVM)

    • Maximum Entropy Discrimination (MED)

     Max-margin 학습과 Bayesian-style 학습을 결합

     min𝑝(𝐰),ξ𝐾𝐿 𝑝 𝐰 ||𝑝0 𝐰 + 𝐶 ξ𝑖 𝑁 𝑖=1

     𝑠. 𝑡. 𝑦𝑖 𝑝 𝐰 𝐰 𝑇𝑓(𝐱𝑖) 𝑑𝐰 ≥ 1 − ξ𝑖 , ξ∀𝑖

     SVM is a special case of MED: (p0(w) ~ N(0,I))

    +1

    -1 p(w)

    0 , 1)(, ..

    2

    1 min

    2

    ,

    

     

    iiii

    n

    i iw

    byits

    C

    

    

    xw

    w

  • Margin-based Learning 파라다임 (ICML)

  • Parametric Classification

         

         iii

    iii

    CPCxpxg

    CPCxpxg

    log| log

    or

    |

    

       

       

     i i

    i ii

    i

    i

    i

    i

    CP x

    xg

    x Cxp

    log log log

    exp|

     

    

     

      

      

    2

    2

    2

    2

    2 2

    2

    1

    22

    1

     

    

    14

  • 15

     Given the sample

     ML estimates are

     Discriminant becomes

    N t

    tt ,rx 1}{ X

    x   

    

     

    , if

    if

    ijx

    x r

    j t

    i t

    t i C

    C

    0

    1

       

      

    t

    t i

    t

    t ii

    t

    i

    t

    t i

    t

    t i

    t

    i t

    t i

    i r

    rmx

    s r

    rx

    m N

    r

    CP

    2

    2 ˆ

       

     i i

    i ii CP

    s

    mx sxg ˆ log log log 

     

    2

    2

    2 2

    2

    1 

  • 16

    Equal variances

    Single boundary at halfway between means

  • 17

    Variances are different

    Two boundaries