CSC411: Optimization for Machine Learningmren/teach/csc411_19s/tut/tut03.pdf · CSC411:...

25
CSC411: Optimization for Machine Learning University of Toronto September 20–26, 2018 1 1 based on slides by Eleni Triantafillou, Ladislav Rampasek, Jake Snell, Kevin Swersky, Shenlong Wang and other

Transcript of CSC411: Optimization for Machine Learningmren/teach/csc411_19s/tut/tut03.pdf · CSC411:...

  • CSC411: Optimization for Machine Learning

    University of Toronto

    September 20–26, 2018

    1

    1based on slides by Eleni Triantafillou, Ladislav Rampasek, Jake Snell, KevinSwersky, Shenlong Wang and other

  • !

    !

    !

  • !

    !

    !

    !

  • θ∗ = θ (θ) θ

    ! θ ∈ R! : R → R

    (θ) − (θ)

  • ! θ! θ!

  • θ

    ! ( , )( | , θ)

    ! − log ( | , θ)!

  • ∂ (θ∗)∂θ =

    ! θ!

    ! ∇θ = ( ∂∂θ ,∂∂θ , ...,

    ∂∂θ )

  • η

    ! θ! = :

    ! δ ← −η∇θ −! θ ← θ − + δ

  • η

    ! θ! = :

    ! η (θ − η ∇θ − ) < (θ )! δ ← −η ∇θ −! θ ← θ − + δ

  • α ∈ [ , )

    ! θ! δ! = :

    ! δ ← −η∇θ − +αδ −! θ ← θ − + δ

    α

  • η

    ! θ!

    ! δ ← −η∇θ −! θ ← θ − + δ

    !

  • !| (θ + )− (θ )| < ϵ

    ! ∥∇θ ∥ < ϵ!

  • !

    ∇!

    ∂θ≈ ((θ , . . . , θ + ϵ, . . . , θ ))− ((θ , . . . , θ − ϵ, . . . , θ ))

    ϵ

    !

    !

  • !

    !

    !!

    !

    !

  • !

    !

    !

    !

  • θ θ∈ [ , ]

    ( θ + ( − )θ ) ≤ (θ ) + ( − ) (θ )

    ! α α ≥! +!

  • !!

  • (θ) = −∑ ( ) log ( = | ( ), θ) + ( − ( )) log ( = | ( ), θ)− log σ(θ)

  • !

    !

    !