CSC411: Optimization for Machine Learning
University of Toronto
September 20–26, 2018
1
1based on slides by Eleni Triantafillou, Ladislav Rampasek, Jake Snell, KevinSwersky, Shenlong Wang and other
!
!
!
!
!
!
!
θ∗ = θ (θ) θ
! θ ∈ R! : R → R
(θ) − (θ)
! θ! θ!
θ
! ( , )( | , θ)
! − log ( | , θ)!
∂ (θ∗)∂θ =
! θ!
! ∇θ = ( ∂∂θ ,∂∂θ , ...,
∂∂θ )
η
! θ! = :
! δ ← −η∇θ −! θ ← θ − + δ
η
! θ! = :
! η (θ − η ∇θ − ) < (θ )! δ ← −η ∇θ −! θ ← θ − + δ
α ∈ [ , )
! θ! δ! = :
! δ ← −η∇θ − +αδ −! θ ← θ − + δ
α
η
! θ!
! δ ← −η∇θ −! θ ← θ − + δ
!
!| (θ + )− (θ )| < ϵ
! ∥∇θ ∥ < ϵ!
!
∇!
∂
∂θ≈ ((θ , . . . , θ + ϵ, . . . , θ ))− ((θ , . . . , θ − ϵ, . . . , θ ))
ϵ
!
!
!
!
!!
!
!
!
!
!
!
θ θ∈ [ , ]
( θ + ( − )θ ) ≤ (θ ) + ( − ) (θ )
! α α ≥! +!
!!
(θ) = −∑ ( ) log ( = | ( ), θ) + ( − ( )) log ( = | ( ), θ)− log σ(θ)
!
!
!
Top Related