Report - Variance Reduction for Policy Gradient Methodsrll.berkeley.edu/deeprlcoursesp17/docs/lec6.pdf · \potential" I Theorem: ~r admits the same optimal policies as r.1 I Proof sketch:

Please pass captcha verification before submit form