标签:imp square node and oss ast com eps image




^ is the square root of epsilon


a simplified version of hard version
a more smooth way to find correct solution

the first term is the REINFORCE term, and the seconde term is our grad log probability of our loss

b is a stochastic node




more formula derivations are ignored.
Deep RL Bootcamp Lecture 7: SVG, DDPG, and Stochastic Computation Graphs
标签:imp square node and oss ast com eps image
原文地址:https://www.cnblogs.com/ecoflex/p/8977893.html