首页 > 其他好文 > 详细

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

时间：2018-05-01 20:50:24 阅读：394 评论：0 收藏：0 [点我收藏+]

标签：lower http form alt sam HERE web sample uga

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

https://statweb.stanford.edu/~owen/mc/Ch-var-is.pdf

https://zhuanlan.zhihu.com/p/29934206

技术分享图片

技术分享图片

blue curve is the lower bounded one

技术分享图片

conjugate gradient to solve the optimization problem.

技术分享图片

Fisher information matrix, natural policy gradient

技术分享图片

技术分享图片

To write down an optimization problem, we can solve more robustly with more sample efficiency to update policy

But Lis Lpg is not constrained, so we use KL to ...

技术分享图片

it‘s hard to choose beta

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

TRPO is much worse than A3C on imaging game, where PPO does better

see the slide: limitations of TRPO

技术分享图片

技术分享图片

技术分享图片

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

标签：lower http form alt sam HERE web sample uga

原文地址：https://www.cnblogs.com/ecoflex/p/8976876.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

更多

友情链接

兰亭集智国之画百度统计站长统计阿里云 chrome插件新版天听网

关于我们 - 联系我们 - 留言反馈

© 2014 mamicode.com 版权所有联系我们:gaon5@hotmail.com

迷上了代码！