码迷,mamicode.com
首页 > 其他好文 > 详细

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

时间:2017-09-30 20:53:09      阅读:164      评论:0      收藏:0      [点我收藏+]

标签:ref   frame   dev   for   eva   his   info   ase   pre   

 

 

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

标签:ref   frame   dev   for   eva   his   info   ase   pre   

原文地址:http://www.cnblogs.com/yuanjiangw/p/7615875.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!