查看原文
其他

【代码集合】深度强化学习Pytorch实现集锦

本次分享的是用PyTorch语言编写的深度强化学习算法的高质量实现这些IPython笔记本的目的主要是帮助练习和理解这些论文;因此,在某些情况下,我将选择可读性而不是效率。首先,我会上传论文的实现,然后是标记来解释代码的每一部分。


相关论文



  1. Human Level Control Through Deep Reinforement Learning

     [Publication] https://deepmind.com/research/publications/human-level-control-through-deep-reinforcement-learning/

     [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/01.DQN.ipynb

  2. Multi-Step Learning (from Reinforcement Learning: An Introduction, Chapter 7) 

    [Publication] https://github.com/qfettes/DeepRL-Tutorials/blob/master/01.DQN.ipynb

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/02.NStep_DQN.ipynb

  3. Deep Reinforcement Learning with Double Q-learning 

    [Publication] https://arxiv.org/abs/1509.06461

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/03.Double_DQN.ipynb

  4. Dueling Network Architectures for Deep Reinforcement Learning 

    [Publication] https://arxiv.org/abs/1511.06581

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/04.Dueling_DQN.ipynb

  5. Noisy Networks for Exploration 

    [Publication] https://github.com/qfettes/DeepRL-Tutorials/blob/master/04.Dueling_DQN.ipynb

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/05.DQN-NoisyNets.ipynb

  6. Prioritized Experience Replay 

    [Publication] https://arxiv.org/abs/1511.05952?context=cs

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/06.DQN_PriorityReplay.ipynb

  7. A Distributional Perspective on Reinforcement Learning 

    [Publication] https://arxiv.org/abs/1707.06887

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/07.Categorical-DQN.ipynb

  8. Rainbow: Combining Improvements in Deep Reinforcement Learning 

    [Publication] https://arxiv.org/abs/1710.02298

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/08.Rainbow.ipynb

  9. Distributional Reinforcement Learning with Quantile Regression 

    [Publication] https://arxiv.org/abs/1710.10044

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/09.QuantileRegression-DQN.ipynb

  10. Rainbow with Quantile Regression 

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/10.Quantile-Rainbow.ipynb

  11. Deep Recurrent Q-Learning for Partially Observable MDPs 

    [Publication] https://arxiv.org/abs/1507.06527

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/11.DRQN.ipynb

  12. Advantage Actor Critic (A2C) 

    [Publication1] https://arxiv.org/abs/1602.01783

    [Publication2] https://blog.openai.com/baselines-acktr-a2c/

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/12.A2C.ipynb

  13. High-Dimensional Continuous Control Using Generalized Advantage Estimation 

    [Publication] https://arxiv.org/abs/1506.02438

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/13.GAE.ipynb

  14. Proximal Policy Optimization Algorithms 

    [Publication] https://arxiv.org/abs/1707.06347

    [code] https://github.com/qfettes/DeepRL-Tutorials/blob/master/14.PPO.ipynb


PyTorch实现


关注公众号,后天回复关键词

20181023


推荐阅读

宿命之战:程序员VS产品经理

赛事发布 | 数字合肥广邀智慧城市建设英才,三十万重金等你来战

800万中文词,腾讯AI Lab开源大规模NLP数据集

pandas入门教程

10 张令人喷饭的程序员漫画

【资源】机器学习算法工程师手册(PDF下载)

源码 | Python爬虫之网易云音乐下载

548页MIT强化学习教程,收藏备用【PDF下载】


    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存