[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71520":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},71520,"reinforcement-learning","dennybritz\u002Freinforcement-learning","dennybritz","Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.","http:\u002F\u002Fwww.wildml.com\u002F2016\u002F10\u002Flearning-reinforcement-learning\u002F",null,"Jupyter Notebook",22034,6137,855,97,0,3,8,32,9,83.2,"MIT License",false,"master",true,[],"2026-06-12 04:01:01","### Overview\n\nThis repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from\n\n- [Reinforcement Learning: An Introduction (2nd Edition)](http:\u002F\u002Fincompleteideas.net\u002Fbook\u002FRLbook2018.pdf)\n- [David Silver's Reinforcement Learning Course](http:\u002F\u002Fwww0.cs.ucl.ac.uk\u002Fstaff\u002Fd.silver\u002Fweb\u002FTeaching.html)\n\nEach folder in corresponds to one or more chapters of the above textbook and\u002For course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.\n\nAll code is written in Python 3 and uses RL environments from [OpenAI Gym](https:\u002F\u002Fgym.openai.com\u002F). Advanced techniques use [Tensorflow](https:\u002F\u002Fwww.tensorflow.org\u002F) for neural network implementations.\n\n\n### Table of Contents\n\n- [Introduction to RL problems & OpenAI Gym](Introduction\u002F)\n- [MDPs and Bellman Equations](MDP\u002F)\n- [Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration](DP\u002F)\n- [Monte Carlo Model-Free Prediction & Control](MC\u002F)\n- [Temporal Difference Model-Free Prediction & Control](TD\u002F)\n- [Function Approximation](FA\u002F)\n- [Deep Q Learning](DQN\u002F) (WIP)\n- [Policy Gradient Methods](PolicyGradient\u002F) (WIP)\n- Learning and Planning (WIP)\n- Exploration and Exploitation (WIP)\n\n\n### List of Implemented Algorithms\n\n- [Dynamic Programming Policy Evaluation](DP\u002FPolicy%20Evaluation%20Solution.ipynb)\n- [Dynamic Programming Policy Iteration](DP\u002FPolicy%20Iteration%20Solution.ipynb)\n- [Dynamic Programming Value Iteration](DP\u002FValue%20Iteration%20Solution.ipynb)\n- [Monte Carlo Prediction](MC\u002FMC%20Prediction%20Solution.ipynb)\n- [Monte Carlo Control with Epsilon-Greedy Policies](MC\u002FMC%20Control%20with%20Epsilon-Greedy%20Policies%20Solution.ipynb)\n- [Monte Carlo Off-Policy Control with Importance Sampling](MC\u002FOff-Policy%20MC%20Control%20with%20Weighted%20Importance%20Sampling%20Solution.ipynb)\n- [SARSA (On Policy TD Learning)](TD\u002FSARSA%20Solution.ipynb)\n- [Q-Learning (Off Policy TD Learning)](TD\u002FQ-Learning%20Solution.ipynb)\n- [Q-Learning with Linear Function Approximation](FA\u002FQ-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb)\n- [Deep Q-Learning for Atari Games](DQN\u002FDeep%20Q%20Learning%20Solution.ipynb)\n- [Double Deep-Q Learning for Atari Games](DQN\u002FDouble%20DQN%20Solution.ipynb)\n- Deep Q-Learning with Prioritized Experience Replay (WIP)\n- [Policy Gradient: REINFORCE with Baseline](PolicyGradient\u002FCliffWalk%20REINFORCE%20with%20Baseline%20Solution.ipynb)\n- [Policy Gradient: Actor Critic with Baseline](PolicyGradient\u002FCliffWalk%20Actor%20Critic%20Solution.ipynb)\n- [Policy Gradient: Actor Critic with Baseline for Continuous Action Spaces](PolicyGradient\u002FContinuous%20MountainCar%20Actor%20Critic%20Solution.ipynb)\n- Deterministic Policy Gradients for Continuous Action Spaces (WIP)\n- Deep Deterministic Policy Gradients (DDPG) (WIP)\n- [Asynchronous Advantage Actor Critic (A3C)](PolicyGradient\u002Fa3c)\n\n\n### Resources\n\nTextbooks:\n\n- [Reinforcement Learning: An Introduction (2nd Edition)](http:\u002F\u002Fincompleteideas.net\u002Fbook\u002FRLbook2018.pdf)\n\nClasses:\n\n- [David Silver's Reinforcement Learning Course (UCL, 2015)](http:\u002F\u002Fwww0.cs.ucl.ac.uk\u002Fstaff\u002Fd.silver\u002Fweb\u002FTeaching.html)\n- [CS294 - Deep Reinforcement Learning (Berkeley, Fall 2015)](http:\u002F\u002Frll.berkeley.edu\u002Fdeeprlcourse\u002F)\n- [CS 8803 - Reinforcement Learning (Georgia Tech)](https:\u002F\u002Fwww.udacity.com\u002Fcourse\u002Freinforcement-learning--ud600)\n- [CS885 - Reinforcement Learning (UWaterloo), Spring 2018](https:\u002F\u002Fcs.uwaterloo.ca\u002F~ppoupart\u002Fteaching\u002Fcs885-spring18\u002F)\n- [CS294-112 - Deep Reinforcement Learning (UC Berkeley)](http:\u002F\u002Frail.eecs.berkeley.edu\u002Fdeeprlcourse\u002F)\n\nTalks\u002FTutorials:\n\n- [Introduction to Reinforcement Learning (Joelle Pineau @ Deep Learning Summer School 2016)](http:\u002F\u002Fvideolectures.net\u002Fdeeplearning2016_pineau_reinforcement_learning\u002F)\n- [Deep Reinforcement Learning (Pieter Abbeel @ Deep Learning Summer School 2016)](http:\u002F\u002Fvideolectures.net\u002Fdeeplearning2016_abbeel_deep_reinforcement\u002F)\n- [Deep Reinforcement Learning ICML 2016 Tutorial (David Silver)](http:\u002F\u002Ftechtalks.tv\u002Ftalks\u002Fdeep-reinforcement-learning\u002F62360\u002F)\n- [Tutorial: Introduction to Reinforcement Learning with Function Approximation](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=ggqnxyjaKe4)\n- [John Schulman - Deep Reinforcement Learning (4 Lectures)](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLjKEIQlKCTZYN3CYBlj8r58SbNorobqcp)\n- [Deep Reinforcement Learning Slides @ NIPS 2016](http:\u002F\u002Fpeople.eecs.berkeley.edu\u002F~pabbeel\u002Fnips-tutorial-policy-optimization-Schulman-Abbeel.pdf)\n- [OpenAI Spinning Up](https:\u002F\u002Fspinningup.openai.com\u002Fen\u002Flatest\u002Fuser\u002Fintroduction.html)\n- [Advanced Deep Learning & Reinforcement Learning (UCL 2018, DeepMind)](https:\u002F\u002Fwww.youtube.com\u002Fplaylist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)\n-[Deep RL Bootcamp](https:\u002F\u002Fsites.google.com\u002Fview\u002Fdeep-rl-bootcamp\u002Flectures)\n\nOther Projects:\n\n- [carpedm20\u002Fdeep-rl-tensorflow](https:\u002F\u002Fgithub.com\u002Fcarpedm20\u002Fdeep-rl-tensorflow)\n- [matthiasplappert\u002Fkeras-rl](https:\u002F\u002Fgithub.com\u002Fmatthiasplappert\u002Fkeras-rl)\n\nSelected Papers:\n\n- [Human-Level Control through Deep Reinforcement Learning (2015-02)](http:\u002F\u002Fwww.readcube.com\u002Farticles\u002F10.1038\u002Fnature14236)\n- [Deep Reinforcement Learning with Double Q-learning (2015-09)](http:\u002F\u002Farxiv.org\u002Fabs\u002F1509.06461)\n- [Continuous control with deep reinforcement learning (2015-09)](https:\u002F\u002Farxiv.org\u002Fabs\u002F1509.02971)\n- [Prioritized Experience Replay (2015-11)](http:\u002F\u002Farxiv.org\u002Fabs\u002F1511.05952)\n- [Dueling Network Architectures for Deep Reinforcement Learning (2015-11)](http:\u002F\u002Farxiv.org\u002Fabs\u002F1511.06581)\n- [Asynchronous Methods for Deep Reinforcement Learning (2016-02)](http:\u002F\u002Farxiv.org\u002Fabs\u002F1602.01783)\n- [Deep Reinforcement Learning from Self-Play in Imperfect-Information Games (2016-03)](http:\u002F\u002Farxiv.org\u002Fabs\u002F1603.01121)\n- [Mastering the game of Go with deep neural networks and tree search](https:\u002F\u002Fgogameguru.com\u002Fi\u002F2016\u002F03\u002Fdeepmind-mastering-go.pdf)\n","该项目提供了强化学习算法的实现代码、练习及解决方案，旨在作为理论材料的学习辅助工具。核心功能包括动态规划、蒙特卡洛方法、时序差分学习、函数逼近以及深度Q学习等主流强化学习技术的实现，并使用Python 3编写，结合了OpenAI Gym环境和TensorFlow框架来支持神经网络的应用。适合于正在学习《强化学习：入门》（第二版）或David Silver的强化学习课程的学生与研究人员，通过实践加深对概念的理解和技术的掌握。",2,"2026-06-11 03:38:13","high_star"]