【经典书】强化学习算法,98页pdf

强化学习是一种学习范式，它关注的是如何控制一个系统，从而最大化一个表示长期目标的数值性能度量。强化学习与监督学习的区别在于，对于学习器的预测，只会给予部分反馈。此外，这些预测可能通过影响被控制系统的未来状态而产生长期影响。因此，时间扮演着特殊的角色。强化学习的目标是发展有效的学习算法，以及了解算法的优点和局限性。强化学习之所以引起人们极大的兴趣，是因为它可以用于解决大量的实际应用，从人工智能到运筹学或控制工程的问题。在这本书中，我们专注于那些建立在强大的动态规划理论基础上的强化学习算法。我们给出了一个相当全面的学习问题的目录，描述了核心思想，关注大量的最先进的算法，然后讨论了它们的理论性质和局限性。

https://sites.ualberta.ca/~szepesva/rlbook.html

目录内容

Preface ix
Acknowledgments xiii
Markov Decision Processes 1

Preliminaries 1
Markov Decision Processes 1
Value functions 6
Dynamic programming algorithms for solving MDPs 10

Value Prediction Problems 11

TD(lambda) with function approximation 22
Gradient temporal difference learning 25
Least-squares methods 27
The choice of the function space 33
Tabular TD(0) 11
Every-visit Monte-Carlo 14
TD(lambda): Unifying Monte-Carlo and TD(0) 16

Temporal difference learning in finite state spaces 11
Algorithms for large state spaces 18

Control 37

Implementing a critic 54
Implementing an actor 56
Q-learning in finite MDPs 47
Q-learning with function approximation 49
Online learning in bandits 38
Active learning in bandits 40
Active learning in Markov Decision Processes 41
Online learning in Markov Decision Processes 42

A catalog of learning problems 37
Closed-loop interactive learning 38
Direct methods 47
Actor-critic methods 52

For Further Exploration 63

Further reading 63
Applications 63
Software 64

Appendix: The Theory of Discounted Markovian Decision Processes 65

A.1 Contractions and Banach’s fixed-point theorem 65
A.2 Application to MDPs 69

Bibliography 73
Author's Biography 89

专知便捷查看

便捷下载，请关注专知公众号（点击上方蓝色专知关注）
后台回复“RL98” 就可以获取《【经典书】强化学习算法，98页pdf》专知下载链接

专知，专业可信的人工智能知识分发，让认知协作更快更好！欢迎注册登录专知www.zhuanzhi.ai，获取5000+AI主题干货知识资料！

欢迎微信扫一扫加入专知人工智能知识星球群，获取最新AI专业干货知识教程资料和与专家交流咨询！

点击“阅读原文”，了解使用专知，查看获取5000+AI主题知识资源