This is a graduate course at Hangzhou Institute for advanced study, UCAS.
Reinforcement Learning is a method of learning that mimics very much how children start learning: they try out several actions and see if the result satisfies their goals. In this graduate course, students learn how to use this idea in machine learning. The learning algorithm tries out actions and gets “rewarded” depending on the result; by systematically trying out all (combinations of) actions and recording the result, it learns over time to accumulate more and more reward. The course introduces a number of variants of this paradigm of machine learning, which are suitable for a variety of situations. Concretely, students learn about multi-armed bandits, the basic model of Markov chains, Monte Carlo methods, how to use these methods for policy evaluation and for prediction, and finally about approximate methods for large models.