Course Notes and Reading

Lectures

Week 1: Course motivation and finite horizon MDPs. (Slides)
Week 2: Infinite horizon and indefinite horizon MDPs. (Slides)
Week 3: Algorithms for tractable state spaces. (Scribe notes)
Week 4: Asynchronous DP, Real-Time DP and Intro to RL . (Scribe notes)
Week 5: Policy Evaluation Part 1. (Scribe notes)
Week 6: Policy Evaluation Part 2. (Scribe notes)
Week 7: Control with value function approxiamtion. (Scribe notes)
Week 8: Exploration in online optimization and Thompson sampling (Slides). (Thompson sampling tutorial)

Week 1: Bertsekas Volume 1, Sections 1.2-1.4 and Section 3.4 (asset selling example)
Week 2: Bertsekas Volume 2, Sections 1.2 and 1.5. Chapter 3 covers indefinite horizon (in much more depth than in class)
Week 3: Bertsekas Volume 2, Sections 2.1-2.3. More detail on linear programming for MDPs can be found in Martin Puterman's book.