Course Notes and Reading

Lectures

  1. Week 1: Course motivation and finite horizon MDPs. (Slides)

  2. Week 2: Infinite horizon and indefinite horizon MDPs. (Slides)

  3. Week 3: Algorithms for tractable state spaces. (Scribe notes)

  4. Week 4: Asynchronous DP, Real-Time DP and Intro to RL . (Scribe notes)

  5. Week 5: Policy Evaluation Part 1. (Scribe notes)

  6. Week 6: Policy Evaluation Part 2. (Scribe notes)

  7. Week 7: Control with value function approxiamtion. (Scribe notes)

  8. Week 8: Exploration in online optimization and Thompson sampling (Slides). (Thompson sampling tutorial)

Readings

  1. Week 1: Bertsekas Volume 1, Sections 1.2-1.4 and Section 3.4 (asset selling example)

  2. Week 2: Bertsekas Volume 2, Sections 1.2 and 1.5. Chapter 3 covers indefinite horizon (in much more depth than in class)

  3. Week 3: Bertsekas Volume 2, Sections 2.1-2.3. More detail on linear programming for MDPs can be found in Martin Puterman's book.