Dynamic Programming and Reinforcement Learning

Schedule: Fall 2017, Monday 1:00-4:00pm
Course Number: B9140-001
Instructor: Daniel Russo
Contact: djr2174@gsb.columbia.edu
Location: 330 Uris until October 16, Grace Dodge Hall 363 thereafter
Office Hours: 418 Uris, Thursday 2:00-3:00 PM
TA Info: Francisco Castro will have office hours Mondays 12:00-1:00 in cubicle 4R Uris hall.

Course Description

This course offers an advanced introduction Markov Decision Processes (MDPs)–a formalization of the problem of optimal sequential decision making under uncertainty–and Reinforcement Learning (RL)–a paradigm for learning from data to make near optimal sequential decisions. The first part of the course will cover foundational material on MDPs. We'll then look at the problem of estimating long run value from data, including popular RL algorithms like temporal difference learning and Q-learning. The final part of the course looks at the design and analysis of efficient exploration algorithms, i.e. those that intelligently probe the environment to collect data that improves decision quality. This a doctoral level course. Students should have experience with mathematical proofs, coding for numerical computation, and the basics of statistics, optimization, and stochastic processes.

Course Requirements

There will be some homework problems in the beginning of class covering fundemental material on MDPs. Afterward, the course will run like a doctoral seminar. You will be expected to engage with the material and to read some papers outside of class. The main assignment will be a course project, which could involve literature review, implementation of algorithms, or original research.

Textbooks

Strongly Reccomended: Dynamic Programming and Optimal Control, Vol I & II, Dimitris Bertsekas
These two volumes will be our main reference on MDPs, and I will reccomend some readings from them during first few weeks. The books also cover a lot of material on approximate DP and reinforcement learning.

Reinforcement Learning: An Introduction, Second Edition, Richard Sutton and Andrew Barto
A pdf of the working draft is freely available.

Algorithms for Reinforcement Learning, Csaba Czepesvári
A consise treatment, also freely available.

Course Meetings

Following the business school calendar, there will be no class on October 23 or November 6. Otherwise, we will meet every Monday from September 11 to December 11.

Related Courses at Columbia

This course complements two others that will be offered this Fall. Depending on your interests, you may wish to also enroll in one of these courses, or even both.

COMS E6998.001: Alekh Agarwal and Alex Slivkins from Microsoft Research will offer course on Bandits and RL this Fall. Relative to this course, theirs will have much greater focus on contextual bandit problems and regret analyses.

B9120-001: Awi Federgruen is teaching a full semester course on dynamic programming. This course should provide a more thorough treatment of the general theory of dynamic programming as well as of structural analysis of specific dynamic programs arising in important application areas.