Dynamic-Optimization-Course

# Dynamic programming and Optimal Control

Course Information

Schedule: Fall 2023, Thursdays 9:00am - 12:15pm Location: 430 Kravis Hall

Professor: Daniel Russo Office hours: 12:15-1:00 PM or by appointment

TA: David Cheikhi

Course Number: B9120-001

Course description:

This course serves as an advanced introduction to dynamic programming and optimal control. The first part of the course will cover problem formulation and problem specific solution ideas arising in canonical control problems. The second part of the course covers algorithms, treating foundations of approximate dynamic programming and reinforcement learning alongside exact dynamic programming algorithms.

Prerequisites

Markov chains; Some python and some linear programming; mathematical maturity (this is a doctoral course);

Textbooks

Dynamic Programming and Optimal Control by Dimitris Bertsekas, 4th Edition, Volumes I and II.

Class notes (and tentative schedule)

#	Date	Topic	Notes	Reading
		Part I: Problem formulation and special problem structure
1	9/7	Intro	pdf	Vol 1, Sec 1.2-1.4
2	9/7	Optimal stopping	pdf	Vol 1, Sec 3.4
3	9/14	Inventory control	pdf	Vol 1, Sec 3.2
4	9/14	Linear quadratic control	pdf	Vol 1, Sec 3.1
5	9/21	Imperfect state information; reformulation as a problem with perfect state information; separation principle	pdf	Vol 1, Sec 4.1-4.3
6	9/28	Discounted infinite horizon objectives.	pdf	Vol 2 Sec 1.4
7	10/5	Continuous time, discrete event problems; uniformization	pdf	Vol 2 Sec 1.5
8	10/5	Gittins index theorem; Applications to scheduling and optimal learning;	pdf1, pdf2, pdf3
9	10/12	Gittins index theorem continued
	10/19	No B-school classes scheduled
		Part 2: Algorithms and approximations
10	10/26	Infinite (or indefinite) horizon problems without discounting: average cost objectives & stochastic shortest path problems.	pdf
11	11/2	Online value iteration.	pdf
12	11/9	Approximate value iteration.	pdf
12	11/9	Approximation benefits of online value iteration.	pdf
13	11/16	Policy iteration; impact of distribution shift on convergence rate.	pdf
	11/23	No B-school classes scheduled
14	11/30	Policy gradient methods	pdf
15	Dec 7

Topics for part 2 are yet to be determined. Some possibilities include:

Temporal difference learning and parametric value function approximation.
Policy gradient methods and their global convergence; differentiable simulators;
Efficient exploration; optimism in the face of uncertainty
Problems with many “weakly coupled” components; Lagrangian relaxations

Evaluation

Homework (90%). Lecture scribing and class participation (10%).

Homework

We will have a short homework each week. Please write down a precise, rigorous, formulation of all word problems. For example, specify the state space, the cost functions at each state, etc.

Due September 14. Vol 1. Problems 1.23, 1.24, 3.18. Scan
Due September 21. pdf
Due September 29. see canvas annoucement
Due October 5. pdf
Due October 12. pdf
Due October 26. Vol 2. Problem 1.8. You may assume that the state space of each bandit process is finite.
Due November 2. pdf
Due November 17. pdf
Due December 9. pdf