Dan Russo – Teaching

OPNS 525: Learning in Sequential Decision Making

Course Overview

This course offers an advanced introduction to topics at the intersection of statistical (machine) learning and sequential decision-making. A tentative course plan is as follows. We will begin by covering classic work on optimal hypothesis testing when data can be gathered sequentially and interactively. The second part of the class focuses on bandit learning and the design and analysis of algorithms that balance exploration/exploitation. The last part of the course introduces reinforcement learning, including methods for value function approximation and algorithms for efficient exploration. Students should have experience with mathematical proofs, coding for numerical computation, and the basics of statistics, optimization, dynamic programming, and stochastic processes.

Course Logistics

Tuesday/Thursday 2:00-3:30 PM
Kellogg Global Hub 4302

Office Hours: Thursday 3:30-4:30 PM or by appointment
Office: Kellogg Global Hub 4171

Tentative Outline

  1. Sequential and Active Hypothesis Testing

    1. Wald's sequential probability ratio test and optimal stopping

    2. Chernoff's optimal sequential design of experiments for hypothesis testing

  2. Bandit Learning

    1. Upper Confidence Bound Algorithms

    2. Thompson Sampling

    3. Regret analysis

    4. Applications to dynamic pricing and the shortest path problem

  3. Reinforcement Learning

    1. Value function learning: least-squares value iteration, temporal differences, and Q-learning

    2. Parametric approximations to the value function

    3. The exploration problem in RL

Readings

Sequential and Active Hypothesis Testing

Bandit Learning

  • Thompson sampling: A tutorial (distributed via email)

Lecture Notes

Lecture note template

  1. Lecture 1

  2. Lecture 4

  3. Gaussian UCB Regret Bound