## Dan Russo – Teaching## OPNS 525: Learning in Sequential Decision Making## Course OverviewThis course offers an advanced introduction to topics at the intersection of statistical (machine) learning and sequential decision-making. A tentative course plan is as follows. We will begin by covering classic work on optimal hypothesis testing when data can be gathered sequentially and interactively. The second part of the class focuses on bandit learning and the design and analysis of algorithms that balance exploration/exploitation. The last part of the course introduces reinforcement learning, including methods for value function approximation and algorithms for efficient exploration. Students should have experience with mathematical proofs, coding for numerical computation, and the basics of statistics, optimization, dynamic programming, and stochastic processes. ## Course LogisticsTuesday/Thursday 2:00-3:30 PM ## Tentative OutlineSequential and Active Hypothesis Testing Wald's sequential probability ratio test and optimal stopping Chernoff's optimal sequential design of experiments for hypothesis testing
Bandit Learning Upper Confidence Bound Algorithms Thompson Sampling Regret analysis Applications to dynamic pricing and the shortest path problem
Reinforcement Learning Value function learning: least-squares value iteration, temporal differences, and Q-learning Parametric approximations to the value function The exploration problem in RL
## Readings## Sequential and Active Hypothesis Testing## Bandit LearningThompson sampling: A tutorial (distributed via email)
## Lecture Notes |