Operations Research Seminars Amsterdam

Herman Blok (Leiden University and Technical University Eindhoven)
Thursday, 14 April 2016

In this talk we discuss continuous time Markov decision processes (MDPs). A powerful way to derive a policy that minimizes the discounted or average cost is via the value function and the optimality equation. If a process is uniformizable the value function of the equivalent discrete time MPD can be approximated via value iteration. The value iteration algorithm can be used to show properties such as convexity that lead to a certain optimal policy.

Unbounded rate MDPs do not allow uniformization, hence discrete time tools are not directly available. One can apply a truncation to make the process uniformizable, however there are some issues: (1) The truncated processes need not have the desired properties due to boundary effects. (2) It is not guaranteed that the truncated processes approach the original model if the truncation size goes to infinity. We will discuss both issues, together these provide a framework for obtaining structural properties for non-uniformizable problems.

As a remedy for the first problem we propose the smoothed rate truncation that preserves properties and is less vulnerable for boundary effects. Some examples are presented that show that the smoothed rate truncation can also have a dramatic advantage in gaining insight via numerical calculations. In a bounded rate MDP with infinite state one needs to apply a state space truncation to get numerical results. Normal truncation can give a completely false idea of the optimal policy, while the smoothed rate truncation preserves the right policy.

For the second issue we present conditions on the Markov decision process that guarantee convergence. For the discounted cost criterion the conditions are rather mild and allow a large class of problems. Analysis of the average cost criterion is dependent on stronger conditions.

Joint work with Floske Spieksma and Sandjai Bhulai