Kybernetika - Article detail

Kybernetika 57 no. 2, 272-294, 2021

Risk probability optimization problem for finite horizon continuous time Markov decision processes with loss rate

Haifeng Huo and Xian WenDOI: 10.14736/kyb-2021-2-0272

Abstract:

This paper presents a study the risk probability optimality for finite horizon continuous-time Markov decision process with loss rate and unbounded transition rates. Under drift condition, which is slightly weaker than the regular condition, as detailed in existing literature on the risk probability optimality Semi-Markov decision processes, we prove that the value function is the unique solution of the corresponding optimality equation, and demonstrate the existence of a risk probability optimization policy using an iteration technique. Furthermore, we provide verification of the imposed condition with two examples of controlled birth-and-death system and risk control, and further demonstrate that a value iteration algorithm can be used to calculate the value function and develop an optimal policy.

Keywords:

optimal policy, continuous-time Markov decision processes, risk probability criterion, loss rate, finite horizon, unbounded transition rate

Classification:

90C40, 60E20

References:

  1. K. Boda, J. A. Filar and Y. L. Lin: Stochastic target hitting time and the problem of early retirement. IEEE Trans. Automat. Control 49 (2004), 409-419.   DOI:10.1109/TAC.2004.824469
  2. M. Bouakiz and Y. Kebir: Target-level criterion in Markov decision process. J. Optim. Theory Appl. 86 (1995), 1-15.   DOI:10.1007/BF02193458
  3. D. Bertsekas and S. Shreve: Stochastic Optimal Control: The Discrete-Time Case. Academic Press Inc, New York 1978   CrossRef
  4. N. Bauerle and U. Rieder: Markov Decision Processes with Applications to Finance. Springer, Heidelberg 2011   CrossRef
  5. E. Feinberg: Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29 (2004), 492-524.   DOI:10.1287/moor.1040.0089
  6. X. P. Guo and O. Hernández-Lerma: Continuous-Time Markov Decision Process: Theorey and Applications. Springer-Verlag, Berlin 2009.   CrossRef
  7. X. P. Guo and A. Piunovskiy: Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Oper. Res. 36 (2011), 105-132.   DOI:10.1287/moor.1100.0477
  8. X. P. Guo, X. X. Huang and Y. H. Huang: Finite-horizon optimality for continuous-time Markov decision processs with unbounded transition rates. Adv. Appl. Prob. 47 (2015), 1064-1087.   DOI:10.1239/aap/1449859800
  9. O. Hernández-Lerma and J. B. Lasserre: Discrete-Time Markov Control Process: Basic Optimality Criteria. Springer-Verlag, New York 1996.   CrossRef
  10. Y. H. Huang and X. P. Guo: Optimal risk probability for first passage models in Semi-Markov processes. J. Math. Anal. Appl. 359 (2009), 404-420.   DOI:10.1016/j.jmaa.2009.05.058
  11. Y. H. Huang and X. P. Guo: First passage models for denumberable Semi-Markov processes with nonnegative discounted cost. Acta. Math. Appl. Sinica 27 (2011), 177-190.   DOI:10.1007/s10255-011-0061-2
  12. Y. H. Huang, X. P. Guo and Z. F. Li: Minimum risk probability for finite horizon semi-Markov decision process. J. Math. Anal. Appl. 402 (2013), 378-391.   DOI:10.1016/j.jmaa.2013.01.021
  13. X. X. Huang, X. L. Zou and X. P. Guo: A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci. China Math. 58 (2015), 1923-1938.   DOI:10.1007/s11425-015-5029-x
  14. H. F. Huo, X. L. Zou and X. P. Guo: The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dynamic system: Theory Appl. 27 (2017), 675-699.   DOI:10.1007/s10626-017-0257-6
  15. H. F. Huo and X. Wen: First passage risk probability optimality for continuous time Markov decision processes. Kybernetika 55 (2019), 114-133.   DOI:10.14736/kyb-2019-1-0114
  16. H. F. Huo and X.P. Guo: Risk probability minimization problems for continuous time Markov decision processes on finite horizon. IEEE trans. Automat. Control 65 (2020), 3199-3206.   DOI:10.1109/TAC.2019.2947654
  17. J. Jacod: Multivariate point processes: Predictable projection, Radon-Nicodym derivatives, representation of martingales. Z. Wahrscheinlichkeitstheorie und verwandte Gebiete 31 (1975), 235-253.   DOI:10.1007/BF00536010
  18. J. Janssen and R. Manca: Semi-Markov Risk Models For Finance, Insurance, and Reliability. Springer-Verlag, New York 2006.   CrossRef
  19. Q. L. Liu and X. L. Zou: A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. J. Dynamics Games 5 (2018), 143-163.   DOI:10.3934/jdg.2018009
  20. A. Piunovskiy and Y. Zhang: Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optim. 49 (2011), 2032-2061.   DOI:10.1137/10081366x
  21. Y. Ohtsubo and K. Toyonaga: Optimal policy for minimizing risk models in Markov decision processes. J. Math. Anal. Appl. 271 (2002), 66-81.   DOI:10.1016/S0022-247X(02)00097-5
  22. Y. Ohtsubo: Risk minimization in optimal stopping problem and applications. J. Oper. Res. Soc. Japan 46 (2003), 342-352.   DOI:10.15807/jorsj.46.342
  23. Y. Ohtsubo and K. Toyonaga: Equivalence classes for optimizing risk models in Markov decision processes. Math. Methods Oper. Res. 60 (2004), 239-250.   DOI:10.1007/s001860400361
  24. M. L. Puterman: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York 1994.   CrossRef
  25. M. Sakaguchi and Y. Ohtsubo: Optimal threshold probability and expectation in semi-Markov decision processes. Appl. Math. Comput. 216 (2010), 2947-2958.   DOI:10.1007/s001860400361
  26. M. J. Sobel: The variance of discounted Markov decision processes. J. Appl. Probab. 19 (1982), 744-802.   CrossRef
  27. Q. D. Wei and X. P. Guo: Constrained semi-Markov decision processes with ratio and time expected average criteria in Polish spaces. Optimization 64 (2015), 1593-1623.   DOI:10.1080/02331934.2013.860686
  28. D. J. White: Minimizing a threshold probability in discounted Markov decision processes. J. Math. Anal. Appl. Optim. 173 (1993), 634-646.   DOI:10.1006/jmaa.1993.1093
  29. C. B. Wu and Y. L. Lin: Minimizing risk models in Markov decision processes with policies depending on target values. J. Math. Anal. Appl. 231 (1999), 47-67.   DOI:10.1006/jmaa.1998.6203
  30. R. Wu and K. Fang: A risk model with delay in claim settlement. Acta Math. Applic. Sinica 15 (1999), 352-360.   DOI:/10.1007/BF02684035
  31. S. X. Yu, Y. L. Lin and P. F. Yan: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225 (1998), 193-223.   DOI:10.1006/jmaa.1998.6015
  32. L. Xia: Optimization of Markov decision processes under the variance criterion Automatica 73 (2016), 269-278.   DOI:10.1016/j.automatica.2016.06.018