Abstract
In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the additional reward levels. Then, we construct the corresponding probability spaces, and also establish the non-explosion of the state process. Secondly, under suitable conditions we prove that the value function is a solution to the optimality equation for the probability criterion by an iteration technique, and obtain a value iteration algorithm to compute (at least approximate) the value function. Furthermore, under an additional condition we establish the uniqueness of the solution to the optimality equation and the existence of an optimal policy. Finally, we illustrate our results with two examples. The first one is used to verify our conditions for CTMDPs with unbounded transition rates, the second one for the numerical calculation of the value function and an optimal policy.
Similar content being viewed by others
References
Anderson WJ (1991) Continuous-time Markov chains. Springer
Baüerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer
Bertsekas D, Shreve S (1996) Stochastic optimal control: the discrete-time case. Academic Press, Inc
Bouakiz M, kebir Y (1995) Target-level criterion in Markov decision process. J Optim Theory Appl 86:1–15
Chung KL (1967) Markov chains with stationary transition probabilities. Springer
Cao XR, Guo XP (2004) A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases. Automatica 40:1749–1759
Cao XR (2007) Stochastic learning and optimization-a sensitivity-based approach. Springer
Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discrete Event Dyn Syst 21:11–38
Feinberg E (2012) Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. Optimization Control and Applications of Stochastic Systems Springer pp 77–97
Guo XP (2007) Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math Oper Res 32:73–87
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer
Guo XP, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132
Guo XP, Huang YH, Song XY (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J Control Optim 50:23–47
Guo XP, Huang XX, Huang YH (2015) Finite-horizon optimality for continuous-time Markov decision processs with unbounded transition rates. Adv Appl Prob 47(4):1064–1087
Huang YH, Guo XP, Song XY (2011) Performance analysis for controlled semi-Markov process. J Optim Theory Appl 150:395–415
Huang YH, Guo XP, Li ZF (2013) Minimum risk probability for finite horizon semi-Markov decision process. J Math Anal Appl 402:378–391
Huang XX, Zou XL, Guo XP (2015) A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math 58:1923–1938
Hong LJ, Liu G (2009) Simulating sensitivities of conditional value at risk. Manag Sci 55:281–293
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer
Janssen J, Manca R (2006) Semi-Markov risk models for finance, insurance, and reliability. Springer Mathematics 319:24–37
Li YJ, Cao F (2013) A basic formula for performance gradient estimation of semi-Markov decision processes. Eur J Oper Res 224:333–339
Ohtsubo Y, Toyonaga K (2002) Optimal policy for minimizing risk models in Markov decision processes. J Math Anal Appl 271:66–81
Ohtsubo Y (2003) Minimizing risk models in stochastic shortest path problems. Mathe Meth Oper Res 57:79–88
Sakaguchi M, Ohtsubo Y (2013) Markov decision processes associated with two threshold probability criteria. J Control Theory Appl 11:548–557
Prieto-Rumeau T, Hernández-Lerma O (2012) Discounted continuous-time controlled Markov chains: convergence of control models. J Appl Probab 49:1072–1090
Piunovskiy A, Zhang Y (2011) Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J Control Optim 49:2032–2061
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley , New York
Peng YJ, Fu M, Hu JQ (2016) Estimating distribution sensitivity using generalized likelihood ratio method. WODES, Xi’an, China
Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:744–802
White DJ (1993) Minimizing a threshold probability in discounted Markov decision processes. J Math Anal Appl 173:634–646
Wu CB, Lin YL (1999) Minimizing risk models in Markov decision processes with policies depending on target values. J Math Anal Appl 231:47–67
Xi HS, Tang H, Yin BQ (2003) Optimal policies for a continuous time MCP with compact action set. Acta Automat Sinica 29:206–211
Xia L, Jia QS (2015) Parameterized Markov decision process and its application to service rate control. Automatica 54:29–35
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huo, H., Zou, X. & Guo, X. The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dyn Syst 27, 675–699 (2017). https://doi.org/10.1007/s10626-017-0257-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10626-017-0257-6