The risk probability criterion for discounted continuous-time Markov decision processes

Huo, Haifeng; Zou, Xiaolong; Guo, Xianping

doi:10.1007/s10626-017-0257-6

The risk probability criterion for discounted continuous-time Markov decision processes

Published: 10 August 2017

Volume 27, pages 675–699, (2017)
Cite this article

Discrete Event Dynamic Systems Aims and scope Submit manuscript

Haifeng Huo¹,
Xiaolong Zou² &
Xianping Guo¹

672 Accesses
Explore all metrics

Abstract

In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the additional reward levels. Then, we construct the corresponding probability spaces, and also establish the non-explosion of the state process. Secondly, under suitable conditions we prove that the value function is a solution to the optimality equation for the probability criterion by an iteration technique, and obtain a value iteration algorithm to compute (at least approximate) the value function. Furthermore, under an additional condition we establish the uniqueness of the solution to the optimality equation and the existence of an optimal policy. Finally, we illustrate our results with two examples. The first one is used to verify our conditions for CTMDPs with unbounded transition rates, the second one for the numerical calculation of the value function and an optimal policy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Article 19 October 2019

First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes

Article 01 July 2022

Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates

Article 10 January 2019

References

Anderson WJ (1991) Continuous-time Markov chains. Springer
Baüerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer
Bertsekas D, Shreve S (1996) Stochastic optimal control: the discrete-time case. Academic Press, Inc
Bouakiz M, kebir Y (1995) Target-level criterion in Markov decision process. J Optim Theory Appl 86:1–15
Article MathSciNet MATH Google Scholar
Chung KL (1967) Markov chains with stationary transition probabilities. Springer
Cao XR, Guo XP (2004) A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases. Automatica 40:1749–1759
Article MathSciNet MATH Google Scholar
Cao XR (2007) Stochastic learning and optimization-a sensitivity-based approach. Springer
Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discrete Event Dyn Syst 21:11–38
Article MathSciNet MATH Google Scholar
Feinberg E (2012) Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. Optimization Control and Applications of Stochastic Systems Springer pp 77–97
Guo XP (2007) Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math Oper Res 32:73–87
Article MathSciNet MATH Google Scholar
Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer
Guo XP, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132
Article MathSciNet MATH Google Scholar
Guo XP, Huang YH, Song XY (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J Control Optim 50:23–47
Article MathSciNet MATH Google Scholar
Guo XP, Huang XX, Huang YH (2015) Finite-horizon optimality for continuous-time Markov decision processs with unbounded transition rates. Adv Appl Prob 47(4):1064–1087
Article MathSciNet MATH Google Scholar
Huang YH, Guo XP, Song XY (2011) Performance analysis for controlled semi-Markov process. J Optim Theory Appl 150:395–415
Article MathSciNet MATH Google Scholar
Huang YH, Guo XP, Li ZF (2013) Minimum risk probability for finite horizon semi-Markov decision process. J Math Anal Appl 402:378–391
Article MathSciNet MATH Google Scholar
Huang XX, Zou XL, Guo XP (2015) A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math 58:1923–1938
Article MathSciNet MATH Google Scholar
Hong LJ, Liu G (2009) Simulating sensitivities of conditional value at risk. Manag Sci 55:281–293
Article MATH Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer
Janssen J, Manca R (2006) Semi-Markov risk models for finance, insurance, and reliability. Springer Mathematics 319:24–37
MATH Google Scholar
Li YJ, Cao F (2013) A basic formula for performance gradient estimation of semi-Markov decision processes. Eur J Oper Res 224:333–339
Article MathSciNet MATH Google Scholar
Ohtsubo Y, Toyonaga K (2002) Optimal policy for minimizing risk models in Markov decision processes. J Math Anal Appl 271:66–81
Article MathSciNet MATH Google Scholar
Ohtsubo Y (2003) Minimizing risk models in stochastic shortest path problems. Mathe Meth Oper Res 57:79–88
Article MathSciNet MATH Google Scholar
Sakaguchi M, Ohtsubo Y (2013) Markov decision processes associated with two threshold probability criteria. J Control Theory Appl 11:548–557
Article MathSciNet MATH Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2012) Discounted continuous-time controlled Markov chains: convergence of control models. J Appl Probab 49:1072–1090
MathSciNet MATH Google Scholar
Piunovskiy A, Zhang Y (2011) Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J Control Optim 49:2032–2061
Article MathSciNet MATH Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley , New York
Book MATH Google Scholar
Peng YJ, Fu M, Hu JQ (2016) Estimating distribution sensitivity using generalized likelihood ratio method. WODES, Xi’an, China
Book Google Scholar
Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:744–802
Article MathSciNet MATH Google Scholar
White DJ (1993) Minimizing a threshold probability in discounted Markov decision processes. J Math Anal Appl 173:634–646
Article MathSciNet MATH Google Scholar
Wu CB, Lin YL (1999) Minimizing risk models in Markov decision processes with policies depending on target values. J Math Anal Appl 231:47–67
Article MathSciNet MATH Google Scholar
Xi HS, Tang H, Yin BQ (2003) Optimal policies for a continuous time MCP with compact action set. Acta Automat Sinica 29:206–211
MathSciNet Google Scholar
Xia L, Jia QS (2015) Parameterized Markov decision process and its application to service rate control. Automatica 54:29–35
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
Haifeng Huo & Xianping Guo
School of Economics and Statistics, Guang Zhou University, Guangzhou, 510006, China
Xiaolong Zou

Authors

Haifeng Huo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Zou
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianping Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huo, H., Zou, X. & Guo, X. The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dyn Syst 27, 675–699 (2017). https://doi.org/10.1007/s10626-017-0257-6

Download citation

Received: 23 May 2016
Accepted: 20 July 2017
Published: 10 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10626-017-0257-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

The risk probability criterion for discounted continuous-time Markov decision processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

First Passage Risk Probability Minimization for Piecewise Deterministic Markov Decision Processes

Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now