The risk probability criterion for discounted continuous-time Markov decision processes | Discrete Event Dynamic Systems Skip to main content
Log in

The risk probability criterion for discounted continuous-time Markov decision processes

  • Published:
Discrete Event Dynamic Systems Aims and scope Submit manuscript

Abstract

In this paper, we consider the risk probability minimization problem for infinite discounted continuous-time Markov decision processes (CTMDPs) with unbounded transition rates. First, we introduce a class of policies depending on histories with the additional reward levels. Then, we construct the corresponding probability spaces, and also establish the non-explosion of the state process. Secondly, under suitable conditions we prove that the value function is a solution to the optimality equation for the probability criterion by an iteration technique, and obtain a value iteration algorithm to compute (at least approximate) the value function. Furthermore, under an additional condition we establish the uniqueness of the solution to the optimality equation and the existence of an optimal policy. Finally, we illustrate our results with two examples. The first one is used to verify our conditions for CTMDPs with unbounded transition rates, the second one for the numerical calculation of the value function and an optimal policy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Anderson WJ (1991) Continuous-time Markov chains. Springer

  • Baüerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer

  • Bertsekas D, Shreve S (1996) Stochastic optimal control: the discrete-time case. Academic Press, Inc

  • Bouakiz M, kebir Y (1995) Target-level criterion in Markov decision process. J Optim Theory Appl 86:1–15

    Article  MathSciNet  MATH  Google Scholar 

  • Chung KL (1967) Markov chains with stationary transition probabilities. Springer

  • Cao XR, Guo XP (2004) A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases. Automatica 40:1749–1759

    Article  MathSciNet  MATH  Google Scholar 

  • Cao XR (2007) Stochastic learning and optimization-a sensitivity-based approach. Springer

  • Cao XR, Wang DX, Lu T, Xu YF (2011) Stochastic control via direct comparison. Discrete Event Dyn Syst 21:11–38

    Article  MathSciNet  MATH  Google Scholar 

  • Feinberg E (2012) Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. Optimization Control and Applications of Stochastic Systems Springer pp 77–97

  • Guo XP (2007) Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math Oper Res 32:73–87

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Hernández-Lerma O (2009) Continuous-time Markov decision processes. Springer

  • Guo XP, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Huang YH, Song XY (2012) Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J Control Optim 50:23–47

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Huang XX, Huang YH (2015) Finite-horizon optimality for continuous-time Markov decision processs with unbounded transition rates. Adv Appl Prob 47(4):1064–1087

    Article  MathSciNet  MATH  Google Scholar 

  • Huang YH, Guo XP, Song XY (2011) Performance analysis for controlled semi-Markov process. J Optim Theory Appl 150:395–415

    Article  MathSciNet  MATH  Google Scholar 

  • Huang YH, Guo XP, Li ZF (2013) Minimum risk probability for finite horizon semi-Markov decision process. J Math Anal Appl 402:378–391

    Article  MathSciNet  MATH  Google Scholar 

  • Huang XX, Zou XL, Guo XP (2015) A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math 58:1923–1938

    Article  MathSciNet  MATH  Google Scholar 

  • Hong LJ, Liu G (2009) Simulating sensitivities of conditional value at risk. Manag Sci 55:281–293

    Article  MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer

  • Janssen J, Manca R (2006) Semi-Markov risk models for finance, insurance, and reliability. Springer Mathematics 319:24–37

    MATH  Google Scholar 

  • Li YJ, Cao F (2013) A basic formula for performance gradient estimation of semi-Markov decision processes. Eur J Oper Res 224:333–339

    Article  MathSciNet  MATH  Google Scholar 

  • Ohtsubo Y, Toyonaga K (2002) Optimal policy for minimizing risk models in Markov decision processes. J Math Anal Appl 271:66–81

    Article  MathSciNet  MATH  Google Scholar 

  • Ohtsubo Y (2003) Minimizing risk models in stochastic shortest path problems. Mathe Meth Oper Res 57:79–88

    Article  MathSciNet  MATH  Google Scholar 

  • Sakaguchi M, Ohtsubo Y (2013) Markov decision processes associated with two threshold probability criteria. J Control Theory Appl 11:548–557

    Article  MathSciNet  MATH  Google Scholar 

  • Prieto-Rumeau T, Hernández-Lerma O (2012) Discounted continuous-time controlled Markov chains: convergence of control models. J Appl Probab 49:1072–1090

    MathSciNet  MATH  Google Scholar 

  • Piunovskiy A, Zhang Y (2011) Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J Control Optim 49:2032–2061

    Article  MathSciNet  MATH  Google Scholar 

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley , New York

    Book  MATH  Google Scholar 

  • Peng YJ, Fu M, Hu JQ (2016) Estimating distribution sensitivity using generalized likelihood ratio method. WODES, Xi’an, China

    Book  Google Scholar 

  • Sobel MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19:744–802

    Article  MathSciNet  MATH  Google Scholar 

  • White DJ (1993) Minimizing a threshold probability in discounted Markov decision processes. J Math Anal Appl 173:634–646

    Article  MathSciNet  MATH  Google Scholar 

  • Wu CB, Lin YL (1999) Minimizing risk models in Markov decision processes with policies depending on target values. J Math Anal Appl 231:47–67

    Article  MathSciNet  MATH  Google Scholar 

  • Xi HS, Tang H, Yin BQ (2003) Optimal policies for a continuous time MCP with compact action set. Acta Automat Sinica 29:206–211

    MathSciNet  Google Scholar 

  • Xia L, Jia QS (2015) Parameterized Markov decision process and its application to service rate control. Automatica 54:29–35

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huo, H., Zou, X. & Guo, X. The risk probability criterion for discounted continuous-time Markov decision processes. Discrete Event Dyn Syst 27, 675–699 (2017). https://doi.org/10.1007/s10626-017-0257-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10626-017-0257-6

Keywords

Navigation