Abstract
This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes (DTMDPs) with constraints, state-action dependent discount factors, and possibly unbounded costs. Using the convex analytic approach under mild conditions, we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the limit" one. Furthermore, we show that any countable- state DTMDP can be approximated by a sequence of finite-state DTMDPs, which are constructed using the truncation technique. Finally, we illustrate the approximation by solving a controlled queueing system numeri- cally, and give the corresponding error bound of the approximation.
Similar content being viewed by others
References
Almudevar A. Approximate Iterative Algorithms. Leiden: CRC Press/Balkema, 2014
Altman E. Denumerable constrained Markov decision processes and finite approximations. Math Methods Oper Res, 1994, 19: 169–191
Altman E. Constrained Markov Decision Processes. Florida: Chapman & Hall/CRC, 1999
Alvarez-Mena J, Hernández-Lerma O. Convergence of the optimal values of constrained Markov control processes. Math Methods Oper Res, 2002, 55: 461–484
Feinberg E A. Constrained discounted Markov decision processes and Hamiltonian cycles. Math Oper Res, 2000, 25: 130–140
Feinberg E A, Shwartz A. Constrained dynamic programming with two discount factors: Applications and an algorithm. IEEE Trans Automat Control, 1999, 44: 628–631
González-Hernández J, Hernández-Lerma O. Extreme points of sets of randomized strategies in constrained optimiza-tion and control problems. SIAM J Optim, 2005, 15: 1085–1104
González-Hernández J, López-Martńez R R, Minjárez-Sosa J A. Approximation, estimation, and control of stochastic systems under a randomized discounted cost criterion. Kybernetika (Prague), 2009, 45: 737–754
Guo X P, Hernández-Lerma O. Continuous-Time Markov Decision Processes. Berlin-Heidelberg: Springer-Verlag, 2009
Guo X P, Piunovskiy A. Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates. Math Oper Res, 2011, 36: 105–132
Guo X P, Zhang W Z. Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. European J Oper Res, 2014, 238: 486–496
Hernández-Lerma O, González-Hernández J. Constrained Markov decision processes in Borel spaces: The discounted case. Math Methods Oper Res, 2000, 52: 271–285
Hernández-Lerma O, González-Hernández J, López-Martńez R R. Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim, 2003, 42: 442–468
Hernández-Lerma O, Lasserre J B. Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1996
Hernández-Lerma O, Lasserre J B. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999
Hinderer K, Waldmann K-H. Algorithms for countable state Markov decision models with an absorbing set. SIAM J Control Optim, 2005, 43: 2109–2131
Huang X X, Zou X L, Guo X P. A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math, 2015, 58: 1923–1938
Huang Y H, Wei Q D, Guo X P. Constrained Markov decision processes with first passage criteria. Ann Oper Res, 2013, 206: 197–219
Mao X, Piunovskiy A. Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl, 2000, 18: 755–776
Piunovskiy A. Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic, 1997
Piunovskiy A. Controlled random sequences: The convex analytic approach and constrained problems. Russian Math Surveys, 2000, 53: 1233–1293
Prieto-Rumeau T, Hernández-Lerma O. Discounted continuous-time controlled Markov chains: Convergence of control models. J Appl Probab, 2012, 49: 1072–1090
Prokhorov Y. Convergence of random processes and limit theorems in probability theory. Theory Probab Appl, 1956, 1: 157–214
Puterman, M L. Markov Decision Processes. New York: Wiley, 1994
Saldi N, Linder T, Yuksel S. Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans Automat Control, 2015, 60: 553–558
Sennott L I. Constrained discounted Markov decision chains. Probab Engrg Inform Sci, 1991, 5: 463–475
Sennott L I. Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999
Wei Q D, Guo X P. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett, 2011, 39: 369–374
Wu X, Guo X P. First passage optimality and variance minimization of Markov decision processes with varying discount factors. J Appl Probab, 2015, 52: 441–456
Wu X, Zhang J Y. Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. Discrete Event Dyn Syst, 2016, 26: 669–683
Wu X, Zou X L, Guo X P. First passage Markov decision processes with constraints and varying discount factors. Front Math China, 2015, 10: 1005–1023
Zhang W Z, Guo X P. Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. Sci China Math, 2012, 55: 2405–2416
Zhang Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP, 2013, 21: 378–408
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61374067 and 41271076).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, X., Guo, X. Convergence of Markov decision processes with constraints and state-action dependent discount factors. Sci. China Math. 63, 167–182 (2020). https://doi.org/10.1007/s11425-017-9292-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-017-9292-1
Keywords
- discrete-time Markov decision processes
- state-action dependent discount factors
- unbounded costs
- convergence