Abstract
This paper deals with discrete-time Markov control processes in Borel spaces, with unbounded rewards. The criterion to be optimized is a long-run sample-path (or pathwise) average reward subject to constraints on a long-run pathwise average cost. To study this pathwise problem, we give conditions for the existence of optimal policies for the problem with “expected” constraints. Moreover, we show that the expected case can be solved by means of a parametric family of optimality equations. These results are then extended to the problem with pathwise constraints.
Similar content being viewed by others
References
Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton
Borkar VS (1994) Ergodic control of Markov chains with constraints—the general case. SIAM J Control Optim 32: 176–186
Ding Y, Jia R, Tang S (2003) Dynamical principal agent model based on CMCP. Math Methods Oper Res 58: 149–157
Djonin DV, Krishnamurthy V (2007) MIMO transmission control in fading channels—a constrained Markov decision process formulation with monotone randomized policies. IEEE Trans Signal Process 55: 5069–5083
Ekeland I, Temam R (1976) Convex analysis and variational problems. North-Holland, Amsterdam
Feinberg E, Shwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21: 922–945
Föllmer H, Schied A (2002) Stochastic finance. An introduction in discrete time. Walter de Gruyter & Co, Berlin
Gordienko E, Hernández-Lerma O (1995) Average cost Markov control processes with weigthed norms: existence of canonical policies. Appl Math (Warsaw) 23: 199–218
Haviv M (1996) On constrained Markov decision processes. Oper Res Lett 19: 25–28
Hernández-Lerma O, González-Hernández J, López-Martínez RR (2003) Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim 42: 442–468
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Hernández-Lerma O, Lasserre JB (2003) Markov chains and invariant probabilities. Birkhäuser Verlag, Basel
Hernández-Lerma O, Vega-Amaya O (1998) Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality. Appl Math (Warsaw) 25: 153–178
Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38: 79–93
Kartashov HV (1985) Inequalities in theorems of ergodicity and stability of Markov chains with common phase space. II. Theory Probab Appl 30: 507–515
Krishnamurthy V, Vázquez Abad F, Martin K (2003) Implementation of gradient estimation to a constrained Markov decision problem. In: 42nd IEEE conference on decision and control pp 4841–4846
Mendoza-Pérez A (2008) Pathwise average reward Markov control processes. Doctoral thesis, CINVESTAV-IPN, México. Available at http://www.math.cinvestav.mx/ohernand_students
Mendoza-Pérez A, Hernández-Lerma O (2009) Markov control processes with pathwise constraints (longer version). Available at http://www.math.cinvestav.mx/sites/default/files/art-MMOR.pdf
Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, London
Prieto-Rumeau T, Hernández-Lerma O (2008) Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J Control Optim 47: 1888–1908
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Boston
Puterman ML (1994) Markov decision process. Wiley, New York
Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints. Oper Res 37: 780–790
Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint. Math Oper Res 16: 195–207
Vega-Amaya O (1998) Markov control processes in Borel spaces: undiscounted criteria. Doctoral thesis, UAM-Iztapalapa, México (In Spanish)
Vega-Amaya O, Montes-de-Oca R (1998) Application of average dynamic programming to inventory systems. Math Methods Oper Res 47: 451–471
Vega-Amaya O (2003) The average cost optimality equation: a fixed point approach. Bol Soc Mat Mexicana 9: 185–195
Vega-Amaya O, Expected and sample-path constrained average Markov decision processes, Internal Report no. 35, Departamento de Matemáticas, Universidad de Sonora. (Submitted)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mendoza-Pérez, A.F., Hernández-Lerma, O. Markov control processes with pathwise constraints. Math Meth Oper Res 71, 477–502 (2010). https://doi.org/10.1007/s00186-010-0311-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-010-0311-8
Keywords
- (discrete-time) Markov control processes
- Average reward criteria
- Pathwise average reward
- Constrained control problems