Markov control processes with pathwise constraints | Mathematical Methods of Operations Research
Skip to main content

Markov control processes with pathwise constraints

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

This paper deals with discrete-time Markov control processes in Borel spaces, with unbounded rewards. The criterion to be optimized is a long-run sample-path (or pathwise) average reward subject to constraints on a long-run pathwise average cost. To study this pathwise problem, we give conditions for the existence of optimal policies for the problem with “expected” constraints. Moreover, we show that the expected case can be solved by means of a parametric family of optimality equations. These results are then extended to the problem with pathwise constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Borkar VS (1994) Ergodic control of Markov chains with constraints—the general case. SIAM J Control Optim 32: 176–186

    Article  MATH  MathSciNet  Google Scholar 

  • Ding Y, Jia R, Tang S (2003) Dynamical principal agent model based on CMCP. Math Methods Oper Res 58: 149–157

    Article  MATH  MathSciNet  Google Scholar 

  • Djonin DV, Krishnamurthy V (2007) MIMO transmission control in fading channels—a constrained Markov decision process formulation with monotone randomized policies. IEEE Trans Signal Process 55: 5069–5083

    Article  Google Scholar 

  • Ekeland I, Temam R (1976) Convex analysis and variational problems. North-Holland, Amsterdam

    MATH  Google Scholar 

  • Feinberg E, Shwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21: 922–945

    Article  MATH  MathSciNet  Google Scholar 

  • Föllmer H, Schied A (2002) Stochastic finance. An introduction in discrete time. Walter de Gruyter & Co, Berlin

    Book  MATH  Google Scholar 

  • Gordienko E, Hernández-Lerma O (1995) Average cost Markov control processes with weigthed norms: existence of canonical policies. Appl Math (Warsaw) 23: 199–218

    MATH  MathSciNet  Google Scholar 

  • Haviv M (1996) On constrained Markov decision processes. Oper Res Lett 19: 25–28

    Article  MATH  MathSciNet  Google Scholar 

  • Hernández-Lerma O, González-Hernández J, López-Martínez RR (2003) Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim 42: 442–468

    Article  MATH  MathSciNet  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York

    Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York

    MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (2003) Markov chains and invariant probabilities. Birkhäuser Verlag, Basel

    MATH  Google Scholar 

  • Hernández-Lerma O, Vega-Amaya O (1998) Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality. Appl Math (Warsaw) 25: 153–178

    MATH  MathSciNet  Google Scholar 

  • Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38: 79–93

    Article  MATH  MathSciNet  Google Scholar 

  • Kartashov HV (1985) Inequalities in theorems of ergodicity and stability of Markov chains with common phase space. II. Theory Probab Appl 30: 507–515

    Article  Google Scholar 

  • Krishnamurthy V, Vázquez Abad F, Martin K (2003) Implementation of gradient estimation to a constrained Markov decision problem. In: 42nd IEEE conference on decision and control pp 4841–4846

  • Mendoza-Pérez A (2008) Pathwise average reward Markov control processes. Doctoral thesis, CINVESTAV-IPN, México. Available at http://www.math.cinvestav.mx/ohernand_students

  • Mendoza-Pérez A, Hernández-Lerma O (2009) Markov control processes with pathwise constraints (longer version). Available at http://www.math.cinvestav.mx/sites/default/files/art-MMOR.pdf

  • Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, London

    MATH  Google Scholar 

  • Prieto-Rumeau T, Hernández-Lerma O (2008) Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J Control Optim 47: 1888–1908

    Article  MATH  MathSciNet  Google Scholar 

  • Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Boston

    MATH  Google Scholar 

  • Puterman ML (1994) Markov decision process. Wiley, New York

    Book  Google Scholar 

  • Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints. Oper Res 37: 780–790

    Article  MATH  MathSciNet  Google Scholar 

  • Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint. Math Oper Res 16: 195–207

    Article  MATH  MathSciNet  Google Scholar 

  • Vega-Amaya O (1998) Markov control processes in Borel spaces: undiscounted criteria. Doctoral thesis, UAM-Iztapalapa, México (In Spanish)

  • Vega-Amaya O, Montes-de-Oca R (1998) Application of average dynamic programming to inventory systems. Math Methods Oper Res 47: 451–471

    Article  MATH  MathSciNet  Google Scholar 

  • Vega-Amaya O (2003) The average cost optimality equation: a fixed point approach. Bol Soc Mat Mexicana 9: 185–195

    MATH  MathSciNet  Google Scholar 

  • Vega-Amaya O, Expected and sample-path constrained average Markov decision processes, Internal Report no. 35, Departamento de Matemáticas, Universidad de Sonora. (Submitted)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Onésimo Hernández-Lerma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mendoza-Pérez, A.F., Hernández-Lerma, O. Markov control processes with pathwise constraints. Math Meth Oper Res 71, 477–502 (2010). https://doi.org/10.1007/s00186-010-0311-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-010-0311-8

Keywords

Mathematics Subject Classification (2000)