Zero-sum infinite-horizon discounted piecewise deterministic Markov games | Mathematical Methods of Operations Research Skip to main content
Log in

Zero-sum infinite-horizon discounted piecewise deterministic Markov games

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

This paper is devoted to zero-sum piecewise deterministic Markov games with Borel state and action spaces, where the expected infinite-horizon discounted payoff criterion is considered. Both the transition rate and payoff rate are allowed to be unbounded. The policies of the two players are history-dependent, and the controls continuously act on the transition rate and the payoff rate. Under suitable conditions, Dynkin’s formula and the comparison theorem are developed in our setup, via which the game is shown to have the value function as the unique solution to the associated Shapley equation. By the Shapley equation in the form of a differential equation, we establish the existence of a saddle point with a very simple form, which only depends on the current state and can be applied at any time. A potential algorithm for computing saddle points is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Avrachenkov K, Habachi O, Piunovskiy A, Zhang Y (2015) Infinite horizon impulsive optimal control with applications to Internet congestion control. Int J Control 88:703–716

    Article  MATH  Google Scholar 

  • Barron EN (2013) Game Theory: An Introduction. Wiley series in operations research and management science, 2nd edn. Wiley series in operations research and management scienceWiley series in operations research and management scienceWiley series in operations research and management scienceWiley series in operations research and management science. John Wiley and Sons Inc, Hoboken, NJ

    Book  Google Scholar 

  • Bäuerle N, Rieder U (2011) Markov Decision Processes with Applications to Finance. Chapter 8: piecewise deterministic Markov decision processes. Universitext, Springer, Heidelberg

  • Costa OLV, Dufour F, Piunovskiy AB (2016) Constrained and unconstrained optimal discounted control of piecewise deterministic Markov processes. SIAM J Control Optim 54:1444–1474

    Article  MathSciNet  MATH  Google Scholar 

  • Costa OLV, Dufour F (2018) Zero-sum discounted reward criterion games for piecewise deterministic Markov processes. Appl Math Optim 78:587–611

    Article  MathSciNet  MATH  Google Scholar 

  • Fan K (1953) Minimax theorems. Proc Natl Acad Sci USA 39:42–47

    Article  MathSciNet  MATH  Google Scholar 

  • Gensbittel F, Renault J (2015) The value of Markov chain games with incomplete information on both sides. Math Oper Res 40:820–841

    Article  MathSciNet  MATH  Google Scholar 

  • Ghosh MK, Goswami A (2006) Partially observable semi-Markov games with discounted payoff. Stoch Anal Appl 24:1035–1059

    Article  MathSciNet  MATH  Google Scholar 

  • Guo X, Zhang Y (2017) Zero-sum continuous-time Markov pure jump game over a fixed duration. J Math Anal Appl 452:1194–1208

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. J Appl Probab 40:327–345

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Hernández-Lerma O (2007) Zero-sum games for continuous-time jump Markov processes in Polish spaces: discounted payoffs. Adv Appl Probab 39:645–668

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Song XY (2011) Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann Appl Probab 21:2016–2049

    Article  MathSciNet  MATH  Google Scholar 

  • Haurie A, Krawczyk JB, Zaccour G (2012) Games and Dynamic Games. World Scientific Now Publishers Series in Business, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ

  • Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Huang YH, Guo XP (2019) Finite-horizon piecewise deterministic Markov decision processes with unbounded transition rates. Stochastics 91:67–95

    Article  MathSciNet  MATH  Google Scholar 

  • Jacod J (1975) Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Z.Wahrscheinlichkeitstheorie und Verw Gebiete 31:235–253

    Article  MathSciNet  MATH  Google Scholar 

  • Jaśkiewicz A (2009) Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J Optim Theory Appl 141:321–347

    Article  MathSciNet  MATH  Google Scholar 

  • Lorenzo JM, Hernández-Noriega I, Prieto-Rumeau T (2015) Approximation of two-person zero-sum continuous-time Markov games with average payoff criterion. Oper Res Lett 43:110–116

    Article  MathSciNet  MATH  Google Scholar 

  • Minjárez-Sosa JA (2020) Zero-sum discrete-time Markov games with unknown disturbance distribution. Discounted and average criteria. SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham

  • Mondal P (2017) On zero-sum two-person undiscounted semi-Markov games with a multichain structure. Adv Appl Probab 49:826–849

    Article  MathSciNet  MATH  Google Scholar 

  • Nowak AS (1984) On zero-sum stochastic games with general state space. I. Probab Math Stat 4:13–32

    MathSciNet  MATH  Google Scholar 

  • Piunovskiy A, Zhang Y (2021) Aggregated occupation measures and linear programming approach to constrained impulse control problems. J Math Anal Appl 499(2):125070

    Article  MathSciNet  MATH  Google Scholar 

  • Prieto-Rumeau T, Lorenzo JM (2015) Approximation of zero-sum continuous-time Markov games under the discounted payoff criterion. TOP 23:799–836

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the National Natural Science Foundation of China (11931018), the National Key Research and Development Program of China (2022YFA1004600), the University of Macau (MYRG2019-00031-FBA), and Guangdong Basic and Applied Basic Research Foundation. The authors are grateful to the two reviewers and the associate editor for their careful reading and many constructive suggestions that have improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yonghui Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Proposition 1

Appendix: Proof of Proposition 1

In fact, we extend the previous results (Guo and Song 2011, Theorem 3.1) for CTMDPs and (Huang and Guo 2019, Proposition 3.1) for PDMDPs to the case of PDMGs. Thus, the proof of Proposition 1 proceeds in a similar way. Since Assumption 1 in this paper slightly differs from Assumption 3.1 in Huang and Guo (2019), the proofs are accordingly adjusted. First, we provide two lemmas.

Lemma 3

Let some function \(w \ge 0\) on E, and some signed kernel \({\bar{q}}(dy \vert x,u)\) on E given \(E \times R_+\) satisfy the following conditions: (i) \({\bar{q}}(E \vert x,u)=0\); \({\bar{q}}(D \vert x,u)\ge 0\) if \(\phi (x,u) \notin D\); \({\bar{q}}(x,u):={\bar{q}}(E\setminus \{\phi (x,u)\} \vert x,u)<\infty \); (ii) \(\int _{E} {\bar{q}}(dy \vert x,u)w(y)\le c w(\phi (x,u))+ d\), where \(u\ge 0\), \(c> 0\) and \(d \ge 0\) are some constants; (iii) \(w(\phi (x,t))\le w(x)\) for all \(x\in E\) and \(t \in R_+\). Then the function

$$\begin{aligned} \Psi (s, x):=e^{c s} w (x)+\frac{d}{c}(e^{c s}-1), \ \ s \in R_+, x \in E, \end{aligned}$$
(29)

satisfies the following inequality

$$\begin{aligned}{} & {} \int _0^s \int _{E\setminus \{\phi (x,u)\}} e^{-\int _0^u {\bar{q}}(x,v)dv}{\bar{q}}(dy \vert x,u)\Psi (s-u,y) du + e^{-\int _0^s {\bar{q}}(x,v)dv} w(\phi (x,s)) \\{} & {} \quad \le \Psi (s,x) \end{aligned}$$

for all \(s \in R_+\) and \(x \in E\).

Proof

Using the conditions in this lemma, it is easy to verify that

$$\begin{aligned}{} & {} \int _0^s \int _{E\setminus \{\phi (x,u)\}} e^{-\int _0^u {\bar{q}}(x,v)dv}{\bar{q}}(dy {\vert }x,u)\Big [ e^{c(s-u)}w(y)+\frac{d}{c}(e^{c(s-u)}-1)\Big ] du \\{} & {} \qquad + e^{-\int _0^s {\bar{q}}(x,v)dv} w(\phi (x,s))\\{} & {} \quad \le \int _0^s e^{-\int _0^u {\bar{q}}(x,v)dv}\Big [ e^{c(s-u)}\Big (c w (\phi (x,u))+ d + {\bar{q}}(x,u) w(\phi (x,u)) \Big ) \\{} & {} \qquad +\frac{b}{c}(e^{c(s-u)}-1){\bar{q}}(x,u)\Big ] du +e^{-\int _0^s {\bar{q}}(x,v)dv} w(\phi (x,s))\\{} & {} \quad \le \int _0^s e^{-\int _0^u {\bar{q}}(x,v)dv}\Big [ e^{c(s-u)}\Big (c w (x)+ d + {\bar{q}}(x,u) w(x) \Big )\\ {}{} & {} +\frac{d}{c}(e^{c(s-u)}-1){\bar{q}}(x,u)\Big ] du \\{} & {} \qquad +e^{-\int _0^s {\bar{q}}(x,v)dv} w(x)\\{} & {} \quad =\Big (w(x)+\frac{d}{c} \Big ) \int _0^s e^{-\int _0^u {\bar{q}}(x,v)dv} d \Big (- e^{c(s-u)}\Big ) \\{} & {} \qquad + w(x) \int _0^s e^{c(s-u)} d\Big (-e^{-\int _0^u {\bar{q}}(x,v)dv} \Big ) \\{} & {} \qquad + \frac{d}{c} \int _0^s \Big (e^{c(s-u)}-1 \Big ) d\Big ( -e^{-\int _0^u {\bar{q}}(x,v)dv} \Big ) + e^{-\int _0^s {\bar{q}}(x,v)dv} w(x)\\{} & {} \quad =\Big (w(x)+\frac{d}{c} \Big ) \Big [e^{c s}-e^{-\int _0^s {\bar{q}}(x,v)dv} - \int _0^s e^{c(s-u)} d \Big (- e^{-\int _0^u {\bar{q}}(x,v)dv} \Big )\Big ]\\{} & {} \qquad + \Big (w(x)+\frac{d}{c} \Big ) \int _0^s e^{c(s-u)} d\Big (- e^{-\int _0^u {\bar{q}}(x,v)dv} \Big ) + \frac{d}{c} \Big ( e^{-\int _0^s {\bar{q}}(x,v)dv}-1 \Big ) \\{} & {} \qquad + e^{-\int _0^s {\bar{q}}(x,v)dv} w(x) \\{} & {} \quad = e^{c s}w(x)+\frac{d}{c}(e^{c s}-1)\\{} & {} \quad =\Psi (s,x), \end{aligned}$$

where the first and second inequalities follow from conditions (ii) and (iii), respectively. \(\square \)

Lemma 4

Let Assumption 1(a) and Assumption 1(c) be fulfilled. Then, for each \( (\pi ^1,\pi ^2) \in \Pi ^1 \times \Pi ^2\), \(x\in E\), and \(m=0,1,2,\ldots \),

$$\begin{aligned} \mathbbm {E}_{x}^{\pi ^1,\pi ^2} \big [w_0(\xi _t) \mathbbm {1}_{\{t< T_{m+1}\}} \big ] \le e^{c_0 t}w_0(x) + \dfrac{d_0}{c_0}(e^{c_0 t} -1) ~ ~\forall t \in R_+. \end{aligned}$$

Proof

The proof is similar to that of (Huang and Guo 2019, Lemma A.2) for PDMDPs. We omit it to save space. \(\square \)

Proof of Proposition 1

Based on Lemma 4 and Assumption 1, the proof of Proposition 1 can be proceeded in a similar way to that of (Guo and Song 2011, Theorem 3.1) for CTMDPs and that of (Huang and Guo 2019, Proposition 3.1) for PDMDPs. However, we omit the proof to save space. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Y., Lian, Z. & Guo, X. Zero-sum infinite-horizon discounted piecewise deterministic Markov games. Math Meth Oper Res 97, 179–205 (2023). https://doi.org/10.1007/s00186-023-00809-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-023-00809-0

Keywords

Mathematics Subject Classification

Navigation