Solving efficiently Decentralized MDPs with temporal and resource constraints | Autonomous Agents and Multi-Agent Systems Skip to main content

Advertisement

Log in

Solving efficiently Decentralized MDPs with temporal and resource constraints

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Optimizing the operation of cooperative multi-agent systems that can deal with large and realistic problems has become an important focal area of research in the multi-agent community. In this paper, we first present a new model, the OC-DEC-MDP (Opportunity Cost Decentralized Markov Decision Process), that allows us to represent large multi-agent decision problems with temporal and precedence constraints. Then, we propose polynomial algorithms to efficiently solve problems formalized by OC-DEC-MDPs. The problems we deal with consist of a set of agents that have to execute a set of tasks in a cooperative way. The agents cannot communicate during task execution and they must respect resource and temporal constraints. Our approach is based on Decentralized Markov Decision Processes (DEC-MDPs) and uses the concept of opportunity cost borrowed from economics to obtain approximate control policies. Experimental results show that our approach produces good quality solutions for complex problems which are out of reach of existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abdallah, S., & Lesser, V. (2005). Modeling task allocation using a decision theoretic model. In Proceedings of fourth international joint conference on autonomous agents and multiagent systems (pp. 719–726). Utrecht, Netherlands: ACM Press.

  2. Amato, C., Bernstein D.S., & Zilberstein, S. (2007). Optimizing memory-bounded controllers for decentralized POMDPs. In Proceedings of the twenty third conference on uncertainty in artificial intelligence.

  3. Amato, C., Carlin, A., & Zilberstein, S. (2007). Bounded dynamic programming for decetralized POMDPs. In AAMAS 2007 workshop on multi-agent sequential decision making in uncertain domains.

  4. Becker, R., Lesser, V., & Zilberstein, S. (2004). Decentralized Markov decision processes with event-driven interactions. In The third international joint conference on autonomous agents and multi agent systems (Vol. 1, pp. 302–309). New York: IEEE Computer Society.

  5. Becker, R., Lesser, V., & Zilberstein, S. (2005). Analyzing myopic approaches for multi-agent communication. In Proceedings of the 2005 IEEE/WIC/ACM international conference on intelligent agent technology (IAT 05), September 2005 (pp. 550–557). Compiegne, France: IEEE Computer Society.

  6. Becker, R., Zilberstein, S., Lesser, V., & Goldman, C. (2003). Transition-independent decentralized markov decision processes. In Proceedings of the second international joint conference on autonomous agents and multi agent systems, July 2003 (pp. 41–48). Melbourne, Australia.

  7. Becker R., Zilberstein S., Lesser V., Goldman C. (2004) Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research 22: 423–455

    MathSciNet  MATH  Google Scholar 

  8. Bernstein, D., Hansen, E. A., & Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In Proceedings of the nineteenth international joint conference on artificial intelligence. Edinburgh, Scotland.

  9. Bernstein D., Zilberstein S., Immerman N. (2002) The complexity of decentralized control of MDPs. Mathematics of Operations Research 27(4): 819–840

    Article  MathSciNet  MATH  Google Scholar 

  10. Bernstein, D., Zilberstein, S., Washington, R., & Bresina, J. (2001). Planetary rover control as a Markov decision process. In The 6th international symposium on artificial intelligence, robotics and automation in space. Montreal, Canada.

  11. Blum A., Furst M. (1997) Fast planning through planning graph analysis. Artificial Intelligence 90: 281–300

    Article  MATH  Google Scholar 

  12. Blum, A., & Langford, J. (1999). Probabilistic planning in the graphplan framework. In Proceedings of ECP’99.

  13. Blythe, J. (1999). Decision-theoretic planning. AI Magazine.

  14. Blythe, J. (1999). Planning under uncertainty in dynamic domains. PhD thesis, Carnegie Mellon University.

  15. Boutilier, C., Brafman, R., & Geib, C. (1997). Prioritized goal decomposition of Markov decision processes: Towards a synthesis of classical and decision theoretic planning. In Proceedings of the fifteenth international joint conference on artificial intelligence (pp. 1156–1163). San Francisco: Morgan Kaufmann.

  16. Boutilier C., Dean T., Hanks S. (1999) Decision-theoretic planning: Structural asumptions and computational leverage. Journal of Articicial Intelligence Research 1: 1–93

    MathSciNet  Google Scholar 

  17. Bresina, J., Dearden, R., Meuleau, N., Ramakrishnan, S., Smith, D., & Washington, R. (2002). Planning under continuous time and resource uncertainty: A challenge for AI. In UAI.

  18. Chadès, I., Scherrer, B., & Charpillet, F. (2002). A heuristic approach for solving decentralized-POMDP: Assessment on the pursuit problem. In Proceedings of the sixteenth acm symposium on applied computing.

  19. Dean, T., & Lin, S. (1995). Decomposition techniques for planning in stochastic domains. In IJCAI-95.

  20. Decker K., Lesser V. (1993) Quantitative modeling of complex environments. International Journal of Intelligent Systems in Accounting Finance and Management 2(4): 215–234

    Google Scholar 

  21. Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2004). Approximate solutions for partially observable stochastic games with common payoffs. In Proceedings of the third joint conference on autnomous agents and multi agent systems.

  22. Esben, H. O., Maja, J. M., & Gaurav, S. S. (2002). Multi-robot task allocation in the light of uncertainty. In Proceedings of IEEE international conference on robotics and automation (pp. 3002–3007).

  23. Gerkey B. P., Matarić M. J. (2002) Sold!: Auction methods for multi-robot coordination. IEEE Transactions on Robotics and Automation 18(5): 58–768

    Article  Google Scholar 

  24. Goldman C., Zilberstein S. (2004) Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research 22: 143–174

    MathSciNet  MATH  Google Scholar 

  25. Hanna, H., & Mouaddib, A. (2002). Task selection as decision making in multiagent system. In International joint conference on autonomous agents and multi agent systems (pp. 616–623).

  26. Hansen, E. A., Bernstein, D., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Proceedings of the nineteenth national conference on artificial intelligence.

  27. Howard R. A. (1960) Dynamic programming and Markov processes. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  28. Koller D., Milch B. (2003) Multi-agent influence diagrams for representing and solving games. Games and Economic Behavior 45(1): 181–221

    Article  MathSciNet  MATH  Google Scholar 

  29. Maheswaran, R., Szekely, P., Becker, M., Fitzpatrick, S., Gati, G., Jin, J., Neches, R., Noori, N., Rogers, C., Sanchez, R., Smyth, K., & Vanbuskirk, C. (2008). Predictability and criticality metrics for coordination in complex environments. In Proceedings of the 7th international joint conference on autonomous agents and multiagent systems.

  30. Marecki, J., & Tambe, M. (2007). On opportunistic techniques for solving decentralized MDPs with temporal constraints. In Proceedings of the sixth international joint conference on autonomous agents and multi-agent systems (AAMAS).

  31. Meuleau, N., Hauskrecht, M., Kim, K.-E., Peshkin, L., Kaelbling, L., Dean, T., & Boutilier, C. (1998). Solving very large weakly coupled markov decision processes. In: AAAI/IAAI (pp. 165–172).

  32. Morimoto, T. (2000). How to develop a RoboCupRescue agent. RoboCupRescue Technical Committee.

  33. Musliner, D., Durfee, E., Wu, J., Dolgiv, D., & Boddy, M. (2007). Coordination of highly contingent plans. In Proceedings of the international conference on knowledge intensive multi-agent systems (KIMAS) (pp. 418–422).

  34. Nair, R., Pradeep, V., Milind, T., & Makoto, Y. (2005). Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In Proceedings of the twentieth national conference on artificial intelligence (AAAI-05).

  35. Nair, R., Roth, M., Yokoo, M., & Tambe, M. (2004). Communication for improving policy computation in distributed POMDPs. In Proceedings of the third international joint conference on agents and multiagent systems (AAMAS-04) (pp. 1098–1105).

  36. Nair, R., Tambe, M., Yokoo, M., Marsella, S., & Pynadath, D. V. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the international joint conference on artificial intelligence (pp. 705–711).

  37. Peshkin, L., Kim, K., Meuleu, N., & Kaelbling, L. (2000). Learning to cooperate via policy search. In Sixteenth conference on uncertainty in artificial intelligence (pp. 307–314).

  38. Poupart, P., Boutilier, C., Patrascu, R., & Schuurmans, D. (2002). Piecewise linear value function approximation for factored MDPs. In Eighteenth national conference on artificial intelligence. Edmonton.

  39. Puterman M. L. (2005) Markov decision processes: Discrete stochastic dynamic programming. Wiley-Interscience, New York

    MATH  Google Scholar 

  40. Roy, N., Pineau, J., & Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning. In Proceedings of the 38th annual meeting of the association for computational linguistics (ACL-2000). Hong Kong.

  41. Seuken, S., & Zilberstein, S. (2007). Memory-bounded dynamic programming for DEC-POMDPs. In Proceedings of the twentieth international joint conference on artificial intelligence (IJCAI) (pp. 2009–2015).

  42. Shen, J., Becker, R., & Lesser, V. (2006). Agent interaction in distributed MDPs and its implications on complexity. In Proceedings of the fifth international joint conference on autonomous agents and multi-agent systems (pp. 529–536).

  43. Singh, S., & Cohn, D. (1998). How to dynamically merge Markov decision processes. In Advances in neural information processing systems (Vol. 10). Cambridge, MA: The MIT Press.

  44. Smith, S. F., Gallagher, A., & Zimmerman, T. (2007). Distributed management of flexible times schedules. In AAMAS ’07: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. ACM.

  45. Szer, D., Charpillet, F., & Zilberstein, S. (2005). MAA*: A heuristic search algorithm for solving decentralized POMDPs. In Proceedings of the 21st conference on uncertainty in artificial intelligence.

  46. The RoboCup Rescue Technical Committee. (2000). RoboCup-Rescue simulator manual. The RoboCup Federation.

  47. Varakantham, P., Marecki, J., Yabu, Y., Milind, T., & Makoto, Y. (2007). Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies. In Proceedings of the international joint conference on agents and multiagent systems (AAMAS-07).

  48. Wieser, F. (1889). Valeur naturelle (Der natürliche Wert).

  49. Wu, F., Zilberstein, S., & Chen, X. (2010). Point-based policy generation for decentralized POMDPs. In Proceedings of the ninth international joint conference on autonomous agents and multi-agent systems.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aurélie Beynier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beynier, A., Mouaddib, AI. Solving efficiently Decentralized MDPs with temporal and resource constraints. Auton Agent Multi-Agent Syst 23, 486–539 (2011). https://doi.org/10.1007/s10458-010-9145-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-010-9145-2

Keywords

Navigation