Abstract
Collaborative privacy preserving planning (cppp) gained much attention in the past decade. cppp aims to create solutions for multi agent planning problems where cooperation is required to achieve an efficient solution, without exposing information that the agent considers private in the process. To date, cppp has focused on domains with deterministic action effects. However, in real-world problems action effects are often non-deterministic, and actions can have multiple possible effects with varying probabilities. In this paper, we introduce Stochastic cppp (scppp), which is an extension of cppp to domains with stochastic action effects. We show how scppp can be modeled as a Markov decision process (mdp) and how the value-iteration algorithm can be adapted to solve it. This adaptation requires extending value-iteration to support multiple agents and privacy. Then, we present two adaptions of the real-time dynamic programming (rtdp) algorithm, a popular algorithm for solving mdps, designed to solve scppp problems. The first rtdp adaptation, called distributed rtdp (drtdp), yields identical behavior to applying rtdp in a centralized manner on the joint problem. To preserve privacy, drtdp uses a message passing mechanism adopted from the mafs algorithm. The second rtdp adaptation is an approximation of drtdp called public synchronization rtdp (ps-rtdp). ps-rtdp differs from drtdp mainly in its message passing mechanism, where ps-rtdp sends significantly fewer messages than drtdp. We experimented on domains adapted from the deterministic cppp literature by adding different stochastic effects to different actions. The results show that ps-rtdp can reduce the amount of messages compared to drtdp by orders of magnitude thus improving run-time, while producing policies with similar expected costs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hefner, T., Stern, R., & Shani, G. (2020). Privacy preserving planning in stochastic environments. In J. C. Beck, O. Buffet, J. Hoffmann, E. Karpas, & S. Sohrabi, (Eds.). Proceedings of the Thirtieth International Conference on Automated Planning and Scheduling, Nancy, France, October 26–30 (pp. 258–262). AAAI Press. https://aaai.org/ojs/index.php/ICAPS/article/view/6669
Brafman, R. I., & Domshlak, C. (2008). From one to many: Planning for loosely coupled multi-agent systems. In ICAPS (pp. 28–35).
Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34–37.
Kolobov, A. . (2012). Planning with Markov decision processes: An AI perspective. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1–210.
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1–2), 81–138.
Nissim, R., & Brafman, R. I. (2014). Distributed heuristic forward search for multi-agent planning. JAIR, 51, 293–332.
Štolba, M., Komenda, A., & Kovacs, D. L. (2015). Competition of distributed and multiagent planners (codmap). In The International Planning Competition (WIPC-15), (p. 24).
Brafman, R. I., & Domshlak, C. (2013). On the complexity of planning for agent teams and its implications for single agent planning. Artificial Intelligence, 198, 52–71.
Brafman, R. I. (2015). A privacy preserving algorithm for multi-agent planning and search. In IJCAI (pp. 1530–1536).
Tožička, J., Štolba, M., & Komenda, A. (2017). The limits of strong privacy preserving multi-agent planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 27).
Maliah, S., Shani, G., & Stern, R. (2018). Action dependencies in privacy-preserving multi-agent planning. Autonomous Agents and Multi-Agent Systems, 32(6), 779–821.
Maliah, S., Shani, G., & Brafman, R. I. (2016a). Online macro generation for privacy preserving planning. In Twenty-Sixth International Conference on Automated Planning and Scheduling.
Wu, F., Zilberstein, S., & Chen, X. (2018). Privacy-preserving policy iteration for decentralized pomdps. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), New Orleans, USA (pp. 4759–4766).
Štolba, M., Tožička, J., & Komenda, A. (2018). Quantifying privacy leakage in multi-agent planning. TOIT, 18(3), 28.
Štolba, M., Fišer, D., & Komenda, A. (2019). Privacy leakage of search-based multi-agent planning algorithms. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 29, pp. 482–490).
Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored mdps. InIn IJCAI (Vol. 17, pp. 673–680).
Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6, 679–684.
Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. Wiley.
Trevizan, F. W., Teichteil-Königsbuch, F., & Thiébaux, S. (2017). Efficient solutions for stochastic shortest path problems with dead ends. In UAI.
Du, Wenliang., & Zhan, Zhijun. (2002). A practical approach to solve secure multi-party computation problems. In Proceedings of the 2002 workshop on New security paradigms, pp 127–135.
Bertsekas, D. (1982). Distributed dynamic programming. IEEE Transactions on Automatic Control, 27(3), 610–616.
Witzig, J., Beckenbach, I., Eifler, L., Fackeldey, K., Gleixner, A., Grever, A., & Weber, M. (2018). Mixed-integer programming for cycle detection in nonreversible Markov processes. Multiscale Modeling & Simulation, 16(1), 248–265.
Ameloot, T. J., & Van den Bussche, J. (2015). On the convergence of cycle detection for navigational reinforcement learning. http://arxiv.org/1511.08724
Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.
Bertsekas, D. P., & Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3), 580–595.
Littman, M. L., Dean, T. L., & Kaelbling, L. P. (2013). On the complexity of solving markov decision problems. http://arxiv.org/1302.4971
Štolba, M., & Komenda, A. (2017). The Madla planner: Multi-agent planning by combination of distributed and local heuristic search. Artificial Intelligence, 252, 175–210.
Maliah, S., Shani, G., & Stern, R. (2016b). Stronger privacy preserving projections for multi-agent planning. In the International Conference on Automated Planning and Scheduling (ICAPS) (pp. 221–229).
Maliah, S., Shani, G., & Stern, R. (2016c). Stronger privacy preserving projections for multi-agent planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 26).
Maliah, S., Shani, G., & Stern, R. (2014). Privacy preserving landmark detection. In the European Conference on Artificial Intelligence (ECAI) (pp. 597–602).
Hauskrecht, M., Meuleau, N., Kaelbling, L. P, Dean, T. L., & Boutilier, C. (2013). Hierarchical solution of markov decision processes using macro-actions. http://arxiv.org/1301.7381
Gerevini, A. E., Lipovetzky, N., Percassi, F., Saetti, A., & Serina, I. (2019). Best-first width search for multi agent privacy-preserving planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 29, pp. 163–171).
Gohari, P., Hale, M., & Topcu, U. (2020). Privacy-preserving policy synthesis in markov decision processes. In 2020 59th IEEE Conference on Decision and Control (CDC) (pp. 6266–6271). IEEE.
Venkitasubramaniam, P. (2013). Privacy in stochastic control: A markov decision process perspective. In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 381–388). IEEE.
Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In In IJCAI (Vol. 99, pp. 478–485).
Amato, C., Chowdhary, G., Geramifard, A., Kemal Üre, N., & Kochenderfer, M. J. (2013). Decentralized control of partially observable markov decision processes. In 52nd IEEE Conference on Decision and Control (pp. 2398–2405). IEEE.
Kocsis, L., Szepesvári, C. (2006). Bandit based monte-carlo planning. In European conference on machine learning (pp. 282–293). Springer.
Brendan, M. H., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: Rtdp with monotone upper bounds and performance guarantees. In Proceedings of the 22nd international conference on Machine learning (pp. 569–576). ACM.
Smith, T., & Simmons, R. (2006). Focused real-time dynamic programming for mdps: Squeezing more out of a heuristic. In AAAI (pp. 1227–1232).
Bonet, B., & Geffner, H. (2001). Planning as heuristic search. Artificial Intelligence, 129(1), 5–33.
Keller, T., & Eyerich, P. (2012). Prost: Probabilistic planning based on uct. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 22).
Bonet, B., & Geffner, H. (2006). Learning depth-first search: A unified approach to heuristic search in deterministic and non-deterministic settings, and its application to mdps. In In ICAPS (Vol. 6, pp. 142–151).
Stolba, M., Tozicka, J., & Komenda, A. (2018). Quantifying privacy leakage in multi-agent planning. ACM Transactions on Internet Technology, 18(3), 28:1-28:21.
Stolba, M., Fiser, D., & Komenda, A. (2019). Privacy leakage of search-based multi-agent planning algorithms. In J. Benton, N. Lipovetzky, E. Onaindia, D. E. Smith, & S. Srivastava (Eds.). Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, ICAPS 2018, Berkeley, CA, USA, July 11–15, 2019 (pp. 482–490). AAAI Press.
Acknowledgements
This research was supported by ISF Grant #210/17 to Roni Stern. This research was also supported by the ISF fund under Grant #1210/18.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Parts of this paper appeared in [1].
Rights and permissions
About this article
Cite this article
Hefner, T., Shani, G. & Stern, R. Privacy preserving planning in multi-agent stochastic environments. Auton Agent Multi-Agent Syst 36, 22 (2022). https://doi.org/10.1007/s10458-022-09554-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10458-022-09554-w