Abstract
Task planning systems have been developed to help robots use human knowledge (about actions) to complete long-horizon tasks. Most of them have been developed for “closed worlds” while assuming the robot is provided with complete world knowledge. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break theplanner’s completeness. Could we leverage the recent advances on pre-trained Large Language Models (LLMs) to enable classical planning systems to deal with novel situations? This paper introduces a novel framework, called COWP, for open-world task planning and situation handling. COWP dynamically augments the robot’s action knowledge, including the preconditions and effects of actions, with task-oriented commonsense knowledge. COWP embraces the openness from LLMs, and is grounded to specific domains via action knowledge. For systematic evaluations, we collected a dataset that includes 1085 execution-time situations. Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works. Experimental results show that our approach outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Intuitively, the solution from COWP goes beyond finding alternative objects in addressing unforeseen situations. COWP enables situation handling by manipulating the attributes of individual instances. For example, a “dirty cup” situation can be handled by running a dishwasher, where no second object is involved.
References
Aeronautiques, C., Howe, A., Knoblock, C., McDermott, I. D., Ram, A., Veloso, M., et al. (1998). PDDL| the planning domain definition language. Tech Rep: Technical Report.
Amiri, S., Bajracharya, S., Goktolgal, C., Thomason, J., & Zhang, S. (2019). Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE; 2019. p. 744–750.
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., & Ho, D., et al. (2023a). Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on robot learning; 287–318.
Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Chen, X., Choromanski, K., Ding, T., Driess, D., Dubey, A., Finn, C., et al. (2023b). RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, arXiv preprint arXiv:2307.15818.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems., 33, 1877–1901.
Chen, M., Tworek, J., Jun, H., & Yuan, Q. (2021). Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
Chernova, S., Chu, V., Daruna, A., Garrison, H., Hahn, M., & Khante, P. et al. (2020) Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics research. Springer; . p. 353–369.
Davis, E., & Marcus, G. (2015). Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM, 58(9), 92–103.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of naacL-HLT, 1(2)-2.
Ding, Y., Zhang, X., Paxton, C., & Zhang, S. (2023). Task and Motion Planning with Large Language Models for Object Rearrangement. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Elsweiler, D., Hauptmann, H., & Trattner, C. (2022). Food recommender systems. In: Recommender systems handbook. Springer; 871–925.
Galindo, C., Fernández-Madrigal, J. A., González, J., & Saffiotti, A. (2008). Robot task planning using semantic maps. Robotics and Autonomous Systems, 56(11), 955–966.
Gao, P., Han, J., Zhang, R., Lin, Z., Geng, S., & Zhou, A.,et al. (2023). Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010.
Garrett, C. R., Lozano-Pérez, T., & Kaelbling, L. P. (2020). Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. Proceedings of the international conference on automated planning and scheduling., 30, 440–448.
Garrett, C. R., Chitnis, R., Holladay, R., Kim, B., Silver, T., Kaelbling, L. P., et al. (2021). Integrated task and motion planning. Annual Review of Control, Robotics, and Autonomous Systems, 4, 265–293.
Ghallab, M., Nau, D., & Traverso, P. (2016). Automated planning and acting. Cambridge University Press.
Google.: Bard FAQ. Accessed on April 7, (2023). https://bard.google.com/faq.
Hanheide, M., Göbelbecker, M., Horn, G. S., Pronobis, A., Sjöö, K., Aydemir, A., et al. (2017). Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence, 247, 119–150.
Haslum, P., Lipovetzky, N., Magazzeni, D., & Muise, C. (2019). An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning, 13(2), 1–187.
Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research, 26, 191–246.
Hoffmann, J. (2001). FF: The fast-forward planning system. AI magazine, 22(3), 57–57.
Huang, W., Abbeel, P., Pathak, D., Mordatch, I. (2022) Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. Thirty-ninth international conference on machine learning.
Huang, W., Xia, F., Shah, D., Driess, D., Zeng, A., & Lu, Y., et al.(2023). Grounded Decoding: Guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855.
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., & Florence, P., et al. (2022). Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual conference on robot learning.
Jiang, Y., Walker, N., Hart, J., &Stone, P. (2019) Open-world reasoning for service robots. In: Proceedings of the international conference on automated planning and scheduling. vol. 29; . p. 725–733.
Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., & Michael, K., et al. (2022). ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo.
Kant, Y., Ramachandran, A., Yenamandra, S., Gilitschenski, I., Batra, D., & Szot, A. et al. (2022) Housekeep: Tidying virtual households using commonsense reasoning. In: Computer vision–ECCV 2022. Springer; . p. 355–373.
Knoblock C.A., & Tenenberg, J.D., Yang, Q. (1991) Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth national conference on artificial intelligence2692–697.
Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., & Fan, L., et al. (2022). Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems.
Lin, K., Agia, C., Migimatsu, T., Pavone, M., & Bohg, J. (2023). Text2Motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153.
Liu, B., Jiang, Y., Zhang, X., Liu, Q., Zhang, S., & Biswas, J., et al. (2023). LLM+P: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1–35.
Lo, S. Y., Zhang, S., & Stone, P. (2020). The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research., 69, 471–500.
Morrison, D., Corke, P., & Leitner, J. (2018). Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. In: Robotics: Science and Systems (RSS).
Nau, D. S., Au, T. C., Ilghami, O., Kuter, U., Murdock, J. W., Wu, D., et al. (2003). SHOP2: An HTN planning system. Journal of artificial intelligence research, 20, 379–404.
OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Retrieved from: https://openai.com/blog/chatgpt/.
OpenAI.: GPT-4 technical report.
OpenAI.: Models–OpenAI API. Retrieved: 2023-07-10. https://platform.openai.com/docs/models/overview.
Perera, V., Soetens, R., Kollar, T., Samadi, M., Sun, Y., Nardi, D., et al. (2015). Learning task knowledge from dialog and web access. Robotics, 4(2), 223–252.
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., & Fidler, S., et al. (2018). Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. 8494–8502.
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., & Leibs, J., et al. (2009). ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; p. 5.
Reiter, R. (1981) On closed world data bases. In: Readings in artificial intelligence. Elsevier. p. 119–140.
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., & Tremblay, J., et al. (2023). Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA).
Song, CH., Wu, J., Washington, C., Sadler, BM., Chao, WL., & Su, Y. (2023). Llm-planner: Few-shot grounded planning for embodied agents with large language models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
Tucker, M., Aksaray, D., Paul, R., Stein, G.J., & Roy, N.(2020) Learning unknown groundings for natural language interaction with mobile robots. In: Robotics research. Springer; 317–333.
Valmeekam, K., Olmo, A., Sreedharan, S., & Kambhampati, S. (2022). Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). In: Foundation Models for Decision Making Workshop at Neural Information Processing Systems.
Valmeekam, K., Sreedharan, S., Marquez, M., Olmo, A., & Kambhampati, S. (2023). On the planning abilities of large language models (a critical investigation with a proposed benchmark). arXiv preprint arXiv:2302.06706. 2023;.
Wang, C., Liu, P., & Zhang, Y. (2021). Can generative pre-trained language models serve as knowledge bases for closed-book QA? In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing; 3241–3251.
West, P., Bhagavatula, C., Hessel, J., Hwang, JD., Jiang, L., & Bras, RL, et al. (2022). Symbolic knowledge distillation: From general language models to commonsense models. Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: Human language technologies.
Xie, Y., Yu, C., Zhu, T., Bai, J., Gong, Z., & Soh, H.(2023). Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128.
Yq, Jiang, Sq, Zhang, Khandelwal, P., & Stone, P. (2019). Task planning in robotics: An empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering., 20(3), 363–373.
Zhao, Z., Lee, WS., & Hsu, D (2023). Large Language Models as Commonsense Knowledge for Large-Scale Task Planning, RSS Workshop on Learning for Task and Motion Planning
Zhang, X., Ding, Y., Amiri, S., Yang, H., Kaminski, A., & Esselink, C, et al. (2023). Grounding classical task planners via vision-language models. In: ICRA Workshop on Robot Execution Failures and Failure Management Strategies.
Zhang, N., Li, L., Chen, X., Deng, S., Bi, Z., & Tan, C. et al (2021). ifferentiable prompt makes pre-trained language models better few-shot learners. In: International conference on learning representations.
Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., & Chen, S., et al. (2022). OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.
Acknowledgements
A portion of this work has taken place at the Autonomous Intelligent Robotics (AIR) Group, SUNY Binghamton. AIR research is supported in part by grants from the National Science Foundation (NRI-1925044), Ford Motor Company, OPPO, and SUNY Research Foundation.
Funding
A portion of this work has taken place at the Autonomous Intelligent Robotics (AIR) Group, SUNY Binghamton. AIR research is supported in part by grants from the National Science Foundation (NRI-1925044), Ford Motor Company, OPPO, and SUNY Research Foundation
Author information
Authors and Affiliations
Contributions
YD, XZ, SA, HY, AK, CE, and SZ contributed to the development of the initial ideas and methodology. YD, XZ, and SA contributed to implementing the methodology. YD, XZ, SA, and NC contributed to the experiments. YD, XZ, SA, HY, and SZ contributed to the analysis of the results. YD, XZ, SA, and SZ contributed to the manuscript writing. All authors reviewed and provided feedback on the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 78689 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, Y., Zhang, X., Amiri, S. et al. Integrating action knowledge and LLMs for task planning and situation handling in open worlds. Auton Robot 47, 981–997 (2023). https://doi.org/10.1007/s10514-023-10133-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-023-10133-5