Integrating action knowledge and LLMs for task planning and situation handling in open worlds

Ding, Yan; Zhang, Xiaohan; Amiri, Saeid; Cao, Nieqing; Yang, Hao; Kaminski, Andy; Esselink, Chad; Zhang, Shiqi

doi:10.1007/s10514-023-10133-5

Integrating action knowledge and LLMs for task planning and situation handling in open worlds

Published: 29 August 2023

Volume 47, pages 981–997, (2023)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Yan Ding¹,
Xiaohan Zhang¹,
Saeid Amiri¹,
Nieqing Cao¹,
Hao Yang²,
Andy Kaminski²,
Chad Esselink² &
…
Shiqi Zhang¹

1850 Accesses
2 Altmetric
Explore all metrics

Abstract

Task planning systems have been developed to help robots use human knowledge (about actions) to complete long-horizon tasks. Most of them have been developed for “closed worlds” while assuming the robot is provided with complete world knowledge. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break theplanner’s completeness. Could we leverage the recent advances on pre-trained Large Language Models (LLMs) to enable classical planning systems to deal with novel situations? This paper introduces a novel framework, called COWP, for open-world task planning and situation handling. COWP dynamically augments the robot’s action knowledge, including the preconditions and effects of actions, with task-oriented commonsense knowledge. COWP embraces the openness from LLMs, and is grounded to specific domains via action knowledge. For systematic evaluations, we collected a dataset that includes 1085 execution-time situations. Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works. Experimental results show that our approach outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Agent Can Say No: Robot Task Planning by Natural Language Feedback Between Planner and Executor

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

Human–robot interaction through joint robot planning with large language models

Article 17 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Intuitively, the solution from COWP goes beyond finding alternative objects in addressing unforeseen situations. COWP enables situation handling by manipulating the attributes of individual instances. For example, a “dirty cup” situation can be handled by running a dishwasher, where no second object is involved.

References

Aeronautiques, C., Howe, A., Knoblock, C., McDermott, I. D., Ram, A., Veloso, M., et al. (1998). PDDL| the planning domain definition language. Tech Rep: Technical Report.
Google Scholar
Amiri, S., Bajracharya, S., Goktolgal, C., Thomason, J., & Zhang, S. (2019). Augmenting knowledge through statistical, goal-oriented human-robot dialog. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE; 2019. p. 744–750.
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., & Ho, D., et al. (2023a). Do as i can, not as i say: Grounding language in robotic affordances. In: Conference on robot learning; 287–318.
Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Chen, X., Choromanski, K., Ding, T., Driess, D., Dubey, A., Finn, C., et al. (2023b). RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, arXiv preprint arXiv:2307.15818.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems., 33, 1877–1901.
Google Scholar
Chen, M., Tworek, J., Jun, H., & Yuan, Q. (2021). Pinto HPdO, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
Chernova, S., Chu, V., Daruna, A., Garrison, H., Hahn, M., & Khante, P. et al. (2020) Situated bayesian reasoning framework for robots operating in diverse everyday environments. In: Robotics research. Springer; . p. 353–369.
Davis, E., & Marcus, G. (2015). Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM, 58(9), 92–103.
Article Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of naacL-HLT, 1(2)-2.
Ding, Y., Zhang, X., Paxton, C., & Zhang, S. (2023). Task and Motion Planning with Large Language Models for Object Rearrangement. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Elsweiler, D., Hauptmann, H., & Trattner, C. (2022). Food recommender systems. In: Recommender systems handbook. Springer; 871–925.
Galindo, C., Fernández-Madrigal, J. A., González, J., & Saffiotti, A. (2008). Robot task planning using semantic maps. Robotics and Autonomous Systems, 56(11), 955–966.
Article Google Scholar
Gao, P., Han, J., Zhang, R., Lin, Z., Geng, S., & Zhou, A.,et al. (2023). Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010.
Garrett, C. R., Lozano-Pérez, T., & Kaelbling, L. P. (2020). Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. Proceedings of the international conference on automated planning and scheduling., 30, 440–448.
Article Google Scholar
Garrett, C. R., Chitnis, R., Holladay, R., Kim, B., Silver, T., Kaelbling, L. P., et al. (2021). Integrated task and motion planning. Annual Review of Control, Robotics, and Autonomous Systems, 4, 265–293.
Article Google Scholar
Ghallab, M., Nau, D., & Traverso, P. (2016). Automated planning and acting. Cambridge University Press.
Book MATH Google Scholar
Google.: Bard FAQ. Accessed on April 7, (2023). https://bard.google.com/faq.
Hanheide, M., Göbelbecker, M., Horn, G. S., Pronobis, A., Sjöö, K., Aydemir, A., et al. (2017). Robot task planning and explanation in open and uncertain worlds. Artificial Intelligence, 247, 119–150.
Article MathSciNet MATH Google Scholar
Haslum, P., Lipovetzky, N., Magazzeni, D., & Muise, C. (2019). An introduction to the planning domain definition language. Synthesis Lectures on Artificial Intelligence and Machine Learning, 13(2), 1–187.
Article MATH Google Scholar
Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research, 26, 191–246.
Article MATH Google Scholar
Hoffmann, J. (2001). FF: The fast-forward planning system. AI magazine, 22(3), 57–57.
Google Scholar
Huang, W., Abbeel, P., Pathak, D., Mordatch, I. (2022) Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. Thirty-ninth international conference on machine learning.
Huang, W., Xia, F., Shah, D., Driess, D., Zeng, A., & Lu, Y., et al.(2023). Grounded Decoding: Guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855.
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., & Florence, P., et al. (2022). Inner Monologue: Embodied Reasoning through Planning with Language Models. In: 6th Annual conference on robot learning.
Jiang, Y., Walker, N., Hart, J., &Stone, P. (2019) Open-world reasoning for service robots. In: Proceedings of the international conference on automated planning and scheduling. vol. 29; . p. 725–733.
Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., & Michael, K., et al. (2022). ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo.
Kant, Y., Ramachandran, A., Yenamandra, S., Gilitschenski, I., Batra, D., & Szot, A. et al. (2022) Housekeep: Tidying virtual households using commonsense reasoning. In: Computer vision–ECCV 2022. Springer; . p. 355–373.
Knoblock C.A., & Tenenberg, J.D., Yang, Q. (1991) Characterizing abstraction hierarchies for planning. In: Proceedings of the ninth national conference on artificial intelligence2692–697.
Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., & Fan, L., et al. (2022). Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems.
Lin, K., Agia, C., Migimatsu, T., Pavone, M., & Bohg, J. (2023). Text2Motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153.
Liu, B., Jiang, Y., Zhang, X., Liu, Q., Zhang, S., & Biswas, J., et al. (2023). LLM+P: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1–35.
Article Google Scholar
Lo, S. Y., Zhang, S., & Stone, P. (2020). The petlon algorithm to plan efficiently for task-level-optimal navigation. Journal of Artificial Intelligence Research., 69, 471–500.
Article MathSciNet MATH Google Scholar
Morrison, D., Corke, P., & Leitner, J. (2018). Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. In: Robotics: Science and Systems (RSS).
Nau, D. S., Au, T. C., Ilghami, O., Kuter, U., Murdock, J. W., Wu, D., et al. (2003). SHOP2: An HTN planning system. Journal of artificial intelligence research, 20, 379–404.
Article MATH Google Scholar
OpenAI.: ChatGPT. Cit. on pp. 1, 16. Accessed: 2023-02-08. Retrieved from: https://openai.com/blog/chatgpt/.
OpenAI.: GPT-4 technical report.
OpenAI.: Models–OpenAI API. Retrieved: 2023-07-10. https://platform.openai.com/docs/models/overview.
Perera, V., Soetens, R., Kollar, T., Samadi, M., Sun, Y., Nardi, D., et al. (2015). Learning task knowledge from dialog and web access. Robotics, 4(2), 223–252.
Article Google Scholar
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., & Fidler, S., et al. (2018). Virtualhome: Simulating household activities via programs. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. 8494–8502.
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., & Leibs, J., et al. (2009). ROS: an open-source Robot Operating System. In: ICRA workshop on open source software. vol. 3. Kobe, Japan; p. 5.
Reiter, R. (1981) On closed world data bases. In: Readings in artificial intelligence. Elsevier. p. 119–140.
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., & Tremblay, J., et al. (2023). Progprompt: Generating situated robot task plans using large language models. International Conference on Robotics and Automation (ICRA).
Song, CH., Wu, J., Washington, C., Sadler, BM., Chao, WL., & Su, Y. (2023). Llm-planner: Few-shot grounded planning for embodied agents with large language models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
Tucker, M., Aksaray, D., Paul, R., Stein, G.J., & Roy, N.(2020) Learning unknown groundings for natural language interaction with mobile robots. In: Robotics research. Springer; 317–333.
Valmeekam, K., Olmo, A., Sreedharan, S., & Kambhampati, S. (2022). Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). In: Foundation Models for Decision Making Workshop at Neural Information Processing Systems.
Valmeekam, K., Sreedharan, S., Marquez, M., Olmo, A., & Kambhampati, S. (2023). On the planning abilities of large language models (a critical investigation with a proposed benchmark). arXiv preprint arXiv:2302.06706. 2023;.
Wang, C., Liu, P., & Zhang, Y. (2021). Can generative pre-trained language models serve as knowledge bases for closed-book QA? In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing; 3241–3251.
West, P., Bhagavatula, C., Hessel, J., Hwang, JD., Jiang, L., & Bras, RL, et al. (2022). Symbolic knowledge distillation: From general language models to commonsense models. Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: Human language technologies.
Xie, Y., Yu, C., Zhu, T., Bai, J., Gong, Z., & Soh, H.(2023). Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128.
Yq, Jiang, Sq, Zhang, Khandelwal, P., & Stone, P. (2019). Task planning in robotics: An empirical comparison of PDDL-and ASP-based systems. Frontiers of Information Technology & Electronic Engineering., 20(3), 363–373.
Article Google Scholar
Zhao, Z., Lee, WS., & Hsu, D (2023). Large Language Models as Commonsense Knowledge for Large-Scale Task Planning, RSS Workshop on Learning for Task and Motion Planning
Zhang, X., Ding, Y., Amiri, S., Yang, H., Kaminski, A., & Esselink, C, et al. (2023). Grounding classical task planners via vision-language models. In: ICRA Workshop on Robot Execution Failures and Failure Management Strategies.
Zhang, N., Li, L., Chen, X., Deng, S., Bi, Z., & Tan, C. et al (2021). ifferentiable prompt makes pre-trained language models better few-shot learners. In: International conference on learning representations.
Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., & Chen, S., et al. (2022). OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592.

Download references

Acknowledgements

A portion of this work has taken place at the Autonomous Intelligent Robotics (AIR) Group, SUNY Binghamton. AIR research is supported in part by grants from the National Science Foundation (NRI-1925044), Ford Motor Company, OPPO, and SUNY Research Foundation.

Funding

A portion of this work has taken place at the Autonomous Intelligent Robotics (AIR) Group, SUNY Binghamton. AIR research is supported in part by grants from the National Science Foundation (NRI-1925044), Ford Motor Company, OPPO, and SUNY Research Foundation

Author information

Authors and Affiliations

The State University of New York at Binghamton, Binghamton, NY, 13902, USA
Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao & Shiqi Zhang
Ford Motor Company, Dearborn, MI, 18900, USA
Hao Yang, Andy Kaminski & Chad Esselink

Authors

Yan Ding
View author publications
You can also search for this author inPubMed Google Scholar
Xiaohan Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Saeid Amiri
View author publications
You can also search for this author inPubMed Google Scholar
Nieqing Cao
View author publications
You can also search for this author inPubMed Google Scholar
Hao Yang
View author publications
You can also search for this author inPubMed Google Scholar
Andy Kaminski
View author publications
You can also search for this author inPubMed Google Scholar
Chad Esselink
View author publications
You can also search for this author inPubMed Google Scholar
Shiqi Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

YD, XZ, SA, HY, AK, CE, and SZ contributed to the development of the initial ideas and methodology. YD, XZ, and SA contributed to implementing the methodology. YD, XZ, SA, and NC contributed to the experiments. YD, XZ, SA, HY, and SZ contributed to the analysis of the results. YD, XZ, SA, and SZ contributed to the manuscript writing. All authors reviewed and provided feedback on the manuscript.

Corresponding author

Correspondence to Yan Ding.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 78689 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ding, Y., Zhang, X., Amiri, S. et al. Integrating action knowledge and LLMs for task planning and situation handling in open worlds. Auton Robot 47, 981–997 (2023). https://doi.org/10.1007/s10514-023-10133-5

Download citation

Received: 02 May 2023
Accepted: 03 August 2023
Published: 29 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10514-023-10133-5

Keywords

Part of a collection:

Large Language Models in Robotics

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Integrating action knowledge and LLMs for task planning and situation handling in open worlds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Agent Can Say No: Robot Task Planning by Natural Language Feedback Between Planner and Executor

MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model

Human–robot interaction through joint robot planning with large language models

Explore related subjects

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now