A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents

Liu, Tongtong; McCalmon, Joe; Le, Thai; Rahman, Md Asifur; Lee, Dongwon; Alqahtani, Sarra

doi:10.1007/s10458-023-09615-8

A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents

Published: 09 August 2023

Volume 37, article number 34, (2023)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Tongtong Liu¹,
Joe McCalmon¹,
Thai Le²,
Md Asifur Rahman¹,
Dongwon Lee³ &
…
Sarra Alqahtani¹

468 Accesses
Explore all metrics

Abstract

As reinforcement learning (RL) continues to improve and be applied in situations alongside humans, the need to explain the learned behaviors of RL agents to end-users becomes more important. Strategies for explaining the reasoning behind an agent’s policy, called policy-level explanations, can lead to important insights about both the task and the agent’s behaviors. Following this line of research, in this work, we propose a novel approach, named as CAPS, that summarizes an agent’s policy in the form of a directed graph with natural language descriptions. A decision tree based clustering method is utilized to abstract the state space of the task into fewer, condensed states which makes the policy graphs more digestible to end-users. We then use the user-defined predicates to enrich the abstract states with semantic meaning. To introduce counterfactual state explanations to the policy graph, we first identify the critical states in the graph then develop a novel counterfactual explanation method based on action perturbation in those critical states. We generate explanation graphs using CAPS on 5 RL tasks, using both deterministic and stochastic policies. We also evaluate the effectiveness of CAPS on human participants who are not RL experts in two user studies. When provided with our explanation graph, end-users are able to accurately interpret policies of trained RL agents 80% of the time, compared to 10% when provided with the next best baseline and \(68.2\%\) of users demonstrated an increase in their confidence in understanding an agent’s behavior after provided with the counterfactual explanations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

Article Open access 06 March 2023

Automatic discovery of interpretable planning strategies

Article Open access 09 April 2021

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Our code is available for reproducibility in: https://github.com/mccajl/CAPS.

Notes

The Open AI Gym design of this game is here: https://github.com/openai/gym/blob/master/gym/envs/toy_text/blaackjack.py.
In our study, 74.5% of participants self-evaluated themselves as having either no knowledge about RL at all or had heard about it but do not know any technical details.

References

Benbrahim, H., & Franklin, J. A. (1997). Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems, 22, 283–302.
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5603.
Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J., Songhori, E., Wang, S., Lee, Y.-J., Johnson, E., Pathak, O., Bae, S., Nazi, A., Pak, J., Tong, A., Srinivasa, K., Hang, W., Tuncer, E., Babu, A., Le, Q.V., Laudon, J., Ho, R., Carpenter, R., & Dean, J. (2020). Chip placement with deep reinforcement learning. https://arxiv.org/pdf/2004.10746.pdf.
Liu, N., Li, Z., Xu, J., Xu, Z., Lin, S., Qiu, Q., Tang, J., & Wang, Y. (2017). A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning, pp. 372–382 https://doi.org/10.1109/ICDCS.2017.123.
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 2219–2225). https://doi.org/10.1109/IROS.2006.282564.
Huang, S. H., Bhatia, K., Abbeel, P., & Dragan, A. D. (2018). Establishing appropriate trust via critical states. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS).
Hayes, B., & Shah, J. (2017). Improving robot controller transparency through autonomous policy explanation. In 2017 12th ACM/IEEE international conference on human–robot interaction (HRI) (pp. 303–312).
Zahavy, T., Ben-Zrihem, N., & Mannor, S. (2016). Graying the black box: Understanding dqns. In M. .F. Balcan & K. .Q. Weinberger (Eds.), Proceedings of the 33rd international conference on machine learning. Proceedings of machine learning research (Vol. 48, pp. 1899–1908). PMLR.
Google Scholar
Topin, N., & Veloso, M. (2019). Generation of policy-level explanations for reinforcement learning. In The thirty-third AAAI conference on artificial intelligence, AAAI 2019 (pp. 2514–2521). AAAI Press. https://aaai.org/ojs/index.php/AAAI/article/view/4097.
Olson, M. L., Khanna, R., Neal, L., Li, F., & Wong, W.-K. (2021). Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artificial Intelligence, 295, 103455. https://doi.org/10.1016/j.artint.2021.103455
Article MathSciNet Google Scholar
Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., & Doshi-Velez, F. (2019) Explainable reinforcement learning via reward decomposition. In Proceedings at the international joint conference on artificial intelligence.
Aniek Markus, J. K., & Rijnbeek, P. (2021). The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. Journal of Biomedical Informatics, 113, 103655. https://doi.org/10.1016/j.jbi.2020.103655
Article Google Scholar
Madumal, P., Miller, T., Sonenberg, L., & Vetere, F. (2020). Explainable reinforcement learning through a causal lens. In The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020 (pp. 2493–2500). AAAI Press. https://aaai.org/ojs/index.php/AAAI/article/view/5631.
Liu, B., Xia, Y., & Yu, P. S. (2004). Clustering via decision tree construction. In Foundations and advances in data mining. Studies in fuzziness and soft computing (Vol. 180). Springer, Berlin, Heidelberg. https://doi.org/10.1007/11362197_5.
Schulman, J., Wolski, F., Radford, A., Dhariwal, P., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. In arXiv preprint arXiv:1312.5602
Liu, Guiliang & Schulte, Oliver & Zhu, Wang & Li, Qingcan, 2018. Toward interpretable deep reinforcement learning with linear model U-trees: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part II. https://doi.org/10.1007/978-3-030-10928-8_25.
van der Waa, J., van Diggelen, J., van den Bosch, K., & Neerincx, M. (2018). Contrastive explanations for reinforcement learning in terms of expected consequences.
Iyer, R. R., Li, Y., Li, H., Lewis, M., Sundar, R., & Sycara, K. P. (2018). Transparency and explanation in deep reinforcement learning neural networks. In Proceedings of the 2018 AAAI/ACM conference on AI, ethics, and society.
Yang, Z., Bai, S., Zhang, L., & Torr, P. H. S. (2019). Learn to interpret Atari agents.
Greydanus, S., Koul, A., Dodge, J., & Fern, A. (2018). Visualizing and understanding Atari agents. In J. Dy & A. Krause (Eds.), Proceedings of the 35th international conference on machine learning. Proceedings of machine learning research (Vol. 80, pp. 1792–1801). PMLR.
Google Scholar
Amir, D., & Amir, O. (2018) Highlights: Summarizing agent behavior to people. In AAMAS.
McCalmon, J., Le, T., Alqahtani, S., & Lee, D. (2022) Caps: Comprehensible abstract policy summaries for explaining reinforcement learning agents. In Proceedings of the 21st international conference on autonomous agents and multiagent systems. AAMAS ’22 (pp. 889–897). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC.
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.
Article MathSciNet Google Scholar
Moore, A. (1995). Variable resolution reinforcement learning. CMU-RI-TR-95-19. https://apps.dtic.mil/sti/tr/pdf/ADA311507.pdf.
Rodgers, J., & Nicewander, A. (1988). Thirteen ways to look at the correlation coefficient. American Statistician, 42, 59–66. https://doi.org/10.1080/00031305.1988.10475524
Article Google Scholar
Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell., 267, 1–38.
Article MathSciNet Google Scholar
Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., & Lee, S. (2019). Counterfactual visual explanations. In ICML (pp. 2376–2384). http://proceedings.mlr.press/v97/goyal19a.html.
Uesato, J., Kumar, A., Szepesvari, C., Erez, T., Ruderman, A., Anderson, K., Heess, N., & Kohli, P. (2018). Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures. arXiv. https://doi.org/10.48550/ARXIV.1812.01647. arXiv:1812.01647.
Abolfathi, E. A., Luo, J., Yadmellat, P., & Rezaee, K. (2021). Coachnet: An adversarial sampling approach for reinforcement learning. arXiv preprint arXiv:2101.02649.
van der Waa, J., van Diggelen, J., Bosch, K. V. D., & Mark, N. (2018). Contrastive explanations for reinforcement learning in terms of expected consequences. https://doi.org/10.48550/ARXIV.1807.08706.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv:1606.01540.

Download references

Funding

The work was in part supported by NSF awards #1950491, #1909702, and #2105007.

Author information

Authors and Affiliations

Computer Science Department, Wake Forest Univeristy, 1834 Wake Forest Rd, Winston-Salem, NC, 27109, USA
Tongtong Liu, Joe McCalmon, Md Asifur Rahman & Sarra Alqahtani
Computer and Information Science Department, University of Mississippi, 1848 University, Oxford, MS, 38677, USA
Thai Le
College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, 16802, USA
Dongwon Lee

Authors

Tongtong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Joe McCalmon
View author publications
You can also search for this author in PubMed Google Scholar
Thai Le
View author publications
You can also search for this author in PubMed Google Scholar
Md Asifur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Dongwon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sarra Alqahtani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TL developed the counterfactual explanation and co-designed the second study. JM developed the initial CAPS without the counterfactual explanation and co-designed the first case study. TL co-designed the user studies and analyzed them. MAR co-designed the user studies and generated the fidelity tests. DL advised the co-authors in the development process and co-wrote the manuscript. SA advised the co-authors in the development process and wrote the manuscript. All authors participated in the brainstorming stage of this work.

Corresponding author

Correspondence to Sarra Alqahtani.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Ethical approval

Our user studies were approved by the Institutional Review Board at Wake Forest University with the number IRB00024657. The approved IRB followed the Exemption Category 3: Research involving benign behavioral interventions is conjunction with the collection of information for an adult subject through verbal or written responses (including data entry) or audiovisual recording if the subject prospectively agrees to the intervention and information collected. (A) The information obtained is recorded by the investigator in such a manner that the identity of the human subjects cannot readily be ascertained, directly or through identifiers linked to the subjects; (B) Any disclosure of the human subjects’ responses outside the research would not reasonably place the subjects at risk of criminal liability or be damaging to the subjects’ financial standing, employability, educational advancement, or reputation; or (C) The information is recorded by the investigator in such a manner that the identity of the human subjects can readily be ascertained, directly or through identifiers linked to the subjects.

Participant consent

You are invited to participate in a research study explaining the behavior of artificial intelligence agents. We are investigating whether summarizing the agent’s behavior using English improves the end user’s understanding and increases their trust in the agent’s behavior. In this study, you will complete several questionnaires to measure your understanding of the agent’s behavior. You may discontinue your participation at any time without penalty by closing your browser window. Any responses entered to that point will be deleted. You may also choose not to answer any question(s) you do not wish to answer for any reason. You may choose to skip any question(s) for any reason. We encourage you to print or save a copy of this page for your records (or future reference). By clicking on “I agree”, you indicate that you are at least 18 years old and that you agree to participate in this research project. You will advance to the experiment. If you do not wish to participate, please close your browser window. Completing the experiment should take about 15 min. You will earn 20 cents for each minute you spend on the experiment (there will be a time limit for each question) and you will get 10 cents extra for each correct answer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 1310 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, T., McCalmon, J., Le, T. et al. A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents. Auton Agent Multi-Agent Syst 37, 34 (2023). https://doi.org/10.1007/s10458-023-09615-8

Download citation

Accepted: 09 July 2023
Published: 09 August 2023
DOI: https://doi.org/10.1007/s10458-023-09615-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

Automatic discovery of interpretable planning strategies

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Participant consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 1310 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

Automatic discovery of interpretable planning strategies

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Explore related subjects

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Participant consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 1310 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation