Validation of Reinforcement Learning Agents and Safety Shields with ProB

Vu, Fabian; Dunkelau, Jannik; Leuschel, Michael

doi:10.1007/978-3-031-60698-4_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14627))

Included in the following conference series:

NASA Formal Methods Symposium

662 Accesses
1 Citations

Abstract

Reinforcement learning (RL) is an important machine learning technique to train agents that make decisions autonomously. For safety-critical applications, however, the decision-making of an RL agent may not be intelligible to humans and thus difficult to validate, verify and certify.

This work presents a technique to link a concrete RL agent with a high-level formal B model of the safety shield and the environment. This allows us to run the RL agent in the formal method tool ProB, and particularly use the formal model to surround the agent with a safety shield at runtime. This paper also presents a methodology to validate the behavior of RL agents and respective safety shields with formal methods techniques, including trace replay, simulation, and statistical validation. The validation process is supported by domain-specific visualizations to ease human validation. Finally, we demonstrate the approach for a highway simulation.

The work of Fabian Vu is part of the KI-LOK project funded by the “Bundesministerium für Wirtschaft und Energie”; grant # 19/21007E, and the IVOIRE project funded by “Deutsche Forschungsgemeinschaft” (DFG) and the Austrian Science Fund (FWF) grant # I 4744-N.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 14871; Price includes VAT (Japan)

Softcover Book: JPY 18589; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Formal Methods Assisted Training of Safe Reinforcement Learning Agents

Verifiably Safe Off-Model Reinforcement Learning

Intelligent and Dependable Decision-Making Under Uncertainty

Notes

1.
Cf. relative deadlock freedom [1, Chapter 14] for a proof-based approach.
2.
Note that at least one operation must always be enabled as discussed before.
3.
https://github.com/hhu-stups/reinforcement-learning-b-models.
4.
A scenario is a sequence of events which alters the system’s state. Scenarios as static exports [36] available at: https://hhu-stups.github.io/highway-env-b-model/.

References

Abrial, J.R.: Modeling in Event-B: System and Software Engineering. Cambridge University Press, Cambridge (2010). https://doi.org/10.1017/CBO9781139195881
Abrial, J.R., Hoare, A.: The B-Book: Assigning Programs to Meanings. Cambridge University Press, Cambridge (2005). https://doi.org/10.1017/CBO9780511624162
Aichernig, B.K., Tappler, M.: Probabilistic black-box reachability checking (extended version). Form. Methods Syst. Des. 54(3), 416–448 (2019). https://doi.org/10.1007/s10703-019-00333-0
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings AAAI, pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.11797
Bendisposto, J., et al.: ProB2-UI: a java-based user interface for ProB. In: Lluch Lafuente, A., Mavridou, A. (eds.) FMICS 2021. LNCS, vol. 12863, pp. 193–201. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85248-1_12
Chapter Google Scholar
Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Proceedings AAAI, pp. 6485–6492. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.12107
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: Ai2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2018). https://doi.org/10.1109/SP.2018.00058
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1
Chapter Google Scholar
Huang, X., Ruan, W., Tang, Q., Zhao, X.: Bridging formal methods and machine learning with global optimisation. In: Riesco, A., Zhang, M. (eds.) Formal Methods and Software Engineering. ICFEM 2022. LNCS, vol. 13478, pp. 1–19. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-17244-1_1
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Chapter Google Scholar
Kobayashi, T., Bondu, M., Ishikawa, F.: Formal modelling of safety architecture for responsibility-aware autonomous vehicle via event-b refinement. In: Proceedings FM’2023, pp. 533–549 (2023). https://doi.org/10.1007/978-3-031-27481-7_30
Könighofer, B., Lorber, F., Jansen, N., Bloem, R.: Shield synthesis for reinforcement learning. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 290–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_16
Chapter Google Scholar
Krings, S.: Towards Infinite-State Symbolic Model Checking for B and Event-B. Ph.D. thesis, Heinrich Heine Universität Düsseldorf, August 2017
Google Scholar
Landers, M., Doryab, A.: Deep reinforcement learning verification: a survey. ACM Comput. Surv. 55(14s) (2023). https://doi.org/10.1145/3596444
Leurent, E.: An Environment for Autonomous Driving Decision-Making (2018). https://github.com/eleurent/highway-env
Leuschel, M., Butler, M.: ProB: a model checker for B. In: Araki, K., Gnesi, S., Mandrioli, D. (eds.) FME 2003. LNCS, vol. 2805, pp. 855–874. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45236-2_46
Chapter Google Scholar
Leuschel, M., Butler, M.: ProB: an automated analysis toolset for the B method. STTT 10(2), 185–203 (2008). https://doi.org/10.1007/s10009-007-0063-9
Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning Markov decision processes for model checking. In: Proceedings QFM, pp. 49–63. EPTCS 103, Open Publishing Association (2012). http://dx.doi.org/10.4204/EPTCS.103.6
Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning deterministic probabilistic automata from a model checking perspective. Mach. Learn. 105(2), 255–299 (2016). https://doi.org/10.1007/s10994-016-5565-9
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Peer, E., Menkovski, V., Zhang, Y., Lee, W.J.: Shunting trains with deep reinforcement learning. In: Proceedings SMC, pp. 3063–3068. IEEE (2018). https://doi.org/10.1109/SMC.2018.00520
Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97–114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6
Chapter Google Scholar
Plagge, D., Leuschel, M.: Seven at one stroke: LTL model checking for high-level specifications in B, Z, CSP, and more. STTT 12(1), 9–21 (2010). https://doi.org/10.1007/s10009-009-0132-3
Razzaghi, P., et al.: A Survey on Reinforcement Learning in Aviation Applications (2022). https://doi.org/10.48550/arXiv.2211.02147
Ruan, W., Huang, X., Kwiatkowska, M.: Reachability analysis of deep neural networks with provable guarantees. In: Proceedings IJCAI, pp. 2651–2659 (2018). https://doi.org/10.24963/ijcai.2018/368
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
Sallab, A., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 29(19), 70–76 (2017). https://doi.org/10.2352/ISSN.2470-1173.2017.19.AVM-023
Article Google Scholar
Seshia, S.A., Sadigh, D., Sastry, S.S.: Toward verified artificial intelligence. Commun. ACM 65(7), 46–55 (2022). https://doi.org/10.1145/3503914
Sha, L.: Using simplicity to control complexity. IEEE Softw. 18(4), 20–28 (2001). https://doi.org/10.1109/MS.2001.936213
Article Google Scholar
Shalev-Shwartz, S., Shammah, S., Shashua, A.: On a formal model of safe and scalable self-driving cars. CoRR abs/1708.06374 (2017). http://arxiv.org/abs/1708.06374
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Tappler, M., Aichernig, B.K.: Differential safety testing of deep RL agents enabled by automata learning. In: Steffen, B. (eds.) Bridging the Gap Between AI and Reality. AISoLA 2023. LNCS, vol. 14380, pp. 138–159 Springer, Cham (2024). https://doi.org/10.1007/978-3-031-46002-9_8
Tappler, M., Cano Cordoba, F., Aichernig, B., Könighofer, B.: Search-based testing of reinforcement learning. In: Proceedings IJCAI, pp. 503–510 (2022). https://doi.org/10.24963/ijcai.2022/72
Tran, H.D., Cai, F., Diego, M.L., Musau, P., Johnson, T.T., Koutsoukos, X.: Safety verification of cyber-physical systems with reinforcement learning control. ACM Trans. Embed. Comput. Syst. 18(5s), 1–22 (2019). https://doi.org/10.1145/3358230
Vouros, G.A.: Explainable deep reinforcement learning: state of the art and challenges. ACM Comput. Surv. 55(5), 1–39 (2022). https://doi.org/10.1145/3527448
Article Google Scholar
Vu, F., Happe, C., Leuschel, M.: Generating interactive documents for domain-specific validation of formal models. STTT (2024). https://doi.org/10.1007/s10009-024-00739-0
Vu, F., Leuschel, M.: Validation of formal models by interactive simulation. In: Glässer, U., Creissac Campos, J., Méry, D., Palanque, P. (eds.) ABZ 2023. LNCS, vol. 14010, pp. 59–69. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-33163-3_5
Chapter Google Scholar
Vu, F., Leuschel, M., Mashkoor, A.: Validation of formal models by timed probabilistic simulation. In: Raschke, A., Méry, D. (eds.) ABZ 2021. LNCS, vol. 12709, pp. 81–96. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77543-8_6
Chapter Google Scholar
Wang, X., Nair, S., Althoff, M.: Falsification-based robust adversarial reinforcement learning. In: Proceedings ICMLA, pp. 205–212. IEEE (2020). https://doi.org/10.1109/ICMLA51294.2020.00042
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698
Werth, M., Leuschel, M.: VisB: a lightweight tool to visualize formal models with SVG graphics. In: Raschke, A., Méry, D., Houdek, F. (eds.) ABZ 2020. LNCS, vol. 12071, pp. 260–265. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48077-6_21
Chapter Google Scholar
Zhou, Z.-H.: Machine Learning. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-1967-3
Book Google Scholar

Download references

Acknowledgements

We would like to thank Davin Holten for his initial experiments showing the feasibility of ProB’s techniques — especially of SimB — for the highway RL agent.

Author information

Authors and Affiliations

Mathematisch-Naturwissenschaftliche Fakultät Institut für Informatik, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
Fabian Vu, Jannik Dunkelau & Michael Leuschel

Authors

Fabian Vu
View author publications
You can also search for this author in PubMed Google Scholar
Jannik Dunkelau
View author publications
You can also search for this author in PubMed Google Scholar
Michael Leuschel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fabian Vu or Jannik Dunkelau .

Editor information

Editors and Affiliations

NASA Ames Research Center, Moffett Field, CA, USA
Nathaniel Benz
Ames Research Center, Moffett Field, CA, USA
Divya Gopinath
NASA Ames Research Center, Moffet Field, CA, USA
Nija Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vu, F., Dunkelau, J., Leuschel, M. (2024). Validation of Reinforcement Learning Agents and Safety Shields with ProB. In: Benz, N., Gopinath, D., Shi, N. (eds) NASA Formal Methods. NFM 2024. Lecture Notes in Computer Science, vol 14627. Springer, Cham. https://doi.org/10.1007/978-3-031-60698-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-60698-4_16
Published: 26 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60697-7
Online ISBN: 978-3-031-60698-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Validation of Reinforcement Learning Agents and Safety Shields with ProB

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Formal Methods Assisted Training of Safe Reinforcement Learning Agents

Verifiably Safe Off-Model Reinforcement Learning

Intelligent and Dependable Decision-Making Under Uncertainty

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Validation of Reinforcement Learning Agents and Safety Shields with ProB

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Formal Methods Assisted Training of Safe Reinforcement Learning Agents

Verifiably Safe Off-Model Reinforcement Learning

Intelligent and Dependable Decision-Making Under Uncertainty

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation