Validation of Reinforcement Learning Agents and Safety Shields with ProB | SpringerLink
Skip to main content

Validation of Reinforcement Learning Agents and Safety Shields with ProB

  • Conference paper
  • First Online:
NASA Formal Methods (NFM 2024)

Abstract

Reinforcement learning (RL) is an important machine learning technique to train agents that make decisions autonomously. For safety-critical applications, however, the decision-making of an RL agent may not be intelligible to humans and thus difficult to validate, verify and certify.

This work presents a technique to link a concrete RL agent with a high-level formal B model of the safety shield and the environment. This allows us to run the RL agent in the formal method tool ProB, and particularly use the formal model to surround the agent with a safety shield at runtime. This paper also presents a methodology to validate the behavior of RL agents and respective safety shields with formal methods techniques, including trace replay, simulation, and statistical validation. The validation process is supported by domain-specific visualizations to ease human validation. Finally, we demonstrate the approach for a highway simulation.

The work of Fabian Vu is part of the KI-LOK project funded by the “Bundesministerium für Wirtschaft und Energie”; grant # 19/21007E, and the IVOIRE project funded by “Deutsche Forschungsgemeinschaft” (DFG) and the Austrian Science Fund (FWF) grant # I 4744-N.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 18589
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Cf. relative deadlock freedom [1, Chapter 14] for a proof-based approach.

  2. 2.

    Note that at least one operation must always be enabled as discussed before.

  3. 3.

    https://github.com/hhu-stups/reinforcement-learning-b-models.

  4. 4.

    A scenario is a sequence of events which alters the system’s state. Scenarios as static exports [36] available at: https://hhu-stups.github.io/highway-env-b-model/.

References

  1. Abrial, J.R.: Modeling in Event-B: System and Software Engineering. Cambridge University Press, Cambridge (2010). https://doi.org/10.1017/CBO9781139195881

  2. Abrial, J.R., Hoare, A.: The B-Book: Assigning Programs to Meanings. Cambridge University Press, Cambridge (2005). https://doi.org/10.1017/CBO9780511624162

  3. Aichernig, B.K., Tappler, M.: Probabilistic black-box reachability checking (extended version). Form. Methods Syst. Des. 54(3), 416–448 (2019). https://doi.org/10.1007/s10703-019-00333-0

  4. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings AAAI, pp. 2669–2678. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.11797

  5. Bendisposto, J., et al.: ProB2-UI: a java-based user interface for ProB. In: Lluch Lafuente, A., Mavridou, A. (eds.) FMICS 2021. LNCS, vol. 12863, pp. 193–201. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85248-1_12

    Chapter  Google Scholar 

  6. Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Proceedings AAAI, pp. 6485–6492. AAAI Press (2018). https://doi.org/10.1609/aaai.v32i1.12107

  7. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: Ai2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2018). https://doi.org/10.1109/SP.2018.00058

  8. Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1

    Chapter  Google Scholar 

  9. Huang, X., Ruan, W., Tang, Q., Zhao, X.: Bridging formal methods and machine learning with global optimisation. In: Riesco, A., Zhang, M. (eds.) Formal Methods and Software Engineering. ICFEM 2022. LNCS, vol. 13478, pp. 1–19. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-17244-1_1

  10. Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5

    Chapter  Google Scholar 

  11. Kobayashi, T., Bondu, M., Ishikawa, F.: Formal modelling of safety architecture for responsibility-aware autonomous vehicle via event-b refinement. In: Proceedings FM’2023, pp. 533–549 (2023). https://doi.org/10.1007/978-3-031-27481-7_30

  12. Könighofer, B., Lorber, F., Jansen, N., Bloem, R.: Shield synthesis for reinforcement learning. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 290–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_16

    Chapter  Google Scholar 

  13. Krings, S.: Towards Infinite-State Symbolic Model Checking for B and Event-B. Ph.D. thesis, Heinrich Heine Universität Düsseldorf, August 2017

    Google Scholar 

  14. Landers, M., Doryab, A.: Deep reinforcement learning verification: a survey. ACM Comput. Surv. 55(14s) (2023). https://doi.org/10.1145/3596444

  15. Leurent, E.: An Environment for Autonomous Driving Decision-Making (2018). https://github.com/eleurent/highway-env

  16. Leuschel, M., Butler, M.: ProB: a model checker for B. In: Araki, K., Gnesi, S., Mandrioli, D. (eds.) FME 2003. LNCS, vol. 2805, pp. 855–874. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45236-2_46

    Chapter  Google Scholar 

  17. Leuschel, M., Butler, M.: ProB: an automated analysis toolset for the B method. STTT 10(2), 185–203 (2008). https://doi.org/10.1007/s10009-007-0063-9

  18. Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning Markov decision processes for model checking. In: Proceedings QFM, pp. 49–63. EPTCS 103, Open Publishing Association (2012). http://dx.doi.org/10.4204/EPTCS.103.6

  19. Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning deterministic probabilistic automata from a model checking perspective. Mach. Learn. 105(2), 255–299 (2016). https://doi.org/10.1007/s10994-016-5565-9

  20. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236

  21. Peer, E., Menkovski, V., Zhang, Y., Lee, W.J.: Shunting trains with deep reinforcement learning. In: Proceedings SMC, pp. 3063–3068. IEEE (2018). https://doi.org/10.1109/SMC.2018.00520

  22. Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97–114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6

    Chapter  Google Scholar 

  23. Plagge, D., Leuschel, M.: Seven at one stroke: LTL model checking for high-level specifications in B, Z, CSP, and more. STTT 12(1), 9–21 (2010). https://doi.org/10.1007/s10009-009-0132-3

  24. Razzaghi, P., et al.: A Survey on Reinforcement Learning in Aviation Applications (2022). https://doi.org/10.48550/arXiv.2211.02147

  25. Ruan, W., Huang, X., Kwiatkowska, M.: Reachability analysis of deep neural networks with provable guarantees. In: Proceedings IJCAI, pp. 2651–2659 (2018). https://doi.org/10.24963/ijcai.2018/368

  26. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

  27. Sallab, A., Abdou, M., Perot, E., Yogamani, S.: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 29(19), 70–76 (2017). https://doi.org/10.2352/ISSN.2470-1173.2017.19.AVM-023

    Article  Google Scholar 

  28. Seshia, S.A., Sadigh, D., Sastry, S.S.: Toward verified artificial intelligence. Commun. ACM 65(7), 46–55 (2022). https://doi.org/10.1145/3503914

  29. Sha, L.: Using simplicity to control complexity. IEEE Softw. 18(4), 20–28 (2001). https://doi.org/10.1109/MS.2001.936213

    Article  Google Scholar 

  30. Shalev-Shwartz, S., Shammah, S., Shashua, A.: On a formal model of safe and scalable self-driving cars. CoRR abs/1708.06374 (2017). http://arxiv.org/abs/1708.06374

  31. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  32. Tappler, M., Aichernig, B.K.: Differential safety testing of deep RL agents enabled by automata learning. In: Steffen, B. (eds.) Bridging the Gap Between AI and Reality. AISoLA 2023. LNCS, vol. 14380, pp. 138–159 Springer, Cham (2024). https://doi.org/10.1007/978-3-031-46002-9_8

  33. Tappler, M., Cano Cordoba, F., Aichernig, B., Könighofer, B.: Search-based testing of reinforcement learning. In: Proceedings IJCAI, pp. 503–510 (2022). https://doi.org/10.24963/ijcai.2022/72

  34. Tran, H.D., Cai, F., Diego, M.L., Musau, P., Johnson, T.T., Koutsoukos, X.: Safety verification of cyber-physical systems with reinforcement learning control. ACM Trans. Embed. Comput. Syst. 18(5s), 1–22 (2019). https://doi.org/10.1145/3358230

  35. Vouros, G.A.: Explainable deep reinforcement learning: state of the art and challenges. ACM Comput. Surv. 55(5), 1–39 (2022). https://doi.org/10.1145/3527448

    Article  Google Scholar 

  36. Vu, F., Happe, C., Leuschel, M.: Generating interactive documents for domain-specific validation of formal models. STTT (2024). https://doi.org/10.1007/s10009-024-00739-0

  37. Vu, F., Leuschel, M.: Validation of formal models by interactive simulation. In: Glässer, U., Creissac Campos, J., Méry, D., Palanque, P. (eds.) ABZ 2023. LNCS, vol. 14010, pp. 59–69. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-33163-3_5

    Chapter  Google Scholar 

  38. Vu, F., Leuschel, M., Mashkoor, A.: Validation of formal models by timed probabilistic simulation. In: Raschke, A., Méry, D. (eds.) ABZ 2021. LNCS, vol. 12709, pp. 81–96. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77543-8_6

    Chapter  Google Scholar 

  39. Wang, X., Nair, S., Althoff, M.: Falsification-based robust adversarial reinforcement learning. In: Proceedings ICMLA, pp. 205–212. IEEE (2020). https://doi.org/10.1109/ICMLA51294.2020.00042

  40. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992). https://doi.org/10.1007/BF00992698

  41. Werth, M., Leuschel, M.: VisB: a lightweight tool to visualize formal models with SVG graphics. In: Raschke, A., Méry, D., Houdek, F. (eds.) ABZ 2020. LNCS, vol. 12071, pp. 260–265. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48077-6_21

    Chapter  Google Scholar 

  42. Zhou, Z.-H.: Machine Learning. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-1967-3

    Book  Google Scholar 

Download references

Acknowledgements

We would like to thank Davin Holten for his initial experiments showing the feasibility of ProB’s techniques — especially of SimB — for the highway RL agent.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fabian Vu or Jannik Dunkelau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vu, F., Dunkelau, J., Leuschel, M. (2024). Validation of Reinforcement Learning Agents and Safety Shields with ProB. In: Benz, N., Gopinath, D., Shi, N. (eds) NASA Formal Methods. NFM 2024. Lecture Notes in Computer Science, vol 14627. Springer, Cham. https://doi.org/10.1007/978-3-031-60698-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-60698-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-60697-7

  • Online ISBN: 978-3-031-60698-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics