Tutorial ECAI2025: Formal Methods for Safe Reinforcement Learning Tutorial

Dr. Edwin Hamel-De le Court, Dr. Francesco Belardinelli


Tutorial Description

The goal of the tutorial is to present the emerging relationship between Reinforcement Learning (RL) and Temporal Logic (TL). The topic is an active area of research that builds a bridge between the RL and verification community. In a world where the safety of AI system is becoming more and more critical bridging a gap between both of those communities, one that deals with AI (RL) and one that deals with ways to check and guarantee system specifications (verification) is an important step forward in order to develop safe, reliable and trustworthy AI. The tutorial is designed to appeal to researchers in RL who want to understand how specifications coming from formal methods can be used, and to researchers in formal methods who want to know how RL can help finding policies that satisfy specifications coming from their field. Thus, the basics of both RL and TL will be carefully explained, and the tutorial will be comprehensible by a large audience.

At the end of the tutorial, the audience will walk away with the basics of RL and safe RL and with a good intuition of the properties expressible in Temporal Logic, so that they may know in which real-world applications RL with Temporal Logic specifications can be of use. Furthermore, the audience will have a good understanding of the basics of TL model checking (i.e. how to check that a system satisfies a TL specification), and reactive synthesis (i.e. how to compute a strategy for a known system so that it may satisfy a TL specification). Finally, the audience will have a good understanding of how to use the tools developed thus far to design RL algorithms with TL objectives or constraints, with a focus on how to obtain formal guarantees that the TL constraints may be satisfied. Several state-of-the-art methods will thus be introduced (e.g., constrained MDPs, shielding, etc.), with a focus on those that yield formal guarantees. A wide range real-life scenarios will be used throughout the tutorial to illustrate the core concepts of RL and LTL, and to show how to apply the tools and methods developed. These examples will range from robot motion planning to designing a controller for water tanks, going through developing an AI for games like Pac-Man.


  • Tutorial Outline


  • Presenters

    Dr. Francesco Belardinelli is Senior Lecturer at the Department of Computing, Imperial College London (ICL), where he is head of the lab on Formal Methods in AI. He is also deputy director of the UKRI CDT in Safe and Trusted AI.

    Dr Edwin Hamel-De le Court is PDRA at the Department of Computing, ICL. He is an expert in automata theory and controller synthesis; his research currently focuses on their application to Safe RL.


    References

    1. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin A. Riedmiller: Playing Atari with Deep Reinforcement Learn- ing. CoRR abs/1312.5602 (2013)

    2. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov: Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017)

    3. Achiam, Joshua and Dario Amodei. Benchmarking Safe Exploration in Deep Rein- forcement Learning. (2019)

    4. Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel: Constrained Policy Optimiza- tion. ICML 2017: 22-31

    5. Christel Baier, Joost-Pieter Katoen: Principles of model checking. MIT Press 2008, ISBN 978-0-262-02649-9, pp. I-XVII, 1-975

    6. Tom´as Br´azdil, Krishnendu Chatterjee, Martin Chmelik, Vojtech Forejt, Jan Kret´ınsky´, Marta Kwiatkowska, Tobias Meggendorfer, David Parker, Mateusz Ujma: Learning Al- gorithms for Verification of Markov Decision Processes. CoRR abs/2403.09184 (2024)

    7. Mohammadhosein Hasanbeig, Daniel Kroening, Alessandro Abate: Deep Reinforcement Learning with Temporal Logics. FORMATS 2020: 1-22

    8. Mohammed Alshiekh, Roderick Bloem, Ru¨diger Ehlers, Bettina K¨onighofer, Scott Niekum, Ufuk Topcu: Safe Reinforcement Learning via Shielding. AAAI 2018: 2669-2678