Research | Hanna Krasowski

Autonomous system development is challenging due to dynamic environments, perception uncertainties, and model disturbances. Machine learning (ML) can solve complex automation tasks, yet purely ML-based controllers typically lack interpretability, safety guarantees, and require large high-quality training datasets. In contrast, model-based methods provide guarantees and explainability for autonomous systems, but often require substantial engineering and expert knowledge, leading to low transferability.

In my research, I aim to unlock the potential of ML-based techniques for real-world systems by incorporating formal methods to achieve data-efficient, reliable and interpretable autonomous systems. Using this hybrid control approach, I currently focus on methods that enable the use of abstract system knowledge to guide the learning process. Abstract system knowledge is rich in insight yet difficult to codify, for example traffic rules or natural-language descriptions of system behavior. I leverage formal methods, such as temporal logic and reachability analysis, to make abstract knowledge computationally tractable and design ML-based algorithms that learn from it.

Applications

I demonstrate the effectiveness of my control approaches on a variety of autonomous systems, with a special focus on motion planning for maritime vessels. Automation of maritime traffic holds immense potential to improve safety, prevent environmental damage, and enhance economic efficiency. At the same time, autonomous vessels provide a challenging safety-critical control problem due to low-frequency traffic data with significant uncertainty, complex dynamical behavior, as well as abstract knowledge implicit in legal documents or expert handbooks, which currently remains unused.

More recently, my research expanded into systems biology, where I apply my approaches to biomolecular systems. While large-scale genetic and protein data is available in biology, certain steps in diagnostics and drug discovery still lack adequate models due to data-scarcity. For example, in situ measurement often remains unfeasible or is prohibitively costly. My research aims to infer and synthesize biomolecular models by codifying abstract knowledge and qualitative observations, which enables using hybrid approaches to uncover cell-cell or gene-gene interaction.

In brief, I work at the intersection of formal methods, machine learning, and robotics. My work can be clustered into four thrusts:

Hybrid algorithms for reliable machine learning: Integrate guarantees into learning algorithms to ensure task or safety requirements are met.
Formal methods for model guidance: Codify abstract system knowledge with formal methods to efficiently guide model learning and enhance performance.
Automation of complex real-world tasks: Design and validate hybrid approaches by applying them to maritime vessel motion planning and biomolecular models.
Software for autonomous vessel navigation: Develop and publish open-source software for benchmarking and evaluation of vessel motion planners.

Selected Publications

2024

Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

Hanna Krasowski*, and Matthias Althoff

IEEE Transactions on Intelligent Vehicles, 2024

Abs arXiv Bib PDF

For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.
@article{Krasowski2024.safeRLautonomousVessels, archiveprefix = {arXiv}, title = {Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea}, author = {Krasowski*, Hanna and Althoff, Matthias}, year = {2024}, journal = {IEEE Transactions on Intelligent Vehicles}, volume = {}, pages = {1--18}, doi = {10.1109/TIV.2024.3400597}, issn = {2379-8904}, }
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking

Roland Stolz*, Hanna Krasowski*, Jakob Thumm, Michael Eichelbeck, Philipp Gassert, and Matthias Althoff

In Proc. of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

Abs arXiv Bib PDF Code Video

Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
@inproceedings{Stolz2024, archiveprefix = {arXiv}, title = {Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking}, author = {Stolz*, Roland and Krasowski*, Hanna and Thumm, Jakob and Eichelbeck, Michael and Gassert, Philipp and Althoff, Matthias}, year = {2024}, booktitle = {Proc. of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, }

2023

Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes

Niklas Kochdumper*, Hanna Krasowski*, Xiao Wang*, Stanley Bak, and Matthias Althoff

IEEE Open Journal of Control Systems, 2023

Abs arXiv Bib PDF Code Video

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
@article{Kochdumper2023.safeRLReachabilityAnalysis, author = {Kochdumper*, Niklas and Krasowski*, Hanna and Wang*, Xiao and Bak, Stanley and Althoff, Matthias}, journal = {IEEE Open Journal of Control Systems}, title = {Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes}, year = {2023}, volume = {2}, pages = {79-92}, doi = {10.1109/OJCSYS.2023.3256305}, }
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking

Hanna Krasowski*, Jakob Thumm*, Marlon Müller, Lukas Schäfer, Xiao Wang, and Matthias Althoff

Transactions on Machine Learning Research, 2023

Abs arXiv Bib PDF Code

Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
@article{Krasowski2023b.ProvablySafeRLSurvey, author = {Krasowski*, Hanna and Thumm*, Jakob and Müller, Marlon and Schäfer, Lukas and Wang, Xiao and Althoff, Matthias}, title = {Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking}, year = {2023}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, }