Reachability analysis evaluates system safety, by identifying the set of states a system may evolve within over a finite time horizon. In contrast to model-based reachability analysis, data-driven reachability analysis estimates reachable sets and derives probabilistic guarantees directly from data. Several popular techniques for validating reachable sets – conformal prediction, scenario optimization, and the holdout method – admit similar Probably Approximately Correct (PAC) guarantees. We establish a formal connection between these PAC bounds and present an empirical case study on reachable sets to illustrate the computational and sample trade-offs associated with these methods. We argue that despite the formal relationship between these techniques, subtle differences arise in both the interpretation of guarantees and the parameterization. As a result, these methods are not generally interchangeable. We conclude with practical advice on the usage of these methods.
Importance Sampling for Statistical Certification of Viable Initial Sets
Elizabeth Dietrich, Hanna Krasowski, Vegard Flovik, and Murat Arcak
We study the problem of statistically certifying viable initial sets (VISs) – sets of initial conditions whose trajectories satisfy a given control specification. While VISs can be obtained from model-based methods, these methods typically rely on simplified models. We propose a simulation-based framework to certify VISs by estimating the probability of specification violations under a high-fidelity or black-box model. Since detecting these violations may be challenging due to their scarcity, we propose a sample-efficient framework that leverages importance sampling to target high-risk regions. We derive an empirical Bernstein inequality for weighted random variables, enabling finite-sample guarantees for importance sampling estimators. We demonstrate the effectiveness of the proposed approach on two systems and show improved convergence of the resulting bounds on an Adaptive Cruise Control benchmark.
Finite-Step Invariant Sets for Hybrid Systems with Probabilistic Guarantees
Varun Madabushi*, Elizabeth Dietrich*, Hanna Krasowski, and Maegan Tucker
Poincare return maps are a fundamental tool for analyzing periodic orbits in hybrid dynamical systems, including legged locomotion, power electronics, and other cyber-physical systems with switching behavior. The Poincare return map captures the evolution of the hybrid system on a guard surface, reducing the stability analysis of a periodic orbit to that of a discrete-time system. While linearization provides local stability information, assessing robustness to disturbances requires identifying invariant sets of the state space under the return dynamics. However, computing such invariant sets is computationally difficult, especially when system dynamics are only available through forward simulation. In this work, we propose an algorithmic framework leveraging sampling-based optimization to compute a finite-step invariant ellipsoid around a nominal periodic orbit using sampled evaluations of the return map. The resulting solution is accompanied by probabilistic guarantees on finite-step invariance satisfying a user-defined accuracy threshold. We demonstrate the approach on two low-dimensional systems and a compass-gait walking model.
2025
pacSTL: PAC-Bounded Signal Temporal Logic from Data-Driven Reachability Analysis
Elizabeth Dietrich*, Hanna Krasowski*, Emir Cem Gezer, Roger Skjetne, Asgeir Johan Sørensen, and Murat Arcak
Real-world robotic systems must comply with safety requirements in the presence of uncertainty. To define and measure requirement adherence, Signal Temporal Logic (STL) offers a mathematically rigorous and expressive language. However, standard STL cannot account for uncertainty. We address this problem by presenting pacSTL, a framework that combines Probably Approximately Correct (PAC) bounded set predictions with an interval-extension of STL through optimization problems on the atomic proposition level. pacSTL provides PAC-bounded robustness intervals on the specification level that can be utilized in monitoring. We demonstrate the effectiveness of this approach through maritime navigation and analyze the efficiency and scalability of pacSTL through simulation and real-world experimentation on model vessels.
Intelligent Sailing Model for Open Sea Navigation
Hanna Krasowski*, Stefan Schärdinger*, Murat Arcak, and Matthias Althoff
Autonomous vessels potentially enhance safety and reliability of seaborne trade. To facilitate the development of autonomous vessels, high-fidelity simulations are required to model realistic interactions with other vessels. However, modeling realistic interactive maritime traffic is challenging due to the unstructured environment, coarsely specified traffic rules, and largely varying vessel types. Currently, there is no standard for simulating interactive maritime environments in order to rigorously benchmark autonomous vessel algorithms. In this paper, we introduce the first intelligent sailing model (ISM), which simulates rule-compliant vessels for navigation on the open sea. An ISM vessel reacts to other traffic participants according to maritime traffic rules while at the same time solving a motion planning task characterized by waypoints. In particular, the ISM monitors the applicable rules, generates rule-compliant waypoints accordingly, and utilizes a model predictive control for tracking the waypoints. We evaluate the ISM in two environments: interactive traffic with only ISM vessels and mixed traffic where some vessel trajectories are from recorded real-world maritime traffic data or handcrafted for criticality. Our results show that simulations with many ISM vessels of different vessel types are rule-compliant and scalable. We tested 4,049 critical traffic scenarios. For interactive traffic with ISM vessels, no collisions occurred while goal-reaching rates of about 97 percent were achieved. We believe that our ISM can serve as a standard for challenging and realistic maritime traffic simulation to accelerate autonomous vessel development.
Published
2026
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Marlon Müller*, Florian Finkeldei*, Hanna Krasowski*, Murat Arcak, and Matthias Althoff
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
@article{Mueller2026,archiveprefix={arXiv},title={Falsification-Driven Reinforcement Learning for Maritime Motion Planning},author={Müller*, Marlon and Finkeldei*, Florian and Krasowski*, Hanna and Arcak, Murat and Althoff, Matthias},year={2026},journal={Ocean Engineering},}
Feedback and Filtering for Automated Translation of Biomedical Observations into Signal Temporal Logic using LLMs
Hanna Krasowski*, Lauren E. Malek*, Sanjit A. Seshia, and Murat Arcak
In Accepted at International Conference on Neuro-Symbolic Systems (NeuS), 2026
Biochemical processes within organisms are usually only partially observable, e.g., the progression of a viral infection or autoimmune diseases. To better treat and understand these diseases, models are developed by scientists. However, the assumptions and used knowledge for the models are often not explicit, and there is limited a posteriori validation with respect to the observations in the literature. To systematize the development and validation of biochemical models, we propose a Large Language Model (LLM)-based translation of natural language statements into Signal Temporal Logic (STL) specifications, guided by feedback and consolidation based on formal checking methods. This results in a small human-checkable set of syntactically-correct STL candidate specifications with high probability of semantic correctness. Specifically, we propose an STL grammar for biomedical observations and apply structured syntax checking alongside embedding-based cosine similarity to ensure syntactic validity and semantic alignment. Evaluating 69 sentences from 18 biomedical publications on COVID-19, Ebola, and measles, we find that our approach generates correct STL specifications for 87% of the sentences for the best-performing LLM GPT-4o.
@inproceedings{Krasowski2026a,author={Krasowski*, Hanna and Malek*, Lauren E. and Seshia, Sanjit A. and Arcak, Murat},title={Feedback and Filtering for Automated Translation of Biomedical Observations into Signal Temporal Logic using LLMs},year={2026},booktitle={Accepted at International Conference on Neuro-Symbolic Systems (NeuS)},}
Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?
Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, Lukas Schäfer, Sebastien Gros, and Matthias Althoff
Projection-based safety filters, which modify unsafe actions by mapping them to the closest safe alternative, are widely used to enforce safety constraints in reinforcement learning (RL). Two integration strategies are commonly considered: Safe environment RL (SE-RL), where the safeguard is treated as part of the environment, and safe policy RL (SP-RL), where it is embedded within the policy through differentiable optimization layers. Despite their practical relevance in safety-critical settings, a formal understanding of their differences is lacking. In this work, we present a theoretical comparison of SE-RL and SP-RL. We identify a key distinction in how each approach is affected by action aliasing, a phenomenon in which multiple unsafe actions are projected to the same safe action, causing information loss in the policy gradients. In SE-RL, this effect is implicitly approximated by the critic, while in SP-RL, it manifests directly as rank-deficient Jacobians during backpropagation through the safeguard. Our contributions are threefold: (i) a unified formalization of SE-RL and SP-RL in the context of actor-critic algorithms, (ii) a theoretical analysis of their respective policy gradient estimates, highlighting the role of action aliasing, and (iii) a comparative study of mitigation strategies, including a novel penalty-based improvement for SP-RL that aligns with established SE-RL practices. Empirical results support our theoretical predictions, showing that action aliasing is more detrimental for SP-RL than for SE-RL. However, with appropriate improvement strategies, SP-RL can match or outperform improved SE-RL across a range of environments. These findings provide actionable insights for choosing and refining projection-based safe RL methods based on task characteristics.
@article{Markgraf2026,title={Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?},author={Markgraf, Hannah and Sawant, Shambhuraj and Krasowski, Hanna and Schäfer, Lukas and Gros, Sebastien and Althoff, Matthias},year={2026},journal={Transactions on Machine Learning Research},archiveprefix={arXiv},}
We study the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks enables the decomposition of complex tasks into simpler sub-tasks that can be assigned to agents. However, existing approaches remain sample-inefficient and are limited to the single-task case. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify the main challenges to ACC-MARL’s feasibility in practice, propose solutions, and prove the correctness of our approach. We further show that the value functions of learned policies can be used to assign tasks optimally at test time. Experiments show emergent task-aware, multi-step coordination among agents, e.g., pressing a button to unlock a door, holding the door, and short-circuiting tasks.
@inproceedings{Yalcinkaya2026,title={Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning},author={Yalcinkaya, Beyazit and Vazquez-Chanlatte, Marcell and Shah, Ameesh and Krasowski, Hanna and Seshia, Sanjit A.},booktitle={Accepted at International Conference on Machine Learning (ICML)},archiveprefix={arXiv},year={2026},}
Learning to Drive by Imitating Surrounding Vehicles
Yasin Sonmez, Hanna Krasowski, and Murat Arcak
In Accepted at IEEE International Conference on Robotics and Automation (ICRA), 2026
Imitation learning is a promising approach for training autonomous vehicles (AV) to navigate complex traffic environments by mimicking expert driver behaviors. However, a major challenge in this paradigm lies in effectively utilizing available driving data, as collecting new data is resource-intensive and often limited in its ability to cover diverse driving scenarios. While existing imitation learning frameworks focus on leveraging expert demonstrations, they often overlook the potential of additional complex driving data from surrounding traffic participants. In this paper, we propose a data augmentation strategy that enhances imitation learning by leveraging the observed trajectories of nearby vehicles, captured through the AV’s sensors, as additional expert demonstrations. We introduce a vehicle selection sampling strategy that prioritizes informative and diverse driving behaviors, contributing to a richer and more diverse dataset for training. We evaluate our approach using the state-of-the-art learning-based planning method PLUTO on the nuPlan dataset and demonstrate that our augmentation method leads to improved performance in complex driving scenarios. Specifically, our method reduces collision rates and improves safety metrics compared to the baseline. Notably, even when using only 10% of the original dataset, our method achieves performance comparable to that of the full dataset, with improved collision rates. Our findings highlight the importance of leveraging diverse real-world trajectory data in imitation learning and provide insights into data augmentation strategies for autonomous driving.
@inproceedings{Sonmez2026,archiveprefix={arXiv},title={Learning to Drive by Imitating Surrounding Vehicles},author={Sonmez, Yasin and Krasowski, Hanna and Arcak, Murat},year={2026},booktitle={Accepted at IEEE International Conference on Robotics and Automation (ICRA)},}
Decentralized Safe Multi-Agent Reinforcement Learning via Predictive Shielding
Yacine El Yamani, Hanna Krasowski, and Elena Vanneaux
Environments are increasingly populated by multiple heterogeneous robots performing independent tasks with limited prior knowledge of each other. Deploying such multi agent systems presents significant challenges. Specifically, shifts in deployment states compared to training data can lead to poor policy performance and compromised safety. While safety shields exist to mitigate these risks, they are typically reactive, which degrades performance near unseen obstacles, and centralized, which limits their scalability. To address this, we propose a decentralized framework that integrates predictive shielding with model-based finite horizon Q-learning. This approach allows agents to safely adapt their pre-trained policies during deployment. Furthermore, to mitigate deadlocks in symmetric scenarios, we introduce a communication-free protocol for conflict resolution.
@inproceedings{ElYamani2026,title={Decentralized Safe Multi-Agent Reinforcement Learning via Predictive Shielding},author={Yamani, Yacine El and Krasowski, Hanna and Vanneaux, Elena},year={2026},booktitle={Accepted at IFAC World Congress},}
2025
Learning Biomolecular Models using Signal Temporal Logic
Hanna Krasowski, Eric Palanques-Tost, Calin Belta, and Murat Arcak
In Proc. of the Annual Learning for Dynamics & Control Conference (L4DC), 2025
Modeling dynamical biological systems is key for understanding, predicting, and controlling complex biological behaviors. Traditional methods for identifying governing equations, such as ordinary differential equations (ODEs), typically require extensive quantitative data, which is often scarce in biological systems due to experimental limitations. To address this challenge, we introduce an approach that determines biomolecular models from qualitative system behaviors expressed as Signal Temporal Logic (STL) statements, which are naturally suited to translate expert knowledge into computationally tractable specifications. Our method represents the biological network as a graph, where edges represent interactions between species, and uses a genetic algorithm to identify the graph. To infer the parameters of the ODEs modeling the interactions, we propose a gradient-based algorithm. On a numerical example, we evaluate two loss functions using STL robustness and analyze different initialization techniques to improve the convergence of the approach.
@inproceedings{Krasowski2025a,title={Learning Biomolecular Models using Signal Temporal Logic},booktitle={Proc. of the Annual Learning for Dynamics \& Control Conference (L4DC)},author={Krasowski, Hanna and Palanques-Tost, Eric and Belta, Calin and Arcak, Murat},year={2025},pages={1365--1377},archiveprefix={arXiv},}
Predictive Safety Shield for Dyna-Q Reinforcement Learning
Pin Jin, Hanna Krasowski, and Elena Vanneaux
In Proc. of the European Control Conference (ECC), 2025
Integrating safety guarantees into reinforcement learning is a major challenge to make this method applicable to real-world tasks. Safety shields extend standard reinforcement learning and achieve hard safety guarantees. However, existing safety shields use random sampling of safe actions or a fixed fallback controller, therefore disregarding future performance implications of different safe actions. In this work, we propose a predictive safety shield for model-based reinforcement learning agents in discrete space. Our safety shield updates the Q-function locally based on safe predictions, which originate from a safe simulation of the environment model. This shielding approach improves performance while maintaining hard safety guarantees. Our experiments on gridworld environments demonstrate that even short prediction horizons can be sufficient to identify the optimal path. We observe that our approach is robust to distribution shifts, e.g., between simulation and reality, without requiring additional training.
@inproceedings{Jin2025,title={Predictive Safety Shield for Dyna-Q Reinforcement Learning},booktitle={Proc. of the European Control Conference (ECC)},author={Jin, Pin and Krasowski, Hanna and Vanneaux, Elena},year={2025},archiveprefix={arXiv},doi={10.23919/ECC65951.2025.11186958},pages={2173--2179},}
CommonOcean-Sim: A Traffic Simulation Environment for Unmanned Surface Vessels
Hanna Krasowski, and Stefan Schärdinger
In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2025
Autonomous vessels have the potential to enhance maritime safety and mitigate environmental, economic, and injury risks. However, research in this domain remains limited, partly due to a lack of benchmarks and open-source tools tailored to maritime applications. The CommonOcean platform addresses this gap by providing software and traffic scenarios for motion planning research on unmanned surface vessels. In this paper, we introduce CommonOcean-Sim, a modular simulation environment for multi-agent maritime traffic. CommonOcean-Sim enables configurable simulation using real-world or handcrafted safety-critical maritime scenarios, arbitrary vessel types, and various controllers. Our simulation software allows users to select controllers that ensure reactive and traffic rule compliant navigation according to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). CommonOcean-Sim features a modular architecture that allows for seamless integration of custom control algorithms as well as extensive configurability and scalability to a multitude of traffic situations, including multi-vessel and high-density traffic. We demonstrate these capabilities on a variety of traffic scenarios, highlighting the potential of CommonOcean-Sim to facilitate research on unmanned surface vessels.
@inproceedings{Krasowski2025b,title={CommonOcean-Sim: A Traffic Simulation Environment for Unmanned Surface Vessels},author={Krasowski, Hanna and Schärdinger, Stefan},year={2025},booktitle={Proc. of the {IEEE} Int. Conf. on Intelligent Transportation Systems (ITSC)},pages={966--972},doi={10.1109/ITSC60802.2025.11423548},}
STL-based Optimization of Biomolecular Neural Networks for Regression and Control
Eric Palanques-Tost, Hanna Krasowski, Murat Arcak, Ron Weiss, and Calin Belta
In Proc. of the IEEE Conference on Decision and Control (CDC), 2025
Biomolecular Neural Networks (BNNs), artificial neural networks with biologically synthesizable architectures, achieve universal function approximations beyond simple biological circuits. However, training BNNs remains challenging due to the lack of target data. To address this, we propose leveraging Signal Temporal Logic (STL) specifications to define training objectives for BNNs. We build on the differentiable quantitative semantics of STL, enabling gradient-based optimization of the BNN weights, and introduce a learning algorithm that enables BNNs to perform regression and control tasks in biological systems. Specifically, we investigate two regression problems in which we train BNNs to act as reporters of dysregulated states, and a feedback control problem in which we train the BNN in closed loop with a chronic disease model, learning to reduce inflammation while avoiding adverse responses to external infections. Our numerical experiments demonstrate that STL-based learning can solve the investigated regression and control tasks efficiently.
@inproceedings{PalanquesTost2025,archiveprefix={arXiv},title={STL-based Optimization of Biomolecular Neural Networks for Regression and Control},author={Palanques-Tost, Eric and Krasowski, Hanna and Arcak, Murat and Weiss, Ron and Belta, Calin},year={2025},booktitle={Proc. of the {IEEE} Conference on Decision and Control (CDC)},doi={10.1109/CDC57313.2025.11312596},pages={3276--3281},}
Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underpowered Systems Operating in Uncertain Ocean Currents
Matthias Killer*, Marius Wiggert*, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux, and Claire J. Tomlin
Seaweed biomass presents a substantial opportunity for climate mitigation, yet to realize its potential, farming must be expanded to the vast open oceans. However, in the open ocean neither anchored farming nor floating farms with powerful engines are economically viable. Thus, a potential solution are farms that operate by going with the flow, utilizing minimal propulsion to strategically leverage beneficial ocean currents. In this work, we focus on low-power autonomous seaweed farms and design controllers that maximize seaweed growth by taking advantage of ocean currents. We first introduce a Dynamic Programming (DP) formulation to solve for the growth-optimal value function when the true currents are known. However, in reality only short-term imperfect forecasts with increasing uncertainty are available. Hence, we present three additional extensions. Firstly, we use frequent replanning to mitigate forecast errors. Second, to optimize for long-term growth, we extend the value function beyond the forecast horizon by estimating the expected future growth based on seasonal average currents. Lastly, we introduce a discounted finite-time DP formulation to account for the increasing uncertainty in future ocean current estimates. We empirically evaluate our approach with 30-day simulations of farms in realistic ocean conditions. Our method achieves 95.8% of the best possible growth using only 5-day forecasts. This demonstrates that low-power propulsion is a promising method to operate autonomous seaweed farms in real-world conditions.
@article{Killer2025,archiveprefix={arXiv},journal={IEEE Robotics and Automation Letters (RA-L)},volume={10},number={10},pages={10745--10752},title={Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underpowered Systems Operating in Uncertain Ocean Currents},author={Killer*, Matthias and Wiggert*, Marius and Krasowski, Hanna and Doshi, Manan and Lermusiaux, Pierre F.J. and Tomlin, Claire J.},year={2025},doi={10.1109/LRA.2025.3604727}}
Translating Biomedical Observations into Signal Temporal Logic with LLMs using Structured Feedback
Hanna Krasowski*, Lauren E. Malek*, Sanjit A. Seshia, and Murat Arcak
In NeurIPS 2025 Workshop on Biosecurity Safeguards for Generative AI, 2025
Biomedical literature contains valuable knowledge that can be used to validate or monitor machine learning models. To leverage this knowledge for machine learning, we propose an LLM-based approach that translates natural language statements into formal Signal Temporal Logic (STL) specifications, guided by semantic and syntactic feedback. To capture temporal and logical dependencies in biomedical sentences, we design an STL grammar and apply structured syntax checking alongside embedding-based cosine similarity to ensure syntactic validity and semantic alignment. Evaluating sentences from nine biomedical publications on COVID-19, we find that our approach generates semantically correct STL specifications, with GPT-4o achieving the strongest performance. The resulting specifications can be flexibly applied to monitor model outputs or incorporated into training objectives or constraints, enabling interpretable and specification-aware learning.
@inproceedings{Krasowski2025-workshop,title={Translating Biomedical Observations into Signal Temporal Logic with {LLM}s using Structured Feedback},author={Krasowski*, Hanna and Malek*, Lauren E. and Seshia, Sanjit A. and Arcak, Murat},booktitle={NeurIPS 2025 Workshop on Biosecurity Safeguards for Generative AI},year={2025},}
2024
Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea
For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.
@article{Krasowski2024.safeRLautonomousVessels,archiveprefix={arXiv},title={Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea},author={Krasowski, Hanna and Althoff, Matthias},year={2024},journal={IEEE Transactions on Intelligent Vehicles},volume={9},number={12},pages={7617--7634},doi={10.1109/TIV.2024.3400597},issn={2379-8904},}
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking
Roland Stolz*, Hanna Krasowski*, Jakob Thumm, Michael Eichelbeck, Philipp Gassert, and Matthias Althoff
In Proc. of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
@inproceedings{Stolz2024,archiveprefix={arXiv},title={Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking},author={Stolz*, Roland and Krasowski*, Hanna and Thumm, Jakob and Eichelbeck, Michael and Gassert, Philipp and Althoff, Matthias},year={2024},booktitle={Proc. of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},}
2023
Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes
Niklas Kochdumper*, Hanna Krasowski*, Xiao Wang*, Stanley Bak, and Matthias Althoff
While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
@article{Kochdumper2023.safeRLReachabilityAnalysis,author={Kochdumper*, Niklas and Krasowski*, Hanna and Wang*, Xiao and Bak, Stanley and Althoff, Matthias},journal={IEEE Open Journal of Control Systems},title={Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes},year={2023},volume={2},pages={79-92},doi={10.1109/OJCSYS.2023.3256305},}
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking
Hanna Krasowski*, Jakob Thumm*, Marlon Müller, Lukas Schäfer, Xiao Wang, and Matthias Althoff
Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
@article{Krasowski2023b.ProvablySafeRLSurvey,author={Krasowski*, Hanna and Thumm*, Jakob and Müller, Marlon and Schäfer, Lukas and Wang, Xiao and Althoff, Matthias},title={Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking},year={2023},journal={Transactions on Machine Learning Research},issn={2835-8856},}
Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces
Hanna Krasowski, Prithvi Akella, Aaron D. Ames, and Matthias Althoff
In Proc. of the IEEE Conference on Decision and Control (CDC), 2023
Vanilla Reinforcement Learning (RL) can efficiently solve complex tasks but does not provide any guarantees on system behavior. To bridge this gap, we propose a three-step safe RL procedure for continuous action spaces that provides probabilistic guarantees with respect to temporal logic specifications. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification while randomizing the control inputs to the system within a bounded set. Second, we improve the performance of this probabilistically verified controller by adding an RL agent that optimizes the verified controller for performance in the same bounded set around the control input. Third, we verify probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficiently implementable for continuous action and state spaces. The separation of safety verification and performance improvement into two distinct steps realizes both explicit probabilistic safety guarantees and a straightforward RL setup that focuses on performance. We evaluate our approach on an evasion task where a robot has to reach a goal while evading a dynamic obstacle with a specific maneuver. Our results show that our safe RL approach leads to efficient learning while maintaining its probabilistic safety specification.
@inproceedings{Krasowski2023a.probablisticSafeRL,author={Krasowski, Hanna and Akella, Prithvi and Ames, Aaron D. and Althoff, Matthias},title={Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces},booktitle={Proc. of the {IEEE} Conference on Decision and Control (CDC)},year={2023},doi={10.1109/CDC49753.2023.10383601},pages={4372--4378}}
Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers
Andreas Doering*, Marius Wiggert*, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux, and Claire J. Tomlin
In Proc. of the IEEE Conference on Decision and Control (CDC), 2023
Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because their propulsion is much smaller than the magnitude of surrounding currents, they might end up in currents that inevitably push them into unsafe areas such as shallow waters, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for passively floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. Hence, we demonstrate the safety of our approach empirically with large-scale realistic simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning as new forecasts become available reduces stranding below 1% despite forecast errors often exceeding the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination
@inproceedings{Doering2023.safetyUnderactuatedVessels,title={Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers},booktitle={Proc. of the {IEEE} Conference on Decision and Control (CDC)},author={Doering*, Andreas and Wiggert*, Marius and Krasowski, Hanna and Doshi, Manan and Lermusiaux, Pierre F.J. and Tomlin, Claire J.},year={2023},pages={7055--7060},doi={10.1109/CDC49753.2023.10383383}}
2022
CommonOcean: Composable Benchmarks for Motion Planning on Oceans
Hanna Krasowski, and Matthias Althoff
In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2022
Autonomous vessels can increase safety and reduce emissions compared to human-operated vessels. One important task for autonomous vessels is motion planning. Currently, there are no benchmarks for autonomous vessels to compare different motion planning methods. Thus, we introduce composable benchmarks for motion planning on oceans (CommonOcean), which is available at commonocean.cps.cit.tum.de. A CommonOcean benchmark consists of three elements: cost function, vessel model, and motion planning scenario. Benchmarks can be conveniently composed using unique identifiers for these elements, which are highly modular. CommonOcean is easy to use, because we provide meaningful parameters for vessel models, various motion planning scenarios, and comprehensive documentation. Furthermore, we developed a scenario generation tool, which allows one to effortlessly create new scenarios from marine traffic data. We believe that CommonOcean will lead to a better reproducibility and comparability of research on motion planning for vessels.
@inproceedings{Krasowski2022.CommonOcean,author={Krasowski, Hanna and Althoff, Matthias},booktitle={Proc. of the {IEEE} Int. Conf. on Intelligent Transportation Systems (ITSC)},doi={10.1109/ITSC55140.2022.9921925},pages={1676--1682},title={CommonOcean: Composable Benchmarks for Motion Planning on Oceans},year={2022},}
Safe Reinforcement Learning for Urban Driving using Invariably Safe Braking Sets
Hanna Krasowski*, Yinqiang Zhang*, and Matthias Althoff
In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2022
Deep reinforcement learning (RL) has been widely applied to motion planning problems of autonomous vehicles in urban traffic. However, traditional deep RL algorithms cannot ensure safe trajectories throughout training and deployment. We propose a provably safe RL algorithm for urban autonomous driving to address this. We add a novel safety layer to the RL process to verify the safety of high-level actions before they are performed. Our safety layer is based on invariably safe braking sets to constrain actions for safe lane changing and safe intersection crossing. We introduce a generalized discrete high-level action space, which can represent all high-level intersection driving maneuvers and various desired accelerations. Finally, we conducted extensive experiments on the inD dataset containing urban driving scenarios. Our analysis demonstrates that the safe agent never causes a collision and that the safety layer’s lane changing verification can even improve the goal-reaching performance compared to the unsafe baseline agent.
@inproceedings{Krasowski2022b.safeRLurbanDriving,author={Krasowski*, Hanna and Zhang*, Yinqiang and Althoff, Matthias},booktitle={Proc. of the {IEEE} Int. Conf. on Intelligent Transportation Systems (ITSC)},doi={10.1109/ITSC55140.2022.9922166},pages={2407--2414},title={Safe Reinforcement Learning for Urban Driving using Invariably Safe Braking Sets},year={2022},}
2021
CommonRoad-RL: A Configurable Reinforcement Learning Environment for Motion Planning of Autonomous Vehicles
Xiao Wang, Hanna Krasowski, and Matthias Althoff
In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2021
Reinforcement learning (RL) methods have gained popularity in the field of motion planning for autonomous vehicles due to their success in robotics and computer games. However, no existing work enables researchers to conveniently compare different underlying the Markov decision processes (MDPs). To address this issue, we present CommonRoad-RL-an open-source toolbox to train and evaluate RL-based motion planners for autonomous vehicles. Configurability, modularity, and stability of CommonRoad-RL simplify comparing different MDPs. This is demonstrated by comparing agents trained with different rewards, action spaces, and vehicle models on a real-world highway dataset. Our toolbox is available at commonroad.in.tum.de.
@inproceedings{Wang2021.CommonRoadRL,author={Wang, Xiao and Krasowski, Hanna and Althoff, Matthias},title={{CommonRoad-RL}: A Configurable Reinforcement Learning Environment for Motion Planning of Autonomous Vehicles},booktitle={Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC)},year={2021},pages={466--472},doi={10.1109/ITSC48978.2021.9564898},}
Temporal Logic Formalization of Marine Traffic Rules
Hanna Krasowski, and Matthias Althoff
In Proc. of the IEEE Intelligent Vehicles Symposium (IV), 2021
Autonomous vessels have to adhere to marine traffic rules to ensure traffic safety and reduce the liability of manufacturers. However, autonomous systems can only evaluate rule compliance if rules are formulated in a precise and mathematical way. This paper formalizes marine traffic rules from the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS) using temporal logic. In particular, the collision prevention rules between two power-driven vessels are delineated. The formulation is based on modular predicates and adjustable parameters. We evaluate the formalized rules in three US coastal areas for over 1,200 vessels using real marine traffic data.
@inproceedings{Krasowski2021.MarineTrafficRules,author={Krasowski, Hanna and Althoff, Matthias},title={Temporal Logic Formalization of Marine Traffic Rules},booktitle={Proc. of the IEEE Intelligent Vehicles Symposium (IV)},year={2021},pages={186--192},doi={10.1109/IV48863.2021.9575685},}
2020
Safe Reinforcement Learning for Autonomous Lane Changing Using Set-Based Prediction
Hanna Krasowski*, Xiao Wang*, and Matthias Althoff
In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2020
Machine learning approaches often lack safety guarantees, which are often a key requirement in real-world tasks. This paper addresses the lack of safety guarantees by extending reinforcement learning with a safety layer that restricts the action space to the subspace of safe actions.
We demonstrate the proposed approach using lane changing in autonomous driving.
To distinguish safe actions from unsafe ones, we compare planned motions with the set of possible occupancies of traffic participants generated by set-based predictions. In situations where no safe action exists, a verified fail-safe controller is executed. We used real-world highway traffic data to train and test the proposed approach. The evaluation result shows that the proposed approach trains agents that do not cause collisions during training and deployment.
@inproceedings{Krasowski2020.safeRLhighway,author={Krasowski*, Hanna and Wang*, Xiao and Althoff, Matthias},title={Safe Reinforcement Learning for Autonomous Lane Changing Using Set-Based Prediction},booktitle={Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC)},year={2020},pages={1--7},doi={10.1109/ITSC45102.2020.9294259},}