Publications | Hanna Krasowski

Here is the link to my Google Scholar.

* indicates equal contribution

Preprints

2025

Intelligent Sailing Model for Open Sea Navigation

Hanna Krasowski*, Stefan Schärdinger*, Murat Arcak, and Matthias Althoff

2025

Abs arXiv

Autonomous vessels potentially enhance safety and reliability of seaborne trade. To facilitate the development of autonomous vessels, high-fidelity simulations are required to model realistic interactions with other vessels. However, modeling realistic interactive maritime traffic is challenging due to the unstructured environment, coarsely specified traffic rules, and largely varying vessel types. Currently, there is no standard for simulating interactive maritime environments in order to rigorously benchmark autonomous vessel algorithms. In this paper, we introduce the first intelligent sailing model (ISM), which simulates rule-compliant vessels for navigation on the open sea. An ISM vessel reacts to other traffic participants according to maritime traffic rules while at the same time solving a motion planning task characterized by waypoints. In particular, the ISM monitors the applicable rules, generates rule-compliant waypoints accordingly, and utilizes a model predictive control for tracking the waypoints. We evaluate the ISM in two environments: interactive traffic with only ISM vessels and mixed traffic where some vessel trajectories are from recorded real-world maritime traffic data or handcrafted for criticality. Our results show that simulations with many ISM vessels of different vessel types are rule-compliant and scalable. We tested 4,049 critical traffic scenarios. For interactive traffic with ISM vessels, no collisions occurred while goal-reaching rates of about 97 percent were achieved. We believe that our ISM can serve as a standard for challenging and realistic maritime traffic simulation to accelerate autonomous vessel development.
Learning to Drive by Imitating Surrounding Vehicles

Yasin Sonmez, Hanna Krasowski, and Murat Arcak

2025

Abs arXiv

Imitation learning is a promising approach for training autonomous vehicles (AV) to navigate complex traffic environments by mimicking expert driver behaviors. However, a major challenge in this paradigm lies in effectively utilizing available driving data, as collecting new data is resource-intensive and often limited in its ability to cover diverse driving scenarios. While existing imitation learning frameworks focus on leveraging expert demonstrations, they often overlook the potential of additional complex driving data from surrounding traffic participants. In this paper, we propose a data augmentation strategy that enhances imitation learning by leveraging the observed trajectories of nearby vehicles, captured through the AV’s sensors, as additional expert demonstrations. We introduce a vehicle selection sampling strategy that prioritizes informative and diverse driving behaviors, contributing to a richer and more diverse dataset for training. We evaluate our approach using the state-of-the-art learning-based planning method PLUTO on the nuPlan dataset and demonstrate that our augmentation method leads to improved performance in complex driving scenarios. Specifically, our method reduces collision rates and improves safety metrics compared to the baseline. Notably, even when using only 10% of the original dataset, our method achieves performance comparable to that of the full dataset, with improved collision rates. Our findings highlight the importance of leveraging diverse real-world trajectory data in imitation learning and provide insights into data augmentation strategies for autonomous driving.
Falsification-Driven Reinforcement Learning for Maritime Motion Planning

Marlon Müller*, Florian Finkeldei*, Hanna Krasowski*, Murat Arcak, and Matthias Althoff

2025

Abs

The training scenarios for reinforcement learning (RL) play a critical role in shaping the capabilities of the trained agents. Especially when agents need to learn sophisticated behaviors, e.g., maneuvering according to specific traffic rules, generating appropriate scenarios is challenging, and collecting real-world data is often insufficient. In this paper, we propose a falsification-driven RL approach that leverages counterexamples generated through efficient sampling-based falsification to improve policy learning of complex temporal specifications. We apply our method to maritime traffic, where constructing diverse training scenarios is particularly difficult and complying with safety, i.e., complex signal temporal logic formalized traffic rules, is challenging. Our experiments on open-sea navigation with two vessels demonstrate that the proposed falsification-driven RL approach achieves more consistent rule compliance.
Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?

Hannah Markgraf, Shambhuraj Sawant, Hanna Krasowski, Lukas Schäfer, Sebastien Gros, and Matthias Althoff

2025

Abs arXiv

Projection-based safety filters, which modify unsafe actions by mapping them to the closest safe alternative, are widely used to enforce safety constraints in reinforcement learning (RL). Two integration strategies are commonly considered: Safe environment RL (SE-RL), where the safeguard is treated as part of the environment, and safe policy RL (SP-RL), where it is embedded within the policy through differentiable optimization layers. Despite their practical relevance in safety-critical settings, a formal understanding of their differences is lacking. In this work, we present a theoretical comparison of SE-RL and SP-RL. We identify a key distinction in how each approach is affected by action aliasing, a phenomenon in which multiple unsafe actions are projected to the same safe action, causing information loss in the policy gradients. In SE-RL, this effect is implicitly approximated by the critic, while in SP-RL, it manifests directly as rank-deficient Jacobians during backpropagation through the safeguard. Our contributions are threefold: (i) a unified formalization of SE-RL and SP-RL in the context of actor-critic algorithms, (ii) a theoretical analysis of their respective policy gradient estimates, highlighting the role of action aliasing, and (iii) a comparative study of mitigation strategies, including a novel penalty-based improvement for SP-RL that aligns with established SE-RL practices. Empirical results support our theoretical predictions, showing that action aliasing is more detrimental for SP-RL than for SE-RL. However, with appropriate improvement strategies, SP-RL can match or outperform improved SE-RL across a range of environments. These findings provide actionable insights for choosing and refining projection-based safe RL methods based on task characteristics.

Published

2025

Learning Biomolecular Models using Signal Temporal Logic

Hanna Krasowski, Eric Palanques-Tost, Calin Belta, and Murat Arcak

In Proc. of the Annual Learning for Dynamics & Control Conference (L4DC), 2025

Abs arXiv Bib PDF

Modeling dynamical biological systems is key for understanding, predicting, and controlling complex biological behaviors. Traditional methods for identifying governing equations, such as ordinary differential equations (ODEs), typically require extensive quantitative data, which is often scarce in biological systems due to experimental limitations. To address this challenge, we introduce an approach that determines biomolecular models from qualitative system behaviors expressed as Signal Temporal Logic (STL) statements, which are naturally suited to translate expert knowledge into computationally tractable specifications. Our method represents the biological network as a graph, where edges represent interactions between species, and uses a genetic algorithm to identify the graph. To infer the parameters of the ODEs modeling the interactions, we propose a gradient-based algorithm. On a numerical example, we evaluate two loss functions using STL robustness and analyze different initialization techniques to improve the convergence of the approach.
@inproceedings{Krasowski2025a, title = {Learning Biomolecular Models using Signal Temporal Logic}, booktitle = {Proc. of the Annual Learning for Dynamics \& Control Conference (L4DC)}, author = {Krasowski, Hanna and Palanques-Tost, Eric and Belta, Calin and Arcak, Murat}, year = {2025}, pages = {1365--1377}, archiveprefix = {arXiv}, }
Predictive Safety Shield for Dyna-Q Reinforcement Learning

Pin Jin, Hanna Krasowski, and Elena Vanneaux

In Proc. of the European Control Conference (ECC), 2025

Abs Bib PDF

Integrating safety guarantees into reinforcement learning is a major challenge to make this method applicable to real-world tasks. Safety shields extend standard reinforcement learning and achieve hard safety guarantees. However, existing safety shields use random sampling of safe actions or a fixed fallback controller, therefore disregarding future performance implications of different safe actions. In this work, we propose a predictive safety shield for model-based reinforcement learning agents in discrete space. Our safety shield updates the Q-function locally based on safe predictions, which originate from a safe simulation of the environment model. This shielding approach improves performance while maintaining hard safety guarantees. Our experiments on gridworld environments demonstrate that even short prediction horizons can be sufficient to identify the optimal path. We observe that our approach is robust to distribution shifts, e.g., between simulation and reality, without requiring additional training.
@inproceedings{Jin2025, title = {Predictive Safety Shield for Dyna-Q Reinforcement Learning}, booktitle = {Proc. of the European Control Conference (ECC)}, author = {Jin, Pin and Krasowski, Hanna and Vanneaux, Elena}, year = {2025}, }
CommonOcean-Sim: A Traffic Simulation Environment for Unmanned Surface Vessels

Hanna Krasowski, and Stefan Schärdinger

In Accepted for Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2025

Abs Bib

Autonomous vessels have the potential to enhance maritime safety and mitigate environmental, economic, and injury risks. However, research in this domain remains limited, partly due to a lack of benchmarks and open-source tools tailored to maritime applications. The CommonOcean platform addresses this gap by providing software and traffic scenarios for motion planning research on unmanned surface vessels. In this paper, we introduce CommonOcean-Sim, a modular simulation environment for multi-agent maritime traffic. CommonOcean-Sim enables configurable simulation using real-world or handcrafted safety-critical maritime scenarios, arbitrary vessel types, and various controllers. Our simulation software allows users to select controllers that ensure reactive and traffic rule compliant navigation according to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). CommonOcean-Sim features a modular architecture that allows for seamless integration of custom control algorithms as well as extensive configurability and scalability to a multitude of traffic situations, including multi-vessel and high-density traffic. We demonstrate these capabilities on a variety of traffic scenarios, highlighting the potential of CommonOcean-Sim to facilitate research on unmanned surface vessels
@inproceedings{Krasowski2025b, title = {CommonOcean-Sim: A Traffic Simulation Environment for Unmanned Surface Vessels}, author = {Krasowski, Hanna and Schärdinger, Stefan}, year = {2025}, booktitle = {Accepted for Proc. of the {IEEE} Int. Conf. on Intelligent Transportation Systems (ITSC)}, }
STL-based Optimization of Biomolecular Neural Networks for Regression and Control

Eric Palanques-Tost, Hanna Krasowski, Murat Arcak, Ron Weiss, and Calin Belta

In Accepted for Proc. of the IEEE Conference on Decision and Control (CDC), 2025

Abs arXiv Bib

Biomolecular Neural Networks (BNNs), artificial neural networks with biologically synthesizable architectures, achieve universal function approximations beyond simple biological circuits. However, training BNNs remains challenging due to the lack of target data. To address this, we propose leveraging Signal Temporal Logic (STL) specifications to define training objectives for BNNs. We build on the differentiable quantitative semantics of STL, enabling gradient-based optimization of the BNN weights, and introduce a learning algorithm that enables BNNs to perform regression and control tasks in biological systems. Specifically, we investigate two regression problems in which we train BNNs to act as reporters of dysregulated states, and a feedback control problem in which we train the BNN in closed loop with a chronic disease model, learning to reduce inflammation while avoiding adverse responses to external infections. Our numerical experiments demonstrate that STL-based learning can solve the investigated regression and control tasks efficiently.
@inproceedings{PalanquesTost2025, archiveprefix = {arXiv}, title = {STL-based Optimization of Biomolecular Neural Networks for Regression and Control}, author = {Palanques-Tost, Eric and Krasowski, Hanna and Arcak, Murat and Weiss, Ron and Belta, Calin}, year = {2025}, booktitle = {Accepted for Proc. of the {IEEE} Conference on Decision and Control (CDC)}, }
Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underpowered Systems Operating in Uncertain Ocean Currents

Matthias Killer*, Marius Wiggert*, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux, and Claire J. Tomlin

Early Access at IEEE Robotics and Automation Letters (RA-L), 2025

Abs arXiv Bib

Seaweed biomass presents a substantial opportunity for climate mitigation, yet to realize its potential, farming must be expanded to the vast open oceans. However, in the open ocean neither anchored farming nor floating farms with powerful engines are economically viable. Thus, a potential solution are farms that operate by going with the flow, utilizing minimal propulsion to strategically leverage beneficial ocean currents. In this work, we focus on low-power autonomous seaweed farms and design controllers that maximize seaweed growth by taking advantage of ocean currents. We first introduce a Dynamic Programming (DP) formulation to solve for the growth-optimal value function when the true currents are known. However, in reality only short-term imperfect forecasts with increasing uncertainty are available. Hence, we present three additional extensions. Firstly, we use frequent replanning to mitigate forecast errors. Second, to optimize for long-term growth, we extend the value function beyond the forecast horizon by estimating the expected future growth based on seasonal average currents. Lastly, we introduce a discounted finite-time DP formulation to account for the increasing uncertainty in future ocean current estimates. We empirically evaluate our approach with 30-day simulations of farms in realistic ocean conditions. Our method achieves 95.8% of the best possible growth using only 5-day forecasts. This demonstrates that low-power propulsion is a promising method to operate autonomous seaweed farms in real-world conditions
@article{Killer2025, archiveprefix = {arXiv}, journal = {Early Access at IEEE Robotics and Automation Letters (RA-L)}, title = {Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underpowered Systems Operating in Uncertain Ocean Currents}, author = {Killer*, Matthias and Wiggert*, Marius and Krasowski, Hanna and Doshi, Manan and Lermusiaux, Pierre F.J. and Tomlin, Claire J.}, year = {2025}, doi = {10.1109/LRA.2025.3604727} }

2024

Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea

Hanna Krasowski, and Matthias Althoff

IEEE Transactions on Intelligent Vehicles, 2024

Abs arXiv Bib PDF

For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.
@article{Krasowski2024.safeRLautonomousVessels, archiveprefix = {arXiv}, title = {Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open Sea}, author = {Krasowski, Hanna and Althoff, Matthias}, year = {2024}, journal = {IEEE Transactions on Intelligent Vehicles}, volume = {9}, number = {12}, pages = {7617--7634}, doi = {10.1109/TIV.2024.3400597}, issn = {2379-8904}, }
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking

Roland Stolz*, Hanna Krasowski*, Jakob Thumm, Michael Eichelbeck, Philipp Gassert, and Matthias Althoff

In Proc. of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

Abs arXiv Bib PDF Code Video

Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
@inproceedings{Stolz2024, archiveprefix = {arXiv}, title = {Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking}, author = {Stolz*, Roland and Krasowski*, Hanna and Thumm, Jakob and Eichelbeck, Michael and Gassert, Philipp and Althoff, Matthias}, year = {2024}, booktitle = {Proc. of the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, }

2023

Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes

Niklas Kochdumper*, Hanna Krasowski*, Xiao Wang*, Stanley Bak, and Matthias Althoff

IEEE Open Journal of Control Systems, 2023

Abs arXiv Bib PDF Code Video

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
@article{Kochdumper2023.safeRLReachabilityAnalysis, author = {Kochdumper*, Niklas and Krasowski*, Hanna and Wang*, Xiao and Bak, Stanley and Althoff, Matthias}, journal = {IEEE Open Journal of Control Systems}, title = {Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes}, year = {2023}, volume = {2}, pages = {79-92}, doi = {10.1109/OJCSYS.2023.3256305}, }
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking

Hanna Krasowski*, Jakob Thumm*, Marlon Müller, Lukas Schäfer, Xiao Wang, and Matthias Althoff

Transactions on Machine Learning Research, 2023

Abs arXiv Bib PDF Code

Ensuring the safety of reinforcement learning (RL) algorithms is crucial to unlock their potential for many real-world tasks. However, vanilla RL and most safe RL approaches do not guarantee safety. In recent years, several methods have been proposed to provide hard safety guarantees for RL, which is essential for applications where unsafe actions could have disastrous consequences. Nevertheless, there is no comprehensive comparison of these provably safe RL methods. Therefore, we introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods. We categorize the methods based on how they adapt the action: action replacement, action projection, and action masking. Our experiments on an inverted pendulum and a quadrotor stabilization task indicate that action replacement is the best-performing approach for these applications despite its comparatively simple realization. Furthermore, adding a reward penalty, every time the safety verification is engaged, improved training performance in our experiments. Finally, we provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
@article{Krasowski2023b.ProvablySafeRLSurvey, author = {Krasowski*, Hanna and Thumm*, Jakob and Müller, Marlon and Schäfer, Lukas and Wang, Xiao and Althoff, Matthias}, title = {Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking}, year = {2023}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, }
Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces

Hanna Krasowski, Prithvi Akella, Aaron D. Ames, and Matthias Althoff

In Proc. of the IEEE Conference on Decision and Control (CDC), 2023

Abs arXiv Bib PDF Video

Vanilla Reinforcement Learning (RL) can efficiently solve complex tasks but does not provide any guarantees on system behavior. To bridge this gap, we propose a three-step safe RL procedure for continuous action spaces that provides probabilistic guarantees with respect to temporal logic specifications. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification while randomizing the control inputs to the system within a bounded set. Second, we improve the performance of this probabilistically verified controller by adding an RL agent that optimizes the verified controller for performance in the same bounded set around the control input. Third, we verify probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficiently implementable for continuous action and state spaces. The separation of safety verification and performance improvement into two distinct steps realizes both explicit probabilistic safety guarantees and a straightforward RL setup that focuses on performance. We evaluate our approach on an evasion task where a robot has to reach a goal while evading a dynamic obstacle with a specific maneuver. Our results show that our safe RL approach leads to efficient learning while maintaining its probabilistic safety specification.
@inproceedings{Krasowski2023a.probablisticSafeRL, author = {Krasowski, Hanna and Akella, Prithvi and Ames, Aaron D. and Althoff, Matthias}, title = {Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces}, booktitle = {Proc. of the {IEEE} Conference on Decision and Control (CDC)}, year = {2023} }
Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers

Andreas Doering*, Marius Wiggert*, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux, and Claire J. Tomlin

In Proc. of the IEEE Conference on Decision and Control (CDC), 2023

Abs arXiv Bib

Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because their propulsion is much smaller than the magnitude of surrounding currents, they might end up in currents that inevitably push them into unsafe areas such as shallow waters, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for passively floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. Hence, we demonstrate the safety of our approach empirically with large-scale realistic simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning as new forecasts become available reduces stranding below 1% despite forecast errors often exceeding the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination
@inproceedings{Doering2023.safetyUnderactuatedVessels, title = {Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers}, booktitle = {Proc. of the {IEEE} Conference on Decision and Control (CDC)}, author = {Doering*, Andreas and Wiggert*, Marius and Krasowski, Hanna and Doshi, Manan and Lermusiaux, Pierre F.J. and Tomlin, Claire J.}, year = {2023} }

2022

CommonOcean: Composable Benchmarks for Motion Planning on Oceans

Hanna Krasowski, and Matthias Althoff

In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2022

Abs Bib PDF Code Video

Autonomous vessels can increase safety and reduce emissions compared to human-operated vessels. One important task for autonomous vessels is motion planning. Currently, there are no benchmarks for autonomous vessels to compare different motion planning methods. Thus, we introduce composable benchmarks for motion planning on oceans (CommonOcean), which is available at commonocean.cps.cit.tum.de. A CommonOcean benchmark consists of three elements: cost function, vessel model, and motion planning scenario. Benchmarks can be conveniently composed using unique identifiers for these elements, which are highly modular. CommonOcean is easy to use, because we provide meaningful parameters for vessel models, various motion planning scenarios, and comprehensive documentation. Furthermore, we developed a scenario generation tool, which allows one to effortlessly create new scenarios from marine traffic data. We believe that CommonOcean will lead to a better reproducibility and comparability of research on motion planning for vessels.
@inproceedings{Krasowski2022.CommonOcean, author = {Krasowski, Hanna and Althoff, Matthias}, booktitle = {Proc. of the {IEEE} Int. Conf. on Intelligent Transportation Systems (ITSC)}, doi = {10.1109/ITSC55140.2022.9921925}, pages = {1676--1682}, title = {CommonOcean: Composable Benchmarks for Motion Planning on Oceans}, year = {2022}, }
Safe Reinforcement Learning for Urban Driving using Invariably Safe Braking Sets

Hanna Krasowski*, Yinqiang Zhang*, and Matthias Althoff

In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2022

Abs Bib PDF Video

Deep reinforcement learning (RL) has been widely applied to motion planning problems of autonomous vehicles in urban traffic. However, traditional deep RL algorithms cannot ensure safe trajectories throughout training and deployment. We propose a provably safe RL algorithm for urban autonomous driving to address this. We add a novel safety layer to the RL process to verify the safety of high-level actions before they are performed. Our safety layer is based on invariably safe braking sets to constrain actions for safe lane changing and safe intersection crossing. We introduce a generalized discrete high-level action space, which can represent all high-level intersection driving maneuvers and various desired accelerations. Finally, we conducted extensive experiments on the inD dataset containing urban driving scenarios. Our analysis demonstrates that the safe agent never causes a collision and that the safety layer’s lane changing verification can even improve the goal-reaching performance compared to the unsafe baseline agent.
@inproceedings{Krasowski2022b.safeRLurbanDriving, author = {Krasowski*, Hanna and Zhang*, Yinqiang and Althoff, Matthias}, booktitle = {Proc. of the {IEEE} Int. Conf. on Intelligent Transportation Systems (ITSC)}, doi = {10.1109/ITSC55140.2022.9922166}, pages = {2407--2414}, title = {Safe Reinforcement Learning for Urban Driving using Invariably Safe Braking Sets}, year = {2022}, }

2021

CommonRoad-RL: A Configurable Reinforcement Learning Environment for Motion Planning of Autonomous Vehicles

Xiao Wang, Hanna Krasowski, and Matthias Althoff

In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2021

Abs Bib PDF Code Video

Reinforcement learning (RL) methods have gained popularity in the field of motion planning for autonomous vehicles due to their success in robotics and computer games. However, no existing work enables researchers to conveniently compare different underlying the Markov decision processes (MDPs). To address this issue, we present CommonRoad-RL-an open-source toolbox to train and evaluate RL-based motion planners for autonomous vehicles. Configurability, modularity, and stability of CommonRoad-RL simplify comparing different MDPs. This is demonstrated by comparing agents trained with different rewards, action spaces, and vehicle models on a real-world highway dataset. Our toolbox is available at commonroad.in.tum.de.
@inproceedings{Wang2021.CommonRoadRL, author = {Wang, Xiao and Krasowski, Hanna and Althoff, Matthias}, title = {{CommonRoad-RL}: A Configurable Reinforcement Learning Environment for Motion Planning of Autonomous Vehicles}, booktitle = {Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC)}, year = {2021}, pages = {466--472}, doi = {10.1109/ITSC48978.2021.9564898}, }
Temporal Logic Formalization of Marine Traffic Rules

Hanna Krasowski, and Matthias Althoff

In Proc. of the IEEE Intelligent Vehicles Symposium (IV), 2021

Abs Bib PDF Code Video

Autonomous vessels have to adhere to marine traffic rules to ensure traffic safety and reduce the liability of manufacturers. However, autonomous systems can only evaluate rule compliance if rules are formulated in a precise and mathematical way. This paper formalizes marine traffic rules from the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS) using temporal logic. In particular, the collision prevention rules between two power-driven vessels are delineated. The formulation is based on modular predicates and adjustable parameters. We evaluate the formalized rules in three US coastal areas for over 1,200 vessels using real marine traffic data.
@inproceedings{Krasowski2021.MarineTrafficRules, author = {Krasowski, Hanna and Althoff, Matthias}, title = {Temporal Logic Formalization of Marine Traffic Rules}, booktitle = {Proc. of the IEEE Intelligent Vehicles Symposium (IV)}, year = {2021}, pages = {186--192}, doi = {10.1109/IV48863.2021.9575685}, }

2020

Safe Reinforcement Learning for Autonomous Lane Changing Using Set-Based Prediction

Hanna Krasowski*, Xiao Wang*, and Matthias Althoff

In Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC), 2020

Abs Bib PDF Video

Machine learning approaches often lack safety guarantees, which are often a key requirement in real-world tasks. This paper addresses the lack of safety guarantees by extending reinforcement learning with a safety layer that restricts the action space to the subspace of safe actions. We demonstrate the proposed approach using lane changing in autonomous driving. To distinguish safe actions from unsafe ones, we compare planned motions with the set of possible occupancies of traffic participants generated by set-based predictions. In situations where no safe action exists, a verified fail-safe controller is executed. We used real-world highway traffic data to train and test the proposed approach. The evaluation result shows that the proposed approach trains agents that do not cause collisions during training and deployment.
@inproceedings{Krasowski2020.safeRLhighway, author = {Krasowski*, Hanna and Wang*, Xiao and Althoff, Matthias}, title = {Safe Reinforcement Learning for Autonomous Lane Changing Using Set-Based Prediction}, booktitle = {Proc. of the IEEE Int. Conf. on Intelligent Transportation Systems (ITSC)}, year = {2020}, pages = {1--7}, doi = {10.1109/ITSC45102.2020.9294259}, }