Loading...
机构名称:
¥ 1.0

Abstract In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.In particular, besides optimizing performance, it is crucial to guar- antee the safety of an agent during training as well as deployment (e.g., a robot should avoid taking actions - exploratory or not - which irrevocably harm its hard- ware).To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision processes (CMDPs), an extension of the standard Markov decision processes (MDPs) augmented with constraints on expected cu- mulative costs.Our approach hinges on a novel Lyapunov method.We define and present a method for constructing Lyapunov functions, which provide an ef- fective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints.Leveraging these theoretical underpinnings, we show how to use the Lyapunov approach to systematically transform dynamic programming (DP) and RL algorithms into their safe counterparts.To illustrate their effectiveness, we evaluate these algorithms in several CMDP planning and decision-making tasks on a safety benchmark domain.Our results show that our proposed method significantly outperforms existing baselines in balancing con- straint satisfaction and performance.

基于a-lyapunov的诉讼 - 诉讼 - > ...

基于a-lyapunov的诉讼 - 诉讼 - > ...PDF文件第1页

基于a-lyapunov的诉讼 - 诉讼 - > ...PDF文件第2页

基于a-lyapunov的诉讼 - 诉讼 - > ...PDF文件第3页

基于a-lyapunov的诉讼 - 诉讼 - > ...PDF文件第4页

基于a-lyapunov的诉讼 - 诉讼 - > ...PDF文件第5页

相关文件推荐

2024 年
¥1.0
2024 年
¥1.0
2018 年

...

¥1.0
2025 年

...

¥1.0
2023 年

...

¥1.0
2024 年
¥1.0
2025 年
¥1.0
2023 年
¥1.0
2024 年
¥1.0
2023 年
¥2.0
2024 年
¥18.0
2024 年
¥1.0
2024 年

...

¥3.0
2024 年
¥1.0
2024 年
¥1.0
2025 年

...

¥1.0
2025 年
¥1.0
2024 年

...

¥2.0
1900 年
¥1.0
1900 年
¥1.0
2024 年

...

¥1.0
2024 年
¥1.0
2024 年
¥1.0
2024 年
¥1.0
2025 年
¥1.0
2024 年
¥3.0
2024 年

...

¥3.0