Abstract In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints.In particular, besides optimizing performance, it is crucial to guar- antee the safety of an agent during training as well as deployment (e.g., a robot should avoid taking actions - exploratory or not - which irrevocably harm its hard- ware).To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision processes (CMDPs), an extension of the standard Markov decision processes (MDPs) augmented with constraints on expected cu- mulative costs.Our approach hinges on a novel Lyapunov method.We define and present a method for constructing Lyapunov functions, which provide an ef- fective way to guarantee the global safety of a behavior policy during training via a set of local linear constraints.Leveraging these theoretical underpinnings, we show how to use the Lyapunov approach to systematically transform dynamic programming (DP) and RL algorithms into their safe counterparts.To illustrate their effectiveness, we evaluate these algorithms in several CMDP planning and decision-making tasks on a safety benchmark domain.Our results show that our proposed method significantly outperforms existing baselines in balancing con- straint satisfaction and performance.
S1.01 广义通量量子比特阵列作为低非谐性模拟量子模拟器 Ilan T. Rosen、Kasper Poulsen、Sarah Muschinske、William D. Oliver 赞助:IC 博士后奖学金 人们对超导量子比特阵列进行模拟量子模拟的兴趣日益浓厚,因为它们可以原生实现 Bose-Hubbard 汉密尔顿量,具有广泛的可访问能量尺度范围,并且能够进行全状态或部分状态断层扫描测量。然而,传统量子比特的大非谐性限制了量子比特阵列模拟器探索弱相互作用物理。广义通量量子比特 (GFQ) 具有与传统量子比特类似的相干性、控制和测量特性,但还具有可调的非谐性。在这里,我们提出使用超导广义通量量子比特 (GFQ) 阵列作为弱相互作用物理的模拟量子模拟器。我们讨论了基于器件制造的不确定性如何限制现实 GFQ 阵列中的无序与自能以及无序与非谐性之比。然后,我们用数字方法研究了凝聚态基准模型,重点介绍了现实 GFQ 阵列模拟器可实现的模式。
令人困惑,也许是难以理解的。例如,在指出过失的要素之后,原告指出:“由于被告的行为/行为,原告受到了私人第三方安全保养(SIC)手枪的种族概况,跟踪和骚扰,导致原告害怕他的直接安全和生命。
半自动驾驶汽车发生碰撞时谁负责?汽车制造商声称,由于高级驾驶辅助系统 (ADAS) 即使在自动驾驶功能处于活动状态时也需要不断进行人工监督,因此当监督自动驾驶功能失效时,驾驶员始终要负全部责任。本文认为,汽车制造商的立场在描述和规范上都可能是错误的。在描述方面,现行产品责任法提供了一条通往共同法律责任的途径。毕竟,汽车制造商已经开展了大量营销活动来赢得公众对自动化功能的信任。当驾驶员的信任被证明是错误的,驾驶员并不总是能够及时做出反应以重新控制汽车。在这种情况下,汽车制造商可能面临主要责任,或许会因驾驶员的比较过失而减轻责任。在规范方面,本文认为,现代半自动驾驶系统的性质要求人类和机器进行协同驾驶。人类驾驶员不应该对这种共同责任造成的损害承担全部责任。
Srpska共和国农业,水管理和林业部长Savo Minic先生,波斯尼亚和Herzegovina和Herzegovina Zeljko Budimir博士,Srpska,Bosnia和Bosnia和Herzegovina的高等教育和信息学会的科学技术发展和高等教育和信息学会的科学技术发展部长Zeljko Budimir博士东萨拉热窝大学,波斯尼亚和黑塞哥维那校长,贝尔格莱德大学农业学院院长杜桑·齐夫科维奇博士,塞尔维亚大学,塞尔维亚毛里齐奥·雷利博士,地中海nitiity an yilkey yilkey yilmaz,rcector themek rector themek rcecund selcuk rcecunc,rcecung themekio theekio raimaz rector in.俄罗斯州农业技术大学校长安德里夫(Andreev),俄罗斯教授Alexey Yu博士。Popov, Rector of the Voronezh State Agricultural University named after Peter The Great, Russia Prof. dr Zhang Jijian, President of Jiangsu University, People's Republic of China Prof. dr Barbara Hinterstoisser, Vice-Rector of the University of Natural Resources and Life Sciences (BOKU), Austria Prof. dr Sorin Mihai Cimpeanu, Rector of the University of Agronomic Sciences and布加勒斯特兽医医学,罗马尼亚教授Shinichi Yonekura教授,日本Shinshu大学副主席。。