Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity
ICLR(2024)
摘要
Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are bothpowerful tools for making decisions in the presence of uncertainties. Previousefforts have aimed to establish their connections, revealing equivalences inspecific formulations. This paper introduces a new formulation forrisk-sensitive MDPs, which assesses risk in a slightly different mannercompared to the classical Markov risk measure (Ruszczyński 2010), andestablishes its equivalence with a class of soft robust MDP (RMDP) problems,including the standard RMDP as a special case. Leveraging this equivalence, wefurther derive the policy gradient theorem for both problems, proving gradientdomination and global convergence of the exact policy gradient method under thetabular setting with direct parameterization. This forms a sharp contrast tothe Markov risk measure, known to be potentially non-gradient-dominant (Huanget al. 2021). We also propose a sample-based offline learning algorithm, namelythe robust fitted-Z iteration (RFZI), for a specific soft RMDP problem with aKL-divergence regularization term (or equivalently the risk-sensitive MDP withan entropy risk measure). We showcase its streamlined design and less stringentassumptions due to the equivalence and analyze its sample complexity
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要