Chrome Extension
WeChat Mini Program
Use on ChatGLM

A Distributional Analogue to the Successor Representation

ICLR 2024(2024)

Research Scientist Intern | PhD student | Gatsby Unit and Google Deepmind | Research Scientist | Adjunct Professor

Cited 4|Views50
Abstract
This paper contributes a new approach for distributional reinforcementlearning which elucidates a clean separation of transition structure and rewardin the learning process. Analogous to how the successor representation (SR)describes the expected consequences of behaving according to a given policy,our distributional successor measure (SM) describes the distributionalconsequences of this behaviour. We formulate the distributional SM as adistribution over distributions and provide theory connecting it withdistributional and model-based reinforcement learning. Moreover, we propose analgorithm that learns the distributional SM from data by minimizing a two-levelmaximum mean discrepancy. Key to our method are a number of algorithmictechniques that are independently valuable for learning generative models ofstate. As an illustration of the usefulness of the distributional SM, we showthat it enables zero-shot risk-sensitive policy evaluation in a way that wasnot previously possible.
More
Translated text
Key words
reinforcement learning,distributional reinforcement learning,successor representation,successor measure,geometric horizon models,gamma models,risk-aware
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本文提出了一种新的分布式强化学习方法,即分布式继任表示,清晰地分离了状态转移结构和奖励,在数据上通过最小化双层最大平均偏差来学习分布式的继任测度。

方法】:本文通过最小化双层最大平均偏差来从数据中学习分布式的继任测度。

实验】:作者使用了一种算法技术来学习生成状态的生成模型,通过实例展示了分布式的继任测度如何实现零样本风险敏感性政策评估。