Quality Diversity Through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
ICML(2024)
Abstract
Reinforcement Learning from Human Feedback (RLHF) has shown potential inqualitative tasks where easily defined performance measures are lacking.However, there are drawbacks when RLHF is commonly used to optimize for averagehuman preferences, especially in generative tasks that demand diverse modelresponses. Meanwhile, Quality Diversity (QD) algorithms excel at identifyingdiverse and high-quality solutions but often rely on manually crafted diversitymetrics. This paper introduces Quality Diversity through Human Feedback (QDHF),a novel approach that progressively infers diversity metrics from humanjudgments of similarity among solutions, thereby enhancing the applicabilityand effectiveness of QD algorithms in complex and open-ended domains. Empiricalstudies show that QDHF significantly outperforms state-of-the-art methods inautomatic diversity discovery and matches the efficacy of QD with manuallycrafted diversity metrics on standard benchmarks in robotics and reinforcementlearning. Notably, in open-ended generative tasks, QDHF substantially enhancesthe diversity of text-to-image generation from a diffusion model and is morefavorably received in user studies. We conclude by analyzing QDHF'sscalability, robustness, and quality of derived diversity metrics, emphasizingits strength in open-ended optimization tasks. Code and tutorials are availableat https://liding.info/qdhf.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined