Generalized Preference Optimization: A Unified Approach to Offline Alignment
ICML 2024(2024)
Google DeepMind | DeepMind | Google DeepMind Inria MVA
Abstract
Offline preference optimization allows fine-tuning large models directly fromoffline data, and has proved effective in recent alignment practices. Wepropose generalized preference optimization (GPO), a family of offline lossesparameterized by a general class of convex functions. GPO enables a unifiedview over preference optimization, encompassing existing algorithms such asDPO, IPO and SLiC as special cases, while naturally introducing new variants.The GPO framework also sheds light on how offline algorithms enforceregularization, through the design of the convex function that defines theloss. Our analysis and experiments reveal the connections and subtledifferences between the offline regularization and the KL divergenceregularization intended by the canonical RLHF formulation. In all, our resultspresent new algorithmic toolkits and empirical insights to alignmentpractitioners.
MoreTranslated text
Key words
Constraint Optimization,Algorithm Selection,Distributed Algorithms,Group Decision Making
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Related Papers
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
去 AI 文献库 对话