Self-Alignment with Instruction Backtranslation

Xian Li,Ping Yu,Chunting Zhou,Timo Schick,Omer Levy,Luke Zettlemoyer,Jason E Weston,Mike Lewis

ICLR 2024（2024）

引用 127|浏览1304

暂无评分

摘要

We present a scalable method to build a high quality instruction followinglanguage model by automatically labelling human-written text with correspondinginstructions. Our approach, named instruction backtranslation, starts with alanguage model finetuned on a small amount of seed data, and a given webcorpus. The seed model is used to construct training examples by generatinginstruction prompts for web documents (self-augmentation), and then selectinghigh quality examples from among these candidates (self-curation). This data isthen used to finetune a stronger model. Finetuning LLaMa on two iterations ofour approach yields a model that outperforms all other LLaMa-based models onthe Alpaca leaderboard not relying on distillation data, demonstrating highlyeffective self-alignment.

查看译文

关键词

large language models,self-supervised learning,data augmentation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要