Self-Alignment with Instruction Backtranslation

ICLR 2024(2024)

引用 127|浏览1304
暂无评分
摘要
We present a scalable method to build a high quality instruction followinglanguage model by automatically labelling human-written text with correspondinginstructions. Our approach, named instruction backtranslation, starts with alanguage model finetuned on a small amount of seed data, and a given webcorpus. The seed model is used to construct training examples by generatinginstruction prompts for web documents (self-augmentation), and then selectinghigh quality examples from among these candidates (self-curation). This data isthen used to finetune a stronger model. Finetuning LLaMa on two iterations ofour approach yields a model that outperforms all other LLaMa-based models onthe Alpaca leaderboard not relying on distillation data, demonstrating highlyeffective self-alignment.
更多
查看译文
关键词
large language models,self-supervised learning,data augmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要