Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Trans Mach Learn Res(2024)

引用 0|浏览80
暂无评分
摘要
Fine-tuning language models (LMs) on human-generated data remains a prevalentpractice. However, the performance of such models is often limited by thequantity and diversity of high-quality human data. In this paper, we explorewhether we can go beyond human data on tasks where we have access to scalarfeedback, for example, on math problems where one can verify correctness. To doso, we investigate a simple self-training method based onexpectation-maximization, which we call ReST^EM, where we (1) generatesamples from the model and filter them using binary feedback, (2) fine-tune themodel on these samples, and (3) repeat this process a few times. Testing onadvanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we findthat ReST^EM scales favorably with model size and significantly surpassesfine-tuning only on human data. Overall, our findings suggest self-trainingwith feedback can substantially reduce dependence on human-generated data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要