Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1 Long Papers)(2024)

引用 0|浏览41
暂无评分
摘要
Recent works have shown the benefits to LLMs from fine-tuning golden-standardChain-of-Thought (CoT) rationales or using them as correct examples in few-shotprompting. While humans can indeed imitate correct examples, learning from ourmistakes is another vital aspect of human cognition. Hence, a questionnaturally arises: can LLMs learn and benefit from their mistakes,especially for their reasoning? This study investigates this problem fromboth the prompting and model-tuning perspectives. We begin by introducingCoTErrorSet, a new benchmark with 609,432 questions, each designedwith both correct and error references, and demonstrating the types and reasonsfor making such mistakes. To explore the effectiveness of those mistakes, wedesign two methods: (1) Self-rethinking prompting guides LLMs torethink whether they have made similar previous mistakes; and (2)Mistake tuning involves finetuning models in both correct andincorrect reasoning domains, rather than only tuning models to learn groundtruth in traditional methodology. We conduct a series of experiments to proveLLMs can obtain benefits from mistakes in both directions. Our two methodsoffer potentially cost-effective strategies by leveraging errors to enhancereasoning capabilities, which costs significantly less than creatingmeticulously hand-crafted golden references. We ultimately make a thoroughanalysis of the reasons behind LLMs' errors, which provides directions thatfuture research needs to overcome. CoTErrorSet will be published soonon .
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要