FAC^2E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition
arxiv(2024)
摘要
Large language models (LLMs) are primarily evaluated by overall performance
on various text understanding and generation tasks. However, such a paradigm
fails to comprehensively differentiate the fine-grained language and cognitive
skills, rendering the lack of sufficient interpretation to LLMs' capabilities.
In this paper, we present FAC^2E, a framework for Fine-grAined and
Cognition-grounded LLMs' Capability Evaluation. Specifically, we formulate
LLMs' evaluation in a multi-dimensional and explainable manner by dissociating
the language-related capabilities and the cognition-related ones. Besides,
through extracting the intermediate reasoning from LLMs, we further break down
the process of applying a specific capability into three sub-steps: recalling
relevant knowledge, utilizing knowledge, and solving problems. Finally,
FAC^2E evaluates each sub-step of each fine-grained capability, providing a
two-faceted diagnosis for LLMs. Utilizing FAC^2E, we identify a common
shortfall in knowledge utilization among models and propose a straightforward,
knowledge-enhanced method to mitigate this issue. Our results not only showcase
promising performance enhancements but also highlight a direction for future
LLM advancements.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要