Toward Infinite-Long Prefix in Transformer
CoRR(2024)
摘要
Prompting and contextual-based fine-tuning methods, which we call Prefix
Learning, have been proposed to enhance the performance of language models on
various downstream tasks that can match full parameter fine-tuning. There
remains a limited theoretical understanding of how these methods work. In this
paper, we aim to relieve this limitation by studying the learning ability of
Prefix Learning from the perspective of prefix length. In particular, we
approximate the infinite-long Prefix Learning optimization process by the
Neural Tangent Kernel (NTK) technique. We formulate and solve it as a learning
problem of the infinite-long prefix in a one-layer attention network. Our
results confirm the over-parameterization property and arbitrary small loss
convergence guarantee of the infinite-long Prefix Learning in attention. To the
implementation end, we propose our NTK-Attention method, which is "equivalent"
to attention computation with arbitrary prefix length efficiently. Its time
complexity mainly depends on the sub-quadratic of input length (without
prefix), and our method only requires d^2 + d extra parameters for
representation, where d is the feature dimension. In addition, we conducted
experiments that compare our NTK-Attention with full parameters fine-tuning,
LoRA, and P-Tuning V2 methods across vision or natural language datasets. The
results indicate our approach may be a promising
parameter-efficient-fine-tuning method since it has demonstrated superior
performance in numerous scenarios. Our code can be found at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要