Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning¶

https://arxiv.org/abs/2104.08691
In this work, we explore “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3’s “few-shot” learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method “closes the gap” and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed “prefix tuning” of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
在这项工作中，我们探索了 “prompt tuning”，这是一种简单而有效的机制，用于学习 “soft prompts” 来调节冻结的语言模型以执行特定的下游任务。与 GPT-3 使用的离散文本提示不同，软提示是通过反向传播学习的，并且可以进行调整以合并来自任意数量的标记示例的信号。我们的端到端学习方法大大优于 GPT-3 的“小样本”学习。更值得注意的是，通过使用 T5 对模型大小进行消融，我们表明快速调整在规模上变得更具竞争力：当模型超过数十亿个参数时，我们的方法“缩小差距”并与模型调整的强大性能相匹配（所有模型权重都被调整）。这一发现尤其相关，因为大型模型的共享和服务成本很高，并且能够将一个冻结模型用于多个下游任务可以减轻这一负担。我们的方法可以看作是 Li 和 Liang （2021）最近提出的 “前缀调整” 的简化，我们提供了与这种方法和其他类似方法的比较。最后，我们表明，与完整的模型调整相比，用软提示调节冻结模型在域迁移的鲁棒性方面具有优势。

主页

索引

模块索引

搜索页面