论文标题
语言模型是通用界面
Language Models are General-Purpose Interfaces
论文作者
论文摘要
由于在广泛的下游应用程序中的有效性,因此基础模型受到了很多关注。尽管在体系结构方面存在很大的融合,但大多数审慎的模型通常仍用于特定任务或模式。在这项工作中,我们建议将语言模型用作各种基础模型的通用接口。一系列预验证的编码者感知到了多种方式(例如视觉和语言),并与扮演通用任务层作用的语言模型对接。我们提出了一个半伴侣的语言建模目标,以共同确定界面和模块化编码器。我们从因果关系和非因果建模中涵盖了优势和能力,从而结合了两个世界中最好的。具体而言,所提出的方法不仅从因果语言建模中继承了内在学习和开放式生成的能力,而且由于双向编码器,也有利于填补。更重要的是,我们的方法无缝地解锁了上述功能的组合,例如,使用填充编码器启用了文本学习或指导。各种仅语言和视觉语言基准的实验结果表明,我们的模型的表现优于鉴定,零局部概括和少量学习的专业模型竞争。
Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language), and they dock with a language model that plays the role of a universal task layer. We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders. We subsume the advantages and capabilities from both causal and non-causal modeling, thereby combining the best of two worlds. Specifically, the proposed method not only inherits the capabilities of in-context learning and open-ended generation from causal language modeling, but also is conducive to finetuning because of the bidirectional encoders. More importantly, our approach seamlessly unlocks the combinations of the above capabilities, e.g., enabling in-context learning or instruction following with finetuned encoders. Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.