论文标题
通过精确模型调整最大化用例特异性
Maximizing Use-Case Specificity through Precision Model Tuning
论文作者
论文摘要
近年来,对于信息检索等任务,语言模型越来越流行。随着用例变得针对特定域,对标准性能的默认为默认值。为了微调这些模型以用于特定的任务和数据集,有必要仔细调整模型的超参数和训练技术。在本文中,我们对四个基于变压器的语言模型在生物医学信息检索任务上的性能进行了深入分析。我们考虑的模型是DeepMind的Retro(7b参数),GPT-J(6B参数),GPT-3(175b参数)和Bloom(176b参数)。我们使用有关蛋白质结构/功能预测作为我们数据集的480000研究论文的大型语料库,根据相关性,准确性和解释性进行比较。我们的发现表明,在特定于域的数据集上具有<10b参数且微调的较小模型倾向于在准确性,相关性和解释性方面优于高度特定问题的较大语言模型(平均+50%)。但是,较大的模型确实可以在更广泛的提示上提供更好的结果。
Language models have become increasingly popular in recent years for tasks like information retrieval. As use-cases become oriented toward specific domains, fine-tuning becomes default for standard performance. To fine-tune these models for specific tasks and datasets, it is necessary to carefully tune the model's hyperparameters and training techniques. In this paper, we present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval. The models we consider are DeepMind's RETRO (7B parameters), GPT-J (6B parameters), GPT-3 (175B parameters), and BLOOM (176B parameters). We compare their performance on the basis of relevance, accuracy, and interpretability, using a large corpus of 480000 research papers on protein structure/function prediction as our dataset. Our findings suggest that smaller models, with <10B parameters and fine-tuned on domain-specific datasets, tend to outperform larger language models on highly specific questions in terms of accuracy, relevancy, and interpretability by a significant margin (+50% on average). However, larger models do provide generally better results on broader prompts.