论文标题

与翻译工件的协同作用,用于培训和推断多语言任务

Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

论文作者

Oh, Jaehoon, Ko, Jongwoo, Yun, Se-Young

论文摘要

翻译在改善多语言任务的性能中起着至关重要的作用:(1)从源语言数据中生成目标语言数据进行培训,并且(2)从目标语言数据中生成源语言数据以进行推理。但是,先前的工作并未同时考虑两种翻译的使用。本文表明,将它们结合起来可以协同各种多语言句子分类任务上的结果。我们从经验上发现,翻译人员风格的翻译工件是绩效增长的主要因素。基于此分析,我们采用了两种培训方法,即SUPCON和MIDUP,考虑了翻译工件。此外,我们提出了一种称为MUSC的跨语性微调算法,该算法使用SUPCON和MIXUP共同使用并改善了性能。我们的代码可在https://github.com/jongwooko/musc上找到。

Translation has played a crucial role in improving the performance on multilingual tasks: (1) to generate the target language data from the source language data for training and (2) to generate the source language data from the target language data for inference. However, prior works have not considered the use of both translations simultaneously. This paper shows that combining them can synergize the results on various multilingual sentence classification tasks. We empirically find that translation artifacts stylized by translators are the main factor of the performance gain. Based on this analysis, we adopt two training methods, SupCon and MixUp, considering translation artifacts. Furthermore, we propose a cross-lingual fine-tuning algorithm called MUSC, which uses SupCon and MixUp jointly and improves the performance. Our code is available at https://github.com/jongwooko/MUSC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源