多模式机器翻译的动态上下文引导的胶囊网络

论文标题

多模式机器翻译的动态上下文引导的胶囊网络

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

论文作者

Lin, Huan, Meng, Fandong, Su, Jinsong, Yin, Yongjing, Yang, Zhengyuan, Ge, Yubin, Zhou, Jie, Luo, Jiebo

论文摘要

多模式的机器翻译（MMT）主要集中于以视觉特征增强仅文本翻译，引起了计算机视觉和自然语言处理社区的极大关注。当前的大多数MMT模型求助于注意机制，全局上下文建模或多模式的联合表示学习来利用视觉特征。但是，注意机制缺乏模式之间的足够的语义相互作用，而其他两个则提供了固定的视觉上下文，这不适合在生成翻译时对观察到的可变性进行建模。为了解决上述问题，在本文中，我们为MMT提出了一种新颖的动态上下文引导的胶囊网络（DCCN）。具体而言，在解码的每个时间步中，我们首先采用常规的源目标关注来产生特定于时间段的源端上下文向量。接下来，DCCN将该向量作为输入，并使用它通过上下文引导的动态路由机制来指导相关视觉特征的迭代提取。特别是，我们代表具有全局和区域视觉特征的输入图像，我们引入了两个平行的DCCN，以模拟具有不同粒度的视觉特征的多模式上下文矢量。最后，我们获得了两个多模式上下文向量，它们被融合并合并到解码器中以预测目标单词。在英语到德语和英语翻译的Multi30k数据集上的实验结果证明了DCCN的优势。我们的代码可在https://github.com/deeplearnxmu/mm-dccn上找到。

Multimodal machine translation (MMT), which mainly focuses on enhancing text-only translation with visual features, has attracted considerable attention from both computer vision and natural language processing communities. Most current MMT models resort to attention mechanism, global context modeling or multimodal joint representation learning to utilize visual features. However, the attention mechanism lacks sufficient semantic interactions between modalities while the other two provide fixed visual context, which is unsuitable for modeling the observed variability when generating translation. To address the above issues, in this paper, we propose a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at each timestep of decoding, we first employ the conventional source-target attention to produce a timestep-specific source-side context vector. Next, DCCN takes this vector as input and uses it to guide the iterative extraction of related visual features via a context-guided dynamic routing mechanism. Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities. Finally, we obtain two multimodal context vectors, which are fused and incorporated into the decoder for the prediction of the target word. Experimental results on the Multi30K dataset of English-to-German and English-to-French translation demonstrate the superiority of DCCN. Our code is available on https://github.com/DeepLearnXMU/MM-DCCN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题