过滤的跨语言嵌入对齐方式的内部产品投影

论文标题

过滤的跨语言嵌入对齐方式的内部产品投影

Filtered Inner Product Projection for Crosslingual Embedding Alignment

论文作者

Sachidananda, Vin, Yang, Ziyi, Zhu, Chenguang

论文摘要

由于对机器翻译和传输学习的普遍兴趣，因此有许多算法将多个嵌入到共享表示空间中。最近，这些算法在双语词典归纳的环境中进行了研究，在这些算法诱导的情况下，人们试图使源的嵌入和目标语言保持一致，使得翻译为单词对彼此靠近共同表示空间。在本文中，我们提出了一种过滤后的内部产品投影（FIPP），以将嵌入到共同表示空间中，并在双语词典诱导的背景下评估FIPP。由于语义转移在语言和域之间普遍存在，因此FIPP首先识别嵌入中的常见几何结构，然后仅在共同的结构上，将这些嵌入的革兰氏矩阵对齐。与以前的方法不同，即使源和目标嵌入具有不同的维度，FIPP也适用。我们表明，我们的方法在缪斯数据集上的现有方法都优于各种语言对的现有方法。此外，FIPP在易于实施和可扩展性方面都提供了计算益处。

Due to widespread interest in machine translation and transfer learning, there are numerous algorithms for mapping multiple embeddings to a shared representation space. Recently, these algorithms have been studied in the setting of bilingual dictionary induction where one seeks to align the embeddings of a source and a target language such that translated word pairs lie close to one another in a common representation space. In this paper, we propose a method, Filtered Inner Product Projection (FIPP), for mapping embeddings to a common representation space and evaluate FIPP in the context of bilingual dictionary induction. As semantic shifts are pervasive across languages and domains, FIPP first identifies the common geometric structure in both embeddings and then, only on the common structure, aligns the Gram matrices of these embeddings. Unlike previous approaches, FIPP is applicable even when the source and target embeddings are of differing dimensionalities. We show that our approach outperforms existing methods on the MUSE dataset for various language pairs. Furthermore, FIPP provides computational benefits both in ease of implementation and scalability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题