通过记忆：对神经重复问题检测中最近邻居的研究

论文标题

通过记忆：对神经重复问题检测中最近邻居的研究

Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection

论文作者

Yaghoobzadeh, Yadollah, Rochette, Alexandre, Hazen, Timothy J.

论文摘要

重复的问题检测（DQD）对于提高社区和自动问答系统的效率很重要。不幸的是，在域中收集监督数据是耗时且昂贵的，我们利用范围内的注释的能力很少。在这项工作中，我们利用神经表示并研究最近的邻居在DQD中进行跨域泛化。我们首先在丰富的表示空间中编码源和目标域的问题对，然后使用k-nearest基于基于邻居检索的方法，我们将邻居的标签和距离汇总以排名对。我们在stackexchange，spring和Quora数据集的不同跨域情景中观察到这种方法的稳健性能，在多种情况下表现优于跨凝性分类。

Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题