论文标题
通过记忆:对神经重复问题检测中最近邻居的研究
Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection
论文作者
论文摘要
重复的问题检测(DQD)对于提高社区和自动问答系统的效率很重要。不幸的是,在域中收集监督数据是耗时且昂贵的,我们利用范围内的注释的能力很少。在这项工作中,我们利用神经表示并研究最近的邻居在DQD中进行跨域泛化。我们首先在丰富的表示空间中编码源和目标域的问题对,然后使用k-nearest基于基于邻居检索的方法,我们将邻居的标签和距离汇总以排名对。我们在stackexchange,spring和Quora数据集的不同跨域情景中观察到这种方法的稳健性能,在多种情况下表现优于跨凝性分类。
Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.