论文标题

通过记忆:对神经重复问题检测中最近邻居的研究

Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection

论文作者

Yaghoobzadeh, Yadollah, Rochette, Alexandre, Hazen, Timothy J.

论文摘要

重复的问题检测(DQD)对于提高社区和自动问答系统的效率很重要。不幸的是,在域中收集监督数据是耗时且昂贵的,我们利用范围内的注释的能力很少。在这项工作中,我们利用神经表示并研究最近的邻居在DQD中进行跨域泛化。我们首先在丰富的表示空间中编码源和目标域的问题对,然后使用k-nearest基于基于邻居检索的方法,我们将邻居的标签和距离汇总以排名对。我们在stackexchange,spring和Quora数据集的不同跨域情景中观察到这种方法的稳健性能,在多种情况下表现优于跨凝性分类。

Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源