论文标题
Ergan:实体分辨率的生成对抗网络
ErGAN: Generative Adversarial Networks for Entity Resolution
论文作者
论文摘要
实体分辨率目标是识别从一个或多个数据集代表相同现实世界实体的记录。基于学习的实体解决方案的主要挑战是如何降低培训标签成本。由于唱片对比较的二次性质,标签是一项昂贵的任务,通常需要人类专家的巨大努力。受生成对抗网络(GAN)的最新进展的启发,我们提出了一种新颖的深度学习方法,即Ergan,以应对挑战。 Ergan由两个关键组成部分组成:标签生成器和一个通过对抗学习进行优化的鉴别器。为了减轻过度拟合和高度不平衡分布的问题,我们为多样性和传播设计了两个新型模块,可以极大地提高模型的概括能力。我们进行了广泛的实验,以经验验证Ergan的标记和学习效率。实验结果表明,Ergan击败了最先进的基线,包括无监督,半监督和无监督的学习方法。
Entity resolution targets at identifying records that represent the same real-world entity from one or more datasets. A major challenge in learning-based entity resolution is how to reduce the label cost for training. Due to the quadratic nature of record pair comparison, labeling is a costly task that often requires a significant effort from human experts. Inspired by recent advances of generative adversarial network (GAN), we propose a novel deep learning method, called ErGAN, to address the challenge. ErGAN consists of two key components: a label generator and a discriminator which are optimized alternatively through adversarial learning. To alleviate the issues of overfitting and highly imbalanced distribution, we design two novel modules for diversity and propagation, which can greatly improve the model generalization power. We have conducted extensive experiments to empirically verify the labeling and learning efficiency of ErGAN. The experimental results show that ErGAN beats the state-of-the-art baselines, including unsupervised, semi-supervised, and unsupervised learning methods.