论文标题
使用排名统计信息自动发现和学习新的视觉类别
Automatically Discovering and Learning New Visual Categories with Ranking Statistics
论文作者
论文摘要
我们解决了在给定标记的其他类别示例的图像集中发现新颖课程的问题。这种设置类似于半监督的学习,但要难得更难,因为新类没有标记的示例。因此,挑战是利用标记图像中包含的信息,以学习通用聚类模型,并使用后者来识别未标记数据中的新类。在这项工作中,我们通过结合三个想法来解决这个问题:(1)我们建议使用标记数据来引导图像表示的常见方法仅引入不需要的偏见,并且可以通过使用自我监督的学习来避免使用自我审查的学习来训练在标记和未标记数据的联合和未标记数据的情况下训练表示的表示; (2)我们使用等级统计信息将模型的标记类知识转移到聚类未标记图像的问题上; (3)我们通过在数据的标记和未标记的数据子集上优化关节目标函数来训练数据表示形式,从而改善标记数据的监督分类以及未标记数据的聚类。我们评估了我们对标准分类基准的方法,并以显着的余量来胜过新型类别发现的当前方法。
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. This setting is similar to semi-supervised learning, but significantly harder because there are no labelled examples for the new classes. The challenge, then, is to leverage the information contained in the labelled images in order to learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data. In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data. We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.