添加看似非信息的标签有助于低数据制度

论文标题

添加看似非信息的标签有助于低数据制度

Adding Seemingly Uninformative Labels Helps in Low Data Regimes

论文作者

Matsoukas, Christos, Hernandez, Albert Bou I, Liu, Yue, Dembrower, Karin, Miranda, Gisele, Konuk, Emir, Haslum, Johan Fredin, Zouzos, Athanasios, Lindholm, Peter, Strand, Fredrik, Smith, Kevin

论文摘要

有证据表明，在大型数据集中培训的网络不仅仅是因为众多的培训示例，而是班级多样性，这会鼓励学习丰富的功能。这就提出了一个问题，即当数据稀缺时是否仍然如此 - 在低数据制度中使用其他标签学习是否有优势？在这项工作中，我们考虑了一项需要难以获取专家注释的任务：乳房X线摄影图像中的肿瘤分割。我们表明，在低数据设置中，可以通过使用非专家注释的看似不信息标签来补充专家注释，从而提高性能，从而将任务变成多级问题。我们透露，当较少的专家数据可用时，这些收益会增加，并通过进一步的研究发现了一些有趣的属性。我们演示了我们在此处介绍的新数据集的CSAW-S上的发现，并在两个公共数据集中确认它们。

Evidence suggests that networks trained on large datasets generalize well not solely because of the numerous training examples, but also class diversity which encourages learning of enriched features. This raises the question of whether this remains true when data is scarce - is there an advantage to learning with additional labels in low-data regimes? In this work, we consider a task that requires difficult-to-obtain expert annotations: tumor segmentation in mammography images. We show that, in low-data settings, performance can be improved by complementing the expert annotations with seemingly uninformative labels from non-expert annotators, turning the task into a multi-class problem. We reveal that these gains increase when less expert data is available, and uncover several interesting properties through further studies. We demonstrate our findings on CSAW-S, a new dataset that we introduce here, and confirm them on two public datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题