论文标题

通过使用自动阈值来处理分布数据,以改善半监督的深度学习,以处理分发数据,以便使用胸部X射线图像进行COVID-19检测

Improving Semi-supervised Deep Learning by using Automatic Thresholding to Deal with Out of Distribution Data for COVID-19 Detection using Chest X-ray Images

论文作者

Benavides-Mata, Isaac, Calderon-Ramirez, Saul

论文摘要

当标记的数据受到限制并且未标记的数据范围很大时,半监督学习(SSL)利用培训模型标记和未标记的数据。通常,未标记的数据比标记的数据更广泛地获得,因此,当标记的数据稀缺时,该数据用于提高模型的概括水平。但是,在实际设置中,未标记的数据可能描绘了与标记的数据集分布不同的分布。这被称为分布不匹配。当未标记的数据的来源与标记的数据不同时,通常会发生这种问题。例如,在医学成像域中,使用胸部X射线图像训练Covid-19探测器时,可以使用不同医院采样的不同标记的数据集。在这项工作中,我们提出了一种自动阈值方法,以过滤未标记的数据集中的分布外数据。我们使用由预训练的图像网络提取器(FE)构建的特征空间在标记的数据集和未标记的数据集之间使用Mahalanobis距离来评分每个未标记的观测值。我们在使用胸部X射线图像训练Covid-19检测器的背景下测试两种简单的自动阈值方法。经过测试的方法提供了一种自动方式来定义在训练半监督深度学习体系结构时要保留的未标记数据。

Semi-supervised learning (SSL) leverages both labeled and unlabeled data for training models when the labeled data is limited and the unlabeled data is vast. Frequently, the unlabeled data is more widely available than the labeled data, hence this data is used to improve the level of generalization of a model when the labeled data is scarce. However, in real-world settings unlabeled data might depict a different distribution than the labeled dataset distribution. This is known as distribution mismatch. Such problem generally occurs when the source of unlabeled data is different from the labeled data. For instance, in the medical imaging domain, when training a COVID-19 detector using chest X-ray images, different unlabeled datasets sampled from different hospitals might be used. In this work, we propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset. We use the Mahalanobis distance between the labeled and unlabeled datasets using the feature space built by a pre-trained Image-net Feature Extractor (FE) to score each unlabeled observation. We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images. The tested methods provide an automatic manner to define what unlabeled data to preserve when training a semi-supervised deep learning architecture.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源