论文标题
学会从越南语中的放射学报告中诊断出胸部射线照相的常见胸部疾病
Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese
论文作者
论文摘要
我们提出了一个数据收集和注释管道,该数据从越南放射学报告中提取信息,以提供胸部X射线(CXR)图像的准确标签。这可以通过注释与其特有诊断类别的数据相匹配,这些数据可能因国家而异。为了评估所提出的标签技术的功效,我们构建了一个包含9,752项研究的CXR数据集,并使用此数据集的子集评估了我们的管道。评估的F1得分至少为0.9923,表明我们的标签工具在所有类别中都精确而始终如一。构建数据集后,我们训练深度学习模型,以利用从大型公共CXR数据集传输的知识。我们采用各种损失功能来克服不平衡的多标签数据集的诅咒,并使用各种模型体系结构进行实验,以选择提供最佳性能的诅咒。我们的最佳模型(CHEXPERT-EDISTISENET-B2)的F1得分为0.6989(95%CI 0.6740,0.7240),AUC为0.7912,灵敏度为0.7064,特异性为0.8760,普遍诊断为0.8760。最后,我们证明我们的粗分类(基于五个特定的异常位置)可以在基准CHEXPERT数据集上获得可比的结果(十二个病理),以用于一般异常检测,同时在所有类别的平均表现方面提供更好的性能。
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country. To assess the efficacy of the proposed labeling technique, we built a CXR dataset containing 9,752 studies and evaluated our pipeline using a subset of this dataset. With an F1-score of at least 0.9923, the evaluation demonstrates that our labeling tool performs precisely and consistently across all classes. After building the dataset, we train deep learning models that leverage knowledge transferred from large public CXR datasets. We employ a variety of loss functions to overcome the curse of imbalanced multi-label datasets and conduct experiments with various model architectures to select the one that delivers the best performance. Our best model (CheXpert-pretrained EfficientNet-B2) yields an F1-score of 0.6989 (95% CI 0.6740, 0.7240), AUC of 0.7912, sensitivity of 0.7064 and specificity of 0.8760 for the abnormal diagnosis in general. Finally, we demonstrate that our coarse classification (based on five specific locations of abnormalities) yields comparable results to fine classification (twelve pathologies) on the benchmark CheXpert dataset for general anomaly detection while delivering better performance in terms of the average performance of all classes.