论文标题
典型的医疗图像分割任务的绩效估计是多么精确?
How precise are performance estimates for typical medical image segmentation tasks?
论文作者
论文摘要
医学图像处理中的一个重要问题是,不仅能够估计算法的性能,还可以估计这些性能的估计精度。报告精度通常等于报告平均值(SEM)或同等置信区间的标准误差。但是,这在医学图像分割研究中很少进行。在本文中,我们旨在估算此类研究中可以预期的典型信心。为此,我们首先使用标准深度学习模型(U-NET)和医疗分割的十项全能龙的经典任务进行骰子指标估计的实验。我们使用高斯假设和自举均值(不需要对分布的任何假设)进行广泛研究精确估计。然后,我们为其他测试集大小和性能差异执行模拟。总体而言,我们的工作表明,小型测试集导致宽敞的置信区间(例如,$ \ sim $ 8点的骰子,对于20个样品,$σ\ simeq 10 $)。
An important issue in medical image processing is to be able to estimate not only the performances of algorithms but also the precision of the estimation of these performances. Reporting precision typically amounts to reporting standard-error of the mean (SEM) or equivalently confidence intervals. However, this is rarely done in medical image segmentation studies. In this paper, we aim to estimate what is the typical confidence that can be expected in such studies. To that end, we first perform experiments for Dice metric estimation using a standard deep learning model (U-net) and a classical task from the Medical Segmentation Decathlon. We extensively study precision estimation using both Gaussian assumption and bootstrapping (which does not require any assumption on the distribution). We then perform simulations for other test set sizes and performance spreads. Overall, our work shows that small test sets lead to wide confidence intervals (e.g. $\sim$8 points of Dice for 20 samples with $σ\simeq 10$).