论文标题
机器学习工作流程以解释对多个数据集评估的早期阿尔茨海默氏病分类的黑盒模型
Machine Learning Workflow to Explain Black-box Models for Early Alzheimer's Disease Classification Evaluated for Multiple Datasets
论文作者
论文摘要
目的:难以解释的黑盒机器学习(ML)通常用于早期阿尔茨海默氏病(AD)检测。 方法:解释极端梯度提升(XGBoost),随机森林(RF)和支持向量机(SVM)黑盒模型基于Shapley值的工作流程。所有模型均经过阿尔茨海默氏病神经影像学计划(ADNI)数据集的培训,并对独立的ADNI测试集进行了评估,以及澳大利亚外部成像和衰老的生活方式旗舰研究(AIBL)和开放式成像研究(OASIS)数据集。将沙普利的值与可解释的决策树(DTS)和逻辑回归(LR)以及自然和排列特征进行比较。为了避免相关特征引起的解释有效性的降低,实施了正向选择和方面合并。 结果:一些黑盒模型的表现优于DTS和LR。前向选择的特征对应于先前与AD相关的大脑区域。沙普利值确定了具有中等至强相关的生物学上合理的关联与特征重要性。预测AD转换的最重要的RF特征是杏仁核的体积和认知测试评分。良好的认知测试表现和大脑量降低了AD风险。使用认知测试得分训练的模型明显优于大脑体积模型($ P <0.05 $)。认知正常(CN)与AD模型成功地转移到外部数据集。 结论:与以前的工作相比,使用脑体积来实现CN与轻度认知障碍(MCI)分类的ADNI和AIBL的提高性能。沙普利值和特征的重要性显示出中度至强相关性。
Purpose: Hard-to-interpret Black-box Machine Learning (ML) were often used for early Alzheimer's Disease (AD) detection. Methods: To interpret eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM) black-box models a workflow based on Shapley values was developed. All models were trained on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and evaluated for an independent ADNI test set, as well as the external Australian Imaging and Lifestyle flagship study of Ageing (AIBL), and Open Access Series of Imaging Studies (OASIS) datasets. Shapley values were compared to intuitively interpretable Decision Trees (DTs), and Logistic Regression (LR), as well as natural and permutation feature importances. To avoid the reduction of the explanation validity caused by correlated features, forward selection and aspect consolidation were implemented. Results: Some black-box models outperformed DTs and LR. The forward-selected features correspond to brain areas previously associated with AD. Shapley values identified biologically plausible associations with moderate to strong correlations with feature importances. The most important RF features to predict AD conversion were the volume of the amygdalae, and a cognitive test score. Good cognitive test performances and large brain volumes decreased the AD risk. The models trained using cognitive test scores significantly outperformed brain volumetric models ($p<0.05$). Cognitive Normal (CN) vs. AD models were successfully transferred to external datasets. Conclusion: In comparison to previous work, improved performances for ADNI and AIBL were achieved for CN vs. Mild Cognitive Impairment (MCI) classification using brain volumes. The Shapley values and the feature importances showed moderate to strong correlations.