论文标题
机器学习系统中向后兼容性的经验分析
An Empirical Analysis of Backward Compatibility in Machine Learning Systems
论文作者
论文摘要
在机器学习(ML)的许多应用中,进行更新的目的是提高模型性能。但是,最新模型的当前实践仅依赖于孤立的,汇总的性能分析,忽略了实际部署的重要依赖性,期望和需求。我们考虑旨在改善ML模型的更新如何引入可能会对下游系统和用户产生重大影响的新错误。例如,基于云的分类服务中使用的模型中使用的更新(例如图像识别)可能会导致对服务调用的系统中意外的错误行为。先前的工作表明,“向后兼容”对于维持人类信任的重要性。我们研究了不同ML架构和数据集的向后兼容性的挑战,重点是共同的设置,包括与推论管道中使用的结构化噪声和ML的数据变化。我们的结果表明,(i)即使没有由于优化的随机性而没有数据移动而出现的兼容性问题,(ii)对大规模嘈杂数据集进行的培训通常会导致向后兼容性显着下降,即使模型的准确性提高,也会导致(iii)(iii)与噪声偏见不符的不符点相一致的分布,激励噪声偏见,激励能够使能力达到努力和强大的意识到能力,并获得了努力。
In many applications of machine learning (ML), updates are performed with the goal of enhancing model performance. However, current practices for updating models rely solely on isolated, aggregate performance analyses, overlooking important dependencies, expectations, and needs in real-world deployments. We consider how updates, intended to improve ML models, can introduce new errors that can significantly affect downstream systems and users. For example, updates in models used in cloud-based classification services, such as image recognition, can cause unexpected erroneous behavior in systems that make calls to the services. Prior work has shown the importance of "backward compatibility" for maintaining human trust. We study challenges with backward compatibility across different ML architectures and datasets, focusing on common settings including data shifts with structured noise and ML employed in inferential pipelines. Our results show that (i) compatibility issues arise even without data shift due to optimization stochasticity, (ii) training on large-scale noisy datasets often results in significant decreases in backward compatibility even when model accuracy increases, and (iii) distributions of incompatible points align with noise bias, motivating the need for compatibility aware de-noising and robustness methods.