超越炒作：对基于机器学习的恶意软件检测的影响和成本的现实评估

论文标题

超越炒作：对基于机器学习的恶意软件检测的影响和成本的现实评估

Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection

论文作者

Bridges, Robert A., Oesch, Sean, Verma, Miki E., Iannacone, Michael D., Huffer, Kelly M. T., Jewell, Brian, Nichols, Jeff A., Weber, Brian, Beaver, Justin M., Smith, Jared M., Scofield, Daniel, Miles, Craig, Plummer, Thomas, Daniell, Mark, Tall, Anne M.

论文摘要

在本文中，我们对四种突出的恶意软件检测工具进行了科学评估，以协助组织提出两个主要问题：基于ML的工具在多大程度上对以前和从未见过的文件进行了准确的分类？是否值得购买网络级恶意软件检测器？为了识别弱点，我们针对各种文件类型进行了3,536个文件（2,554或72 \％恶意，982或28 \％良性）的总计测试，包括数百种恶意零售，polyglots和apt-style-style style，包括多种协议。我们介绍了有关检测时间和准确性的统计结果，请考虑互补分析（一起使用多个工具），并提供了近期的成本效益评估程序的两种新颖应用。虽然基于ML的工具在检测零日文件和可执行文件方面更有效，但基于签名的工具可能仍然是一个更好的选择。两种基于网络的工具与任何一种主机工具配对时都可以进行大量（模拟）节省，但两者在HTTP或SMTP以外的协议上都显示出较差的检测率。我们的结果表明，所有四个工具都具有几乎完美的精度但令人震惊的低召回率，尤其是在可执行文件和Office文件以外的文件类型上 - 未检测到的37％的恶意软件，包括所有Polyglot文件。给出了研究人员的优先事项，并为最终用户的外卖提供了优先事项。

In this paper, we present a scientific evaluation of four prominent malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously- and never-before-seen files? Is it worth purchasing a network-level malware detector? To identify weaknesses, we tested each tool against 3,536 total files (2,554 or 72\% malicious, 982 or 28\% benign) of a variety of file types, including hundreds of malicious zero-days, polyglots, and APT-style files, delivered on multiple protocols. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of the recent cost-benefit evaluation procedure of Iannacone \& Bridges. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected. Priorities for researchers and takeaways for end users are given.

下载PDF全文

下载文献需遵守相关版权规定

论文标题