论文标题
seqtrans:通过序列进行序列学习的自动漏洞修复
SeqTrans: Automatic Vulnerability Fix via Sequence to Sequence Learning
论文作者
论文摘要
由于最近开发了自动化漏洞狩猎工具,因此现在以空前的速度报告了软件漏洞。但是,修复漏洞仍然主要取决于程序员的手动工作。开发人员需要深入了解漏洞,并尝试尽可能少地影响系统的功能。 在本文中,随着神经机器翻译(NMT)技术的发展,我们提供了一种名为SEQTRANS的新颖方法,以利用历史漏洞修复以提供建议并自动修复源代码。为了捕获围绕脆弱代码的上下文信息,我们建议利用数据流依赖项来构建代码序列并将其馈送到最先进的变压器模型中。已经引入了微调策略来克服小样本量问题。我们在包含1,282个提交的数据集上评估了SEQTRANS,该数据集可以解决205个Java项目中的624个漏洞。结果表明,SEQTRANS的准确性优于最新技术,在语句级别的修复中获得23.3%,而CVE级别的Fix中的固定级别为25.3%。同时,我们深入研究了结果,并观察到NMT模型在某些类型的漏洞中表现出色,例如CWE-287(不正确的身份验证)和CWE-863(错误授权)。
Software vulnerabilities are now reported at an unprecedented speed due to the recent development of automated vulnerability hunting tools. However, fixing vulnerabilities still mainly depends on programmers' manual efforts. Developers need to deeply understand the vulnerability and try to affect the system's functions as little as possible. In this paper, with the advancement of Neural Machine Translation (NMT) techniques, we provide a novel approach called SeqTrans to exploit historical vulnerability fixes to provide suggestions and automatically fix the source code. To capture the contextual information around the vulnerable code, we propose to leverage data flow dependencies to construct code sequences and fed them into the state-of-the-art transformer model. The fine-tuning strategy has been introduced to overcome the small sample size problem. We evaluate SeqTrans on a dataset containing 1,282 commits that fix 624 vulnerabilities in 205 Java projects. Results show that the accuracy of SeqTrans outperforms the latest techniques and achieves 23.3% in statement-level fix and 25.3% in CVE-level fix. In the meantime, we look deep inside the result and observe that NMT model performs very well in certain kinds of vulnerabilities like CWE-287 (Improper Authentication) and CWE-863 (Incorrect Authorization).