使用自然语言处理的错误报告自动标记

论文标题

使用自然语言处理的错误报告自动标记

Auto-labelling of Bug Report using Natural Language Processing

论文作者

Patil, Avinash, Jadon, Aryan

论文摘要

在错误跟踪系统中检测类似的错误报告的行动称为重复错误报告检测。在事先了解错误报告的存在的情况下，减少了为调试问题所做的努力并确定根本原因。基于规则和基于查询的解决方案建议一长串潜在的类似错误报告，没有明确的排名。此外，分类工程师的动力不足以花时间浏览广泛的清单。因此，这阻止了使用重复的错误报告检索解决方案的使用。在本文中，我们提出了使用NLP技术组合的解决方案。我们的方法考虑了错误报告的非结构化和结构化属性，例如摘要，描述和严重性，受影响的产品，平台，类别等。它使用自定义数据变压器，深层神经网络和非通用机器学习方法来检索现有相同的错误报告。我们已经对包含数千个错误报告的重要数据源进行了许多实验，并证明了所提出的解决方案可用于召回@5的高检索精度为70％。

The exercise of detecting similar bug reports in bug tracking systems is known as duplicate bug report detection. Having prior knowledge of a bug report's existence reduces efforts put into debugging problems and identifying the root cause. Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking. In addition, triage engineers are less motivated to spend time going through an extensive list. Consequently, this deters the use of duplicate bug report retrieval solutions. In this paper, we have proposed a solution using a combination of NLP techniques. Our approach considers unstructured and structured attributes of a bug report like summary, description and severity, impacted products, platforms, categories, etc. It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports. We have performed numerous experiments with significant data sources containing thousands of bug reports and showcased that the proposed solution achieves a high retrieval accuracy of 70% for recall@5.

下载PDF全文

下载文献需遵守相关版权规定

论文标题