在Microsoft上，使用大规模的异质图表示学习进行代码审核建议

论文标题

在Microsoft上，使用大规模的异质图表示学习进行代码审核建议

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

论文作者

Zhang, Jiyang, Maddila, Chandra, Bairi, Ram, Bird, Christian, Raizada, Ujjwal, Agrawal, Apoorva, Jhawar, Yamini, Herzig, Kim, van Deursen, Arie

论文摘要

代码审查是任何成熟软件开发过程中不可或缺的一部分，并且在软件工程社区中确定最佳代码更改的审阅者是一个良好接受的问题。选择缺乏专业知识和理解的审阅者可以减缓发展或导致更多缺陷。迄今为止，大多数审阅者建议系统主要依赖于历史文件更改和审查信息；那些过去更改或审查文件的人将来最好审查。我们认为，尽管这些方法能够识别并建议合格的审阅者，但它们可能对具有所需专业知识并且从未与更改文件进行过交互的审阅者视而不见。幸运的是，在微软，我们在许多存储库中都有大量的作品文物，可以产生有关开发人员的宝贵信息。为了解决上述问题，我们提出了珊瑚，这是一种新颖的审阅者建议，建议利用由丰富的实体集（开发人员，存储库，文件，拉动请求（PRS），工作项目等）及其在现代源代码管理系统中的关系。我们在此图上采用了图形卷积神经网络，并在微软内的332个存储库上对其进行了两年半的历史训练。我们表明，珊瑚能够很好地对审稿人选择的手册历史进行建模。此外，基于广泛的用户研究，我们证明了这种方法确定了传统审稿人Miss的相关和合格的审阅者，并且这些开发人员希望将其纳入审核过程。最后，我们发现“古典”审稿人建议系统在较小的（开发人员）软件项目上的表现更好，而珊瑚在较大的项目上表现出色，这表明“没有一个模型可以统治它们”。

Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well-accepted problem within the software engineering community. Selecting a reviewer who lacks expertise and understanding can slow development or result in more defects. To date, most reviewer recommendation systems rely primarily on historical file change and review information; those who changed or reviewed a file in the past are the best positioned to review in the future. We posit that while these approaches are able to identify and suggest qualified reviewers, they may be blind to reviewers who have the needed expertise and have simply never interacted with the changed files before. Fortunately, at Microsoft, we have a wealth of work artifacts across many repositories that can yield valuable information about our developers. To address the aforementioned problem, we present CORAL, a novel approach to reviewer recommendation that leverages a socio-technical graph built from the rich set of entities (developers, repositories, files, pull requests (PRs), work items, etc.) and their relationships in modern source code management systems. We employ a graph convolutional neural network on this graph and train it on two and a half years of history on 332 repositories within Microsoft. We show that CORAL is able to model the manual history of reviewer selection remarkably well. Further, based on an extensive user study, we demonstrate that this approach identifies relevant and qualified reviewers who traditional reviewer recommenders miss, and that these developers desire to be included in the review process. Finally, we find that "classical" reviewer recommendation systems perform better on smaller (in terms of developers) software projects while CORAL excels on larger projects, suggesting that there is "no one model to rule them all."

下载PDF全文

下载文献需遵守相关版权规定

论文标题