论文标题
后续注意:开发人员和神经模型探索的实证研究
Follow-up Attention: An Empirical Study of Developer and Neural Model Code Exploration
论文作者
论文摘要
由于潜在的注意机制,最近的代码神经模型,例如OpenAI Codex和AlphaCode,在代码生成方面表现出了非常熟练的水平。但是,通常不清楚模型如何实际处理代码,以及它们的推理以及其注意机制扫描代码与开发人员模式匹配的方式。对模型推理过程的不良理解限制了当今当前神经模型的利用方式,到目前为止,主要是为了原始预测。为了填补这一空白,这项工作研究了三种开放大型语言模型的加工注意信号(Codegen,Invoder和GPT -J-)如何与开发人员的看法和探索代码相吻合时,当每个人都回答有关代码的相同有意义的问题时。此外,我们贡献了一个开源的眼球跟踪数据集,其中包括92个手动标记的会话,来自25个从事感官任务的开发人员。我们从经验上评估了五种启发式方法,这些启发式方法不使用注意力和十种基于注意力的后处理方法,即Codegen的注意力信号反对我们探索代码的开发人员的基础真理,包括新的后续注意概念,这些概念表现出模型和人类注意力之间的最高一致性。我们的后续注意方法可以预测开发人员将以47%准确性看待的下一行。这表现优于基线预测准确性42.3%,该准确性使用其他开发人员的会话历史记录来推荐下一行。这些结果证明了利用预训练模型的注意力信号进行有效代码探索的潜力。
Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers engaged in sensemaking tasks. We empirically evaluate five heuristics that do not use the attention and ten attention-based post-processing approaches of the attention signal of CodeGen against our ground truth of developers exploring code, including the novel concept of follow-up attention which exhibits the highest agreement between model and human attention. Our follow-up attention method can predict the next line a developer will look at with 47% accuracy. This outperforms the baseline prediction accuracy of 42.3%, which uses the session history of other developers to recommend the next line. These results demonstrate the potential of leveraging the attention signal of pre-trained models for effective code exploration.