通过细心的多模式学习对软件系统的异质异常检测

论文标题

通过细心的多模式学习对软件系统的异质异常检测

Heterogeneous Anomaly Detection for Software Systems via Attentive Multi-modal Learning

论文作者

Li, Baitong, Yang, Tianyi, Chen, Zhuangbin, Su, Yuxin, Yang, Yongqiang, Lyu, Michael R.

论文摘要

及时准确检测系统异常对于确保软件系统的可靠性至关重要。与利用所有可用运行时间信息的手动工作不同，现有方法通常仅利用单一类型的监视数据（通常是日志或指标），或者无法在多源数据中有效使用联合信息。因此，发生了许多错误的预测。为了更好地理解系统异常的表现，我们基于大量异质数据（即日志和指标）进行了全面的经验研究。我们的研究表明，系统异常可以在不同的数据类型中明显表现出来。因此，整合异质数据可以帮助恢复系统健康状况的完整情况。在这种情况下，我们提出了Hades，这是基于异构数据有效识别系统异常的第一项工作。我们的方法采用层次结构来通过融合日志语义和度量模式来学习系统状态的全局表示。它通过新型的跨模式注意模块捕获了从多模式数据中捕获判别特征和有意义的相互作用，从而实现了准确的系统异常检测。我们对大型模拟和工业数据集进行了广泛的评估。实验结果介绍了HADE在检测系统异质数据上的异常情况方面的优势。我们发布了代码和注释的数据集，以进行可重复性和未来的研究。

Prompt and accurate detection of system anomalies is essential to ensure the reliability of software systems. Unlike manual efforts that exploit all available run-time information, existing approaches usually leverage only a single type of monitoring data (often logs or metrics) or fail to make effective use of the joint information among multi-source data. Consequently, many false predictions occur. To better understand the manifestations of system anomalies, we conduct a comprehensive empirical study based on a large amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates that system anomalies could manifest distinctly in different data types. Thus, integrating heterogeneous data can help recover the complete picture of a system's health status. In this context, we propose HADES, the first work to effectively identify system anomalies based on heterogeneous data. Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns. It captures discriminative features and meaningful interactions from multi-modal data via a novel cross-modal attention module, enabling accurate system anomaly detection. We evaluate HADES extensively on large-scale simulated and industrial datasets. The experimental results present the superiority of HADES in detecting system anomalies on heterogeneous data. We release the code and the annotated dataset for reproducibility and future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题