论文标题
覆盖范围引导的张量编译器与关节IR-PASS突变模糊
Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation
论文作者
论文摘要
在过去的十年中,深度学习(DL)系统已被广泛部署在各个领域,以促进我们的日常生活。同时,确保DL系统的正确性(例如,由于其内在的非确定性)的正确性是极其挑战的,并且DL系统中的错误可能会造成严重后果,甚至可能威胁人类的生命。在文献中,研究人员探索了测试,分析和验证DL模型的各种技术,因为它们的质量直接影响相应的系统行为。最近,研究人员还提出了用于测试基础操作员级别DL库(例如Tensorflow和Pytorch)的新型技术,该技术为每个高级DL操作员提供了一般的二元实现,以在许多平台上运行各种DL模型。但是,针对新兴张量编译器的可靠性仍然有限,旨在将高级张量计算图直接编译为高性能二进制文件,以提高效率,便携性和可扩展性。在本文中,我们针对张量编译器测试的重要问题,并提出了TZER,这是一种用于广泛使用的TVM张量编译器的实用模糊技术。由于高级IR的突变空间有限,Tzer致力于突变TVM的低级中间表示(IR)。更具体地说,Tzer利用了覆盖范围反馈以进行进化IR突变的覆盖范围反馈。此外,Tzer还与IR突变同时执行通过突变,以进行更有效的模糊。我们的结果表明,TZER在张量编译器测试上的现有模糊技术大大优于现有的模糊技术,比第二好技术高75%,高50%的宝贵测试。迄今为止,TZER已检测到49个以前未知的TVM错误,并确认了37个错误和25个错误(PR合并)。
In the past decade, Deep Learning (DL) systems have been widely deployed in various domains to facilitate our daily life. Meanwhile, it is extremely challenging to ensure the correctness of DL systems (e.g., due to their intrinsic nondeterminism), and bugs in DL systems can cause serious consequences and may even threaten human lives. In the literature, researchers have explored various techniques to test, analyze, and verify DL models, since their quality directly affects the corresponding system behaviors. Recently, researchers have also proposed novel techniques for testing the underlying operator-level DL libraries (such as TensorFlow and PyTorch), which provide general binary implementations for each high-level DL operator for running various DL models on many platforms. However, there is still limited work targeting the reliability of the emerging tensor compilers, which aim to directly compile high-level tensor computation graphs into high-performance binaries for better efficiency, portability, and scalability. In this paper, we target the important problem of tensor compiler testing, and have proposed Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Tzer focuses on mutating the low-level Intermediate Representation (IR) for TVM due to the limited mutation space for the high-level IR. More specifically, Tzer leverages both general-purpose and tensor-compiler-specific mutators guided by coverage feedback for evolutionary IR mutation; furthermore, Tzer also performs pass mutation in tandem with IR mutation for more effective fuzzing. Our results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing, with 75% higher coverage and 50% more valuable tests than the 2nd-best technique. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed (PR merged).