论文标题

迈向深层神经网络的准确平台感知性能建模

Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks

论文作者

Wang, Chuan-Chi, Liao, Ying-Chiao, Kao, Ming-Chang, Liang, Wen-Yew, Hung, Shih-Hao

论文摘要

在本文中,我们提供了一种基于精细机器学习的方法PerfNETV2,该方法提高了我们以前的工作的准确性,用于建模各种GPU加速器上的神经网络性能。给定应用程序,提出的方法可用于预测应用程序中使用的卷积神经网络的推理时间和训练时间,这使系统开发人员能够通过选择神经网络和/或合并硬件强制器来优化性能,从而在及时提供令人满意的结果。此外,所提出的方法能够预测看不见或不存在的设备的性能,例如一个新的GPU,其工作频率具有较高的处理器内核,但内存能力更高。这使系统开发人员可以快速搜索硬件设计空间和/或微调系统配置。与以前的作品相比,PerfNETV2通过在执行完整的神经网络并改善预测变量中使用的机器学习模型的体系结构中对详细的主机加速器相互作用进行建模来提供更准确的结果。我们的案例研究表明,在NVIDIA GTX-1080TI上,LENET,ALEXNET和VGG16在13.1%的范围内产生的绝对百分比误差的平均百分比误差,而ICBD 2018上发布的工作的错误率可能高达200%。

In this paper, we provide a fine-grain machine learning-based method, PerfNetV2, which improves the accuracy of our previous work for modeling the neural network performance on a variety of GPU accelerators. Given an application, the proposed method can be used to predict the inference time and training time of the convolutional neural networks used in the application, which enables the system developer to optimize the performance by choosing the neural networks and/or incorporating the hardware accelerators to deliver satisfactory results in time. Furthermore, the proposed method is capable of predicting the performance of an unseen or non-existing device, e.g. a new GPU which has a higher operating frequency with less processor cores, but more memory capacity. This allows a system developer to quickly search the hardware design space and/or fine-tune the system configuration. Compared to the previous works, PerfNetV2 delivers more accurate results by modeling detailed host-accelerator interactions in executing the full neural networks and improving the architecture of the machine learning model used in the predictor. Our case studies show that PerfNetV2 yields a mean absolute percentage error within 13.1% on LeNet, AlexNet, and VGG16 on NVIDIA GTX-1080Ti, while the error rate on a previous work published in ICBD 2018 could be as large as 200%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源