有效的Stein变异推断，用于可靠的无分配网络修剪

论文标题

有效的Stein变异推断，用于可靠的无分配网络修剪

Efficient Stein Variational Inference for Reliable Distribution-lossless Network Pruning

论文作者

Wang, Yingchun, Guo, Song, Guo, Jingcai, Zhang, Weizhan, Xu, Yida, Zhang, Jie, Liu, Yi

论文摘要

网络修剪是生成光线但准确模型并在资源有限的边缘设备上部署的一种有希望的方法。但是，当前的最新面临假定给定网络中的有效子网络和其他多余参数共享相同的分布，在这些分布中，修剪不可避免地涉及分布截断操作。它们通常消除接近零的值。虽然简单，但它可能不是最合适的方法，因为有效的模型自然可能具有与之相关的许多小值。删除已经嵌入模型空间中的接近零值可能会显着降低模型的准确性。另一项工作建议在所有仍然依赖于人类制作的先前假设的可能子结构的子结构中分配了离散的先验。更糟糕的是，现有方法使用正规点估计值，即无法提供错误估计和修剪网络的可靠性理由。在本文中，我们提出了一种名为DLLP的新型无分布修剪法，从理论上讲在贝叶斯治疗中找到修剪的彩票。具体而言，DLLP将香草网络重塑为潜在修剪模型和其他冗余的离散先验。更重要的是，DLLP使用Stein变异推断来接近潜在的先验，并有效地绕过以未知分布的计算KL差异。基于小型CIFAR-10和大型成像网的广泛实验表明，我们的方法可以获得具有出色概括性能的稀疏网络，同时为修剪模型提供了量化的可靠性。

Network pruning is a promising way to generate light but accurate models and enable their deployment on resource-limited edge devices. However, the current state-of-the-art assumes that the effective sub-network and the other superfluous parameters in the given network share the same distribution, where pruning inevitably involves a distribution truncation operation. They usually eliminate values near zero. While simple, it may not be the most appropriate method, as effective models may naturally have many small values associated with them. Removing near-zero values already embedded in model space may significantly reduce model accuracy. Another line of work has proposed to assign discrete prior over all possible sub-structures that still rely on human-crafted prior hypotheses. Worse still, existing methods use regularized point estimates, namely Hard Pruning, that can not provide error estimations and fail reliability justification for the pruned networks. In this paper, we propose a novel distribution-lossless pruning method, named DLLP, to theoretically find the pruned lottery within Bayesian treatment. Specifically, DLLP remodels the vanilla networks as discrete priors for the latent pruned model and the other redundancy. More importantly, DLLP uses Stein Variational Inference to approach the latent prior and effectively bypasses calculating KL divergence with unknown distribution. Extensive experiments based on small Cifar-10 and large-scaled ImageNet demonstrate that our method can obtain sparser networks with great generalization performance while providing quantified reliability for the pruned model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题