斯巴达人：通过正规运输的可微分稀疏性

论文标题

斯巴达人：通过正规运输的可微分稀疏性

Spartan: Differentiable Sparsity via Regularized Transportation

论文作者

Tai, Kai Sheng, Tian, Taipeng, Lim, Ser-Nam

论文摘要

我们提出Spartan，这是一种训练具有预定水平稀疏度的稀疏神经网络模型的方法。 Spartan基于两种技术的组合：（1）通过正规的最佳运输问题和（2）在正向通行中使用硬性稀疏的基于双重平均的参数更新。该方案实现了一个探索 - 开发的权衡：在训练的早期，学习者能够探索各种稀疏模式，并且随着软上k的近似值在训练过程中逐渐锐化，因此平衡方向相对于固定稀疏面膜，平衡向参数优化。 Spartan足够灵活，可以适应各种稀疏分配策略，包括非结构化和块结构化的稀疏性，以及由每参数成本的线性模型介导的一般成本敏感的稀疏分配。在ImagEnet-1K分类上，Spartan产生95％的稀疏Resnet-50型号和90％块稀疏的VIT-B/16型号，而与完全密集的训练相比，绝对TOP-1精度损失小于1％。

We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top-k masking of low-magnitude parameters via a regularized optimal transportation problem and (2) dual averaging-based parameter updates with hard sparsification in the forward pass. This scheme realizes an exploration-exploitation tradeoff: early in training, the learner is able to explore various sparsity patterns, and as the soft top-k approximation is gradually sharpened over the course of training, the balance shifts towards parameter optimization with respect to a fixed sparsity mask. Spartan is sufficiently flexible to accommodate a variety of sparsity allocation policies, including both unstructured and block structured sparsity, as well as general cost-sensitive sparsity allocation mediated by linear models of per-parameter costs. On ImageNet-1K classification, Spartan yields 95% sparse ResNet-50 models and 90% block sparse ViT-B/16 models while incurring absolute top-1 accuracy losses of less than 1% compared to fully dense training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题