论文标题
斯巴达人:通过正规运输的可微分稀疏性
Spartan: Differentiable Sparsity via Regularized Transportation
论文作者
论文摘要
我们提出Spartan,这是一种训练具有预定水平稀疏度的稀疏神经网络模型的方法。 Spartan基于两种技术的组合:(1)通过正规的最佳运输问题和(2)在正向通行中使用硬性稀疏的基于双重平均的参数更新。该方案实现了一个探索 - 开发的权衡:在训练的早期,学习者能够探索各种稀疏模式,并且随着软上k的近似值在训练过程中逐渐锐化,因此平衡方向相对于固定稀疏面膜,平衡向参数优化。 Spartan足够灵活,可以适应各种稀疏分配策略,包括非结构化和块结构化的稀疏性,以及由每参数成本的线性模型介导的一般成本敏感的稀疏分配。在ImagEnet-1K分类上,Spartan产生95%的稀疏Resnet-50型号和90%块稀疏的VIT-B/16型号,而与完全密集的训练相比,绝对TOP-1精度损失小于1%。
We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top-k masking of low-magnitude parameters via a regularized optimal transportation problem and (2) dual averaging-based parameter updates with hard sparsification in the forward pass. This scheme realizes an exploration-exploitation tradeoff: early in training, the learner is able to explore various sparsity patterns, and as the soft top-k approximation is gradually sharpened over the course of training, the balance shifts towards parameter optimization with respect to a fixed sparsity mask. Spartan is sufficiently flexible to accommodate a variety of sparsity allocation policies, including both unstructured and block structured sparsity, as well as general cost-sensitive sparsity allocation mediated by linear models of per-parameter costs. On ImageNet-1K classification, Spartan yields 95% sparse ResNet-50 models and 90% block sparse ViT-B/16 models while incurring absolute top-1 accuracy losses of less than 1% compared to fully dense training.