论文标题
稀疏控制三元重量网络
Sparsity-Control Ternary Weight Networks
论文作者
论文摘要
深度神经网络(DNN)已广泛应用于各种应用,但它们需要大量的内存和计算能力。这严重限制了他们在资源有限的设备上的部署。为了解决这个问题,已经为培训低位体重DNN做出了许多努力。在本文中,我们专注于训练三元重量\ { - 1,0,+1 \}网络,该网络可以避免乘法并大大减少内存和计算要求。三元重量网络可以通过用0替换二进制重量的一些-1或1s来视为二进制重量对应的稀疏版本,从而导致更有效的推断,但内存成本更高。但是,现有的训练三元重量网络的方法无法控制三元重量的稀疏性(即0s的百分比),这破坏了三元重量的优势。在本文中,我们提出了最大的知识,以训练三元重量网络的第一种稀疏控制方法(SCA),这仅是通过权重离散化正规器(WDR)实现的。 SCA与所有基于正规器的方法不同,因为它可以通过控制器$α$控制三元重量的稀疏性,并且不依赖梯度估计器。从理论上讲,我们从理论上和经验上表明,受过训练的三元重量的稀疏性与$α$呈正相关。 SCA非常简单,易于实现,并且显示出在几个基准数据集上的最先进方法始终超过最先进的方法,甚至与全精度重量对应物的性能匹配。
Deep neural networks (DNNs) have been widely and successfully applied to various applications, but they require large amounts of memory and computational power. This severely restricts their deployment on resource-limited devices. To address this issue, many efforts have been made on training low-bit weight DNNs. In this paper, we focus on training ternary weight \{-1, 0, +1\} networks which can avoid multiplications and dramatically reduce the memory and computation requirements. A ternary weight network can be considered as a sparser version of the binary weight counterpart by replacing some -1s or 1s in the binary weights with 0s, thus leading to more efficient inference but more memory cost. However, the existing approaches to training ternary weight networks cannot control the sparsity (i.e., percentage of 0s) of the ternary weights, which undermines the advantage of ternary weights. In this paper, we propose to our best knowledge the first sparsity-control approach (SCA) to training ternary weight networks, which is simply achieved by a weight discretization regularizer (WDR). SCA is different from all the existing regularizer-based approaches in that it can control the sparsity of the ternary weights through a controller $α$ and does not rely on gradient estimators. We theoretically and empirically show that the sparsity of the trained ternary weights is positively related to $α$. SCA is extremely simple, easy-to-implement, and is shown to consistently outperform the state-of-the-art approaches significantly over several benchmark datasets and even matches the performances of the full-precision weight counterparts.