三角辍学：可变网络宽度而无需再培训

论文标题

三角辍学：可变网络宽度而无需再培训

Triangular Dropout: Variable Network Width without Retraining

论文作者

Staley, Edward W., Markowitz, Jared

论文摘要

神经网络中最基本的设计选择之一是层宽度：它影响网络可以学习的能力并确定解决方案的复杂性。在引入信息瓶颈时，通常会利用后一种属性，迫使网络学习压缩表示形式。但是，一旦开始培训，这种建筑决策通常是不可变的。切换到更加压缩的体系结构需要重新训练。在本文中，我们提出了一种新的层设计，称为三角形辍学，它没有这种限制。训练后，可以任意减少宽度以换取狭窄的性能。我们证明了在三个领域的这种机制的构建和潜在用例。首先，我们描述了自动编码器中三角辍学的配方，在训练后创建具有可选压缩的模型。其次，我们在ImageNet上的VGG19上添加了三角辍学，创建了一个强大的网络，该网络在不重新培训的情况下可以大大减少参数。最后，我们探讨了三角辍学在加固学习（RL）政策上的应用。

One of the most fundamental design choices in neural networks is layer width: it affects the capacity of what a network can learn and determines the complexity of the solution. This latter property is often exploited when introducing information bottlenecks, forcing a network to learn compressed representations. However, such an architecture decision is typically immutable once training begins; switching to a more compressed architecture requires retraining. In this paper we present a new layer design, called Triangular Dropout, which does not have this limitation. After training, the layer can be arbitrarily reduced in width to exchange performance for narrowness. We demonstrate the construction and potential use cases of such a mechanism in three areas. Firstly, we describe the formulation of Triangular Dropout in autoencoders, creating models with selectable compression after training. Secondly, we add Triangular Dropout to VGG19 on ImageNet, creating a powerful network which, without retraining, can be significantly reduced in parameters. Lastly, we explore the application of Triangular Dropout to reinforcement learning (RL) policies on selected control problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题