神经专心电路

论文标题

神经专心电路

Neural Attentive Circuits

论文作者

Rahaman, Nasim, Weiss, Martin, Locatello, Francesco, Pal, Chris, Bengio, Yoshua, Schölkopf, Bernhard, Li, Li Erran, Ballas, Nicolas

论文摘要

最近的工作已经看到了可以培训的通用神经体系结构的发展，以跨不同的数据方式执行任务。通用模型通常对基础数据结构做出很少的假设，并且已知在大数据方面表现良好。同时，对使用稀疏相互作用模块代表数据的模块化神经体系结构的兴趣越来越大。这些模型可以更强大的分布，计算上有效，并且能够对新数据有效的样品适应。但是，它们倾向于对数据进行特定于领域的假设，并在模块行为（即参数化）和连接性（即它们的布局）如何共同学习中提出挑战。在这项工作中，我们介绍了一种称为神经关注电路（NACS）的通用神经结构，该神经结构共同学习了参数化和神经模块的稀疏连接，而无需使用域知识。最好将NACs理解为两个由端到端共同训练的系统的组合：一个决定模块配置，而另一个在输入上执行的系统。我们定性地证明了NACS在没有其他监督的情况下在NLVR2数据集上学习了多样的有意义的模块配置。定量地，我们表明，通过以这种方式结合模块化，NACS在CIFAR和CIFAS数据集的低射击适应性方面改善了强大的非模块化基线，并提高了约10％，而在Tiny Imagenet-R上的鲁棒性则稳健性约为2.5％。此外，我们发现NACS可以在推理时间达到8倍的加速，而性能却低于3％。最后，我们发现NACS以跨越点云分类，符号处理和ASCII字节的文本分类为各种数据模式产生竞争结果，从而确认其通用性质。

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题