论文标题
通过张开的自动编码器改进表示形式学习
Improved Representation Learning Through Tensorized Autoencoders
论文作者
论文摘要
表示学习的主要问题是构成良好或有意义的代表。在这项工作中,我们认为,如果我们考虑具有固有群集结构的数据,可以通过不同的方式和协方差来表征群集,则这些数据结构也应在嵌入中表示。尽管自动编码器(AE)在实践中被广泛用于无监督的表示学习,但由于获得数据的单个表示,它们在嵌入式上没有满足上述条件。为了克服这一点,我们提出了一个元叠加,可用于将任意AE架构扩展到张力版本(TAE),该构建允许学习特定于集群的嵌入,同时学习群集分配。对于线性设置,我们证明TAE可以恢复不同簇的原理成分,而不是标准AE恢复的整个数据的原理组件。我们在种植的模型上验证了这一点,对于一般,非线性和卷积AES,我们从经验上说明,对AE进行张力有益于聚类和推迟任务。
The central question in representation learning is what constitutes a good or meaningful representation. In this work we argue that if we consider data with inherent cluster structures, where clusters can be characterized through different means and covariances, those data structures should be represented in the embedding as well. While Autoencoders (AE) are widely used in practice for unsupervised representation learning, they do not fulfil the above condition on the embedding as they obtain a single representation of the data. To overcome this we propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE) that allows for learning cluster-specific embeddings while simultaneously learning the cluster assignment. For the linear setting we prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE. We validated this on planted models and for general, non-linear and convolutional AEs we empirically illustrate that tensorizing the AE is beneficial in clustering and de-noising tasks.