高维数据分析的一致表示学习

论文标题

高维数据分析的一致表示学习

Consistent Representation Learning for High Dimensional Data Analysis

论文作者

Li, Stan Z., Wu, Lirong, Zang, Zelin

论文摘要

探索和发现的高维数据分析包括三个基本任务：降低维度，聚类和可视化。当三个相关的任务分别完成时，与迄今为止的情况一样，在数据几何形状等方面可能会发生不一致的情况。这可能导致令人困惑或误导数据解释。在本文中，我们提出了一种基于神经网络的新方法，称为一致表示学习（CRL），以完成端到端的三个相关任务并改善一致性。 CRL网络由两个非线性维度降低（NLDR）转换组成：（1）一个从输入数据空间到聚类的潜在特征空间，另一个从聚类空间到最终的2D或3D空间进行可视化。重要的是，执行两个NLDR转换是为了满足空间或网络层之间的局部几何形状（LGP）约束，以改善数据一致性以及处理流程。另外，我们提出了一种新型的度量，聚类 - 可视化不一致（CVI），以评估不一致之处。广泛的比较结果表明，在评估指标和可视化方面，提出的CRL神经网络方法优于流行的T-SNE和基于UMAP和其他当代聚类和可视化算法的流行的T-SNE和其他当代聚类和可视化算法。

High dimensional data analysis for exploration and discovery includes three fundamental tasks: dimensionality reduction, clustering, and visualization. When the three associated tasks are done separately, as is often the case thus far, inconsistencies can occur among the tasks in terms of data geometry and others. This can lead to confusing or misleading data interpretation. In this paper, we propose a novel neural network-based method, called Consistent Representation Learning (CRL), to accomplish the three associated tasks end-to-end and improve the consistencies. The CRL network consists of two nonlinear dimensionality reduction (NLDR) transformations: (1) one from the input data space to the latent feature space for clustering, and (2) the other from the clustering space to the final 2D or 3D space for visualization. Importantly, the two NLDR transformations are performed to best satisfy local geometry preserving (LGP) constraints across the spaces or network layers, to improve data consistencies along with the processing flow. Also, we propose a novel metric, clustering-visualization inconsistency (CVI), for evaluating the inconsistencies. Extensive comparative results show that the proposed CRL neural network method outperforms the popular t-SNE and UMAP-based and other contemporary clustering and visualization algorithms in terms of evaluation metrics and visualization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题