论文标题

树指数:一种新的集群评估技术

Tree Index: A New Cluster Evaluation Technique

论文作者

Beg, A. H., Islam, Md Zahidul, Estivill-Castro, Vladimir

论文摘要

我们介绍了一种称为树索引的集群评估技术。我们的树索引算法旨在描述聚类的结构信息,而不是集群质量索引的定量格式(其中聚类的表示能力是类似于向量量化的累积误差)。我们的树指数正在找到群集中的边缘,以便于学习,而没有最小描述长度的并发症。我们的树索引使用群集标识符作为标签从聚类的数据集中产生决策树。它结合了每个叶子的熵与它们的深度。直觉上,较短的树纯叶子可以很好地推广数据(群集很容易学习,因为它们分开了)。因此,标签是有意义的集群。如果聚类算法不能很好地分开,则从其结果中学到的树木将很大且过于详细。我们表明,在脑数据集上的聚类结果(通过各种技术获得)上,树索引区分了合理和不感知的簇。我们通过图形可视化确认树指数的有效性。树指数评估明智的解决方案高于非敏感解决方案,而现有的集群质量索引无法做到。

We introduce a cluster evaluation technique called Tree Index. Our Tree Index algorithm aims at describing the structural information of the clustering rather than the quantitative format of cluster-quality indexes (where the representation power of clustering is some cumulative error similar to vector quantization). Our Tree Index is finding margins amongst clusters for easy learning without the complications of Minimum Description Length. Our Tree Index produces a decision tree from the clustered data set, using the cluster identifiers as labels. It combines the entropy of each leaf with their depth. Intuitively, a shorter tree with pure leaves generalizes the data well (the clusters are easy to learn because they are well separated). So, the labels are meaningful clusters. If the clustering algorithm does not separate well, trees learned from their results will be large and too detailed. We show that, on the clustering results (obtained by various techniques) on a brain dataset, Tree Index discriminates between reasonable and non-sensible clusters. We confirm the effectiveness of Tree Index through graphical visualizations. Tree Index evaluates the sensible solutions higher than the non-sensible solutions while existing cluster-quality indexes fail to do so.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源