论文标题
与本地模型的预测K均值
Predictive K-means with local models
论文作者
论文摘要
监督分类可以有效预测,但有时对可解释性或解释性(XAI)有效。另一方面,聚类倾向于隔离可能有意义的类别或配置文件,但不能保证它们对标签预测有用。预测聚类试图获得两个世界中的最佳状态。从标记的数据开始,它寻找有关类标签尽可能纯净的群集。一种技术在于调整聚类算法,以便共享相同标签的数据点倾向于聚集在一起。使用基于距离的算法(例如K-均值),解决方案是修改算法使用的距离,以便将有关数据点标签的信息结合在一起。在本文中,我们提出了另一种依赖于以类密度为指导的表示的变化,然后在这个新表示空间中进行聚类。我们使用此技术介绍了两种新算法,并在各种数据集上展示了它们具有纯监督分类器的预测性能竞争性,同时提供了发现的群集的解释性。
Supervised classification can be effective for prediction but sometimes weak on interpretability or explainability (XAI). Clustering, on the other hand, tends to isolate categories or profiles that can be meaningful but there is no guarantee that they are useful for labels prediction. Predictive clustering seeks to obtain the best of the two worlds. Starting from labeled data, it looks for clusters that are as pure as possible with regards to the class labels. One technique consists in tweaking a clustering algorithm so that data points sharing the same label tend to aggregate together. With distance-based algorithms, such as k-means, a solution is to modify the distance used by the algorithm so that it incorporates information about the labels of the data points. In this paper, we propose another method which relies on a change of representation guided by class densities and then carries out clustering in this new representation space. We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance with pure supervised classifiers while offering interpretability of the clusters discovered.