通过深度加固学习自动化DBSCAN

论文标题

通过深度加固学习自动化DBSCAN

Automating DBSCAN via Deep Reinforcement Learning

论文作者

Zhang, Ruitong, Peng, Hao, Dou, Yingtong, Wu, Jia, Sun, Qingyun, Zhang, Jingyi, Yu, Philip S.

论文摘要

DBSCAN由于其简单性和实用性而被广泛用于许多科学和工程领域。但是，由于其高灵敏度参数，聚类结果的准确性在很大程度上取决于实践经验。在本文中，我们首先提出了一种新颖的深入增强学习指导的自动DBSCAN参数搜索框架，即DRL-DBSCAN。该框架通过将聚类环境视为马尔可夫决策过程来模拟调整参数搜索方向的过程，该过程旨在在没有手动帮助的情况下找到最佳的聚类参数。 DRL-DBSCCAN使用弱监督的奖励培训策略网络，通过与簇进行交互来了解不同特征分布的最佳聚类参数搜索策略。此外，我们还提出了一个由数据规模驱动的递归搜索机制，以有效且可控制地处理大参数空间。基于拟议的四种工作模式，在五个人工和现实数据集中进行了广泛的实验。离线和在线任务的结果表明，DRL-DBSCCUN不仅始终将DBSCAN聚类精度提高了26％和25％，而且可以稳定地找到具有较高计算效率的主要参数。该代码可在https://github.com/ringbdstack/drl-dbscan上找到。

DBSCAN is widely used in many scientific and engineering fields because of its simplicity and practicality. However, due to its high sensitivity parameters, the accuracy of the clustering result depends heavily on practical experience. In this paper, we first propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process, which aims to find the best clustering parameters without manual assistance. DRL-DBSCAN learns the optimal clustering parameter search policy for different feature distributions via interacting with the clusters, using a weakly-supervised reward training policy network. In addition, we also present a recursive search mechanism driven by the scale of the data to efficiently and controllably process large parameter spaces. Extensive experiments are conducted on five artificial and real-world datasets based on the proposed four working modes. The results of offline and online tasks show that the DRL-DBSCAN not only consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively, but also can stably find the dominant parameters with high computational efficiency. The code is available at https://github.com/RingBDStack/DRL-DBSCAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题