论文标题

半监督学习的拓扑方法

A Topological Approach for Semi-Supervised Learning

论文作者

Inés, Adrián, Domínguez, César, Heras, Jónathan, Mata, Gadea, Rubio, Julio

论文摘要

如今,机器学习和深度学习方法已成为解决数据分类任务的最新方法。为了使用这些方法,有必要获取和标记大量数据;但是,这在某些领域并不直接,因为数据注释很耗时,并且可能需要专家知识。可以通过半监督学习方法来解决这一挑战,这些学习方法利用了标记和未标记数据的优势。在这项工作中,我们根据拓扑数据分析(TDA)的技术提出了新的半监督学习方法,该领域对于分析具有较高多样性和维度的大量数据非常重要。特别是,我们遵循两种不同的拓扑方法创建了两种半监督学习方法。在前者中,我们使用了一种同源方法,该方法在于使用瓶颈和瓦斯坦距离研究与数据相关的持久图。在后者中,我们考虑了数据的连接性。此外,我们使用3个合成数据集,5个结构化数据集和2个图像数据集对开发方法进行了彻底的分析。结果表明,在本工作中开发的半监督方法优于仅使用手动标记数据训练的模型获得的结果,以及使用经典半监督学习方法获得的结果,最高可提高16%。

Nowadays, Machine Learning and Deep Learning methods have become the state-of-the-art approach to solve data classification tasks. In order to use those methods, it is necessary to acquire and label a considerable amount of data; however, this is not straightforward in some fields, since data annotation is time consuming and might require expert knowledge. This challenge can be tackled by means of semi-supervised learning methods that take advantage of both labelled and unlabelled data. In this work, we present new semi-supervised learning methods based on techniques from Topological Data Analysis (TDA), a field that is gaining importance for analysing large amounts of data with high variety and dimensionality. In particular, we have created two semi-supervised learning methods following two different topological approaches. In the former, we have used a homological approach that consists in studying the persistence diagrams associated with the data using the Bottleneck and Wasserstein distances. In the latter, we have taken into account the connectivity of the data. In addition, we have carried out a thorough analysis of the developed methods using 3 synthetic datasets, 5 structured datasets, and 2 datasets of images. The results show that the semi-supervised methods developed in this work outperform both the results obtained with models trained with only manually labelled data, and those obtained with classical semi-supervised learning methods, reaching improvements of up to a 16%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源