神经主题建模，持续终身学习

论文标题

神经主题建模，持续终身学习

Neural Topic Modeling with Continual Lifelong Learning

论文作者

Gupta, Pankaj, Chaudhary, Yatin, Runkler, Thomas, Schütze, Hinrich

论文摘要

终身学习最近引起了建立机器学习系统的关注，这些机器学习系统不断积累和转移知识以帮助未来的学习。无监督的主题建模被普遍用于发现文档收集中的主题。但是，由于数据稀疏性，例如在一小部分（简短）文档中，因此主题建模的应用是具有挑战性的，因此产生了不连贯的主题和次优的文档表示形式。为了解决这个问题，我们为神经主题建模提出了一个终身学习框架，该框架可以连续处理文档收集的流，积累主题，并通过知识转移从多个来源转移以更好地处理稀疏数据来指导未来的主题建模任务。在终生过程中，我们特别进行了共同的研究：（1）在终生中共享生成同源性（潜在主题）以转移先验知识，并且（2）最大程度地减少灾难性忘记忘记通过新颖的选择性数据增强，共同培训和主题正则化方法来保留过去的学习。给定一系列文档收集，我们将提议的终身神经主题建模（LNTM）框架应用于对三个稀疏文档收集建模作为将来任务的建模，并证明了通过困惑，主题相干性和信息检索任务量化的改进性能。

Lifelong learning has recently attracted attention in building machine learning systems that continually accumulate and transfer knowledge to help future learning. Unsupervised topic modeling has been popularly used to discover topics from document collections. However, the application of topic modeling is challenging due to data sparsity, e.g., in a small collection of (short) documents and thus, generate incoherent topics and sub-optimal document representations. To address the problem, we propose a lifelong learning framework for neural topic modeling that can continuously process streams of document collections, accumulate topics and guide future topic modeling tasks by knowledge transfer from several sources to better deal with the sparse data. In the lifelong process, we particularly investigate jointly: (1) sharing generative homologies (latent topics) over lifetime to transfer prior knowledge, and (2) minimizing catastrophic forgetting to retain the past learning via novel selective data augmentation, co-training and topic regularization approaches. Given a stream of document collections, we apply the proposed Lifelong Neural Topic Modeling (LNTM) framework in modeling three sparse document collections as future tasks and demonstrate improved performance quantified by perplexity, topic coherence and information retrieval task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题