论文标题
上下文加强神经主题建模
Context Reinforced Neural Topic Modeling over Short Texts
论文作者
论文摘要
作为普遍的主题采矿工具之一,神经主题建模对高效率的培训和强大概括能力的优势吸引了很多兴趣。但是,由于每个短文中缺乏上下文,现有的神经主题模型可能会在此类文档上遭受特征稀疏性。为了减轻此问题,我们提出了一个上下文加强神经主题模型(CRNTM),其特征可以总结如下。首先,通过假设每个简短的文本仅涵盖了几个显着的主题,crntm就会在狭窄范围内为每个单词的主题注入该主题。其次,我们的模型通过将主题视为嵌入式空间中的多变量高斯分布或高斯混合物分布来利用预训练的单词嵌入。在两个基准数据集上进行的广泛实验验证了拟议模型对主题发现和文本分类的有效性。
As one of the prevalent topic mining tools, neural topic modeling has attracted a lot of interests for the advantages of high efficiency in training and strong generalisation abilities. However, due to the lack of context in each short text, the existing neural topic models may suffer from feature sparsity on such documents. To alleviate this issue, we propose a Context Reinforced Neural Topic Model (CRNTM), whose characteristics can be summarized as follows. Firstly, by assuming that each short text covers only a few salient topics, CRNTM infers the topic for each word in a narrow range. Secondly, our model exploits pre-trained word embeddings by treating topics as multivariate Gaussian distributions or Gaussian mixture distributions in the embedding space. Extensive experiments on two benchmark datasets validate the effectiveness of the proposed model on both topic discovery and text classification.