论文标题
分层文本分类的有效策略:外部知识和辅助任务
Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks
论文作者
论文摘要
在层次文本分类中,我们执行一系列推理步骤,以预测给定类分类学的上下文档的类别。大多数研究都集中在开发小说神经网络体系结构来处理层次结构上,但我们宁愿寻找有效的方法来增强基线模型。我们首先将任务定义为序列到序列问题。之后,我们提出了自下而上分类的辅助合成任务。然后,从外部字典中,我们检索了所有层次结构层的类的文本定义,并将其映射到vector Space一词中。我们使用类定义嵌入作为附加输入,以调节下一层的预测和适应的梁搜索。尽管修改后的搜索没有提供巨大的收益,但辅助任务的组合和类定义的额外输入可显着提高分类精度。通过我们有效的方法,我们在两个众所周知的英语数据集中使用参数数量大幅度减少了以前的研究。
In hierarchical text classification, we perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. Most of the studies have focused on developing novels neural network architectures to deal with the hierarchical structure, but we prefer to look for efficient ways to strengthen a baseline model. We first define the task as a sequence-to-sequence problem. Afterwards, we propose an auxiliary synthetic task of bottom-up-classification. Then, from external dictionaries, we retrieve textual definitions for the classes of all the hierarchy's layers, and map them into the word vector space. We use the class-definition embeddings as an additional input to condition the prediction of the next layer and in an adapted beam search. Whereas the modified search did not provide large gains, the combination of the auxiliary task and the additional input of class-definitions significantly enhance the classification accuracy. With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.