论文标题
用于自动基因本体论描述生成的图形网络网络
Graph-in-Graph Network for Automatic Gene Ontology Description Generation
论文作者
论文摘要
基因本体论(GO)是能够在生物医学中实现计算任务的主要基因功能知识库。 GO的基本元素是一个术语,其中包括一组具有相同功能的基因。 GO的现有研究工作主要集中于预测基因术语关联。其他任务,例如生成新术语的描述,很少受到追求。在本文中,我们提出了一项新的任务:GO术语描述生成。该任务旨在自动生成一个描述属于这三个类别之一的GO术语功能的句子,即分子功能,生物过程和细胞分量。为了解决此任务,我们提出了一个可以有效利用GO结构信息的图形网络。提出的网络引入了两层图:第一层是GO术语的图形,每个节点也是图(基因图)。这样的图形网络可以得出GO项的生物学功能并生成适当的描述。为了验证拟议网络的有效性,我们构建了三个大规模基准数据集。通过合并所提出的图形网络,可以在所有评估指标中实质上增强七个不同序列模型的性能,分别在BLEU,ROUGE-L和METEOR中分别高达34.7%,14.5%和39.1%的相对改善。
Gene Ontology (GO) is the primary gene function knowledge base that enables computational tasks in biomedicine. The basic element of GO is a term, which includes a set of genes with the same function. Existing research efforts of GO mainly focus on predicting gene term associations. Other tasks, such as generating descriptions of new terms, are rarely pursued. In this paper, we propose a novel task: GO term description generation. This task aims to automatically generate a sentence that describes the function of a GO term belonging to one of the three categories, i.e., molecular function, biological process, and cellular component. To address this task, we propose a Graph-in-Graph network that can efficiently leverage the structural information of GO. The proposed network introduces a two-layer graph: the first layer is a graph of GO terms where each node is also a graph (gene graph). Such a Graph-in-Graph network can derive the biological functions of GO terms and generate proper descriptions. To validate the effectiveness of the proposed network, we build three large-scale benchmark datasets. By incorporating the proposed Graph-in-Graph network, the performances of seven different sequence-to-sequence models can be substantially boosted across all evaluation metrics, with up to 34.7%, 14.5%, and 39.1% relative improvements in BLEU, ROUGE-L, and METEOR, respectively.