论文标题

HGAN:图像文本检索的分层图对齐网络

HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval

论文作者

Guo, Jie, Wang, Meiting, Zhou, Yan, Song, Bin, Chi, Yuhao, Fan, Wei, Chang, Jianglong

论文摘要

由于不同模态之间的语义差距,图像文本检索(ITR)是多模式信息处理领域的一项艰巨任务。近年来,研究人员在探索图像和文本之间的准确对齐方面取得了长足进步。但是,现有作品主要集中于图像区域和句子片段之间的细粒度对齐,这忽略了上下文背景信息的指导意义。实际上,整合本地细粒度信息和全球上下文背景信息可以为检索提供更多的语义线索。在本文中,我们为图像文本检索提出了一个新型的层次图形对齐网络(HGAN)。首先,为了捕获综合的多模式特征,我们分别为图像和文本模式构造了特征图。然后,通过设计的多粒性特征聚合和重排(MFAR)模块建立了多粒性共享空间,该模块增强了本地和全局信息之间的语义相应关系,并为图像和文本方式获得了更准确的特征表示。最后,通过三级相似性函数进一步完善了最终图像和文本特征,以实现层次对齐。为了证明所提出的模型是合理的,我们对MS-Coco和Flickr30k数据集进行了广泛的实验。实验结果表明,所提出的HGAN优于两个数据集上的最新方法,这证明了我们模型的有效性和优势。

Image-text retrieval (ITR) is a challenging task in the field of multimodal information processing due to the semantic gap between different modalities. In recent years, researchers have made great progress in exploring the accurate alignment between image and text. However, existing works mainly focus on the fine-grained alignment between image regions and sentence fragments, which ignores the guiding significance of context background information. Actually, integrating the local fine-grained information and global context background information can provide more semantic clues for retrieval. In this paper, we propose a novel Hierarchical Graph Alignment Network (HGAN) for image-text retrieval. First, to capture the comprehensive multimodal features, we construct the feature graphs for the image and text modality respectively. Then, a multi-granularity shared space is established with a designed Multi-granularity Feature Aggregation and Rearrangement (MFAR) module, which enhances the semantic corresponding relations between the local and global information, and obtains more accurate feature representations for the image and text modalities. Finally, the ultimate image and text features are further refined through three-level similarity functions to achieve the hierarchical alignment. To justify the proposed model, we perform extensive experiments on MS-COCO and Flickr30K datasets. Experimental results show that the proposed HGAN outperforms the state-of-the-art methods on both datasets, which demonstrates the effectiveness and superiority of our model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源