基于对抗性学习和功能融合的技术视频的跨模式搜索方法

论文标题

基于对抗性学习和功能融合的技术视频的跨模式搜索方法

Cross-modal Search Method of Technology Video based on Adversarial Learning and Feature Fusion

论文作者

Liu, Xiangbin, Du, Junping, Liang, Meiyu, Li, Ang

论文摘要

技术视频包含丰富的多模式信息。在跨模式信息搜索中，无法直接比较不同模式的数据特征，因此不同模态之间的语义差距是需要解决的关键问题。为了解决上述问题，本文提出了一种基于特征融合的新型对抗性跨模式检索方法（FFACR），以实现文本到视频匹配，排名和搜索。所提出的方法使用对抗性学习的框架来构建视频多模式融合网络，而特征映射网络则作为生成器，一种模态歧视网络作为歧视器。视频的多模式特征是通过功能融合网络获得的。功能映射网络基于语义和相似性将多模式特征投放到相同的语义空间中。模态歧视网络负责确定特征的原始方式。基于对抗性学习，对生成器和判别器进行了交替训练，因此功能映射网络获得的数据在语义上与原始数据一致，并消除了模态特征，最后使用相似性来对语义空间进行排名和获得搜索结果。实验结果表明，所提出的方法在文本到视频搜索中的性能要比其他现有方法更好，并验证该方法在技术视频的自行构建数据集中的有效性。

Technology videos contain rich multi-modal information. In cross-modal information search, the data features of different modalities cannot be compared directly, so the semantic gap between different modalities is a key problem that needs to be solved. To address the above problems, this paper proposes a novel Feature Fusion based Adversarial Cross-modal Retrieval method (FFACR) to achieve text-to-video matching, ranking and searching. The proposed method uses the framework of adversarial learning to construct a video multimodal feature fusion network and a feature mapping network as generator, a modality discrimination network as discriminator. Multi-modal features of videos are obtained by the feature fusion network. The feature mapping network projects multi-modal features into the same semantic space based on semantics and similarity. The modality discrimination network is responsible for determining the original modality of features. Generator and discriminator are trained alternately based on adversarial learning, so that the data obtained by the feature mapping network is semantically consistent with the original data and the modal features are eliminated, and finally the similarity is used to rank and obtain the search results in the semantic space. Experimental results demonstrate that the proposed method performs better in text-to-video search than other existing methods, and validate the effectiveness of the method on the self-built datasets of technology videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题