论文标题

(联合国)可解释嵌入的可能性培训

(Un)likelihood Training for Interpretable Embedding

论文作者

Wu, Jiaxin, Ngo, Chong-Wah, Chan, Wing-Kwong, Hou, Zhijian

论文摘要

跨模式表示学习已成为弥合文本和视觉数据之间语义差距的新常态。但是,在连续的潜在空间中学习模态不可知表通常被视为黑盒数据驱动的训练过程。众所周知,表示学习的有效性在很大程度上取决于培训数据的质量和规模。对于视频表示学习,拥有一套完整的标签来注释全面的视频内容进行培训,即使不是不可能,也很难。这些问题,即黑盒培训和数据集偏见,由于无法解释且无法预测的结果,代表学习实际上具有挑战性,以进行视频理解。在本文中,我们提出了两个新颖的培训目标,即可能性和不可能的功能,以在嵌入背后的语义上展开语义,同时解决训练中的标签稀疏问题。可能性训练旨在解释培训标签以外的嵌入语义,而不可能的培训利用了正规化的先验知识,以确保语义上的一致解释。通过这两个培训目标,提出了一个新的编码器码头网络,该网络将学习可解释的跨模式表示形式,用于临时视频搜索。关于Trecvid和MSR-VTT数据集的广泛实验表明,该网络的表现优于几个最新的检索模型,具有统计学意义的性能余量。

Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data. Learning modality agnostic representations in a continuous latent space, however, is often treated as a black-box data-driven training process. It is well-known that the effectiveness of representation learning depends heavily on the quality and scale of training data. For video representation learning, having a complete set of labels that annotate the full spectrum of video content for training is highly difficult if not impossible. These issues, black-box training and dataset bias, make representation learning practically challenging to be deployed for video understanding due to unexplainable and unpredictable results. In this paper, we propose two novel training objectives, likelihood and unlikelihood functions, to unroll semantics behind embeddings while addressing the label sparsity problem in training. The likelihood training aims to interpret semantics of embeddings beyond training labels, while the unlikelihood training leverages prior knowledge for regularization to ensure semantically coherent interpretation. With both training objectives, a new encoder-decoder network, which learns interpretable cross-modal representation, is proposed for ad-hoc video search. Extensive experiments on TRECVid and MSR-VTT datasets show the proposed network outperforms several state-of-the-art retrieval models with a statistically significant performance margin.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源