论文标题

在嘈杂环境中学习多模式表示的广义专家

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

论文作者

Joshi, Abhinav, Gupta, Naman, Shah, Jinang, Bhattarai, Binod, Modi, Ashutosh, Stoyanov, Danail

论文摘要

实际应用程序或设置涉及不同模式之间的互动(例如,视频,语音,文本)。为了自动处理多模式信息并将其用于最终应用程序,近期多模式表示学习(MRL)已成为一个活跃的研究领域。 MRL涉及从异质来源学习可靠和强大的信息表示并融合它们。但是,实际上,从不同来源获取的数据通常嘈杂。在某些极端情况下,巨大的噪声可以完全改变数据的语义,从而导致并行多模式数据中存在不一致。在本文中,我们提出了一种通过专家技术的广义产品在嘈杂的环境中学习多模式表示学习的新方法。在提出的方法中,我们训练一个单独的网络,以评估来自该模式的信息的可信度,随后,在估计关节分布的同时,每种方式的贡献都动态变化。我们评估了来自两个不同领域的两个具有挑战性的基准测试:多模式3D手姿势估计和多模式外科视频分割。我们在两个基准测试中都达到了最先进的性能。我们广泛的定量和定性评估表明,与以前的方法相比,我们方法的优势。

A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. MRL involves learning reliable and robust representations of information from heterogeneous sources and fusing them. However, in practice, the data acquired from different sources are typically noisy. In some extreme cases, a noise of large magnitude can completely alter the semantics of the data leading to inconsistencies in the parallel multimodal data. In this paper, we propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality, and subsequently, the contribution from each modality is dynamically varied while estimating the joint distribution. We evaluate our method on two challenging benchmarks from two diverse domains: multimodal 3D hand-pose estimation and multimodal surgical video segmentation. We attain state-of-the-art performance on both benchmarks. Our extensive quantitative and qualitative evaluations show the advantages of our method compared to previous approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源