在对比度音频检索的负面抽样中

论文标题

在对比度音频检索的负面抽样中

On Negative Sampling for Contrastive Audio-Text Retrieval

论文作者

Xie, Huang, Räsänen, Okko, Virtanen, Tuomas

论文摘要

本文在音频文本检索的背景下研究了对比度学习的负抽样。负抽样的策略是指从候选人库中选择负面音频对接对的否定词（音频剪辑或文本描述）。我们通过模型估计的模型内部和跨模式相关性得分来探索采样策略。通过[1]的检索系统持续训练设置，我们研究了八种采样策略，包括硬和半硬采样。实验结果表明，在不同策略之间，检索性能变化很大。特别是，通过选择具有交叉模式得分的半硬质量，检索系统在文本到原告和音频检索中都提高了性能。此外，我们表明特征崩溃是在对具有交叉模式得分的硬质底片进行采样时发生的。

This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text samples. With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling. Experimental results show that retrieval performance varies dramatically among different strategies. Particularly, by selecting semi-hard negatives with cross-modality scores, the retrieval system gains improved performance in both text-to-audio and audio-to-text retrieval. Besides, we show that feature collapse occurs while sampling hard negatives with cross-modality scores.

下载PDF全文

下载文献需遵守相关版权规定

论文标题