听到：音频表示的整体评估

论文标题

听到：音频表示的整体评估

HEAR: Holistic Evaluation of Audio Representations

论文作者

Turian, Joseph, Shier, Jordie, Khan, Humair Raj, Raj, Bhiksha, Schuller, Björn W., Steinmetz, Christian J., Malloy, Colin, Tzanetakis, George, Velarde, Gissel, McNally, Kirk, Henry, Max, Pinto, Nicolas, Noufi, Camille, Clough, Christian, Herremans, Dorien, Fonseca, Eduardo, Engel, Jesse, Salamon, Justin, Esling, Philippe, Manocha, Pranay, Watanabe, Shinji, Jin, Zeyu, Bisk, Yonatan

论文摘要

哪种音频嵌入方法最能推广到无需微调的各种日常域中的各种下游任务？ HEAL基准的目的是开发通用音频表示，为在各种任务和场景中学习提供了有力的基础。 HEAR使用基准套件在各种领域（包括语音，环境声音和音乐）中使用基准套件进行评估。 HER是作为2021年神经共享挑战而推出的。本着共享交流的精神，每个参与者都按照通用，开源和自由使用的通用API提交了音频嵌入模型。评估了来自十六个数据集的19个多样化的下游任务，对13个外部团队的29个模型进行了评估。开放评估代码，提交的模型和数据集是关键贡献，实现了全面和可重复的评估，以及以前不可能的纵向研究。仍然是一个悬而未决的问题，是否可以像人耳一样整体上表现出一个通用音频表示形式。

What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear.

下载PDF全文

下载文献需遵守相关版权规定

论文标题