自动索引库库的表示学学习

论文标题

自动索引库库的表示学学习

Representation Learning for the Automatic Indexing of Sound Effects Libraries

论文作者

Ma, Alison B., Lerch, Alexander

论文摘要

标记和维护商业声音效果库是一项耗时的任务，该任务加剧了数据库，这些数据库的大小不断增长并经历分类法更新。此外，不均匀的元数据使声音搜索和分类法创建变得复杂，即使引入了新的行业标准，即通用类别系统，也是一个不屈不挠的问题。为了解决这些问题并克服抑制深度学习模型的成功培训的数据集依赖性局限性，我们追求代表性学习来培训可用于多种声音效应库的广义嵌入，并且是声音的分类学敏锐性表示。我们表明，特定于任务但独立于数据集的表示可以成功地解决数据问题，例如类不平衡，类别标签不一致和数据集大小不足，胜过诸如OpenL3之类的已建立表示的表示。详细的实验结果表明，公制学习方法和不同的跨数据库训练方法对代表性有效性的影响。

Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy updates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overcome dataset-dependent limitations that inhibit the successful training of deep learning models, we pursue representation learning to train generalized embeddings that can be used for a wide variety of sound effects libraries and are a taxonomy-agnostic representation of sound. We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size, outperforming established representations such as OpenL3. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题