论文标题
GAFX:一般音频功能提取器
GAFX: A General Audio Feature eXtractor
论文作者
论文摘要
大多数用于音频任务的机器学习模型都在处理手工制作的功能,即频谱图。但是,尚不清楚频谱图是否可以用基于深度学习的特征代替。在本文中,我们通过将不同的可学习神经网络与成功的频谱图模型进行比较,并提出了基于双U-NET(GAFX-U),ResNet(GAFX-R)和注意(GAFX-A)模块的一般音频提取器(GAFX),并提出了一般音频提取器(GAFX)。我们设计实验,以评估GTZAN数据集上的音乐流派分类任务,并遵循音频频谱变压器(AST)分类器Achie Achie Achie a flogitive竞争性能。
Most machine learning models for audio tasks are dealing with a handcrafted feature, the spectrogram. However, it is still unknown whether the spectrogram could be replaced with deep learning based features. In this paper, we answer this question by comparing the different learnable neural networks extracting features with a successful spectrogram model and proposed a General Audio Feature eXtractor (GAFX) based on a dual U-Net (GAFX-U), ResNet (GAFX-R), and Attention (GAFX-A) modules. We design experiments to evaluate this model on the music genre classification task on the GTZAN dataset and perform a detailed ablation study of different configurations of our framework and our model GAFX-U, following the Audio Spectrogram Transformer (AST) classifier achieves competitive performance.