使用空间填充曲线的新型音频表示

论文标题

使用空间填充曲线的新型音频表示

A novel audio representation using space filling curves

论文作者

Mari, Alessandro, Salarian, Arash

论文摘要

由于卷积神经网络（CNN）彻底改变了图像处理字段，因此它们已在音频环境中广泛应用。一种常见的方法是使用时频分解方法将一维音频信号时间序列转换为二维图像。同样，丢弃相位信息也很常见。在本文中，我们建议使用空间填充曲线（SFC）将一维音频波形映射到二维图像。这些映射在保留其本地结构的同时不压缩输入信号。此外，映射受益于深度学习和大量现有计算机视觉网络的进展。我们在两个关键字发现问题上测试八个SFC。我们表明，Z曲线由于其在卷积操作下的转移率高而产生了最佳结果。此外，Z曲线与多个CNN的广泛使用的MEL频率曲线系数产生可比的结果。

Since convolutional neural networks (CNNs) have revolutionized the image processing field, they have been widely applied in the audio context. A common approach is to convert the one-dimensional audio signal time series to two-dimensional images using a time-frequency decomposition method. Also it is common to discard the phase information. In this paper, we propose to map one-dimensional audio waveforms to two-dimensional images using space filling curves (SFCs). These mappings do not compress the input signal, while preserving its local structure. Moreover, the mappings benefit from progress made in deep learning and the large collection of existing computer vision networks. We test eight SFCs on two keyword spotting problems. We show that the Z curve yields the best results due to its shift equivariance under convolution operations. Additionally, the Z curve produces comparable results to the widely used mel frequency cepstral coefficients across multiple CNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题