1D操作gan的现实世界音频的盲目恢复

论文标题

1D操作gan的现实世界音频的盲目恢复

Blind Restoration of Real-World Audio by 1D Operational GANs

论文作者

Ince, Turker, Kiranyaz, Serkan, Devecioglu, Ozer Can, Khan, Muhammad Salman, Chowdhury, Muhammad, Gabbouj, Moncef

论文摘要

目的：尽管提出了许多在文献中进行音频恢复的研究，但其中大多数都集中在孤立的恢复问题上，例如降解或缩放，而忽略了其他文物。此外，假设固定信噪比（SDR）水平有限的嘈杂或混响环境是一种常见的做法。但是，现实世界的音频通常被诸如混响，传感器噪声和背景音频混合物的融合而损坏，并具有不同的类型，严重性和持续时间。在这项研究中，我们提出了一种新型的方法，用于通过具有时间和光谱客观指标的操作生成对抗网络（OP-GAN）盲目恢复现实世界的音频信号，以提高恢复的音频信号的质量，而不管每种损坏它的伪影的类型和严重性的类型和严重性。方法：1D操作gan与已盲恢复任何损坏的音频信号的生成神经元模型一起使用。结果：已对所提出的方法进行了广泛的评估，对基准的Timit-rar（语音）和GTZAN-RAR（非语音）数据集进行了广泛的评估，这些数据集被随机混合而损坏，每个人工制品都具有随机的严重性，以模仿现实世界中的音频信号。与基线方法相比，分别实现了7.2 dB和4.9 dB的平均SDR改善。意义：这是一项盲目恢复的先驱研究，具有直接（时间域）恢复现实世界音频的独特能力，同时在广泛的SDR范围和人工制品类型方面达到了前所未有的性能水平。结论：1D OP-GAN可以实现强大和计算有效的现实音频恢复，并显着提高性能。源代码和生成的实际音频数据集在专用的GitHub存储库中与研究社区公开共享。

Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题