论文标题

1D操作gan的现实世界音频的盲目恢复

Blind Restoration of Real-World Audio by 1D Operational GANs

论文作者

Ince, Turker, Kiranyaz, Serkan, Devecioglu, Ozer Can, Khan, Muhammad Salman, Chowdhury, Muhammad, Gabbouj, Moncef

论文摘要

目的:尽管提出了许多在文献中进行音频恢复的研究,但其中大多数都集中在孤立的恢复问题上,例如降解或缩放,而忽略了其他文物。此外,假设固定信噪比(SDR)水平有限的嘈杂或混响环境是一种常见的做法。但是,现实世界的音频通常被诸如混响,传感器噪声和背景音频混合物的融合而损坏,并具有不同的类型,严重性和持续时间。在这项研究中,我们提出了一种新型的方法,用于通过具有时间和光谱客观指标的操作生成对抗网络(OP-GAN)盲目恢复现实世界的音频信号,以提高恢复的音频信号的质量,而不管每种损坏它的伪影的类型和严重性的类型和严重性。方法:1D操作gan与已盲恢复任何损坏的音频信号的生成神经元模型一起使用。结果:已对所提出的方法进行了广泛的评估,对基准的Timit-rar(语音)和GTZAN-RAR(非语音)数据集进行了广泛的评估,这些数据集被随机混合而损坏,每个人工制品都具有随机的严重性,以模仿现实世界中的音频信号。与基线方法相比,分别实现了7.2 dB和4.9 dB的平均SDR改善。意义:这是一项盲目恢复的先驱研究,具有直接(时间域)恢复现实世界音频的独特能力,同时在广泛的SDR范围和人工制品类型方面达到了前所未有的性能水平。结论:1D OP-GAN可以实现强大和计算有效的现实音频恢复,并显着提高性能。源代码和生成的实际音频数据集在专用的GitHub存储库中与研究社区公开共享。

Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源