Gacela-长音频介入的生成对抗上下文编码器

论文标题

Gacela-长音频介入的生成对抗上下文编码器

GACELA -- A generative adversarial context encoder for long audio inpainting

论文作者

Marafioti, Andres, Majdak, Piotr, Holighaus, Nicki, Perraudin, Nathanaël

论文摘要

我们介绍了Gacela，这是一种生成的对抗网络（GAN），旨在恢复缺少的音乐音频数据，持续时间在数百毫秒之间到几秒钟，即执行长距离音频介绍。虽然先前的工作要么通过复制其他信号零件的可用信息来解决较短的差距，要么依赖示例，但Gacela解决了两个方面的长间隙的介入。首先，它通过依靠五个并行的歧视因子来考虑各种音频信息的时间尺度，并以增加的接收场分辨率分辨率。其次，它不仅在差距周围的可用信息（即上下文）上进行条件，而且还基于条件gan的潜在变量。这解决了在如此长的空白处介绍音频的固有的多模式，并提供了用户定义的插图的选项。 Gacela在听力测试中测试了有关复杂性和间隙持续时间不同范围为375〜MS至1500〜MS的差距的音乐信号的测试。虽然我们的受试者通常能够检测到涂料，但人工制品的严重程度从不可接受到轻度干扰。 Gacela代表一个能够整合未来改进的框架，例如处理更多听觉相关功能或更明确的音乐功能。

We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gaps in two aspects. First, it considers various time scales of audio information by relying on five parallel discriminators with increasing resolution of receptive fields. Second, it is conditioned not only on the available information surrounding the gap, i.e., the context, but also on the latent variable of the conditional GAN. This addresses the inherent multi-modality of audio inpainting at such long gaps and provides the option of user-defined inpainting. GACELA was tested in listening tests on music signals of varying complexity and gap durations ranging from 375~ms to 1500~ms. While our subjects were often able to detect the inpaintings, the severity of the artifacts decreased from unacceptable to mildly disturbing. GACELA represents a framework capable to integrate future improvements such as processing of more auditory-related features or more explicit musical features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题