论文标题
从注释级到和弦级的神经网络模型,用于符号音乐中的语音分离
From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music
论文作者
论文摘要
音乐通常是同时发生音符或声音流的发展。发生这种情况的程度取决于沿着语音领先的连续体的位置,从单声音到谐音到谐音,再到多音,这使自动语音分离模型的设计变得复杂。我们通过将语音分离定义为将音乐分解为流的任务来解决这一连续性,这些溪流既表现出与其他流相距高度的外部感知分离,又表现出高度的内部感知一致性。提出的语音分离任务允许语音发出多种声音,并使多个声音会收敛到相同的声音。配备了这个灵活的任务定义,我们手动注释了流行音乐的语料库,并用它来训练神经网络,这些神经网络将音符分别为和弦(注释级)分别为每个音符分别为声音(和弦级别)(和弦级)。训练有素的神经模型使用一组感知知情的输入特征,在输入和弦序列的左至右遍历的声音中分配注释。当对语音音符对中连续的提取进行评估时,这两个模型都基于信封提取函数的迭代应用超过了强基线,弦级模型始终逐步淘汰了注释级别的模型。这两个型号还显示出在Bach音乐中分离声音的先前方法。
Music is often experienced as a progression of concurrent streams of notes, or voices. The degree to which this happens depends on the position along a voice-leading continuum, ranging from monophonic, to homophonic, to polyphonic, which complicates the design of automatic voice separation models. We address this continuum by defining voice separation as the task of decomposing music into streams that exhibit both a high degree of external perceptual separation from the other streams and a high degree of internal perceptual consistency. The proposed voice separation task allows for a voice to diverge to multiple voices and also for multiple voices to converge to the same voice. Equipped with this flexible task definition, we manually annotated a corpus of popular music and used it to train neural networks that assign notes to voices either separately for each note in a chord (note-level), or jointly to all notes in a chord (chord-level). The trained neural models greedily assign notes to voices in a left to right traversal of the input chord sequence, using a diverse set of perceptually informed input features. When evaluated on the extraction of consecutive within voice note pairs, both models surpass a strong baseline based on an iterative application of an envelope extraction function, with the chord-level model consistently edging out the note-level model. The two models are also shown to outperform previous approaches on separating the voices in Bach music.