扬声器识别系统的版本控制

论文标题

扬声器识别系统的版本控制

Version Control of Speaker Recognition Systems

论文作者

Wang, Quan, Moreno, Ignacio Lopez

论文摘要

本文讨论了说话者识别系统中最具挑战性的实践工程问题之一 - 模型和用户配置文件的版本控制。一个典型的扬声器识别系统由两个阶段组成：注册阶段，其中由用户提供的注册音频生成个人资料；以及运行时阶段，其中运行时音频的语音身份与存储的配置文件进行了比较。随着技术的进步，需要更新扬声器识别系统以提高性能。但是，如果未对存储的用户配置文件进行相应的更新，则版本不匹配将导致毫无意义的识别结果。在本文中，我们描述了在工程实践中在Google仔细研究的说话者识别系统的不同版本控制策略。这些策略根据如何在生产环境中部署：设备侧部署，服务器端部署和混合部署将它们分为三组。为了将不同的策略与各种网络配置下的定量指标进行比较，我们介绍了SpeakerVersim，这是一个易于扩展的基于Python的仿真框架，适用于说话者识别系统的不同服务器端部署策略。

This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles. A typical speaker recognition system consists of two stages: the enrollment stage, where a profile is generated from user-provided enrollment audio; and the runtime stage, where the voice identity of the runtime audio is compared against the stored profiles. As technology advances, the speaker recognition system needs to be updated for better performance. However, if the stored user profiles are not updated accordingly, version mismatch will result in meaningless recognition results. In this paper, we describe different version control strategies for speaker recognition systems that had been carefully studied at Google from years of engineering practice. These strategies are categorized into three groups according to how they are deployed in the production environment: device-side deployment, server-side deployment, and hybrid deployment. To compare different strategies with quantitative metrics under various network configurations, we present SpeakerVerSim, an easily-extensible Python-based simulation framework for different server-side deployment strategies of speaker recognition systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题